[ensembl-dev] Compara: Incosistent results in LastZ alignments?
Marc P Hoeppner
mphoeppner at gmail.com
Wed Dec 21 13:05:12 GMT 2016
Dear Matthieu,
I actually used the website to track down this issue:
Human-> Mouse:
http://www.ensembl.org/Homo_sapiens/Location/Compara_Alignments/Image?align=677&db=core&g=ENSG00000200222&r=3%3A64071807-64072019&t=ENST00000363352
And then the same in reverse, hitting a different human snoRNA:
http://www.ensembl.org/Mus_musculus/Location/Compara_Alignments/Image?align=677&db=core&g=ENSMUSG00000097846&r=CHR_MG153_PATCH%3A4078298-4078512&t=ENSMUST00000181407
I am using AlignSlice and AlignSlice:Slice objects to scan each locus;
so maybe that is hiding some of the underlying details (as compared to
genomic_align_blocks)?
Snippet:, using the human-mouse LASTZ_NET MLSS:
my $gene_adaptor = $registry->get_adaptor("human", "Core", "Gene");
my $gene = $gene_adaptor->fetch_by_stable_id("ENSG00000200222");
my $slice = $gene->feature_Slice;
my $align_slice = $align_slice_adaptor->fetch_by_Slice_MethodLinkSpeciesSet(
$slice,
$method_link_species_set,
"expanded"
);
# The slices making up this AlignSlice
my $sub_slices = $align_slice->get_all_Slices;
foreach my $slice (@$sub_slices) {
print $slice->genome_db->name . "\n";
foreach my $sg (@{$slice->get_all_Genes_by_source("RFAM")}) {
print "\t" . $sg->stable_id . "\n";
}
}
On 21.12.2016 13:03, Matthieu Muffato wrote:
> Hi Marc,
>
> Can you please clarify something ? When I try to get the human
> alignments for the mouse region CHR_MG153_PATCH: 4,078,298-4,078,512 I
> get 175 alignment blocks in total. Only 1 of them is on human chr 3,
> but at the correct position: 3:64071783-64072021
> (I've used this script
> http://www.ebi.ac.uk/~muffato/workshops/2016_06_Cambridge/solutions_compara/gab1.pl
> )
>
> In theory, the (initial) pairwise alignment of any two species only
> includes the primary assembly. Then, when patches are released, we
> top-up the alignments with the patches. This process isn't great for
> mouse patches on human vs mouse because human is the reference genome
> for our lastz pipeline. If all the sequences were considered at the
> same time, you'd indeed expect the chaining and netting steps to
> filter out such small alignments to only keep the principal ones, but
> here, because the pairwise alignment is done first without any mouse
> patches, and then only with the mouse patches, it can't filter out the
> secondary alignments properly.
> Perhaps in these cases, we should redo the human->mouse alignment
> entirely
>
> Matthieu
>
> On 21/12/16 07:23, Marc P Hoeppner wrote:
>> Dear EnsEMBL team,
>>
>> I have been using LastZ alignments to check for locus conservation
>> between human and mouse. However, I have come across an issue that seems
>> to be related to the inclusion of ALT loci in the alignments.
>>
>> Specifically, when comparing human<->mouse for the human snoRNA
>> ENSG00000200222, I get the mouse U3 snoRNA ENSMUSG00000097846 as the
>> matching locus. However, this annotation sits on an ALT assembly. When I
>> do the comparison in reverse (mouse<->human), the mouse U3 snoRNA aligns
>> to the human locus ENSG00000212211 (same chromosome as the original
>> human U3 query, but 4 Mbp off).
>>
>> I suppose that shouldn't happen and may be related to these snoRNAs
>> being repetetive sequences. Still, I qould have guessed that the gneomic
>> context (i.e. neighboring coding genes) should provide some guidance to
>> how these loci ought to be aligned? Is this a LastZ problem? Wouldn't it
>> perhaps be more sensible to exclude ALT assemblies until these
>> alignments can be represented as graphs rather than flattened pairwise
>> comparisons?
>>
>> Kind regards,
>>
>> Marc
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
More information about the Dev
mailing list