[ensembl-dev] Compara: Incosistent results in LastZ alignments?

Marc P Hoeppner mphoeppner at gmail.com
Wed Dec 21 13:05:12 GMT 2016


Dear Matthieu,

I actually used the website to track down this issue:

Human-> Mouse:

http://www.ensembl.org/Homo_sapiens/Location/Compara_Alignments/Image?align=677&db=core&g=ENSG00000200222&r=3%3A64071807-64072019&t=ENST00000363352

And then the same in reverse, hitting a different human snoRNA:

http://www.ensembl.org/Mus_musculus/Location/Compara_Alignments/Image?align=677&db=core&g=ENSMUSG00000097846&r=CHR_MG153_PATCH%3A4078298-4078512&t=ENSMUST00000181407

I am using AlignSlice and AlignSlice:Slice objects to scan each locus; 
so maybe that is hiding some of the underlying details (as compared to 
genomic_align_blocks)?

Snippet:, using the human-mouse LASTZ_NET MLSS:

my $gene_adaptor = $registry->get_adaptor("human", "Core", "Gene");
my $gene = $gene_adaptor->fetch_by_stable_id("ENSG00000200222");

my $slice = $gene->feature_Slice;

my $align_slice = $align_slice_adaptor->fetch_by_Slice_MethodLinkSpeciesSet(
           $slice,
           $method_link_species_set,
           "expanded"
       );

# The slices making up this AlignSlice
my $sub_slices = $align_slice->get_all_Slices;

foreach my $slice (@$sub_slices) {
     print $slice->genome_db->name . "\n";

     foreach my $sg (@{$slice->get_all_Genes_by_source("RFAM")}) {

         print "\t" . $sg->stable_id . "\n";

     }
}


On 21.12.2016 13:03, Matthieu Muffato wrote:
> Hi Marc,
>
> Can you please clarify something ? When I try to get the human 
> alignments for the mouse region CHR_MG153_PATCH: 4,078,298-4,078,512 I 
> get 175 alignment blocks in total. Only 1 of them is on human chr 3, 
> but at the correct position: 3:64071783-64072021
> (I've used this script 
> http://www.ebi.ac.uk/~muffato/workshops/2016_06_Cambridge/solutions_compara/gab1.pl 
> )
>
> In theory, the (initial) pairwise alignment of any two species only 
> includes the primary assembly. Then, when patches are released, we 
> top-up the alignments with the patches. This process isn't great for 
> mouse patches on human vs mouse because human is the reference genome 
> for our lastz pipeline. If all the sequences were considered at the 
> same time, you'd indeed expect the chaining and netting steps to 
> filter out such small alignments to only keep the principal ones, but 
> here, because the pairwise alignment is done first without any mouse 
> patches, and then only with the mouse patches, it can't filter out the 
> secondary alignments properly.
> Perhaps in these cases, we should redo the human->mouse alignment 
> entirely
>
> Matthieu
>
> On 21/12/16 07:23, Marc P Hoeppner wrote:
>> Dear EnsEMBL team,
>>
>> I have been using LastZ alignments to check for locus conservation
>> between human and mouse. However, I have come across an issue that seems
>> to be related to the inclusion of ALT loci in the alignments.
>>
>> Specifically, when comparing human<->mouse for the human snoRNA
>> ENSG00000200222, I get the mouse U3 snoRNA ENSMUSG00000097846 as the
>> matching locus. However, this annotation sits on an ALT assembly. When I
>> do the comparison in reverse (mouse<->human), the mouse U3 snoRNA aligns
>> to the human locus  ENSG00000212211 (same chromosome as the original
>> human U3 query, but 4 Mbp off).
>>
>> I suppose that shouldn't happen and may be related to these snoRNAs
>> being repetetive sequences. Still, I qould have guessed that the gneomic
>> context (i.e. neighboring coding genes) should provide some guidance to
>> how these loci ought to be aligned? Is this a LastZ problem? Wouldn't it
>> perhaps be more sensible to exclude ALT assemblies until these
>> alignments can be represented as graphs rather than flattened pairwise
>> comparisons?
>>
>> Kind regards,
>>
>> Marc
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: 
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/





More information about the Dev mailing list