[ensembl-dev] Compara: Incosistent results in LastZ alignments?

Matthieu Muffato muffato at ebi.ac.uk
Wed Dec 21 12:03:57 GMT 2016


Hi Marc,

Can you please clarify something ? When I try to get the human 
alignments for the mouse region CHR_MG153_PATCH: 4,078,298-4,078,512 I 
get 175 alignment blocks in total. Only 1 of them is on human chr 3, but 
at the correct position: 3:64071783-64072021
(I've used this script 
http://www.ebi.ac.uk/~muffato/workshops/2016_06_Cambridge/solutions_compara/gab1.pl 
)

In theory, the (initial) pairwise alignment of any two species only 
includes the primary assembly. Then, when patches are released, we 
top-up the alignments with the patches. This process isn't great for 
mouse patches on human vs mouse because human is the reference genome 
for our lastz pipeline. If all the sequences were considered at the same 
time, you'd indeed expect the chaining and netting steps to filter out 
such small alignments to only keep the principal ones, but here, because 
the pairwise alignment is done first without any mouse patches, and then 
only with the mouse patches, it can't filter out the secondary 
alignments properly.
Perhaps in these cases, we should redo the human->mouse alignment entirely

Matthieu

On 21/12/16 07:23, Marc P Hoeppner wrote:
> Dear EnsEMBL team,
>
> I have been using LastZ alignments to check for locus conservation
> between human and mouse. However, I have come across an issue that seems
> to be related to the inclusion of ALT loci in the alignments.
>
> Specifically, when comparing human<->mouse for the human snoRNA
> ENSG00000200222, I get the mouse U3 snoRNA ENSMUSG00000097846 as the
> matching locus. However, this annotation sits on an ALT assembly. When I
> do the comparison in reverse (mouse<->human), the mouse U3 snoRNA aligns
> to the human locus  ENSG00000212211 (same chromosome as the original
> human U3 query, but 4 Mbp off).
>
> I suppose that shouldn't happen and may be related to these snoRNAs
> being repetetive sequences. Still, I qould have guessed that the gneomic
> context (i.e. neighboring coding genes) should provide some guidance to
> how these loci ought to be aligned? Is this a LastZ problem? Wouldn't it
> perhaps be more sensible to exclude ALT assemblies until these
> alignments can be represented as graphs rather than flattened pairwise
> comparisons?
>
> Kind regards,
>
> Marc




More information about the Dev mailing list