[ensembl-dev] Compara: Incosistent results in LastZ alignments?
Matthieu Muffato
muffato at ebi.ac.uk
Wed Dec 21 12:03:57 GMT 2016
Hi Marc,
Can you please clarify something ? When I try to get the human
alignments for the mouse region CHR_MG153_PATCH: 4,078,298-4,078,512 I
get 175 alignment blocks in total. Only 1 of them is on human chr 3, but
at the correct position: 3:64071783-64072021
(I've used this script
http://www.ebi.ac.uk/~muffato/workshops/2016_06_Cambridge/solutions_compara/gab1.pl
)
In theory, the (initial) pairwise alignment of any two species only
includes the primary assembly. Then, when patches are released, we
top-up the alignments with the patches. This process isn't great for
mouse patches on human vs mouse because human is the reference genome
for our lastz pipeline. If all the sequences were considered at the same
time, you'd indeed expect the chaining and netting steps to filter out
such small alignments to only keep the principal ones, but here, because
the pairwise alignment is done first without any mouse patches, and then
only with the mouse patches, it can't filter out the secondary
alignments properly.
Perhaps in these cases, we should redo the human->mouse alignment entirely
Matthieu
On 21/12/16 07:23, Marc P Hoeppner wrote:
> Dear EnsEMBL team,
>
> I have been using LastZ alignments to check for locus conservation
> between human and mouse. However, I have come across an issue that seems
> to be related to the inclusion of ALT loci in the alignments.
>
> Specifically, when comparing human<->mouse for the human snoRNA
> ENSG00000200222, I get the mouse U3 snoRNA ENSMUSG00000097846 as the
> matching locus. However, this annotation sits on an ALT assembly. When I
> do the comparison in reverse (mouse<->human), the mouse U3 snoRNA aligns
> to the human locus ENSG00000212211 (same chromosome as the original
> human U3 query, but 4 Mbp off).
>
> I suppose that shouldn't happen and may be related to these snoRNAs
> being repetetive sequences. Still, I qould have guessed that the gneomic
> context (i.e. neighboring coding genes) should provide some guidance to
> how these loci ought to be aligned? Is this a LastZ problem? Wouldn't it
> perhaps be more sensible to exclude ALT assemblies until these
> alignments can be represented as graphs rather than flattened pairwise
> comparisons?
>
> Kind regards,
>
> Marc
More information about the Dev
mailing list