[ensembl-dev] API question about Multi Alignments (technical!)

Marc Hoeppner mphoeppner at gmail.com
Fri Jan 25 09:42:15 GMT 2013


Dear EnsEMBL team,

 I am currently trying to convert the 19 vertebrate PECAN alignment into
something resembling a pairwise synteny map of sorts (I know there are ways
to get the actual pairwise synteny, but I have something else I wish to
test on this data). To elaborate:

The goal would be to get information on which regions are aligned between
human and any of the other 18 species – with base-precision for the start
and stop of these alignments in gemomic coordinates. The output should look
like:

Human_chr human_chr_start human_chr_end target_chr target_chr_start
target_chr_end score target_chr_strand

 However, as it turns out this is apparently not as straight-forward as I
had hoped.  The main steps, I suspect, would be:

Deconstruct the aligned sequences (AlignSlice) into their underlying
AlignSlice::Slices

Determine the genomic coordinates of the underlying genomic slices (many
genomic slices can make up one AlignSlice::Slice, possibly separated by
larger GAP slices)

Convert these genomic coordinates back to the AlignSlice::Slice (to get
their position in the fake alignment coordinate system)

Translate these alignment coordinates to human genomic coordinates via the
human AlignSlice::Slice

However, turns out that for each genomic slice, I bascially seem to end up
with the global coordinates of the AlignSlice::Slice I started from. This
would make sense if each AlignSlice::Slice would consist of exactly one
genomic_align and this genomic_align would align full-length to the human
query. Doubtful, I think.

Clearly, I am missing something, somewhere ;) Perhaps there is some padding
going on that I can’t seem to get rid of?

I have attached the script if anyone has an idea of how to do this
correctly.



Cheers,



Marc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130125/e61140fc/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: DumpMultiAlignments.dev_list.pl
Type: application/octet-stream
Size: 5759 bytes
Desc: not available
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130125/e61140fc/attachment.obj>


More information about the Dev mailing list