[ensembl-dev] Difference in genomic coordinates between REFSEQ and ENSEMBL

Duarte Molha duartemolha at gmail.com
Mon Feb 24 09:09:35 GMT 2014


Dear Developers...



I was wondering if anyone of you could help me with an problem I am having
comparing REFSEQ with ENSEMBL transcripts...

I had assumed that the gene start and end coordinates in ensembl were
obtained from the longest transcript model for each gene. However this does
not seem to be the case when comparing as list of around 300 genes I have
queried



Take a look at the example for transcript NM_001101426. In refseq this
transcript has the coordinates: chr7:16127152-16460947. However if you
search for it in Ensembl you get the transcript ENST00000407010 with
the coordinates : chr7:16130817-16460947
<http://www.ensembl.org/Homo_sapiens/Location/View?r=7:16130817-16460947:-1>



If we assume that ensembl would use the longest running transcript to
determine the start and end coordinates then the ISPD gene should
start at 16127152 and not at 16130817. There is a difference of almost
4KB. I understand the gene models are different and I would expect
small differences between the two... but not a 4KB diference. Can you
explain the discrepancy?

Best regards

Duarte


=========================
     Duarte Miguel Paulo Molha
         http://about.me/duarte
=========================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20140224/45ff632d/attachment.html>


More information about the Dev mailing list