[ensembl-dev] exon coordinate discrepancy between NCBI and Ensembl

Reece Hart reece at harts.net
Sat May 21 20:49:29 BST 2011

Dear devs-

NCBI and Ensembl return different genomic exon coordinates for NM_023035.2.
These differences lead to discrepancies when mapping variants in my own code
and at the Ensembl and NCBI web sites. I'd appreciate some help
understanding the origin of these differences.

The following is a diff of exon start,stop,length between e61 and NCBI.

< Ensembl 61 (NM_023035.2; ENST00000360228)
> NCBI (NM_023035.2)
< 13441058 13441147 90
> 13441058 13441150 93
< 13414360 13414427 68
> 13414351 13414427 77
> 13352335 13352340 6

e61 and e62 give identical results for this transcript. There is a net loss
of 12 nt in two exons, and the complete absence of the terminal exon.

This discrepancy between Ensembl and NCBI is also apparent in differences at
the Ensembl and NCBI web sites. For example, both concur that rs58729888 is
located at chr19:g.13368278, but NCBI maps it to NM_023035.2:r.4724,
NP_075461.2:p.1496V>V [1] whereas Ensembl 62 maps it to
ENST00000360228:r.4712, p.1492 [2]. ENST..228 is the transcript retrieved
from Ensembl using NM_023035.2 as an external reference, so I presume that
they're intended to be identical. Note the mapping difference of 12nt is the
same as the sum of the length differences in the exon diffs.

Thanks for any help in understanding the origin of this difference between
Ensembl and NCBI.

The code I used to extract exon coordinates from NCBI and and Ensembl are
attached; if the attachments fail, they're also at
http://pastebin.com/Vuf55x2t and http://pastebin.com/G9sqgZqg.


[1] http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?rs=rs58729888
