[ensembl-dev] exon coordinate discrepancy between NCBI and Ensembl

Susan Fairley sf7 at sanger.ac.uk
Mon May 23 17:47:55 BST 2011

Hi Reece,

Looking at the Ensembl transcript, ENST00000360228, here:


I noticed that it is one of two CCDS transcripts for the gene.

Checking the exon coordinates at NCBI for the CCDS, the structures seem 
to be the same.


Looking further at the Ensembl transcript, it is possible to see that 
NM_023035.2 is one of the pieces of evidence that were aligned to the 
genome and used to construct the transcript ENST00000360228 during the 
Ensembl genebuild process (with the three points where the evidence 
extends beyond the structure highlighted).


It is at the end of the genebuild process that external identifiers are 
associated with genes, on the basis of sequence similarity. It would be 
at this stage that the ID NM_023035.2 became associated with the 
transcript(ENST00000360228) that it contributed to building. As I 
understand things, it is not necessary for there to be an exact match 
for an external ID to be associated with an Ensembl transcript.

You note that both Ensembl and NCBI map rs58729888 to the same genomic 
position. As the two transcript structures you are looking at differ, 
then the positions of rs58729888 in the two transcripts also differ when 
viewed at the transcript level, although it is the same genomic location.

I'm not sure that this directly answers your question but I hope it may 
be of some assistance.

Kind regards,

Reece Hart wrote:
> Dear devs-
> NCBI and Ensembl return different genomic exon coordinates for 
> NM_023035.2. These differences lead to discrepancies when mapping 
> variants in my own code and at the Ensembl and NCBI web sites. I'd 
> appreciate some help understanding the origin of these differences.
> The following is a diff of exon start,stop,length between e61 and NCBI. 
>     1c1
>     < Ensembl 61 (NM_023035.2; ENST00000360228)
>     ---
>      > NCBI (NM_023035.2)
>     11c11
>     < 13441058 13441147 90
>     ---
>      > 13441058 13441150 93
>     18c18
>     < 13414360 13414427 68
>     ---
>      > 13414351 13414427 77
>     32a33
>      > 13352335 13352340 6
> e61 and e62 give identical results for this transcript. There is a net 
> loss of 12 nt in two exons, and the complete absence of the terminal exon. 
> This discrepancy between Ensembl and NCBI is also apparent in 
> differences at the Ensembl and NCBI web sites. For example, both concur 
> that rs58729888 is located at chr19:g.13368278, but NCBI maps it 
> to NM_023035.2:r.4724, NP_075461.2:p.1496V>V [1] whereas Ensembl 62 maps 
> it to ENST00000360228:r.4712, p.1492 [2]. ENST..228 is the transcript 
> retrieved from Ensembl using NM_023035.2 as an external reference, so I 
> presume that they're intended to be identical. Note the mapping 
> difference of 12nt is the same as the sum of the length differences in 
> the exon diffs. 
> Thanks for any help in understanding the origin of this difference 
> between Ensembl and NCBI.
> The code I used to extract exon coordinates from NCBI and and Ensembl 
> are attached; if the attachments fail, they're also at 
> http://pastebin.com/Vuf55x2t and http://pastebin.com/G9sqgZqg.
> -Reece
> [1] http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?rs=rs58729888
> [2] 
> http://www.ensembl.org/Homo_sapiens/Variation/Mappings?db=core;g=ENSG00000141837;r=19:13317256-13617274;t=ENST00000360228;v=rs58729888;vdb=variation;vf=31287499
> ------------------------------------------------------------------------
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

More information about the Dev mailing list