[ensembl-dev] exon coordinate discrepancy between NCBI and Ensembl

Kiran Mukhyala mukhyala at gmail.com
Mon May 23 17:43:43 BST 2011


Hi Reece,

I'll try to take a shot at this since you didn't receive any replies yet -

As far as I understand the Ensembl model, a refseq_dna (an external
reference) linked to an ensembl transcript, does not mean their exons are
identical.
NCBI's mapping of NM_023035.2 to the genome could be different from
Ensembl's mapping of ENST00000360228 because the mapping methods are
different and the sequences could also be different.

Is there any other reason you expect the transcript models to be identical?

-Kiran

On Sat, May 21, 2011 at 12:48 PM, Reece Hart <reece at harts.net> wrote:

> Dear devs-
>
> NCBI and Ensembl return different genomic exon coordinates for NM_023035.2.
> These differences lead to discrepancies when mapping variants in my own code
> and at the Ensembl and NCBI web sites. I'd appreciate some help
> understanding the origin of these differences.
>
> The following is a diff of exon start,stop,length between e61 and NCBI.
>
> 1c1
> < Ensembl 61 (NM_023035.2; ENST00000360228)
> ---
> > NCBI (NM_023035.2)
> 11c11
> < 13441058 13441147 90
> ---
> > 13441058 13441150 93
> 18c18
> < 13414360 13414427 68
> ---
> > 13414351 13414427 77
> 32a33
> > 13352335 13352340 6
>
>
> e61 and e62 give identical results for this transcript. There is a net loss
> of 12 nt in two exons, and the complete absence of the terminal exon.
>
> This discrepancy between Ensembl and NCBI is also apparent in differences
> at the Ensembl and NCBI web sites. For example, both concur that rs58729888
> is located at chr19:g.13368278, but NCBI maps it to NM_023035.2:r.4724,
> NP_075461.2:p.1496V>V [1] whereas Ensembl 62 maps it to
> ENST00000360228:r.4712, p.1492 [2]. ENST..228 is the transcript retrieved
> from Ensembl using NM_023035.2 as an external reference, so I presume that
> they're intended to be identical. Note the mapping difference of 12nt is the
> same as the sum of the length differences in the exon diffs.
>
> Thanks for any help in understanding the origin of this difference between
> Ensembl and NCBI.
>
> The code I used to extract exon coordinates from NCBI and and Ensembl are
> attached; if the attachments fail, they're also at
> http://pastebin.com/Vuf55x2t and http://pastebin.com/G9sqgZqg.
>
> -Reece
>
>
> [1] http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?rs=rs58729888
> [2]
> http://www.ensembl.org/Homo_sapiens/Variation/Mappings?db=core;g=ENSG00000141837;r=19:13317256-13617274;t=ENST00000360228;v=rs58729888;vdb=variation;vf=31287499
>
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe):
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20110523/d2f06cbb/attachment.html>


More information about the Dev mailing list