[ensembl-dev] exon coordinate discrepancy between NCBI and Ensembl
reece at harts.net
Tue May 24 20:46:25 BST 2011
On Mon, May 23, 2011 at 2:00 PM, Paul Flicek <flicek at ebi.ac.uk> wrote:
> Once the Ensembl gene set is created we create "external references" to the
> RefSeq identifiers to identify those objects that are "biologically the
> same". Note however, that a RefSeq that corresponds to an Ensembl gene does
> not mean that these are have identical placements on the genome assembly.
Thanks for your explanation. I understand that transcripts may be
different/similar/identical in many dimensions and that Ensembl made a
particular and reasonable choice about grouping transcripts by some
For whatever it's worth, my expectations were 1) fetch_all_by_external_name
would return a transcript that was fully consistent with the primary source
(rather than similar by some criteria to it); and 2) all transcript grouping
happened at the level of Ensembl Genes (which also groups transcripts).
Because it's so convenient to code for Ensembl, I'd still like to see if
there's a way to accomplish what I want with Ensembl. The goal is convert
HGVS variants specified using NCBI accessions between genomic, raw
transcript (i.e., 'r.' variants), CDS, and protein coordinate systems. To
achieve accurate conversion in the general case, it is necessary to have a
single, shared understanding of the exon structure, accurate to nucleotide
level, as implied by the named transcript. Exon-level similarity, even when
the CDS is unchanged, doesn't cut it in this case.
Does anyone know whether it would work to load NCBI exons directly into
Ensembl? I'm hoping that populating the transcript, transcript_stable_id,
exon, and exon_transcript tables with original NCBI data would suffice. Is
that too naive?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Dev