[ensembl-dev] exon coordinate discrepancy between NCBI and Ensembl

Daniel Hughes dsth at ebi.ac.uk
Wed May 25 09:29:28 BST 2011


could you load the ncbi exons and the refseq version of the assembly as an
alternative assembly?

dan.

Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge)
-------------------------------------------------------------------------------------
dsth at cantab.net
dsth at cpan.org


2011/5/25 Kiran Mukhyala <mukhyala at gmail.com>

>
>
> On Tue, May 24, 2011 at 12:46 PM, Reece Hart <reece at harts.net> wrote:
>
>> Because it's so convenient to code for Ensembl, I'd still like to see if
>> there's a way to accomplish what I want with Ensembl. The goal is convert
>> HGVS variants specified using NCBI accessions between genomic, raw
>> transcript (i.e., 'r.' variants), CDS, and protein coordinate systems. To
>> achieve accurate conversion in the general case, it is necessary to have a
>> single, shared understanding of the exon structure, accurate to nucleotide
>> level, as implied by the named transcript. Exon-level similarity, even when
>> the CDS is unchanged, doesn't cut it in this case.
>>
>> Does anyone know whether it would work to load NCBI exons directly into
>> Ensembl? I'm hoping that populating the transcript, transcript_stable_id,
>> exon, and exon_transcript tables with original NCBI data would suffice. Is
>> that too naive?
>>
>>
> In order to map genomic to transcript coordinates using the Ensembl API,
> one requirement is that the transcript be derived from the reference genome.
> Unfortunately, this is not true for a small percentage of RefSeqs. RefSeq
> UTRs especially do not match the reference genome well.
>
> What that means is that if you load NCBI exons directly into Ensembl, since
> the API constructs the transcript sequence from the genome, the reference
> genome derived transcript will not match the RefSeq sequence and so you will
> not be able to accurately convert genomic to RefSeq coordinates.
>
> This theoretically should not happen with the CCDS genes but I haven't
> tested it. By the way, Ensembl does import RefSeq and CCDS genes into the
> otherfeatures database.
>
> -Kiran
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe):
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20110525/e3755d4d/attachment.html>


More information about the Dev mailing list