[ensembl-dev] loading NCBI exon structures into Ensembl

William Spooner whs at eaglegenomics.com
Fri May 27 07:16:40 BST 2011

Hi Reece,

My approach would be to load the entire set of NCBI genes as a separate analysis, perhaps even into a separate satellite core database. If you have the NCBI annotations in gff3 format, then there are Ensembl scripts to load the data. There may be some faffing with assembly exceptions if the NCBI genes do not always follow the reference assembly exactly.


On 27 May 2011, at 05:48, Reece Hart wrote:

> Hi-
> Does anyone know whether it would work to load NCBI exon structures directly into Ensembl?
> The goal is to use the Ensembl API to map between genome, transcript, and protein coordinates for variants specified using NCBI accessions. This requires exact NCBI exon structures.
> I'm hoping that populating the transcript, transcript_stable_id, exon, and exon_transcript tables with original NCBI data would suffice.
> As Kiran Mukhyala pointed out in a separate thread, the RefSeq sequence differs from the GRCh37 sequence in some cases. I am content with using RefSeq exon structures on GRCh37.
> All code and advice is appreciated.
> Thanks,
> Reece
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

William Spooner
whs at eaglegenomics.com

More information about the Dev mailing list