[ensembl-dev] loading NCBI exon structures into Ensembl

William Spooner whs at eaglegenomics.com
Thu Jun 2 08:42:12 BST 2011


On 2 Jun 2011, at 06:58, Reece Hart wrote:

> On Thu, May 26, 2011 at 11:16 PM, William Spooner <whs at eaglegenomics.com> wrote:
> My approach would be to load the entire set of NCBI genes as a separate analysis, perhaps even into a separate satellite core database. If you have the NCBI annotations in gff3 format, then there are Ensembl scripts to load the data. There may be some faffing with assembly exceptions if the NCBI genes do not always follow the reference assembly exactly.
> 
> Hi Will-
> 
> Sorry for the delay... I was out of town.
> 
> Dang, this is proving to be harder than I expected. The end point I was hoping for was to use the Ensembl API on NCBI transcripts so that I could use SliceAdaptor::fetch_by_region, Slice::get_all_Transcripts, TranscriptMapper, etc. If I understand your suggestion, I don't think I get that functionality. (And, I don't have a gff3 of NCBI annotations either.)
> 
> 
> Do any of the Ensembl devs have any advice? 

Hi Reece,

It seems like the NCBI genes are already loaded into an ensembl satellite database. For example;
http://www.ensembl.org/Homo_sapiens/Gene/Summary?db=otherfeatures;g=55350;r=6:133017695-133161157;t=NM_001024460.1

So you should already be able to use the Ensembl API as you suggest. They are in the 'otherfeatures' database, so you need to specify this DB when getting adaptors from the registry in your script. For example;

...
use Bio::EnsEMBL::Registry;
Bio::EnsEMBL::Registry->load_registry_from_db(
      -host    => 'ensembldb.ensembl.org',
      -port    => '5306',
      -user    => 'anonymous',
      -verbose => '1' );
my $sa = Bio::EnsEMBL::Registry->get_adaptor('human','otherfeatures','Slice');
...

Will
--
William Spooner
whs at eaglegenomics.com
http://www.eaglegenomics.com







More information about the Dev mailing list