[ensembl-dev] gene to coords code issue

Andy Yates ayates at ebi.ac.uk
Mon Apr 23 11:32:21 BST 2012


Hi Sean,

Our US East server holds only the current & last release of the Ensembl datasets. For any archive data we only provide one server at ensembldb.ensembl.org. Should you want a more US based instance of the v53 data set then you will have to mirror the database yourself. We have a script called:

ensembl/misc-scripts/load_databases/load_database_from_ftp_site.pl

This will download, checksum and load an Ensembl database of your choosing into a MySQL server.

Hope this helps,

Andy

Andrew Yates                   Ensembl Core Software Project Leader
EMBL-EBI                       Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK         http://www.ensembl.org/

On 20 Apr 2012, at 22:31, Sean O'Keeffe wrote:

> Hi Andy,
> Ok. It seems the 53 api doesn't load the species databases when I use the useastdb.ensembl.org host. See below.
> When I switch to ensembldb.ensembl.org, I get the species loaded.
> 
> The 66 api works for both hosts.
> 
> $registry->load_registry_from_db(-host => 'ensembldb.ensembl.org',-user => 'anonymous', -verbose=>1);
> $registry->load_registry_from_db(-host => 'useastdb.ensembl.org',-port=>'5306',-user => 'anonymous', -verbose=>1);
> 
> Here's the ens 53 output (useastdb):
> >./gene2coords.pl WT_up_genes.txt
> Will only load v53 databases
> Bio::EnsEMBL::Variation::DBSQL::DBAdaptor module not found so variation databases will be ignored if found
> Bio::EnsEMBL::Funcgen::DBSQL::DBAdaptor module not found so functional genomics databases will be ignored if found
> No Compara databases found
> No ancestral database found
> No GO database found
> 
> And with ensembldb.ensembl.org:
> >./gene2coords.pl WT_up_genes.txt
> Will only load v53 databases
> Species 'saccharomyces_cerevisiae' loaded from database 'saccharomyces_cerevisiae_core_53_1i'
> Species 'oryctolagus_cuniculus' loaded from database 'oryctolagus_cuniculus_core_53_1h'
> Species 'gorilla_gorilla' loaded from database 'gorilla_gorilla_core_53_1'
> Species 'ciona_savignyi' loaded from database 'ciona_savignyi_core_53_2h'
> Species 'echinops_telfairi' loaded from database 'echinops_telfairi_core_53_1g'
> Species 'myotis_lucifugus' loaded from database 'myotis_lucifugus_core_53_1g'
> Species 'taeniopygia_guttata' loaded from database 'taeniopygia_guttata_core_53_1'
> Species 'homo_sapiens' loaded from database 'homo_sapiens_core_53_36o'
> Species 'dipodomys_ordii' loaded from database 'dipodomys_ordii_core_53_1b'
> Species 'sorex_araneus' loaded from database 'sorex_araneus_core_53_1e'
> Species 'otolemur_garnettii' loaded from database 'otolemur_garnettii_core_53_1e'
> Species 'erinaceus_europaeus' loaded from database 'erinaceus_europaeus_core_53_1e'
> Species 'anolis_carolinensis' loaded from database 'anolis_carolinensis_core_53_1'
> Species 'canis_familiaris' loaded from database 'canis_familiaris_core_53_2k'
> Species 'dasypus_novemcinctus' loaded from database 'dasypus_novemcinctus_core_53_2'
> Species 'ornithorhynchus_anatinus' loaded from database 'ornithorhynchus_anatinus_core_53_1j'
> Species 'tetraodon_nigroviridis' loaded from database 'tetraodon_nigroviridis_core_53_8b'
> Species 'tursiops_truncatus' loaded from database 'tursiops_truncatus_core_53_1b'
> Species 'tarsius_syrichta' loaded from database 'tarsius_syrichta_core_53_1b'
> Species 'vicugna_pacos' loaded from database 'vicugna_pacos_core_53_1b'
> Species 'xenopus_tropicalis' loaded from database 'xenopus_tropicalis_core_53_41m'
> Species 'mus_musculus' loaded from database 'mus_musculus_core_53_37f'
> Species 'bos_taurus' loaded from database 'bos_taurus_core_53_4c'
> Species 'aedes_aegypti' loaded from database 'aedes_aegypti_core_53_1d'
> Species 'monodelphis_domestica' loaded from database 'monodelphis_domestica_core_53_5h'
> Species 'choloepus_hoffmanni' loaded from database 'choloepus_hoffmanni_core_53_1'
> Species 'cavia_porcellus' loaded from database 'cavia_porcellus_core_53_3a'
> Species 'anopheles_gambiae' loaded from database 'anopheles_gambiae_core_53_3k'
> Species 'rattus_norvegicus' loaded from database 'rattus_norvegicus_core_53_34u'
> Species 'takifugu_rubripes' loaded from database 'takifugu_rubripes_core_53_4k'
> Species 'caenorhabditis_elegans' loaded from database 'caenorhabditis_elegans_core_53_190'
> Species 'pteropus_vampyrus' loaded from database 'pteropus_vampyrus_core_53_1b'
> Species 'microcebus_murinus' loaded from database 'microcebus_murinus_core_53_1b'
> Species 'ochotona_princeps' loaded from database 'ochotona_princeps_core_53_1c'
> Species 'pan_troglodytes' loaded from database 'pan_troglodytes_core_53_21j'
> Species 'felis_catus' loaded from database 'felis_catus_core_53_1f'
> Species 'equus_caballus' loaded from database 'equus_caballus_core_53_2c'
> Species 'procavia_capensis' loaded from database 'procavia_capensis_core_53_1b'
> Species 'oryzias_latipes' loaded from database 'oryzias_latipes_core_53_1i'
> Species 'macaca_mulatta' loaded from database 'macaca_mulatta_core_53_10k'
> Species 'danio_rerio' loaded from database 'danio_rerio_core_53_7e'
> Species 'gallus_gallus' loaded from database 'gallus_gallus_core_53_2k'
> Species 'tupaia_belangeri' loaded from database 'tupaia_belangeri_core_53_1f'
> Species 'ciona_intestinalis' loaded from database 'ciona_intestinalis_core_53_2l'
> Species 'loxodonta_africana' loaded from database 'loxodonta_africana_core_53_2'
> Species 'spermophilus_tridecemlineatus' loaded from database 'spermophilus_tridecemlineatus_core_53_1g'
> Species 'pongo_pygmaeus' loaded from database 'pongo_pygmaeus_core_53_1c'
> Species 'drosophila_melanogaster' loaded from database 'drosophila_melanogaster_core_53_54a'
> Species 'gasterosteus_aculeatus' loaded from database 'gasterosteus_aculeatus_core_53_1j'
> homo_sapiens_cdna_53_36o loaded
> mus_musculus_cdna_53_37f loaded
> mus_musculus_vega_53_37f loaded
> homo_sapiens_vega_53_36o loaded
> takifugu_rubripes_otherfeatures_53_4k loaded
> danio_rerio_otherfeatures_53_7e loaded
> pan_troglodytes_otherfeatures_53_21j loaded
> taeniopygia_guttata_otherfeatures_53_1 loaded
> rattus_norvegicus_otherfeatures_53_34u loaded
> oryzias_latipes_otherfeatures_53_1i loaded
> drosophila_melanogaster_otherfeatures_53_54a loaded
> saccharomyces_cerevisiae_otherfeatures_53_1i loaded
> gallus_gallus_otherfeatures_53_2k loaded
> homo_sapiens_otherfeatures_53_36o loaded
> xenopus_tropicalis_otherfeatures_53_41m loaded
> pongo_pygmaeus_otherfeatures_53_1c loaded
> gasterosteus_aculeatus_otherfeatures_53_1j loaded
> bos_taurus_otherfeatures_53_4c loaded
> tetraodon_nigroviridis_otherfeatures_53_8b loaded
> anolis_carolinensis_otherfeatures_53_1 loaded
> cavia_porcellus_otherfeatures_53_3a loaded
> equus_caballus_otherfeatures_53_2c loaded
> macaca_mulatta_otherfeatures_53_10k loaded
> canis_familiaris_otherfeatures_53_2k loaded
> ciona_savignyi_otherfeatures_53_2h loaded
> ornithorhynchus_anatinus_otherfeatures_53_1j loaded
> mus_musculus_otherfeatures_53_37f loaded
> ciona_intestinalis_otherfeatures_53_2l loaded
> anopheles_gambiae_otherfeatures_53_3k loaded
> Bio::EnsEMBL::Variation::DBSQL::DBAdaptor module not found so variation databases will be ignored if found
> Bio::EnsEMBL::Funcgen::DBSQL::DBAdaptor module not found so functional genomics databases will be ignored if found
> Bio::EnsEMBL::Compara::DBSQL::DBAdaptor not found so the following compara databases will be ignored: ensembl_compara_53
> ensembl_ancestral_53 loaded
> GO software not installed so GO database ensembl_go_53 will be ignored
> 
> 
> On 20 April 2012 15:51, Andy Yates <ayates at ebi.ac.uk> wrote:
> Hi Sean,
> 
> That is odd. Using the 53 API is the best way to access v53 data. Could you can change your code to the following:
> 
> $registry->load_registry_from_db(-host => 'ensembldb.ensembl.org', -user => 'anonymous', -verbose => 1);
> 
> This will emit a lot of debug information about the databases the registry can find & send that output back to us. We should be able to debug your problem then. Also can you send the latest version of your script please
> 
> Many thanks,
> 
> Andy
> 
> Andrew Yates                   Ensembl Core Software Project Leader
> EMBL-EBI                       Tel: +44-(0)1223-492538
> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
> Cambridge CB10 1SD, UK         http://www.ensembl.org/
> 
> On 20 Apr 2012, at 20:34, Sean O'Keeffe wrote:
> 
> > Hi Andy,
> > You are indeed spot on. I am using the ensembl 53 api. Switching to ensembl 66 solves the issue.
> > However, I'm trying to extract hg18 coordinates not hg19 - this was why I used ensembl_53.
> > What should I do to get these coords?
> >
> > Sean.
> >
> > On 20 April 2012 12:40, Andy Yates <ayates at ebi.ac.uk> wrote:
> > Hi Sean,
> >
> > Normally if you are getting responses saying "can't call method on undefined value" points to you using an unreleased API version. Can you confirm the version of Ensembl you are using please? Also can you run the program ensembl/misc-scripts/ping_ensembl.pl which will attempt to diagnose your connection/setup
> >
> > All the best,
> >
> > Andy
> >
> > Andrew Yates                   Ensembl Core Software Project Leader
> > EMBL-EBI                       Tel: +44-(0)1223-492538
> > Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
> > Cambridge CB10 1SD, UK         http://www.ensembl.org/
> >
> > On 20 Apr 2012, at 16:16, Sean O'Keeffe wrote:
> >
> > > Thanks for the response Javier.
> > >
> > > I see the reference to an array of objects and I've implemented this.
> > > However I don't get it. The script dies at the call to fetch_all_by_external_name() - Can't call method "fetch_all_by_external_name" on an undefined value.
> > > It never gets to implement the loop of array objects. The $variable $id is valid and prints out prior to the script dying.
> > >
> > > ...
> > > print $id,"\n";
> > > my $adaptor = $registry->get_adaptor( 'Human', 'Core', 'gene' );
> > >
> > > my $gene = $adaptor->fetch_all_by_external_name($id);
> > >
> > >   foreach $g(@$gene){
> > >     $chr = $g->seq_region_name();
> > >     $start = $g->seq_region_start();
> > >     $end = $g->seq_region_end();
> > >     print OUT join("\t", $chr,$start,$end,$id),"\n";
> > >   }
> > >
> > >
> > > On 20 April 2012 00:49, Javier Herrero <jherrero at ebi.ac.uk> wrote:
> > > Dear Sean
> > >
> > > The method fetch_all_by_external_name returns a reference to an array of Bio::EnsEMBL::Gene objects. All the methods named "fetch_all_by..." return a reference to an array. The array might be empty or contain just one entry, but you will always get a reference to an array. Contrarily, all the methods named "fetch_by..." return either undef or 1 single object.
> > >
> > > Typically, you would use a foreach loop to go through all possible returned object:
> > >
> > >
> > > open OUT, ">$gene_file.coords";
> > > for my $geneid ( @unique ) {
> > >     chomp $geneid;
> > >     ensembl_coords($geneid);
> > > }
> > >
> > > sub ensembl_coords {
> > >   my ($id) = @_;
> > >
> > >   my $adaptor = $registry->get_adaptor( 'Human', 'Core', 'gene' );
> > >
> > >   my $all_genes = $adaptor->fetch_all_by_external_name($id);
> > >
> > >   foreach my $gene (@$all_genes) {
> > >
> > >     $chr = $gene->seq_region_name();
> > >     $start = $gene->seq_region_start();
> > >     $end = $gene->seq_region_end();
> > >     print OUT join("\t", $chr,$start,$end,$id),"\n"; #I have added the original $id here
> > >   }
> > >
> > > }
> > >
> > >
> > > I hope the helps
> > >
> > > Javier
> > >
> > >
> > >
> > >
> > > On 20/04/12 04:49, Sean O'Keeffe wrote:
> > >> Hi,
> > >> I've used the code below on multiple occasions to convert external gene names to chromosome coords and it worked fine.
> > >> However when I tried it just now I get the error for the very first gene DNAI2 and the script crashes:
> > >>
> > >> Can't call method "seq_region_name" on unblessed reference
> > >>
> > >> When I tried fetch_by_display_label($id) - I get:
> > >>
> > >> Can't call method "seq_region_name" on an undefined value
> > >>
> > >> Have I missed something?
> > >> Thanks for any help,
> > >> Sean.
> > >>
> > >> p.s. I tried connecting to the useastdb.ensembl.org, as I'm in the states, but It gave the following (maybe the 2 issues are related):
> > >>
> > >> DBI connect('host=useastdb.ensembl.org;port=3306','anonymous',...) failed: Can't connect to MySQL server on 'useastdb.ensembl.org' (111) at /home/sean/tools/ensembl_53/modules/Bio/EnsEMBL/Registry.pm line 1329
> > >> Can't call method "selectall_arrayref" on an undefined value at /home/sean/tools/ensembl_53/modules/Bio/EnsEMBL/Registry.pm line 1332.
> > >>
> > >> ==============
> > >>
> > >> #!/usr/bin/perl
> > >>
> > >> use strict;
> > >> use lib '/home/sean/tools/ensembl_53/modules';
> > >>
> > >> use Bio::SeqIO;
> > >> use Bio::Root::IO;
> > >> use Bio::EnsEMBL::DBSQL::BaseAdaptor;
> > >> use Bio::EnsEMBL::Registry;
> > >>
> > >> my $registry = 'Bio::EnsEMBL::Registry';
> > >> #$registry->load_registry_from_db(-host => 'useastdb.ensembl.org',-user => 'anonymous');
> > >> $registry->load_registry_from_db(-host => 'ensembldb.ensembl.org',-user => 'anonymous');
> > >>
> > >> open OUT, ">$gene_file.coords";
> > >> for my $geneid ( @unique ) {
> > >>     chomp $geneid;
> > >>     ($chr,$start, $end) = ensembl_coords($geneid);
> > >>     print OUT join("\t", $chr,$start,$end,$geneid),"\n";
> > >> }
> > >>
> > >> sub ensembl_coords {
> > >>   my ($id) = @_;
> > >>
> > >>   my $adaptor = $registry->get_adaptor( 'Human', 'Core', 'gene' );
> > >>
> > >>   my $gene = $adaptor->fetch_all_by_external_name($id);
> > >>   # my $gene = $adaptor->fetch_by_display_label($id);
> > >>
> > >>   $chr = $gene->seq_region_name();
> > >>   $start = $gene->seq_region_start();
> > >>   $end = $gene->seq_region_end();
> > >>   return ($chr,$start,$end);
> > >>
> > >> }
> > >>
> > >>
> > >> _______________________________________________
> > >> Dev mailing list
> > >> Dev at ensembl.org
> > >>
> > >> List admin (including subscribe/unsubscribe):
> > >> http://lists.ensembl.org/mailman/listinfo/dev
> > >>
> > >> Ensembl Blog:
> > >> http://www.ensembl.info/
> > >
> > > --
> > > Javier Herrero, PhD
> > > Ensembl Coordinator and Ensembl Compara Project Leader
> > > European Bioinformatics Institute (EMBL-EBI)
> > > Wellcome Trust Genome Campus, Hinxton
> > > Cambridge - CB10 1SD - UK
> > >
> > >
> > > _______________________________________________
> > > Dev mailing list    Dev at ensembl.org
> > > List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> > > Ensembl Blog: http://www.ensembl.info/
> > >
> > >
> > > _______________________________________________
> > > Dev mailing list    Dev at ensembl.org
> > > List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> > > Ensembl Blog: http://www.ensembl.info/
> >
> >
> > _______________________________________________
> > Dev mailing list    Dev at ensembl.org
> > List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> > Ensembl Blog: http://www.ensembl.info/
> >
> >
> > _______________________________________________
> > Dev mailing list    Dev at ensembl.org
> > List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> > Ensembl Blog: http://www.ensembl.info/
> 
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
> 
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/





More information about the Dev mailing list