[ensembl-dev] gene to coords code issue

Sean O'Keeffe so2346 at columbia.edu
Mon Apr 23 15:42:53 BST 2012


I see. Thanks for clearing this up Andy.

Sean.

On 23 April 2012 06:32, Andy Yates <ayates at ebi.ac.uk> wrote:

> Hi Sean,
>
> Our US East server holds only the current & last release of the Ensembl
> datasets. For any archive data we only provide one server at
> ensembldb.ensembl.org. Should you want a more US based instance of the
> v53 data set then you will have to mirror the database yourself. We have a
> script called:
>
> ensembl/misc-scripts/load_databases/load_database_from_ftp_site.pl
>
> This will download, checksum and load an Ensembl database of your choosing
> into a MySQL server.
>
> Hope this helps,
>
> Andy
>
> Andrew Yates                   Ensembl Core Software Project Leader
> EMBL-EBI                       Tel: +44-(0)1223-492538
> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
> Cambridge CB10 1SD, UK         http://www.ensembl.org/
>
> On 20 Apr 2012, at 22:31, Sean O'Keeffe wrote:
>
> > Hi Andy,
> > Ok. It seems the 53 api doesn't load the species databases when I use
> the useastdb.ensembl.org host. See below.
> > When I switch to ensembldb.ensembl.org, I get the species loaded.
> >
> > The 66 api works for both hosts.
> >
> > $registry->load_registry_from_db(-host => 'ensembldb.ensembl.org',-user
> => 'anonymous', -verbose=>1);
> > $registry->load_registry_from_db(-host => 'useastdb.ensembl.org',-port=>'5306',-user
> => 'anonymous', -verbose=>1);
> >
> > Here's the ens 53 output (useastdb):
> > >./gene2coords.pl WT_up_genes.txt
> > Will only load v53 databases
> > Bio::EnsEMBL::Variation::DBSQL::DBAdaptor module not found so variation
> databases will be ignored if found
> > Bio::EnsEMBL::Funcgen::DBSQL::DBAdaptor module not found so functional
> genomics databases will be ignored if found
> > No Compara databases found
> > No ancestral database found
> > No GO database found
> >
> > And with ensembldb.ensembl.org:
> > >./gene2coords.pl WT_up_genes.txt
> > Will only load v53 databases
> > Species 'saccharomyces_cerevisiae' loaded from database
> 'saccharomyces_cerevisiae_core_53_1i'
> > Species 'oryctolagus_cuniculus' loaded from database
> 'oryctolagus_cuniculus_core_53_1h'
> > Species 'gorilla_gorilla' loaded from database
> 'gorilla_gorilla_core_53_1'
> > Species 'ciona_savignyi' loaded from database 'ciona_savignyi_core_53_2h'
> > Species 'echinops_telfairi' loaded from database
> 'echinops_telfairi_core_53_1g'
> > Species 'myotis_lucifugus' loaded from database
> 'myotis_lucifugus_core_53_1g'
> > Species 'taeniopygia_guttata' loaded from database
> 'taeniopygia_guttata_core_53_1'
> > Species 'homo_sapiens' loaded from database 'homo_sapiens_core_53_36o'
> > Species 'dipodomys_ordii' loaded from database
> 'dipodomys_ordii_core_53_1b'
> > Species 'sorex_araneus' loaded from database 'sorex_araneus_core_53_1e'
> > Species 'otolemur_garnettii' loaded from database
> 'otolemur_garnettii_core_53_1e'
> > Species 'erinaceus_europaeus' loaded from database
> 'erinaceus_europaeus_core_53_1e'
> > Species 'anolis_carolinensis' loaded from database
> 'anolis_carolinensis_core_53_1'
> > Species 'canis_familiaris' loaded from database
> 'canis_familiaris_core_53_2k'
> > Species 'dasypus_novemcinctus' loaded from database
> 'dasypus_novemcinctus_core_53_2'
> > Species 'ornithorhynchus_anatinus' loaded from database
> 'ornithorhynchus_anatinus_core_53_1j'
> > Species 'tetraodon_nigroviridis' loaded from database
> 'tetraodon_nigroviridis_core_53_8b'
> > Species 'tursiops_truncatus' loaded from database
> 'tursiops_truncatus_core_53_1b'
> > Species 'tarsius_syrichta' loaded from database
> 'tarsius_syrichta_core_53_1b'
> > Species 'vicugna_pacos' loaded from database 'vicugna_pacos_core_53_1b'
> > Species 'xenopus_tropicalis' loaded from database
> 'xenopus_tropicalis_core_53_41m'
> > Species 'mus_musculus' loaded from database 'mus_musculus_core_53_37f'
> > Species 'bos_taurus' loaded from database 'bos_taurus_core_53_4c'
> > Species 'aedes_aegypti' loaded from database 'aedes_aegypti_core_53_1d'
> > Species 'monodelphis_domestica' loaded from database
> 'monodelphis_domestica_core_53_5h'
> > Species 'choloepus_hoffmanni' loaded from database
> 'choloepus_hoffmanni_core_53_1'
> > Species 'cavia_porcellus' loaded from database
> 'cavia_porcellus_core_53_3a'
> > Species 'anopheles_gambiae' loaded from database
> 'anopheles_gambiae_core_53_3k'
> > Species 'rattus_norvegicus' loaded from database
> 'rattus_norvegicus_core_53_34u'
> > Species 'takifugu_rubripes' loaded from database
> 'takifugu_rubripes_core_53_4k'
> > Species 'caenorhabditis_elegans' loaded from database
> 'caenorhabditis_elegans_core_53_190'
> > Species 'pteropus_vampyrus' loaded from database
> 'pteropus_vampyrus_core_53_1b'
> > Species 'microcebus_murinus' loaded from database
> 'microcebus_murinus_core_53_1b'
> > Species 'ochotona_princeps' loaded from database
> 'ochotona_princeps_core_53_1c'
> > Species 'pan_troglodytes' loaded from database
> 'pan_troglodytes_core_53_21j'
> > Species 'felis_catus' loaded from database 'felis_catus_core_53_1f'
> > Species 'equus_caballus' loaded from database 'equus_caballus_core_53_2c'
> > Species 'procavia_capensis' loaded from database
> 'procavia_capensis_core_53_1b'
> > Species 'oryzias_latipes' loaded from database
> 'oryzias_latipes_core_53_1i'
> > Species 'macaca_mulatta' loaded from database
> 'macaca_mulatta_core_53_10k'
> > Species 'danio_rerio' loaded from database 'danio_rerio_core_53_7e'
> > Species 'gallus_gallus' loaded from database 'gallus_gallus_core_53_2k'
> > Species 'tupaia_belangeri' loaded from database
> 'tupaia_belangeri_core_53_1f'
> > Species 'ciona_intestinalis' loaded from database
> 'ciona_intestinalis_core_53_2l'
> > Species 'loxodonta_africana' loaded from database
> 'loxodonta_africana_core_53_2'
> > Species 'spermophilus_tridecemlineatus' loaded from database
> 'spermophilus_tridecemlineatus_core_53_1g'
> > Species 'pongo_pygmaeus' loaded from database 'pongo_pygmaeus_core_53_1c'
> > Species 'drosophila_melanogaster' loaded from database
> 'drosophila_melanogaster_core_53_54a'
> > Species 'gasterosteus_aculeatus' loaded from database
> 'gasterosteus_aculeatus_core_53_1j'
> > homo_sapiens_cdna_53_36o loaded
> > mus_musculus_cdna_53_37f loaded
> > mus_musculus_vega_53_37f loaded
> > homo_sapiens_vega_53_36o loaded
> > takifugu_rubripes_otherfeatures_53_4k loaded
> > danio_rerio_otherfeatures_53_7e loaded
> > pan_troglodytes_otherfeatures_53_21j loaded
> > taeniopygia_guttata_otherfeatures_53_1 loaded
> > rattus_norvegicus_otherfeatures_53_34u loaded
> > oryzias_latipes_otherfeatures_53_1i loaded
> > drosophila_melanogaster_otherfeatures_53_54a loaded
> > saccharomyces_cerevisiae_otherfeatures_53_1i loaded
> > gallus_gallus_otherfeatures_53_2k loaded
> > homo_sapiens_otherfeatures_53_36o loaded
> > xenopus_tropicalis_otherfeatures_53_41m loaded
> > pongo_pygmaeus_otherfeatures_53_1c loaded
> > gasterosteus_aculeatus_otherfeatures_53_1j loaded
> > bos_taurus_otherfeatures_53_4c loaded
> > tetraodon_nigroviridis_otherfeatures_53_8b loaded
> > anolis_carolinensis_otherfeatures_53_1 loaded
> > cavia_porcellus_otherfeatures_53_3a loaded
> > equus_caballus_otherfeatures_53_2c loaded
> > macaca_mulatta_otherfeatures_53_10k loaded
> > canis_familiaris_otherfeatures_53_2k loaded
> > ciona_savignyi_otherfeatures_53_2h loaded
> > ornithorhynchus_anatinus_otherfeatures_53_1j loaded
> > mus_musculus_otherfeatures_53_37f loaded
> > ciona_intestinalis_otherfeatures_53_2l loaded
> > anopheles_gambiae_otherfeatures_53_3k loaded
> > Bio::EnsEMBL::Variation::DBSQL::DBAdaptor module not found so variation
> databases will be ignored if found
> > Bio::EnsEMBL::Funcgen::DBSQL::DBAdaptor module not found so functional
> genomics databases will be ignored if found
> > Bio::EnsEMBL::Compara::DBSQL::DBAdaptor not found so the following
> compara databases will be ignored: ensembl_compara_53
> > ensembl_ancestral_53 loaded
> > GO software not installed so GO database ensembl_go_53 will be ignored
> >
> >
> > On 20 April 2012 15:51, Andy Yates <ayates at ebi.ac.uk> wrote:
> > Hi Sean,
> >
> > That is odd. Using the 53 API is the best way to access v53 data. Could
> you can change your code to the following:
> >
> > $registry->load_registry_from_db(-host => 'ensembldb.ensembl.org',
> -user => 'anonymous', -verbose => 1);
> >
> > This will emit a lot of debug information about the databases the
> registry can find & send that output back to us. We should be able to debug
> your problem then. Also can you send the latest version of your script
> please
> >
> > Many thanks,
> >
> > Andy
> >
> > Andrew Yates                   Ensembl Core Software Project Leader
> > EMBL-EBI                       Tel: +44-(0)1223-492538
> > Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
> > Cambridge CB10 1SD, UK         http://www.ensembl.org/
> >
> > On 20 Apr 2012, at 20:34, Sean O'Keeffe wrote:
> >
> > > Hi Andy,
> > > You are indeed spot on. I am using the ensembl 53 api. Switching to
> ensembl 66 solves the issue.
> > > However, I'm trying to extract hg18 coordinates not hg19 - this was
> why I used ensembl_53.
> > > What should I do to get these coords?
> > >
> > > Sean.
> > >
> > > On 20 April 2012 12:40, Andy Yates <ayates at ebi.ac.uk> wrote:
> > > Hi Sean,
> > >
> > > Normally if you are getting responses saying "can't call method on
> undefined value" points to you using an unreleased API version. Can you
> confirm the version of Ensembl you are using please? Also can you run the
> program ensembl/misc-scripts/ping_ensembl.pl which will attempt to
> diagnose your connection/setup
> > >
> > > All the best,
> > >
> > > Andy
> > >
> > > Andrew Yates                   Ensembl Core Software Project Leader
> > > EMBL-EBI                       Tel: +44-(0)1223-492538
> > > Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
> > > Cambridge CB10 1SD, UK         http://www.ensembl.org/
> > >
> > > On 20 Apr 2012, at 16:16, Sean O'Keeffe wrote:
> > >
> > > > Thanks for the response Javier.
> > > >
> > > > I see the reference to an array of objects and I've implemented this.
> > > > However I don't get it. The script dies at the call to
> fetch_all_by_external_name() - Can't call method
> "fetch_all_by_external_name" on an undefined value.
> > > > It never gets to implement the loop of array objects. The $variable
> $id is valid and prints out prior to the script dying.
> > > >
> > > > ...
> > > > print $id,"\n";
> > > > my $adaptor = $registry->get_adaptor( 'Human', 'Core', 'gene' );
> > > >
> > > > my $gene = $adaptor->fetch_all_by_external_name($id);
> > > >
> > > >   foreach $g(@$gene){
> > > >     $chr = $g->seq_region_name();
> > > >     $start = $g->seq_region_start();
> > > >     $end = $g->seq_region_end();
> > > >     print OUT join("\t", $chr,$start,$end,$id),"\n";
> > > >   }
> > > >
> > > >
> > > > On 20 April 2012 00:49, Javier Herrero <jherrero at ebi.ac.uk> wrote:
> > > > Dear Sean
> > > >
> > > > The method fetch_all_by_external_name returns a reference to an
> array of Bio::EnsEMBL::Gene objects. All the methods named
> "fetch_all_by..." return a reference to an array. The array might be empty
> or contain just one entry, but you will always get a reference to an array.
> Contrarily, all the methods named "fetch_by..." return either undef or 1
> single object.
> > > >
> > > > Typically, you would use a foreach loop to go through all possible
> returned object:
> > > >
> > > >
> > > > open OUT, ">$gene_file.coords";
> > > > for my $geneid ( @unique ) {
> > > >     chomp $geneid;
> > > >     ensembl_coords($geneid);
> > > > }
> > > >
> > > > sub ensembl_coords {
> > > >   my ($id) = @_;
> > > >
> > > >   my $adaptor = $registry->get_adaptor( 'Human', 'Core', 'gene' );
> > > >
> > > >   my $all_genes = $adaptor->fetch_all_by_external_name($id);
> > > >
> > > >   foreach my $gene (@$all_genes) {
> > > >
> > > >     $chr = $gene->seq_region_name();
> > > >     $start = $gene->seq_region_start();
> > > >     $end = $gene->seq_region_end();
> > > >     print OUT join("\t", $chr,$start,$end,$id),"\n"; #I have added
> the original $id here
> > > >   }
> > > >
> > > > }
> > > >
> > > >
> > > > I hope the helps
> > > >
> > > > Javier
> > > >
> > > >
> > > >
> > > >
> > > > On 20/04/12 04:49, Sean O'Keeffe wrote:
> > > >> Hi,
> > > >> I've used the code below on multiple occasions to convert external
> gene names to chromosome coords and it worked fine.
> > > >> However when I tried it just now I get the error for the very first
> gene DNAI2 and the script crashes:
> > > >>
> > > >> Can't call method "seq_region_name" on unblessed reference
> > > >>
> > > >> When I tried fetch_by_display_label($id) - I get:
> > > >>
> > > >> Can't call method "seq_region_name" on an undefined value
> > > >>
> > > >> Have I missed something?
> > > >> Thanks for any help,
> > > >> Sean.
> > > >>
> > > >> p.s. I tried connecting to the useastdb.ensembl.org, as I'm in the
> states, but It gave the following (maybe the 2 issues are related):
> > > >>
> > > >> DBI connect('host=useastdb.ensembl.org;port=3306','anonymous',...)
> failed: Can't connect to MySQL server on 'useastdb.ensembl.org' (111) at
> /home/sean/tools/ensembl_53/modules/Bio/EnsEMBL/Registry.pm line 1329
> > > >> Can't call method "selectall_arrayref" on an undefined value at
> /home/sean/tools/ensembl_53/modules/Bio/EnsEMBL/Registry.pm line 1332.
> > > >>
> > > >> ==============
> > > >>
> > > >> #!/usr/bin/perl
> > > >>
> > > >> use strict;
> > > >> use lib '/home/sean/tools/ensembl_53/modules';
> > > >>
> > > >> use Bio::SeqIO;
> > > >> use Bio::Root::IO;
> > > >> use Bio::EnsEMBL::DBSQL::BaseAdaptor;
> > > >> use Bio::EnsEMBL::Registry;
> > > >>
> > > >> my $registry = 'Bio::EnsEMBL::Registry';
> > > >> #$registry->load_registry_from_db(-host => 'useastdb.ensembl.org',-user
> => 'anonymous');
> > > >> $registry->load_registry_from_db(-host => 'ensembldb.ensembl.org',-user
> => 'anonymous');
> > > >>
> > > >> open OUT, ">$gene_file.coords";
> > > >> for my $geneid ( @unique ) {
> > > >>     chomp $geneid;
> > > >>     ($chr,$start, $end) = ensembl_coords($geneid);
> > > >>     print OUT join("\t", $chr,$start,$end,$geneid),"\n";
> > > >> }
> > > >>
> > > >> sub ensembl_coords {
> > > >>   my ($id) = @_;
> > > >>
> > > >>   my $adaptor = $registry->get_adaptor( 'Human', 'Core', 'gene' );
> > > >>
> > > >>   my $gene = $adaptor->fetch_all_by_external_name($id);
> > > >>   # my $gene = $adaptor->fetch_by_display_label($id);
> > > >>
> > > >>   $chr = $gene->seq_region_name();
> > > >>   $start = $gene->seq_region_start();
> > > >>   $end = $gene->seq_region_end();
> > > >>   return ($chr,$start,$end);
> > > >>
> > > >> }
> > > >>
> > > >>
> > > >> _______________________________________________
> > > >> Dev mailing list
> > > >> Dev at ensembl.org
> > > >>
> > > >> List admin (including subscribe/unsubscribe):
> > > >> http://lists.ensembl.org/mailman/listinfo/dev
> > > >>
> > > >> Ensembl Blog:
> > > >> http://www.ensembl.info/
> > > >
> > > > --
> > > > Javier Herrero, PhD
> > > > Ensembl Coordinator and Ensembl Compara Project Leader
> > > > European Bioinformatics Institute (EMBL-EBI)
> > > > Wellcome Trust Genome Campus, Hinxton
> > > > Cambridge - CB10 1SD - UK
> > > >
> > > >
> > > > _______________________________________________
> > > > Dev mailing list    Dev at ensembl.org
> > > > List admin (including subscribe/unsubscribe):
> http://lists.ensembl.org/mailman/listinfo/dev
> > > > Ensembl Blog: http://www.ensembl.info/
> > > >
> > > >
> > > > _______________________________________________
> > > > Dev mailing list    Dev at ensembl.org
> > > > List admin (including subscribe/unsubscribe):
> http://lists.ensembl.org/mailman/listinfo/dev
> > > > Ensembl Blog: http://www.ensembl.info/
> > >
> > >
> > > _______________________________________________
> > > Dev mailing list    Dev at ensembl.org
> > > List admin (including subscribe/unsubscribe):
> http://lists.ensembl.org/mailman/listinfo/dev
> > > Ensembl Blog: http://www.ensembl.info/
> > >
> > >
> > > _______________________________________________
> > > Dev mailing list    Dev at ensembl.org
> > > List admin (including subscribe/unsubscribe):
> http://lists.ensembl.org/mailman/listinfo/dev
> > > Ensembl Blog: http://www.ensembl.info/
> >
> >
> > _______________________________________________
> > Dev mailing list    Dev at ensembl.org
> > List admin (including subscribe/unsubscribe):
> http://lists.ensembl.org/mailman/listinfo/dev
> > Ensembl Blog: http://www.ensembl.info/
> >
> >
> > _______________________________________________
> > Dev mailing list    Dev at ensembl.org
> > List admin (including subscribe/unsubscribe):
> http://lists.ensembl.org/mailman/listinfo/dev
> > Ensembl Blog: http://www.ensembl.info/
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe):
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20120423/d76ea662/attachment.html>


More information about the Dev mailing list