[ensembl-dev] problems on fetching GO terms for all genes of a bacteria using API

mr6 at ebi.ac.uk mr6 at ebi.ac.uk
Tue Jan 7 14:21:03 GMT 2014


Hi Pengfei,

Replies above

> Hi,
> Magali
> Thanks for your quick reply. I followed your instruction and modified line
> as follows:
> # get adaptor for ontology
> my $ontology_dba=Bio::EnsEMBL::DBSQL::OntologyDBAdaptor->new(
> -HOST=>"mysql.ebi.ac.uk",
> -USER=>'anonymous',
> -PORT=>'4157',
> -group =>'ontology',
> -dbname=>'ensemblgenomes_ontology_21_74',
> -species=>'multi');
>
> #one more question: where could I find those informations like host, user,
> and dbname that I needed to get data object or adaptor wanted ?

I used the information provided in the ensembl bacteria documentation
http://bacteria.ensembl.org/info/data/accessing_ensembl_bacteria.html#advanced-use
The default host for ensembl genomes databases is mysql.ebi.ac.uk with the
connection details as mentioned.
For the database name, an ontology database would normally be called
ensemblgenomes_ontology_egrelease_erelease
where egrelease refers to the ensembl genomes release version (here 21)
and erelease corresponds to the ensembl release version (here 74)
In ensembl, the corresponding database is called ensembl_ontology_74
If you have a mysql server installed, you can log directly onto the
ensembl genomes server to find the exact name of the database you are
looking for

>
> # now use the DBAdaptor to get_adaptor
> my $goada=$ontology_dba->get_adaptor('Multi','Ontology','OntologyTerm');
> # in your reply is $goada=$registry, but i think it shoud by
> $goada=$ontology_dba, right?

Sorry about the confusion.
We normally use registry objects to connect to databases, but it does not
cope well with multi-species databases like the bacterial ones, hence the
use of the lookup and direct DBadaptors.
The correct syntax should have been:
my $ontology_dba = Bio::EnsEMBL::DBSQL::OntologyDBAdaptor->new(
 -HOST => 'mysql.ebi.ac.uk',
 -USER => 'anonymous',
 -PORT => '4157',
 -group   => 'ontology',
 -dbname => 'ensemblgenomes_ontology_21_74',
 -species => 'multi' );

my $goada = $ontology_dba->get_adaptor('OntologyTerm');

Hopefully, this should also solve the issue below.

>
> the output:
> Can't locate object method "new" via package
> "Bio::EnsEMBL::DBSQL::OntologyDBAdaptor" at /home/liupf/hz254_2.pl line
> 21.
>
> I check the doxygen for OntologyDBAdaptor, the new() methods, but returned
> examples were all Bio::EnsEMBL::DBSQL::DBAdaptor::new(), so I think new
> method is no longer supported by OntologyDBAdaptor, so I also tried
> my $ontology_dba=Bio::EnsEMBL::DBSQL::DBAdaptor->new(.....
>
> unfortunately, output came:
> Can't call method "fetch_by_accession" on an undefined value at
> /home/liupf/
> hz254_2.pl line 37.
>
> ### confusion on understanding Ensembl API
> Use API to fetch data, you need to use the right database and the
> corresponding DBAdaptor, and then use the right object adaptor and methods
> to do it. Is my understanding right?

That is correct.

> I am confused by that:
> my genes was in the bacteria database, I could fetch them, but if the gene
> ontology terms was in another database, how could the connected and does
> that mean I need two DBAdaptor for each of them?

The bacterial database contains the genes and all related information.
It thus contains ontology terms attached to translations and genes.
It does not however contain the definition for each ontology term, nor its
relationships with other terms, like descendants and ancestors.
This additional information is stored separately in the ontology database.

> Hope those not bothering you too much!
> ###
> Thank you very much!


Hope this helps,
Magali

>
>
>
> 2014/1/7 <mr6 at ebi.ac.uk>
>
>> Hi Pengfei,
>>
>> The get_GeneAdaptor method is equivalent using get_Adaptor('Gene').
>> More documentation can be found here:
>>
>> http://www.ensembl.org/info/docs/Doxygen/core-api/classBio_1_1EnsEMBL_1_1DBSQL_1_1DBAdaptor.html#a2a1ee81ecb9507fc5ea7bdf39be97bf9
>>
>> As for the undefined value message you are getting.
>> By calling get_adaptor on $dba, you are attempting to get an object
>> adaptor defined in the context of your bacteria database.
>> Ontologies are stored separately in their own database,
>> ensembl_ontology.
>>
>> The easiest way to access the ontology database would be as follow:
>> my $ontology_dba = Bio::EnsEMBL::DBSQL::OntologyDBAdaptor->new(
>> -HOST => 'mysql.ebi.ac.uk',
>> -USER => 'anonymous',
>> -PORT => '4157',
>> -group   => 'ontology',
>> -dbname => 'ensemblgenomes_ontology_21_74',
>> -species => 'multi' );
>>
>> my $goada = $registry->get_adaptor( 'Multi', 'Ontology', 'OntologyTerm'
>> );
>>
>> You should then be able to call fetch_by_accession on $goada for a given
>> GO accession.
>>
>>
>> Hope that helps,
>> Magali
>>
>> > Hi all
>> >   I am new to API. Now I am trying to use it to get all GO terms for
>> each
>> > genes of a archaea(Methanocella conradii HZ254), and want to get a
>> table
>> > with two columns, on for gene name and the other for GO term
>> correponding
>> > to it
>> >
>> > Following the instruction of API and the modifications to use API for
>> > bacteria, I use the following code to do the job:
>> > # load the lookup from the main Ensembl Bacteria public server
>> > my $lookup = Bio::EnsEMBL::LookUp->new(
>> >   -URL => "http://bacteria.ensembl.org/registry.json",
>> >   -NO_CACHE => 1
>> > );
>> > # find the correct database adaptor using a unique name
>> > my ($dba) = @{$lookup->get_by_name_exact(
>> >   'methanocella_conradii_hz254'
>> > )};
>> > # get adaptor for ontology
>> > my $goada=$dba->get_adaptor('Multi','Ontology','OntologyTerm');
>> > my $genes = $dba->get_GeneAdaptor()->fetch_all(); # where is the
>> > get_GeneAdaptor() documentation
>> > # ###test####
>> > print "Found ".scalar @$genes." genes for ".$dba->species()."\n";
>> >
>> > # get go infomation (modified from kokocinsky.net ensembl coding)
>> > foreach my $gene (@$genes){
>> > my $links = $gene->get_all_DBLinks();
>> > foreach my $link (@$links){
>> > if ($link->database eq "GO"){
>> > my $term_id=$link->display_id;
>> > my $term_name='-';
>> > my $term=$goada->fetch_by_accession($term_id);
>> > if($term and $term->name){
>> > $term_name=$term->name;}
>> > print $gene->stable_id.":$term_id ($term_name)\n";
>> > # fetch complete GO hierachy
>> > foreach my $ancestor_term (@{$term->ancestors()}){
>> > print "\t". $ancestor_term->accession." (".$ancestor_term->name.")\n";
>> > }
>> >   }
>> >  }
>> > }
>> >
>> > it works well before "get go information"
>> > the output was as following:
>> > Can't call method "fetch_by_accession" on an undefined value at
>> > /home/liupf/
>> > hz254.pl line 27.
>> > 1, I do not understand the use of 'get_GeneAdaptor', I could not find
>> > documentation on this synthax.
>> > 2, please give me some suggestiones on how to fullfill my task.
>> >
>> > Thank you all!
>> >
>> > $ perl ~/ApiVersion.pl
>> > The API version used is 74
>> >
>> > --
>> > Pengfei Liu, PhD Candidate
>> >
>> > Lab of Microbial Ecology
>> > College of Resources and Environmental Sciences
>> > China Agricultural University
>> > No.2 Yuanmingyuanxilu, Beijing, 100193
>> > P.R. China
>> >
>> > Tel: +86-10-62731358
>> > Fax: +86-10-62731016
>> >
>> > E-mail: liupfskygre at gmail.com
>> > _______________________________________________
>> > Dev mailing list    Dev at ensembl.org
>> > Posting guidelines and subscribe/unsubscribe info:
>> > http://lists.ensembl.org/mailman/listinfo/dev
>> > Ensembl Blog: http://www.ensembl.info/
>> >
>>
>>
>
>
> --
> Pengfei Liu, PhD Candidate
>
> Lab of Microbial Ecology
> College of Resources and Environmental Sciences
> China Agricultural University
> No.2 Yuanmingyuanxilu, Beijing, 100193
> P.R. China
>
> Tel: +86-10-62731358
> Fax: +86-10-62731016
>
> E-mail: liupfskygre at gmail.com
>
> If you are afraid of tomorrow, how can you enjoy today!
> Keep hungry, Keep foolish!
> Moving forward!
>





More information about the Dev mailing list