[ensembl-dev] Missing GO data for bacteria when using API
刘鹏飞
liupfskygre at gmail.com
Fri Jan 17 02:22:19 GMT 2014
Missing GO data when using API to
Dear all
I am using following scripts to fetch all GO for a bacteria.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
# methanocella_conradii_hz254
#!/usr/bin/perl
use strict;
use warnings;
use Bio::EnsEMBL::LookUp;
# load the lookup from the main Ensembl Bacteria public server
my $lookup = Bio::EnsEMBL::LookUp->new(
-URL => "http://bacteria.ensembl.org/registry.json",
-NO_CACHE => 1
);
# find the correct database adaptor using a unique name
my ($dba) = @{$lookup->get_by_name_exact(
'methanocella_conradii_hz254'
)};
my $genes = $dba->get_GeneAdaptor()->fetch_all(); # where is the
get_GeneAdaptor() documentation
# test
print "Found ".scalar @$genes." genes for ".$dba->species()."\n";
use Bio::EnsEMBL::DBSQL::OntologyDBAdaptor;
#did you try to import the ontology the adaptor before constructing it? you
should have line above!
# problems lists below solved!
# get adaptor for ontology
my $ontology_dba = Bio::EnsEMBL::DBSQL::OntologyDBAdaptor->new(
-HOST => 'mysql.ebi.ac.uk',
-USER => 'anonymous',
-PORT => '4157',
-group => 'ontology',
-dbname => 'ensemblgenomes_ontology_21_74',
-species => 'multi' );
my $goada = $ontology_dba->get_adaptor('OntologyTerm');
# get go infomation
open (MYFILE, '>>HZ254gene_go.txt');
foreach my $gene (@$genes){
foreach my $link (@{ $gene->get_all_DBLinks } ){
if ($link->database eq "GO"){
my $term_id=$link->display_id;
my $term_name='-';
my $term=$goada->fetch_by_accession($term_id);
if($term and $term->name){
$term_name=$term->name;}
print MYFILE $gene->stable_id, "\t", $term_id, "\n";
}
}
};
close (MYFILE);
# API version, 74
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Take Mtc_0001, one of my gene for example, data from perl API was as
following:
Mtc_0001 GO:0016740 (molecular_function)
After searching on ensembl bacteria for M. conradii and for gene Mtc_0001 (
http://bacteria.ensembl.org/methanocella_conradii_hz254/Transcript/Ontology/molecular_function?db=core;g=Mtc_0001;oid=molecular_function;r=Chromosome:38-805;t=AFC98776;tab=t)
I got the same results:
GO:0016740 transferase activity
UniProtKB/TrEMBL:H8I517<http://www.uniprot.org/uniprot/H8I517>molecular_function
However, when I continued to browse the its source,
UniProtKB/TrEMBL:H8I517, found in all GO annotation the following:
UniProtKB H8I517
Mtc_0001
*GO:0008152*
metabolic process
P
IEA
UniProt Keywords2GO (UniProtKB/TrEMBL entries)
UniProtKB-KW:KW-0808
1041930
20140111
GOC
*Function*
UniProtKB
H8I517
Mtc_0001
*GO:0016740*
transferase activity
F
IEA
UniProt Keywords2GO (UniProtKB/TrEMBL entries)
UniProtKB-KW:KW-0808
1041930
20140107
UniProt
*results from API above missed the metabolic process data for Mtc_0001.
(Also for all other genes, just molecular function GO term ID were return).*
I need to do GO analysis for this bacteria, hopefully want to fetch all GO
for it. Now I am little confused on this, any suggestion to figure out this
situation are appreciated!
--
Pengfei Liu, PhD Candidate
Lab of Microbial Ecology
College of Resources and Environmental Sciences
China Agricultural University
No.2 Yuanmingyuanxilu, Beijing, 100193
P.R. China
Tel: +86-10-62731358
Fax: +86-10-62731016
E-mail: liupfskygre at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20140117/47bfe228/attachment.html>
More information about the Dev
mailing list