[ensembl-dev] Missing GO data for bacteria when using API

刘鹏飞 liupfskygre at gmail.com
Fri Jan 17 02:22:19 GMT 2014


Missing GO data when using API to

Dear all

I am using following scripts to fetch all GO for a bacteria.

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

# methanocella_conradii_hz254

#!/usr/bin/perl

use strict;

use warnings;

use Bio::EnsEMBL::LookUp;

# load the lookup from the main Ensembl Bacteria public server

my $lookup = Bio::EnsEMBL::LookUp->new(

  -URL => "http://bacteria.ensembl.org/registry.json",

  -NO_CACHE => 1

);

# find the correct database adaptor using a unique name

my ($dba) = @{$lookup->get_by_name_exact(

  'methanocella_conradii_hz254'

)};



my $genes = $dba->get_GeneAdaptor()->fetch_all(); # where is the
get_GeneAdaptor() documentation

# test

print "Found ".scalar @$genes." genes for ".$dba->species()."\n";



use Bio::EnsEMBL::DBSQL::OntologyDBAdaptor;

#did you try to import the ontology the adaptor before constructing it? you
should have line above!

# problems lists below solved!

# get adaptor for ontology

my $ontology_dba = Bio::EnsEMBL::DBSQL::OntologyDBAdaptor->new(

 -HOST => 'mysql.ebi.ac.uk',

 -USER => 'anonymous',

 -PORT => '4157',

 -group   => 'ontology',

 -dbname => 'ensemblgenomes_ontology_21_74',

 -species => 'multi' );



my $goada = $ontology_dba->get_adaptor('OntologyTerm');





# get go infomation

open (MYFILE, '>>HZ254gene_go.txt');

foreach my $gene (@$genes){

foreach my $link (@{ $gene->get_all_DBLinks } ){

if ($link->database eq "GO"){

my $term_id=$link->display_id;

my $term_name='-';

my $term=$goada->fetch_by_accession($term_id);

if($term and $term->name){

$term_name=$term->name;}

print MYFILE $gene->stable_id, "\t", $term_id, "\n";

  }

 }

};

close (MYFILE);

# API version, 74

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

Take Mtc_0001, one of my gene for example, data from perl API was as
following:

Mtc_0001         GO:0016740 (molecular_function)

After searching on ensembl bacteria for M. conradii and for gene Mtc_0001 (
http://bacteria.ensembl.org/methanocella_conradii_hz254/Transcript/Ontology/molecular_function?db=core;g=Mtc_0001;oid=molecular_function;r=Chromosome:38-805;t=AFC98776;tab=t)


I got the same results:

GO:0016740    transferase activity
UniProtKB/TrEMBL:H8I517<http://www.uniprot.org/uniprot/H8I517>molecular_function



However, when I continued to browse the its source,
UniProtKB/TrEMBL:H8I517, found in all GO annotation the following:



UniProtKB H8I517

Mtc_0001

*GO:0008152*

metabolic process

P

IEA

UniProt Keywords2GO (UniProtKB/TrEMBL entries)

UniProtKB-KW:KW-0808

1041930

20140111

GOC

*Function*

UniProtKB

H8I517

Mtc_0001

*GO:0016740*

transferase activity

F

IEA

UniProt Keywords2GO (UniProtKB/TrEMBL entries)

UniProtKB-KW:KW-0808

1041930

20140107

UniProt



*results from API above missed the metabolic process data for Mtc_0001.
(Also for all other genes, just molecular function GO term ID were return).*



I need to do GO analysis for this bacteria, hopefully want to fetch all GO
for it. Now I am little confused on this, any suggestion to figure out this
situation are appreciated!

-- 
Pengfei Liu, PhD Candidate

Lab of Microbial Ecology
College of Resources and Environmental Sciences
China Agricultural University
No.2 Yuanmingyuanxilu, Beijing, 100193
P.R. China

Tel: +86-10-62731358
Fax: +86-10-62731016

E-mail: liupfskygre at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20140117/47bfe228/attachment.html>


More information about the Dev mailing list