[ensembl-dev] a question about GO annotation

Andreas Kusalananda Kähäri ak4 at sanger.ac.uk
Thu Sep 13 09:19:49 BST 2012

Hi Mei,

In Script 1, you fetch all genes and select the ones that are directly
associated with GO terms having names matching "transcription",
"chomratin", or "histone".

In Script 2, you fetch all genes that are directly associated with GO
terms having names matching "transcription","chomratin", or "histone",
*or* with any of the child terms of those GO term.  The child terms
might or might not be named "transcription", "chomratin", or "histone".

When you fetch_all_by_GOTerm(), the API will take the GO hierarchy into
account and fetch all genes assocated with the term itself, or with any
of its child terms.  This is also mentioned in the documentation of that


On Thu, Sep 13, 2012 at 12:20:34AM +0800, JiangMei wrote:
> Hi All. Sorry to bother you.
> I wrote two scripts to fetch genes annotated with specific GO terms. The scripts are shown in the following:
> Script 1:
> #Store target genes in @genelist
> my @genelist;
> use Bio::EnsEMBL::Registry;
> my $registry = 'Bio::EnsEMBL::Registry';
> $registry->load_registry_from_db(
>       -host       =>'ensembldb.ensembl.org',
>       -user       =>'anonymous',
>       -db_version =>'67');
> my $go_adaptor=$registry->get_adaptor( 'Multi', 'Ontology', 'GOTerm' );
> my $gene_adaptor=$registry->get_adaptor( 'drosophila melanogaster', 'Core', 'Gene' );
> for $gene(@{$gene_adaptor->fetch_all}){
>       my @db_links=@{$gene->get_all_DBLinks('GO')};
>       for $dbe(@db_links){
>             my $go_name=$dbe->description;
>             push @genelist,$gene->stable_id if $go_name=~/transcription|chromatin/;
>      }
> }
> Script 2:
> #Store target genes in @genelist
> my @genelist;
> use Bio::EnsEMBL::Registry;
> my $registry = 'Bio::EnsEMBL::Registry';
> $registry->load_registry_from_db(
>       -host       =>'ensembldb.ensembl.org',
>       -user       =>'anonymous',
>       -db_version =>'67');
> my $go_adaptor=$registry->get_adaptor( 'Multi', 'Ontology', 'GOTerm' );
> my $gene_adaptor=$registry->get_adaptor( 'drosophila melanogaster', 'Core', 'Gene' );
> for $term(@{$go_adaptor->fetch_all}){
>       my $name=$term->name;
>       my $acc=$term->accession;
>       if(($acc=~/^GO:/)&&($name=~/transcription|chromatin|histone/)){
>          for $gene(@{$gene_adaptor->fetch_all_by_GOTerm($term)}){
>                push @genelist,$gene->stable_id;
>          }
>      }
> }
> Basically, Script 1 fetched all the genes, then got GO annotations for each gene. If GO matched the regular express, then push gene ID to @genelist. Script 2 fetched all the GO terms and if GO matched the regular expression, push gene ID to @genelist. The two scripts were supposed to get the same gene lists. However, They got different lists. Does anyone konw the reason? Are there anything wrong in the scripts?
> Wish your help! Thanks a bunch! I really appreciate it.
> Best, Mei

> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

Andreas Kusalananda Kähäri
Ensembl Gene Annotation Team

Sent from the tips of my fingers

More information about the Dev mailing list