[ensembl-dev] How to programmatically get ontology go terms for transcript?

Thomason, James thomason at cshl.edu
Tue Apr 17 17:01:30 BST 2012


I was out of town for a week at a conference, but I'm back on this and hopefully nearly there. What I've got so far -

    my $trxA = ((code to get transcript adaptor));
    my $goA = ((code to get go term adaptor));
    my $transcript = $trxA->fetch_by_stable_id('AT3G52430.1');
    my $links = $transcript->get_all_DBLinks();
    foreach my $link (@$links) {
        next unless (ref $link) =~ /OntologyXref/;
        my $term = $goA->fetch_by_accession($link->primary_id);
        foreach my $c (@{$t->$method}) { print "C->", $c->accession, "\n";
                print "C: ", $c->accession, "\n";
        }
    }

This is nearly there! I just seem to have too much data. Again, for this page:
http://gramene.org/Arabidopsis_thaliana/Transcript/Ontology/Table?db=core;g=AT3G52430;oid=1;r=3:19431371-19434403;t=AT3G52430.1

Looking at the first go term (GO:0001666), there's only one GOSlim accession (GO:0006950)

But this code above gives me a bunch:

C->GO:0006950
C->GO:0070482
C->GO:0042221
C->GO:0050896
C->GO:0050896
C->GO:0008150

Grasping at straws and poking at the objects, I discovered the subsets array, which, for example, as a value "goslim_plant". So I tried filtering down on only the go slims that have goslim_plant as their subset. That still gave me this:

C->GO:0006950
C->GO:0008150

Presumably, I should also be tossing out ID GO:0008150, but I don't know why. Is filtering on the subsets correct? Regardless, what additional filtration should I be applying at this point?

Many thanks again for all the help.

On Apr 10, 2012, at 10:18 AM, Andy Yates wrote:

Hi James,

GO Slim terms are located in a secondary database; Ensembl calls this ensembl_ontology_RELEASE & Ensembl Genomes calls it ensemblgenomes_ontology_RELEASE. We have a lot of examples of how to use it in our core checkout under:

ensembl/misc-scripts/ontology

We have a README explaining the design of the schema & API and then examples in the scripts directory.

As for links the web code has a mechanism which allows association of a URL pattern with an external identifier. I doubt you will be able to script against this so I would suggest a small lookup in your code to do the associations.

I hope this helps,

Andy

Andrew Yates                   Ensembl Core Software Project Leader
EMBL-EBI                       Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK         http://www.ensembl.org/

On 10 Apr 2012, at 15:27, Thomason, James wrote:

Okay, good, I am getting somewhere. Hopefully I'm almost there.

I filtered out my call to get_all_DBLinks() to only look at the OntologyXref objects.

Comparing against thes tables:
http://dev.gramene.org/Arabidopsis_thaliana/Transcript/Ontology/Table?db=core;g=AT3G52430;oid=1;r=3:19431371-19434403;t=AT3G52430.1

it looks like the OntologyXref objects directly give me the accession, term, and evidence codes. So next questions are -

1) How do I get those go slim accessions? They don't appear to be  OntologyXref objects, and I'm assuming they're somehow related. Trying the various get_* methods didn't seem to yield the results I wanted.

2) How can I get the URLs that the accessions link to? For the non-slim ones, I see that for my purposes I could just hardwire to Gramene, but I'd really rather populate it correctly.

3) How do I group the accessions together? Again, that page has multiple tables - descendent of biological process, cellular component, etc.  Is it a matter of looking at each OntologyXref's get_all_masters() values and parsing together a tree from that?

Many thanks for the help and the patience. :-)

On Apr 10, 2012, at 2:52 AM, Andy Yates wrote:

Hi James,

If you are going to use the API to extract this information then you should look at the Bio::EnsEMBL::OntologyXref which extends DBEntry. The API automatically creates these objects when it encounters an object_xref link which also has an entry in the ontology_xref table. As Jan said get_all_DBLinks() is the method to use and will return these OntologyXref objects.

All the best,

Andy

On 10 Apr 2012, at 00:08, "Thomason, James" <thomason at cshl.edu<mailto:thomason at cshl.edu><mailto:thomason at cshl.edu>> wrote:

Well, that's progress, I guess. But what do I do with a Bio::EnsEMBL::DBEntry object?

Looking at them, they don't appear to contain any of the data in that ontology table. Do I need to hop to some additional objects? Nothing looks obvious to jump to next.

On Apr 9, 2012, at 5:15 PM, Jan Vogel wrote:


Hi James,

check out the doxygen ensembl api doc:

http://uswest.ensembl.org/info/docs/Doxygen/core-api/classBio_1_1EnsEMBL_1_1Transcript.html#afbe0947fe458e2f2739f78852c292f7c

Bio::EnsEMBL::Transcript::get_all_DBLinks( )  is your friend - for Ensembl annotation on www.ensembl.org<http://www.ensembl.org><http://www.ensembl.org>, this should return http://www.geneontology.org/GO.slims.shtml#whatIs annotations.
Another way would be to use biomart.

I'm unsure if this is set up for the gramene website …

Hope  this helps,

    Jan Vogel


On Apr 9, 2012, at 2:56 PM, Thomason, James wrote:

Hi all,

I'm completely stumped. I've been charged with programmatically extracting out ontology go terms from our ensembl installation. A relevant link would be:

http://www.gramene.org/Arabidopsis_thaliana/Transcript/Ontology/Table?db=core;g=AT3G52430;oid=1;r=3:19431371-19434403;t=AT3G52430.1

I want to pull out everything inside that "Ontology Table" bit. But I'm utterly stumped as to how to go about doing it. I dug through the code enough to find that the page is generated through an EnsEMBL::Web::Component::Transcript::Go object, but I don't know how to instantiate one on the command line to get at the info. Presumably, since it's in the Web sub-tree, I really shouldn't be doing that on the command line anyway. Is there some way to link to that data through a Bio::EnsEMBL::Transcript object, perhaps? An xref or something?

For now, I basically just want to dump out that data in a tab delimited format, so I don't need anything fancy other than actually getting to it.

Any pointers in the right direction would be greatly appreciated.

Thanks,

--
-Jim Thomason...

Scientific Informatics Developer @ The Ware Lab,
a USDA-ARS Laboratory at Cold Spring Harbor Laboratory
http://www.warelab.org/
http://www.cshl.edu/


_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org><mailto:Dev at ensembl.org>
List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/


_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org><mailto:Dev at ensembl.org>
List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/

--
-Jim Thomason...

Scientific Informatics Developer @ The Ware Lab,
a USDA-ARS Laboratory at Cold Spring Harbor Laboratory
http://www.warelab.org/
http://www.cshl.edu/


_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org><mailto:Dev at ensembl.org>
List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/

_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org><mailto:Dev at ensembl.org>
List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/

--
-Jim Thomason...

Scientific Informatics Developer @ The Ware Lab,
a USDA-ARS Laboratory at Cold Spring Harbor Laboratory
http://www.warelab.org/
http://www.cshl.edu/


_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/


_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/

--
-Jim Thomason...

@ The Ware Lab,
a USDA-ARS Laboratory at Cold Spring Harbor Laboratory
http://www.warelab.org/
http://www.cshl.edu/





More information about the Dev mailing list