[ensembl-dev] How to programmatically get ontology go terms for transcript?
Andy Yates
ayates at ebi.ac.uk
Tue Apr 10 16:17:58 BST 2012
Hi James,
GO Slim terms are located in a secondary database; Ensembl calls this ensembl_ontology_RELEASE & Ensembl Genomes calls it ensemblgenomes_ontology_RELEASE. We have a lot of examples of how to use it in our core checkout under:
ensembl/misc-scripts/ontology
We have a README explaining the design of the schema & API and then examples in the scripts directory.
As for links the web code has a mechanism which allows association of a URL pattern with an external identifier. I doubt you will be able to script against this so I would suggest a small lookup in your code to do the associations.
I hope this helps,
Andy
Andrew Yates Ensembl Core Software Project Leader
EMBL-EBI Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK http://www.ensembl.org/
On 10 Apr 2012, at 15:27, Thomason, James wrote:
> Okay, good, I am getting somewhere. Hopefully I'm almost there.
>
> I filtered out my call to get_all_DBLinks() to only look at the OntologyXref objects.
>
> Comparing against thes tables:
> http://dev.gramene.org/Arabidopsis_thaliana/Transcript/Ontology/Table?db=core;g=AT3G52430;oid=1;r=3:19431371-19434403;t=AT3G52430.1
>
> it looks like the OntologyXref objects directly give me the accession, term, and evidence codes. So next questions are -
>
> 1) How do I get those go slim accessions? They don't appear to be OntologyXref objects, and I'm assuming they're somehow related. Trying the various get_* methods didn't seem to yield the results I wanted.
>
> 2) How can I get the URLs that the accessions link to? For the non-slim ones, I see that for my purposes I could just hardwire to Gramene, but I'd really rather populate it correctly.
>
> 3) How do I group the accessions together? Again, that page has multiple tables - descendent of biological process, cellular component, etc. Is it a matter of looking at each OntologyXref's get_all_masters() values and parsing together a tree from that?
>
> Many thanks for the help and the patience. :-)
>
> On Apr 10, 2012, at 2:52 AM, Andy Yates wrote:
>
> Hi James,
>
> If you are going to use the API to extract this information then you should look at the Bio::EnsEMBL::OntologyXref which extends DBEntry. The API automatically creates these objects when it encounters an object_xref link which also has an entry in the ontology_xref table. As Jan said get_all_DBLinks() is the method to use and will return these OntologyXref objects.
>
> All the best,
>
> Andy
>
> On 10 Apr 2012, at 00:08, "Thomason, James" <thomason at cshl.edu<mailto:thomason at cshl.edu>> wrote:
>
> Well, that's progress, I guess. But what do I do with a Bio::EnsEMBL::DBEntry object?
>
> Looking at them, they don't appear to contain any of the data in that ontology table. Do I need to hop to some additional objects? Nothing looks obvious to jump to next.
>
> On Apr 9, 2012, at 5:15 PM, Jan Vogel wrote:
>
>
> Hi James,
>
> check out the doxygen ensembl api doc:
>
> http://uswest.ensembl.org/info/docs/Doxygen/core-api/classBio_1_1EnsEMBL_1_1Transcript.html#afbe0947fe458e2f2739f78852c292f7c
>
> Bio::EnsEMBL::Transcript::get_all_DBLinks( ) is your friend - for Ensembl annotation on www.ensembl.org<http://www.ensembl.org>, this should return http://www.geneontology.org/GO.slims.shtml#whatIs annotations.
> Another way would be to use biomart.
>
> I'm unsure if this is set up for the gramene website …
>
> Hope this helps,
>
> Jan Vogel
>
>
> On Apr 9, 2012, at 2:56 PM, Thomason, James wrote:
>
> Hi all,
>
> I'm completely stumped. I've been charged with programmatically extracting out ontology go terms from our ensembl installation. A relevant link would be:
>
> http://www.gramene.org/Arabidopsis_thaliana/Transcript/Ontology/Table?db=core;g=AT3G52430;oid=1;r=3:19431371-19434403;t=AT3G52430.1
>
> I want to pull out everything inside that "Ontology Table" bit. But I'm utterly stumped as to how to go about doing it. I dug through the code enough to find that the page is generated through an EnsEMBL::Web::Component::Transcript::Go object, but I don't know how to instantiate one on the command line to get at the info. Presumably, since it's in the Web sub-tree, I really shouldn't be doing that on the command line anyway. Is there some way to link to that data through a Bio::EnsEMBL::Transcript object, perhaps? An xref or something?
>
> For now, I basically just want to dump out that data in a tab delimited format, so I don't need anything fancy other than actually getting to it.
>
> Any pointers in the right direction would be greatly appreciated.
>
> Thanks,
>
> --
> -Jim Thomason...
>
> Scientific Informatics Developer @ The Ware Lab,
> a USDA-ARS Laboratory at Cold Spring Harbor Laboratory
> http://www.warelab.org/
> http://www.cshl.edu/
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org<mailto:Dev at ensembl.org>
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org<mailto:Dev at ensembl.org>
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
> --
> -Jim Thomason...
>
> Scientific Informatics Developer @ The Ware Lab,
> a USDA-ARS Laboratory at Cold Spring Harbor Laboratory
> http://www.warelab.org/
> http://www.cshl.edu/
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org<mailto:Dev at ensembl.org>
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org<mailto:Dev at ensembl.org>
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
> --
> -Jim Thomason...
>
> Scientific Informatics Developer @ The Ware Lab,
> a USDA-ARS Laboratory at Cold Spring Harbor Laboratory
> http://www.warelab.org/
> http://www.cshl.edu/
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
More information about the Dev
mailing list