[ensembl-dev] How to programmatically get ontology go terms for transcript?
Thomason, James
thomason at cshl.edu
Tue Apr 17 17:01:30 BST 2012
I was out of town for a week at a conference, but I'm back on this and hopefully nearly there. What I've got so far -
my $trxA = ((code to get transcript adaptor));
my $goA = ((code to get go term adaptor));
my $transcript = $trxA->fetch_by_stable_id('AT3G52430.1');
my $links = $transcript->get_all_DBLinks();
foreach my $link (@$links) {
next unless (ref $link) =~ /OntologyXref/;
my $term = $goA->fetch_by_accession($link->primary_id);
foreach my $c (@{$t->$method}) { print "C->", $c->accession, "\n";
print "C: ", $c->accession, "\n";
}
}
This is nearly there! I just seem to have too much data. Again, for this page:
http://gramene.org/Arabidopsis_thaliana/Transcript/Ontology/Table?db=core;g=AT3G52430;oid=1;r=3:19431371-19434403;t=AT3G52430.1
Looking at the first go term (GO:0001666), there's only one GOSlim accession (GO:0006950)
But this code above gives me a bunch:
C->GO:0006950
C->GO:0070482
C->GO:0042221
C->GO:0050896
C->GO:0050896
C->GO:0008150
Grasping at straws and poking at the objects, I discovered the subsets array, which, for example, as a value "goslim_plant". So I tried filtering down on only the go slims that have goslim_plant as their subset. That still gave me this:
C->GO:0006950
C->GO:0008150
Presumably, I should also be tossing out ID GO:0008150, but I don't know why. Is filtering on the subsets correct? Regardless, what additional filtration should I be applying at this point?
Many thanks again for all the help.
On Apr 10, 2012, at 10:18 AM, Andy Yates wrote:
Hi James,
GO Slim terms are located in a secondary database; Ensembl calls this ensembl_ontology_RELEASE & Ensembl Genomes calls it ensemblgenomes_ontology_RELEASE. We have a lot of examples of how to use it in our core checkout under:
ensembl/misc-scripts/ontology
We have a README explaining the design of the schema & API and then examples in the scripts directory.
As for links the web code has a mechanism which allows association of a URL pattern with an external identifier. I doubt you will be able to script against this so I would suggest a small lookup in your code to do the associations.
I hope this helps,
Andy
Andrew Yates Ensembl Core Software Project Leader
EMBL-EBI Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK http://www.ensembl.org/
On 10 Apr 2012, at 15:27, Thomason, James wrote:
Okay, good, I am getting somewhere. Hopefully I'm almost there.
I filtered out my call to get_all_DBLinks() to only look at the OntologyXref objects.
Comparing against thes tables:
http://dev.gramene.org/Arabidopsis_thaliana/Transcript/Ontology/Table?db=core;g=AT3G52430;oid=1;r=3:19431371-19434403;t=AT3G52430.1
it looks like the OntologyXref objects directly give me the accession, term, and evidence codes. So next questions are -
1) How do I get those go slim accessions? They don't appear to be OntologyXref objects, and I'm assuming they're somehow related. Trying the various get_* methods didn't seem to yield the results I wanted.
2) How can I get the URLs that the accessions link to? For the non-slim ones, I see that for my purposes I could just hardwire to Gramene, but I'd really rather populate it correctly.
3) How do I group the accessions together? Again, that page has multiple tables - descendent of biological process, cellular component, etc. Is it a matter of looking at each OntologyXref's get_all_masters() values and parsing together a tree from that?
Many thanks for the help and the patience. :-)
On Apr 10, 2012, at 2:52 AM, Andy Yates wrote:
Hi James,
If you are going to use the API to extract this information then you should look at the Bio::EnsEMBL::OntologyXref which extends DBEntry. The API automatically creates these objects when it encounters an object_xref link which also has an entry in the ontology_xref table. As Jan said get_all_DBLinks() is the method to use and will return these OntologyXref objects.
All the best,
Andy
On 10 Apr 2012, at 00:08, "Thomason, James" <thomason at cshl.edu<mailto:thomason at cshl.edu><mailto:thomason at cshl.edu>> wrote:
Well, that's progress, I guess. But what do I do with a Bio::EnsEMBL::DBEntry object?
Looking at them, they don't appear to contain any of the data in that ontology table. Do I need to hop to some additional objects? Nothing looks obvious to jump to next.
On Apr 9, 2012, at 5:15 PM, Jan Vogel wrote:
Hi James,
check out the doxygen ensembl api doc:
http://uswest.ensembl.org/info/docs/Doxygen/core-api/classBio_1_1EnsEMBL_1_1Transcript.html#afbe0947fe458e2f2739f78852c292f7c
Bio::EnsEMBL::Transcript::get_all_DBLinks( ) is your friend - for Ensembl annotation on www.ensembl.org<http://www.ensembl.org><http://www.ensembl.org>, this should return http://www.geneontology.org/GO.slims.shtml#whatIs annotations.
Another way would be to use biomart.
I'm unsure if this is set up for the gramene website …
Hope this helps,
Jan Vogel
On Apr 9, 2012, at 2:56 PM, Thomason, James wrote:
Hi all,
I'm completely stumped. I've been charged with programmatically extracting out ontology go terms from our ensembl installation. A relevant link would be:
http://www.gramene.org/Arabidopsis_thaliana/Transcript/Ontology/Table?db=core;g=AT3G52430;oid=1;r=3:19431371-19434403;t=AT3G52430.1
I want to pull out everything inside that "Ontology Table" bit. But I'm utterly stumped as to how to go about doing it. I dug through the code enough to find that the page is generated through an EnsEMBL::Web::Component::Transcript::Go object, but I don't know how to instantiate one on the command line to get at the info. Presumably, since it's in the Web sub-tree, I really shouldn't be doing that on the command line anyway. Is there some way to link to that data through a Bio::EnsEMBL::Transcript object, perhaps? An xref or something?
For now, I basically just want to dump out that data in a tab delimited format, so I don't need anything fancy other than actually getting to it.
Any pointers in the right direction would be greatly appreciated.
Thanks,
--
-Jim Thomason...
Scientific Informatics Developer @ The Ware Lab,
a USDA-ARS Laboratory at Cold Spring Harbor Laboratory
http://www.warelab.org/
http://www.cshl.edu/
_______________________________________________
Dev mailing list Dev at ensembl.org<mailto:Dev at ensembl.org><mailto:Dev at ensembl.org>
List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/
_______________________________________________
Dev mailing list Dev at ensembl.org<mailto:Dev at ensembl.org><mailto:Dev at ensembl.org>
List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/
--
-Jim Thomason...
Scientific Informatics Developer @ The Ware Lab,
a USDA-ARS Laboratory at Cold Spring Harbor Laboratory
http://www.warelab.org/
http://www.cshl.edu/
_______________________________________________
Dev mailing list Dev at ensembl.org<mailto:Dev at ensembl.org><mailto:Dev at ensembl.org>
List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/
_______________________________________________
Dev mailing list Dev at ensembl.org<mailto:Dev at ensembl.org><mailto:Dev at ensembl.org>
List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/
--
-Jim Thomason...
Scientific Informatics Developer @ The Ware Lab,
a USDA-ARS Laboratory at Cold Spring Harbor Laboratory
http://www.warelab.org/
http://www.cshl.edu/
_______________________________________________
Dev mailing list Dev at ensembl.org<mailto:Dev at ensembl.org>
List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/
_______________________________________________
Dev mailing list Dev at ensembl.org<mailto:Dev at ensembl.org>
List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/
--
-Jim Thomason...
@ The Ware Lab,
a USDA-ARS Laboratory at Cold Spring Harbor Laboratory
http://www.warelab.org/
http://www.cshl.edu/
More information about the Dev
mailing list