[ensembl-dev] How to programmatically get ontology go terms for transcript?

Andy Yates ayates at ebi.ac.uk
Fri Apr 20 14:36:54 BST 2012


Hi James,

Normally we suggest bugs be submitted to helpdesk at ensembl.org as this offers a tracking service so you can be aware of when we make progress on the issue. In the meantime I will consult with those who use the ontology API & see if there is anything we can do to avoid hardcoding SQL and help users like yourself to replicate queries and views of data if required.

Best regards,

Andy

Andrew Yates                   Ensembl Core Software Project Leader
EMBL-EBI                       Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK         http://www.ensembl.org/

On 20 Apr 2012, at 05:16, Thomason, James wrote:

> Thanks to all for the help, I got everything isolated, filtered, and functional and have all my data out. When working with Ensembl, this is how I always feel:
> 
> http://www.youmeworks.com/wheretotap.html
> 
> Once I know what I'm doing, it's pretty easy. it's figuring out what to do that's difficult.
> 
> Anyway. On a related topic, is this the proper place to report bugs? Because I seem to have found a few while working on this issue. Going back to this page:
> 
> http://gramene.org/Arabidopsis_thaliana/Transcript/Ontology/Table?db=core;g=AT3G52430;oid=1;r=3:19431371-19434403;t=AT3G52430.1
> 
> You'll note the 3rd entry in the first table - GO:0006629 shows up as both a term and a slim term. This took me a while to track down, and the issue is that in the closure table of our ontology database, that item has a record listing itself as its own parent. I don't know if that logically makes sense, so some data cleanup may or may not be required.
> 
> But my issue is with the code - OntologyTermAdaptor's fetch_all_by_descendant_term explicitly filters out relationships of distance 0 (that is, self-referential ones). But that page linked up above doesn't use fetch_all_by_descendent_terms and instead has a hardwired SQL statement in EnsEMBL::Web::Component::Transcript:Go that doesn't filter out self-references.
> 
> So trying to get the go slims on the website shows items which are their own parent, but trying to get them through the API does not. Regardless of whether or not the data is correct, the behavior is inconsistent and should be standardized on one of them.
> 
> This is on Ensembl release 65, fwiw. I haven't reported any ensembl bugs before - is this even the appropriate forum? If not, could someone point me to the right spot? And, if so, is there any other info I could provide?
> 
> Naturally, I'll cross my fingers that this is already fixed in a later release. :-)
> 
> Thanks again for all the help getting my data out.
> 
> On Apr 17, 2012, at 11:01 AM, Jim Thomason wrote:
> 
> I was out of town for a week at a conference, but I'm back on this and hopefully nearly there. What I've got so far -
> 
>    my $trxA = ((code to get transcript adaptor));
>    my $goA = ((code to get go term adaptor));
>    my $transcript = $trxA->fetch_by_stable_id('AT3G52430.1');
>    my $links = $transcript->get_all_DBLinks();
>    foreach my $link (@$links) {
>        next unless (ref $link) =~ /OntologyXref/;
>        my $term = $goA->fetch_by_accession($link->primary_id);
>        foreach my $c (@{$t->$method}) { print "C->", $c->accession, "\n";
>                print "C: ", $c->accession, "\n";
>        }
>    }
> 
> This is nearly there! I just seem to have too much data. Again, for this page:
> http://gramene.org/Arabidopsis_thaliana/Transcript/Ontology/Table?db=core;g=AT3G52430;oid=1;r=3:19431371-19434403;t=AT3G52430.1
> 
> Looking at the first go term (GO:0001666), there's only one GOSlim accession (GO:0006950)
> 
> But this code above gives me a bunch:
> 
> C->GO:0006950
> C->GO:0070482
> C->GO:0042221
> C->GO:0050896
> C->GO:0050896
> C->GO:0008150
> 
> Grasping at straws and poking at the objects, I discovered the subsets array, which, for example, as a value "goslim_plant". So I tried filtering down on only the go slims that have goslim_plant as their subset. That still gave me this:
> 
> C->GO:0006950
> C->GO:0008150
> 
> Presumably, I should also be tossing out ID GO:0008150, but I don't know why. Is filtering on the subsets correct? Regardless, what additional filtration should I be applying at this point?
> 
> Many thanks again for all the help.
> 
> On Apr 10, 2012, at 10:18 AM, Andy Yates wrote:
> 
> Hi James,
> 
> GO Slim terms are located in a secondary database; Ensembl calls this ensembl_ontology_RELEASE & Ensembl Genomes calls it ensemblgenomes_ontology_RELEASE. We have a lot of examples of how to use it in our core checkout under:
> 
> ensembl/misc-scripts/ontology
> 
> We have a README explaining the design of the schema & API and then examples in the scripts directory.
> 
> As for links the web code has a mechanism which allows association of a URL pattern with an external identifier. I doubt you will be able to script against this so I would suggest a small lookup in your code to do the associations.
> 
> I hope this helps,
> 
> Andy
> 
> Andrew Yates                   Ensembl Core Software Project Leader
> EMBL-EBI                       Tel: +44-(0)1223-492538
> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
> Cambridge CB10 1SD, UK         http://www.ensembl.org/
> 
> On 10 Apr 2012, at 15:27, Thomason, James wrote:
> 
> Okay, good, I am getting somewhere. Hopefully I'm almost there.
> 
> I filtered out my call to get_all_DBLinks() to only look at the OntologyXref objects.
> 
> Comparing against thes tables:
> http://dev.gramene.org/Arabidopsis_thaliana/Transcript/Ontology/Table?db=core;g=AT3G52430;oid=1;r=3:19431371-19434403;t=AT3G52430.1
> 
> it looks like the OntologyXref objects directly give me the accession, term, and evidence codes. So next questions are -
> 
> 1) How do I get those go slim accessions? They don't appear to be  OntologyXref objects, and I'm assuming they're somehow related. Trying the various get_* methods didn't seem to yield the results I wanted.
> 
> 2) How can I get the URLs that the accessions link to? For the non-slim ones, I see that for my purposes I could just hardwire to Gramene, but I'd really rather populate it correctly.
> 
> 3) How do I group the accessions together? Again, that page has multiple tables - descendent of biological process, cellular component, etc.  Is it a matter of looking at each OntologyXref's get_all_masters() values and parsing together a tree from that?
> 
> Many thanks for the help and the patience. :-)
> 
> On Apr 10, 2012, at 2:52 AM, Andy Yates wrote:
> 
> Hi James,
> 
> If you are going to use the API to extract this information then you should look at the Bio::EnsEMBL::OntologyXref which extends DBEntry. The API automatically creates these objects when it encounters an object_xref link which also has an entry in the ontology_xref table. As Jan said get_all_DBLinks() is the method to use and will return these OntologyXref objects.
> 
> All the best,
> 
> Andy
> 
> On 10 Apr 2012, at 00:08, "Thomason, James" <thomason at cshl.edu<mailto:thomason at cshl.edu><mailto:thomason at cshl.edu>> wrote:
> 
> Well, that's progress, I guess. But what do I do with a Bio::EnsEMBL::DBEntry object?
> 
> Looking at them, they don't appear to contain any of the data in that ontology table. Do I need to hop to some additional objects? Nothing looks obvious to jump to next.
> 
> On Apr 9, 2012, at 5:15 PM, Jan Vogel wrote:
> 
> 
> Hi James,
> 
> check out the doxygen ensembl api doc:
> 
> http://uswest.ensembl.org/info/docs/Doxygen/core-api/classBio_1_1EnsEMBL_1_1Transcript.html#afbe0947fe458e2f2739f78852c292f7c
> 
> Bio::EnsEMBL::Transcript::get_all_DBLinks( )  is your friend - for Ensembl annotation on www.ensembl.org<http://www.ensembl.org/><http://www.ensembl.org<http://www.ensembl.org/>>, this should return http://www.geneontology.org/GO.slims.shtml#whatIs annotations.
> Another way would be to use biomart.
> 
> I'm unsure if this is set up for the gramene website …
> 
> Hope  this helps,
> 
>    Jan Vogel
> 
> 
> On Apr 9, 2012, at 2:56 PM, Thomason, James wrote:
> 
> Hi all,
> 
> I'm completely stumped. I've been charged with programmatically extracting out ontology go terms from our ensembl installation. A relevant link would be:
> 
> http://www.gramene.org/Arabidopsis_thaliana/Transcript/Ontology/Table?db=core;g=AT3G52430;oid=1;r=3:19431371-19434403;t=AT3G52430.1
> 
> I want to pull out everything inside that "Ontology Table" bit. But I'm utterly stumped as to how to go about doing it. I dug through the code enough to find that the page is generated through an EnsEMBL::Web::Component::Transcript::Go object, but I don't know how to instantiate one on the command line to get at the info. Presumably, since it's in the Web sub-tree, I really shouldn't be doing that on the command line anyway. Is there some way to link to that data through a Bio::EnsEMBL::Transcript object, perhaps? An xref or something?
> 
> For now, I basically just want to dump out that data in a tab delimited format, so I don't need anything fancy other than actually getting to it.
> 
> Any pointers in the right direction would be greatly appreciated.
> 
> Thanks,
> 
> --
> -Jim Thomason...
> 
> Scientific Informatics Developer @ The Ware Lab,
> a USDA-ARS Laboratory at Cold Spring Harbor Laboratory
> http://www.warelab.org/
> http://www.cshl.edu/
> 
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org><mailto:Dev at ensembl.org>
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
> 
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org><mailto:Dev at ensembl.org>
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
> 
> --
> -Jim Thomason...
> 
> Scientific Informatics Developer @ The Ware Lab,
> a USDA-ARS Laboratory at Cold Spring Harbor Laboratory
> http://www.warelab.org/
> http://www.cshl.edu/
> 
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org><mailto:Dev at ensembl.org>
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org><mailto:Dev at ensembl.org>
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
> 
> --
> -Jim Thomason...
> 
> Scientific Informatics Developer @ The Ware Lab,
> a USDA-ARS Laboratory at Cold Spring Harbor Laboratory
> http://www.warelab.org/
> http://www.cshl.edu/
> 
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
> 
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
> 
> --
> -Jim Thomason...
> 
> @ The Ware Lab,
> a USDA-ARS Laboratory at Cold Spring Harbor Laboratory
> http://www.warelab.org/
> http://www.cshl.edu/
> 
> 
> --
> -Jim Thomason...
> 
> Scientific Informatics Developer @ The Ware Lab,
> a USDA-ARS Laboratory at Cold Spring Harbor Laboratory
> http://www.warelab.org/
> http://www.cshl.edu/
> 
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/





More information about the Dev mailing list