[ensembl-dev] Gene ID <-> Gene Ontology mapping with REST or BioMart API

Joel Fillon, Mr joel.fillon at mcgill.ca
Tue Nov 18 15:33:29 GMT 2014


Dear Thomas and Andy,

Thanks a lot for your quick answers, much appreciated!

I will try both BioMart and REST solutions and find which one suits better my needs.

Another question:
In BioMart, are goslim_goa_accession and goslim_goa_description available for EnsemblGenomes species
or only for Ensembl species?
I can't find them for athaliana_eg_gene dataset.

Thanks again
Joël
________________________________________
De : dev-bounces at ensembl.org [dev-bounces at ensembl.org] de la part de Thomas Maurel [maurel at ebi.ac.uk]
Envoyé : mardi 18 novembre 2014 08:05
À : Ensembl developers list
Objet : Re: [ensembl-dev] Gene ID <-> Gene Ontology mapping with REST or        BioMart API

Dear Joël,

Please find below answers to your first two questions:
On 17 Nov 2014, at 21:09, Joel Fillon, Mr <joel.fillon at mcgill.ca<mailto:joel.fillon at mcgill.ca>> wrote:

Hi Ensembl people,

Given a list of gene IDs from one species, I would like to retrieve the associated GO ids programmatically.
Species can belong to Ensembl e.g. Mus musculus or EnsemblGenomes e.g. Arabidopsis thaliana.

I managed to access them using BioMart through R although the parameters differ between Ensembl and EnsemblGenomes.

Ensembl:
host: www.ensembl.org<http://www.ensembl.org>
mart: ENSEMBL_MART_ENSEMBL
dataset: <short_scientific_name>_gene_ensembl (e.g. mmusculus_gene_ensembl)
attributes: ensembl_gene_id, go_id

EnsemblGenomes:
host: <division>.ensembl.org<http://ensembl.org> (e.g. plants.ensembl.org<http://plants.ensembl.org>)
mart: <division>_mart_<release_number> (e.g. plants_mart_24)
dataset: <short_scientific_name>_eg_gene (e.g. athaliana_eg_gene)
attributes: ensembl_gene_id, go_accession


1. Are those parameters consistent across species within Ensembl and EnsemblGenomes e.g.
if I want ids for Bos taurus, dataset will be btaurus_gene_ensembl and attributes ensembl_gene_id, go_id
will be available?
You are right, the parameters that you have mentioned are consistent across species within Ensembl and EnsemblGenomes.
Your example on Bos taurus will work.

Or are they likely to be modified with the DB schema in the future and I shouldn't rely on them for a systematic automated solution?
We avoid changing the mart attribute internal names but if we do we will declare the changes on the following page: http://www.ensembl.org/info/website/news.html
The mart internal names are more stable than the DB schema so you can rely on them for a systematic automated solution.

2. Is a go_id in Ensembl equivalent to a go_accession in EnsemblGenomes?
Yes, go_id in Ensembl correspond to go_accession in EnsemblGenomes.

3. Is there a better way to do this using REST API or other?
>From http://rest.ensembl.org/ , I can't find an Endpoint
linking a gene ID to related Gene Ontologies

GET xrefs/symbol/:species/:symbol and GET xrefs/name/:species/:name seem to link to external gene databases only.
Also, for a list of 20000+ gene ids, I would need to use POST requests with reduced chunks I guess.

4. On a different note, sorry if this was answered before. I can't find a "Search" function on this mailing list:
Am I missing it on:
http://lists.ensembl.org/mailman/listinfo/dev

or is search only available through Google or other with "site:http://lists.ensembl.org ..."?

Thanks a lot for your help!

Joël
_______________________________________________
Dev mailing list    Dev at ensembl.org
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/

Hope this helps,
Regards,
Thomas
--
Thomas Maurel
Bioinformatician - Ensembl Production Team
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
United Kingdom





More information about the Dev mailing list