[ensembl-dev] Gene ID <-> Gene Ontology mapping with REST or BioMart API

mag mr6 at ebi.ac.uk
Wed Nov 19 08:55:04 GMT 2014


Hi Joel,

The REST API only provides information about the latest release, we 
currently do not support archives.

You can however find some information about retired ids using the 
archive endpoint
http://rest.ensembl.org/documentation/info/archive_id_get

This will tell you, for a given Ensembl stable id, if the object is 
still current in the latest database, and if not, if there are any 
potential replacements.

By definition, our stable ids are stable between releases, so the 
annotation for ENSG00000157764 in release 76 should match closely the 
one available in the latest release (77)
http://aug2014.archive.ensembl.org/Homo_sapiens/Transcript/Ontology/Table?db=core;g=ENSG00000157764;r=7:140719327-140924764;t=ENST00000288602
http://www.ensembl.org/Homo_sapiens/Transcript/Ontology/Table?db=core;g=ENSG00000157764;r=7:140719327-140924764;t=ENST00000288602


Hope that helps,
Magali

On 18/11/2014 23:13, Joel Fillon, Mr wrote:
> Hi Andy et al,
>
> Regarding the REST API:
>
> Is there a way to specify an archive in the REST Endpoint or does the REST server only works with the latest release?
>
> I've got gene IDs from Ensembl 76 or EnsemblGenomes 23 and I'm trying:
> http://rest.aug2014.archive.ensembl.org/xrefs/id/ENSG00000157764?external_db=GO;all_levels=1
> http://rest.ensembl.org/archive/aug2014/xrefs/id/ENSG00000157764?external_db=GO;all_levels=1
> http://rest.ensembl.org/archive/76/xrefs/id/ENSG00000157764?external_db=GO;all_levels=1
>
> with no success.
>
> Thanks,
> Joël
> ________________________________________
> De : dev-bounces at ensembl.org [dev-bounces at ensembl.org] de la part de Andy Yates [ayates at ebi.ac.uk]
> Envoyé : mardi 18 novembre 2014 09:24
> À : Ensembl developers list
> Objet : Re: [ensembl-dev] Gene ID <-> Gene Ontology mapping with REST or        BioMart API
>
> Hi Joel
>
> You're quite right that you can retrieve the Gene -> GO mappings via REST using the /xref/id endpoint e.g.
>
> http://rest.ensembl.org/xrefs/id/ENSG00000157764.json?external_db=GO;all_levels=1
>
> The all_levels parameter is important as GO terms are linked to proteins not genes. All levels forces the REST API to descend through the transcripts and proteins belonging to the gene before sending back the results. That also means that you will probably see duplicate GO terms returned since multiple proteins linked to the same gene could be annotated with similar functions.
>
> As for sending sending 20K+ requests this is fine so long as you are ok respecting the rate limit & the 429 codes the server will send you back should you go over them. If you send 15 requests per second (the max rate available) then you should be able to process 20K in ~20 minutes. I admit this will be slower than using BioMart.
>
> Andy
>
> ------------
> Andrew Yates - Ensembl Support Coordinator
> European Molecular Biology Laboratory
> European Bioinformatics Institute
> Wellcome Trust Genome Campus
> Hinxton, Cambridge
> CB10 1SD, United Kingdom
> Tel: +44-(0)1223-492538
> Fax: +44-(0)1223-494468
> Skype: andrewyatz
> http://www.ensembl.org/
>
> On 17 Nov 2014, at 21:09, "Joel Fillon, Mr" <joel.fillon at mcgill.ca> wrote:
>
>> Hi Ensembl people,
>>
>> Given a list of gene IDs from one species, I would like to retrieve the associated GO ids programmatically.
>> Species can belong to Ensembl e.g. Mus musculus or EnsemblGenomes e.g. Arabidopsis thaliana.
>>
>> I managed to access them using BioMart through R although the parameters differ between Ensembl and EnsemblGenomes.
>>
>> Ensembl:
>> host: www.ensembl.org
>> mart: ENSEMBL_MART_ENSEMBL
>> dataset: <short_scientific_name>_gene_ensembl (e.g. mmusculus_gene_ensembl)
>> attributes: ensembl_gene_id, go_id
>>
>> EnsemblGenomes:
>> host: <division>.ensembl.org (e.g. plants.ensembl.org)
>> mart: <division>_mart_<release_number> (e.g. plants_mart_24)
>> dataset: <short_scientific_name>_eg_gene (e.g. athaliana_eg_gene)
>> attributes: ensembl_gene_id, go_accession
>>
>>
>> 1. Are those parameters consistent across species within Ensembl and EnsemblGenomes e.g.
>> if I want ids for Bos taurus, dataset will be btaurus_gene_ensembl and attributes ensembl_gene_id, go_id
>> will be available?
>>
>> Or are they likely to be modified with the DB schema in the future and I shouldn't rely on them for a systematic automated solution?
>>
>> 2. Is a go_id in Ensembl equivalent to a go_accession in EnsemblGenomes?
>>
>> 3. Is there a better way to do this using REST API or other?
>>  From http://rest.ensembl.org/ , I can't find an Endpoint
>> linking a gene ID to related Gene Ontologies
>>
>> GET xrefs/symbol/:species/:symbol and GET xrefs/name/:species/:name seem to link to external gene databases only.
>> Also, for a list of 20000+ gene ids, I would need to use POST requests with reduced chunks I guess.
>>
>> 4. On a different note, sorry if this was answered before. I can't find a "Search" function on this mailing list:
>> Am I missing it on:
>> http://lists.ensembl.org/mailman/listinfo/dev
>>
>> or is search only available through Google or other with "site:http://lists.ensembl.org ..."?
>>
>> Thanks a lot for your help!
>>
>> Joël
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/





More information about the Dev mailing list