[ensembl-dev] how to annotate many gene names from ensemble to gene name

Kieron Taylor ktaylor at ebi.ac.uk
Mon Jan 22 16:49:38 GMT 2018


Hi Mohammad,

Your request is quite non-specific, and that makes it difficult for me to know how to help.

If you know when your data was created, you can pick an archive and use the BioMart data from that release to fetch the gene names, just as you intended intiially.

http://www.ensembl.org/info/website/archives/index.html

I would suggest somewhere around release 81, given the sample IDs you provided. You would then get the gene names we assigned to those loci at that point in time. For the IDs not to be present in our latest data suggests a more recent revision of the underlying sequence has occurred, and you should expect those gene names to have changed somewhat.

The scientific purpose of your work will then help you decide whether that output is useful or not. It may prove necessary for you to find which Ensembl IDs in our current release are closest to those retired Ensembl IDs you are working on. For this you can copy your list of IDs into the ID history tool (http://www.ensembl.org/Homo_sapiens/Tools/IDMapper?db=core) or send them one by one to our REST API (rest.ensembl.org). Our Perl API can also achieve the same result if Perl suits you (http://www.ensembl.org/info/docs/Doxygen/core-api/classBio_1_1EnsEMBL_1_1DBSQL_1_1ArchiveStableIdAdaptor.html)

In my opinion, the most straightforward programming approach is to write a script that consumes your list and sends requests to our REST API archive endpoint. You can consult the training materials from our REST API course to get yourself started, but the archive endpoint is not used explicitly:

http://training.ensembl.org/events/2017/2017-11-27-REST_API_EBI_Nov

If you wish to understand why we retire stable IDs, you can consult our documentation on the topic: http://www.ensembl.org/info/genome/stable_ids/index.html

I hope that is sufficient to get you going. 

Regards,

Kieron


Kieron Taylor PhD.
Ensembl Developer

EMBL, European Bioinformatics Institute


> On 22 Jan 2018, at 13:55, Mohammad Goodarzi <mohammad.godarzi at gmail.com> wrote:
> 
> Hello,
> 
> Thank you for your reply.
> Is it possible to guide me how to use one of your archive with Biomart or any other programming language ?
> When it comes to 3000 genes , it is very difficult to do them one by one .
> 
> Thanks 
> Mohammad 
> 
> On Mon, 22 Jan 2018 at 04:21, Kieron Taylor <ktaylor at ebi.ac.uk> wrote:
> Hi Mohammad.
> 
> It looks like most of your IDs are now retired. If we take your first example:
> 
> http://www.ensembl.org/Homo_sapiens/Gene/Idhistory?g=ENSG00000122718
> 
> Our Gene page for this ID reports that it was retired in release 84. A revision of the sequence there, or validation of our genebuild has caused us to retire the ID as no longer meaning what we thought it did.
> 
> The BioMart service has no way to report data that is not current. Depending on your needs, you could use one of our archive servers to get the data surrounding your IDs, for example: [1]
> 
> The easiest thing might be to feed your list of failed IDs into our ID history tool [2]. This will tell you the last Ensembl release in which that ID was seen, and if possible report the ID that replaced it.
> 
> Another alternative is to cross-check your IDs to see which ones have been retired against our REST archive endpoint: [3]
> 
> Hopefully one of these methods will suit your needs.
> 
> 
> Regards,
> 
> Kieron
> 
> [1] - http://mar2016.archive.ensembl.org/biomart/martview/3459e207de70960baa9be743908900d2
> [2] - http://www.ensembl.org/Homo_sapiens/Tools/IDMapper?db=core
> [3] - http://rest.ensembl.org/archive/id/ENSG00000122718?content-type=application/json
> 
> 
> 
> Kieron Taylor PhD.
> Ensembl Developer
> 
> EMBL, European Bioinformatics Institute
> 
> 
> 
> 
> 
> 
> > On 21 Jan 2018, at 20:53, Mohammad Goodarzi <mohammad.godarzi at gmail.com> wrote:
> >
> > hello,
> >
> > I recently try to annotate a set of gene names. I have over 3000 genes that i cannot annotate using biomart for example.
> >
> > I have been searching a lot but I could not find a solution. can you please comment how would you do this ? I post a small set of them for the practice purpose and I am happy to get any opinion that help me to do this automatically.
> >
> > Please see below
> >
> > Thanks
> >
> > ENSG00000122718
> > ENSG00000130201
> > ENSG00000150076
> > ENSG00000150526
> > ENSG00000155640
> > ENSG00000166748
> > ENSG00000168260
> > ENSG00000168787
> > ENSG00000170590
> > ENSG00000170803
> > ENSG00000171484
> > ENSG00000172381
> > ENSG00000172774
> >
> > _______________________________________________
> > Dev mailing list    Dev at ensembl.org
> > Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> > Ensembl Blog: http://www.ensembl.info/
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/




More information about the Dev mailing list