[ensembl-dev] getting the entrez gene id from an ensembl record

Ewan Birney birney at ebi.ac.uk
Thu Dec 2 13:26:21 GMT 2010


On 2 Dec 2010, at 11:47, Andreas Kahari wrote:

> On Thu, Dec 02, 2010 at 10:22:35AM -0000, Oliver, Gavin wrote:
>> Is it possible to have Entrez genes at a couple of levels?
>>
>> i.e. a top-level Entrez ID linked via the gene and the others below  
>> this at the transcript or translation stage?
>>
>> Most end users expect a single accurate Entrez gene ID that will  
>> tie to a single HGNC ID.  They don't want to be aware of the actual  
>> complexity of things.  It would be good to be able to cater for  
>> this somehow.
>>
>
> Just to comment on the last part of this:  It is not our job to  
> provide
> a mapping between HGNC and EntrezGene.  We provide mappings from  
> Ensembl
> objects to HGNC, and to EntrezGene.  That's different.
>

That said, this is all converging into one consistent set of identifiers
because:

   The HGNC<=>EntrezGene link is direct and complete by design

   The EntrezGene<=>RefSeq link is very good as they are both in NCBI

   The Ensembl<=>Uniprot link is very good as they are both in EBI

   The Ensembl<=>Havana link is very good because Havana is merged into
the Ensembl build each release cycle

   The Ensembl/Havana <=> RefSeq <=> Uniprot links are locked down each
time there is a CCDS, and this has a ratchet like behaviour (ie, it is
intentionally very difficult to withdraw a CCDS)

   HGNC now is housed at the EBI, so the HGNC<=>Havana and  
HGNC<=>Uniprot
links are far better.


In short, it should be a playground of mutually consistent identifiers
carefully tracking different parts of the legacy diaspara of gene and  
protein
bioinformatics into a single, smooth, consistent space, accessible by
many different APIs/Systems etc.

So - in this heavenly world, the Ensembl API should be just as sensible
a place to go from HGNC<=>EntrezGene as anywhere else :)



(genuinely... this ideal gets closer and closer each year)


This also reminds me that I need to sweet talk Kim and Jen to publish  
the
read-through loci rules which is one of the fiddly bits of biology we
(mainly Jen and Kim) have agreed on how we are consistent earlier this  
year.



Ewan


> Andreas
>
> -- 
> Andreas Kähäri, Ensembl Software Developer
> European Bioinformatics Institute (EMBL-EBI)
> Wellcome Trust Genome Campus
> Hinxton, Cambridge CB10 1SD, United Kingdom
>
> _______________________________________________
> Dev mailing list
> Dev at ensembl.org
> http://lists.ensembl.org/mailman/listinfo/dev





More information about the Dev mailing list