[ensembl-dev] getting the entrez gene id from an ensembl record
Ewan Birney
birney at ebi.ac.uk
Thu Dec 2 13:26:21 GMT 2010
On 2 Dec 2010, at 11:47, Andreas Kahari wrote:
> On Thu, Dec 02, 2010 at 10:22:35AM -0000, Oliver, Gavin wrote:
>> Is it possible to have Entrez genes at a couple of levels?
>>
>> i.e. a top-level Entrez ID linked via the gene and the others below
>> this at the transcript or translation stage?
>>
>> Most end users expect a single accurate Entrez gene ID that will
>> tie to a single HGNC ID. They don't want to be aware of the actual
>> complexity of things. It would be good to be able to cater for
>> this somehow.
>>
>
> Just to comment on the last part of this: It is not our job to
> provide
> a mapping between HGNC and EntrezGene. We provide mappings from
> Ensembl
> objects to HGNC, and to EntrezGene. That's different.
>
That said, this is all converging into one consistent set of identifiers
because:
The HGNC<=>EntrezGene link is direct and complete by design
The EntrezGene<=>RefSeq link is very good as they are both in NCBI
The Ensembl<=>Uniprot link is very good as they are both in EBI
The Ensembl<=>Havana link is very good because Havana is merged into
the Ensembl build each release cycle
The Ensembl/Havana <=> RefSeq <=> Uniprot links are locked down each
time there is a CCDS, and this has a ratchet like behaviour (ie, it is
intentionally very difficult to withdraw a CCDS)
HGNC now is housed at the EBI, so the HGNC<=>Havana and
HGNC<=>Uniprot
links are far better.
In short, it should be a playground of mutually consistent identifiers
carefully tracking different parts of the legacy diaspara of gene and
protein
bioinformatics into a single, smooth, consistent space, accessible by
many different APIs/Systems etc.
So - in this heavenly world, the Ensembl API should be just as sensible
a place to go from HGNC<=>EntrezGene as anywhere else :)
(genuinely... this ideal gets closer and closer each year)
This also reminds me that I need to sweet talk Kim and Jen to publish
the
read-through loci rules which is one of the fiddly bits of biology we
(mainly Jen and Kim) have agreed on how we are consistent earlier this
year.
Ewan
> Andreas
>
> --
> Andreas Kähäri, Ensembl Software Developer
> European Bioinformatics Institute (EMBL-EBI)
> Wellcome Trust Genome Campus
> Hinxton, Cambridge CB10 1SD, United Kingdom
>
> _______________________________________________
> Dev mailing list
> Dev at ensembl.org
> http://lists.ensembl.org/mailman/listinfo/dev
More information about the Dev
mailing list