[ensembl-dev] Ensembl IDs versus HGNC

mag mr6 at ebi.ac.uk
Thu Feb 20 18:14:01 GMT 2014


Hi Genomeo,

On 20/02/2014 16:48, Genomeo Dev wrote:
> Thanks Magali for the response.
>
> 1) I don't quite understand these two points:
>
> '..As each entry is manually curated, not all ensembl mappings are 
> necessarily available..'
>
> Is this due to the non-practicality of manually working with large 
> ensembl IDs?
Typically, an entry in HGNC is created once and updated when new 
information becomes available.
If no Ensembl locus was available at the time of creation, it might be 
missing the link we are looking for.
With the rate of ensembl releases, it would require HGNC to go through 
all entries every single release, which is where the manual vs automatic 
approach can be limited.
>
> '.. What can also happen is that our ensembl stable ID changes between 
> releases due to massive changes in the underlying sequence..We then 
> feed those cases back to HGNC for them to update their records if they 
> agree with the replacement.'
>
> But if the sequence changes, it will change for both Ensembl and HGNC 
> because they both refer to the same reference genome right? so why 
> HGNC needs to agree?
Sometimes, the sequence change is a simple addition of a large UTR 3' 
end, so it seems quite obvious it is still the same locus.
In other cases, a gene was split into two, so there could be an argument 
as to which of the new genes should keep the locus name.

>
> 2) Based on the rest of your email, let me reiterate back my 
> understanding and let me know if that is correct:
>
> - Ensembl doesn't assign any of its IDs to HGNC IDs on its own 
> judgement, rather it uses the manual assignments of HGNC DB and 
> assignments done by Uniprot and RefSeq.
Correct

>
> - Because A) Ensembl updates these assignment each release and not 
> more frequently, B) HGNC keep making more manual assignments 
> continuously, and C) Ensembl IDs can change between releases, all 
> these factors lead to differences between what is in Ensembl and HGNC 
> in terms of mapping.
Correct as well.

>
> If this understanding is correct, I don't see how this will lead to 
> Ensembl having greater coverage than HGNC.
Let's say we have an ensembl gene ENSG1 and an HGNC entry HGNC1.
HGNC1 has a mapping to Uniprot Uniprot1, but none to an Ensembl entry.
We are able to align Uniprot1 against ENSG1 and by proxy assign HGNC1 to 
that locus.
Looking for ENSG1 in HGNC will not return anything.
Looking for HGNC1 in Ensembl wil return ENSG1.

>
> G.
>
>
>
>
>
> On 20 February 2014 16:18, mag <mr6 at ebi.ac.uk <mailto:mr6 at ebi.ac.uk>> 
> wrote:
>
>     Hi Genomeo,
>
>     HGNC data is manually curated, so HGNC curators check a locus and
>     assign the corresponding ensembl entry.
>     As each entry is manually curated, not all ensembl mappings are
>     necessarily available.
>     It does mean though that HGNC can be updated in permanence.
>
>     In Ensembl, we typically update those mappings every release, as
>     the human gene set is updated every release.
>     We assign HGNC IDs using direct mappings from HGNC.
>     These are complemented by indirect mappings, via Uniprot or RefSeq.
>     If a Uniprot entry is mapped to an ensembl entry and that same
>     Uniprot entry is mapped in HGNC to an HGNC symbol, the HGNC symbol
>     is assigned to the ensembl entry.
>
>     So there are more HGNC-ensembl ID links in Ensembl than they are
>     in HGNC.
>
>     What can also happen is that our ensembl stable ID changes between
>     releases due to massive changes in the underlying sequence.
>     For those cases, we will not be able to get the direct mapping
>     from HGNC.
>     We might still be able to keep the same name for the gene thanks
>     to the two-step mappings via RefSeq or Uniprot.
>     We then feed those cases back to HGNC for them to update their
>     records if they agree with the replacement.
>
>     I am unsure on how NCBI assigns mappings to Ensembl, they could be
>     importing the mappings from us directly or generate their own
>     mappings.
>
>     I hope this answers most of your questions.
>
>
>     Regards,
>     Magali
>
>
>     On 20/02/2014 11:43, Genomeo Dev wrote:
>>     Hi,
>>
>>     I have a set of ~ 6000 Ensembl IDs which I want to map to HGNC
>>     IDs. I am faced with the following situation:
>>
>>     Based on Ensembl Biomart or Ensembl Rest, there are ~ 4000 of
>>     these that have HGNC IDs.
>>
>>     Based on HGNC biomart, there are ~ 3000 which have HGNC IDs. HGNC
>>     DB mention that themselves use mapping supplied by Ensembl.
>>
>>     The IDs mapped from each of these sources are not always the same.
>>
>>     Questions:
>>
>>     - What is causing the different level of coverage?
>>     - What is causing the differences in specific mapping if all of
>>     it is done by Esembl?
>>     - How often does this mapping change at any of these sources?
>>     - How do other sources like NCBI assign Ensembl IDs to their
>>     Entrez IDs?
>>     - What is the best way of getting HGNC IDs for Ensembl IDs? from
>>     Ensembl or HGNC DB?
>>
>>     Thanks!
>>
>>     -- 
>>     G.
>>
>>
>>     _______________________________________________
>>     Dev mailing listDev at ensembl.org  <mailto:Dev at ensembl.org>
>>     Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>>     Ensembl Blog:http://www.ensembl.info/
>
>
>     _______________________________________________
>     Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>     Posting guidelines and subscribe/unsubscribe info:
>     http://lists.ensembl.org/mailman/listinfo/dev
>     Ensembl Blog: http://www.ensembl.info/
>
>
>
>
> -- 
> G.
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20140220/6ea8613f/attachment.html>


More information about the Dev mailing list