[ensembl-dev] Genes with identical symbols but different ENSG

Mark McDowall mcdowall at ebi.ac.uk
Mon Jun 17 17:21:43 BST 2019


Dear Duarte,

There are 3 subtle issues at play here.

The first is to do with the names. Ensembl adds the names primarily from HGNC, however other sources are aggregated to handle 
missing names. Some of these counts can be where the display_label is the same as an already assigned HGNC name.

The second is that if a gene is also present in a patch then you can get this duplication behaviour.

The third is in the case of genes like ATXN7 and DIABLO where the 2 genes are close/overlapping. This is an example where there 
have been changes to the gene structure made by the genome annotation team. These should get resolved in a later releases.

Cheers,

Mark

On 17/06/2019 16:42, Duarte Molha wrote:
> Dear Devs
> 
> Back in march I asked this question but got no answer.
> 
> Since this is still relevant to my work I was hoping I could still get some clarification regarding this issue:
> 
> 
> Could you help me understand why the gene
> 
> ATXN7
> 
> Has 2 ENSG ids. but they both map to the same external Reference:
> ATAXIN 7; ATXN7 [*607640] (MIM gene record; description: ATAXIN 7; ATXN7,)
> 
> The Gene with the most transcripts associated with it is:
> ATXN7 (Human Gene)
> ENSG00000163635 3:63898399-64003453:1
> 
> But Overlapping with it you have
> ATXN7 (Human Gene)
> ENSG00000285258 3:63864557-64003462:1
> 
> Would it not make sense to add all transcript to the 1st ID and drop the second?
> This unfortunately is not the only gene where this occurs
> 
> For example the HGNC symbol DIABLO is associated with 2 ENSG IDs (also overlaping
> The same with CCDC39, IGF2, MATR3, PDE11A, RMRP, SCO2, SPATA13 and TBCE
> 
> I am sure this is not the only ones where this is true and it creates a bit of a problem because now I need to be merging 
> distinct entities or choose between one of the 2 entries as the main entry for that gene symbol.
> 
> Your help on why this occurs and any possible solutions as to how to only select the main one would me much appreciated
> 
> Best regards
> 
> 
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
> Ensembl Blog: http://www.ensembl.info/
> 

-- 
Mark McDowall, PhD | Bioinformatician, Ensembl - Applications
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Tel: +44-(0)1223-494589
WWW: http://www.ensembl.org




More information about the Dev mailing list