[ensembl-dev] Shouldn't the same gene ID mean the same gene across different Ensembl GTF releases?

Dejian Zhao dejian.zhao at gmail.com
Fri Feb 20 04:22:25 GMT 2015


Hi,

I think the same Ensembl gene ID means the same gene across different 
releases of GTF. But recently I found it was not the case for some gene IDs.

I am trying to map some old human gene IDs in release-51 (hg18) to those 
in release-65 (hg19). I tried ID History Converter both online 
(http://useast.ensembl.org/Homo_sapiens/UserData/UploadStableIDs?db=core) and 
locally 
(https://github.com/Ensembl/ensembl-tools/tree/release/78/scripts/id_history_converter). 
They gave consistent results as expected.

However, when I tried another method as a double check which is based on 
liftover described below, I got a different mapping result. Then, I 
compared the results from the two methods. Most were consistent in the 
two methods, but some (~10%) were contradictory. Take ENSG00000181404 
for example. ID History Converter shows that this ID is stable from 
release 14 to 78; thus, it mapped this gene to itself in release-65. But 
I mapped it to a different gene ID (ENSG00000234769) in release-65 using 
the method based on liftover. Then, I took a closer look at these IDs 
(ENSG00000181404 and ENSG00000234769) in release-51 and release-65 GTF 
files. For  ENSG00000181404, the gene name was "WASH4P" in release-51 
and "WASH1" in release-65 (NOTE: these two genes are different. WASH4P 
is a pseudogene 
(http://www.genenames.org/cgi-bin/gene_symbol_report?hgnc_id=14126) and 
WASH1 is a protein-coding gene 
(http://www.genenames.org/cgi-bin/gene_symbol_report?hgnc_id=24361) ). 
For ENSG00000234769, it was absent from release-51 (it first appeared in 
release-55 according to ID History Converter) and its gene name was 
"WASH4P" in release-65. Therefore, based on liftover and gene name, 
ENSG00000181404 in release-51 should be mapped to ENSG00000234769, 
rather than itself - ENSG00000181404, in release-65. Here my question 
came out. Do the gene IDs mean the same genes across different releases 
of GTF?

Here is a list of gene IDs and gene names from release-51 (the first 2 
columns) and release-65 (the 3rd and 4th columns). Based on the gene 
names, it seems that the same IDs may mean different genes across releases.
ENSG00000018607 ZNF221  ENSG00000018607 ZNF806
ENSG00000080910 CFHR1   ENSG00000080910 CFHR2
ENSG00000081665 ZNF93   ENSG00000081665 ZNF506
ENSG00000127589 TUBB4Q  ENSG00000127589 TUBBP1
ENSG00000140478 GOLGA6B ENSG00000140478 GOLGA6D
ENSG00000147996 CBWD1   ENSG00000147996 CBWD5
ENSG00000159904 ZNF225  ENSG00000159904 ZNF890P
ENSG00000160229 ZNF486  ENSG00000160229 ZNF66P
ENSG00000170356 OR2A5   ENSG00000170356 OR2A20P
ENSG00000174353 STAG3L1 ENSG00000174353 STAG3L3
ENSG00000181404 WASH4P  ENSG00000181404 WASH1
ENSG00000181997 AQP7P3  ENSG00000181997 AQP7P2
ENSG00000183206 A26B1   ENSG00000183206 POTEC
ENSG00000184324 CSAG3   ENSG00000184324 CSAG2
ENSG00000184923 FAM22D  ENSG00000184923 FAM22A
ENSG00000185829 ARL17P1 ENSG00000185829 ARL17A
ENSG00000187537 A26C2   ENSG00000187537 POTEM
ENSG00000187754 SSX2    ENSG00000187754 SSX7
ENSG00000198566 ZNF658  ENSG00000198566 ZNF658B

I noticed the possibility that  different gene symbols or names may mean 
the same gene due to the existence of aliases. For those consistent 
mapping results, I checked and confirmed this possibility. But for those 
inconsistent mapping results, it seems that they may mean different 
genes in different releases. Shouldn't the same ID mean the same genes? 
How was the ID mapping done across releases?

Here is a brief description of how I did the liftover and ID mapping. 
First, for those old IDs of interest in release-51, I converted exons to 
bed format with one column recording geneID, gene name etc; then, lift 
these exons to hg19 coordinates using UCSC liftOver ( 
https://genome.ucsc.edu/cgi-bin/hgLiftOver ); then intersect these 
lifted exons with exons in release-65 to decide the mapping relationship 
of old and new exons; finally, decide the mapping relationship of old 
and new gene IDs based on the exon relationship.

Thanks!
Dejian

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150219/c7f8f5cb/attachment.html>


More information about the Dev mailing list