[ensembl-dev] 100% identical genes

Podtelezhnikov, Alexei alexei_podtelezhnikov at merck.com
Wed Aug 19 22:27:14 BST 2020


Dear Ensembl team,

Here is probably incomplete list of 100% identical sequences in Rnor_6.0.

Cyp2b1 RGD:2466   and   LOC108348266 RGD:11439359
Cyp3a23-3a1 RGD:628626   and    LOC100910877 RGD:6495986
A2m RGD:2004   and   LOC100911545 RGD:6492449 
LOC100360087 RGD:2322860   and   Ftl1 RGD:61813
LOC102549542 RGD:7527008   and   Elovl6 RGD:620585

Such genes present a problem to RNA-Seq aligners and might result in wrong quantification. I noticed that Ensembl removed some but not all duplicated genes. Do you keep a full list of such genes? Would you consider them genome assembly errors and, therefore, flag, fix, or remove them?

Thank you,
Notice:  This e-mail message, together with any attachments, contains
information of Merck & Co., Inc. (2000 Galloping Hill Road, Kenilworth,
New Jersey, USA 07033), and/or its affiliates Direct contact information
for affiliates is available at 
http://www.merck.com/contact/contacts.html) that may be confidential,
proprietary copyrighted and/or legally privileged. It is intended solely
for the use of the individual or entity named on this message. If you are
not the intended recipient, and have received this message in error,
please notify us immediately by reply e-mail and then delete it from 
your system.

More information about the Dev mailing list