[ensembl-dev] Ugt1a locus in rat

Thibaut Hourlier thibaut at ebi.ac.uk
Tue Nov 2 13:53:19 GMT 2021


Dear Alexei,
Our current data model and the pipeline would always assign transcripts with overlapping CDS regions to the same gene and the name is assigned to the gene. With manual curation it would be possible to separate them and then hopefully assign the correct name to the correct transcript. Unfortunately, this is not something we can do at the moment.

The way we assign gene name has quite often trouble with gene families. This means that unfortunately the problem of the name changes is a known bug but we haven’t found yet the best way to fix it as one solution for one gene might break the naming of a different gene.

Because human and mouse have been manually curated, and thus multiple genes have been created, I don’t think that using human and mouse more actively would solve the problem of Ugt1a. However it might be helpful for other gene families.

Kind regards
Thibaut

> On 21 Oct 2021, at 22:19, Podtelezhnikov, Alexei <alexei_podtelezhnikov at merck.com> wrote:
> 
> Dear developers,
>  
> The tox community looks forward to the mRatBN7.2 assembly that should come in Ensembl 105. My benchmark is Ugt1a locus (ENSRNOG00000018740) which is historically very problematic for Ensembl. Mouse and Human get it right with multiple genes having distant first exons (and promoters) but sharing the trailing exons. Rat was never correct when using transcript variants of a single gene. What is even more confusing that the rat gene name is randomly changing: Ugt1a3 in Rapid, Ugt1a6 in 104, Ugt1a3 in 96, Ugt1a5 in 90, and so on.
>  
> In general the Rapid with mRatBN7.2 is much improved with fewer gene duplications and less gene entanglement in large families. It is just Ugt1a family which continues to be a problem. I suggest that this family be aligned with NCBI and with mouse and human. I think there should be reasonable alignment between three mammalian genomes selected for GRC.
>  
> Please consider this improvement for the future release. Thank you for your time and help.
>  
> Alexei
>  
> Alexei A Podtelezhnikov, PhD
> Principal Scientist
> Genome & Biomarker Sciences
> Merck & Co., Inc.
>  
> Notice:  This e-mail message, together with any attachments, contains
> information of Merck & Co., Inc. (2000 Galloping Hill Road, Kenilworth, 
> New Jersey, USA 07033), and/or its affiliates Direct contact information
> for affiliates is available at 
> http://www.merck.com/contact/contacts.html <http://www.merck.com/contact/contacts.html>) that may be confidential,
> proprietary copyrighted and/or legally privileged. It is intended solely
> for the use of the individual or entity named on this message. If you are
> not the intended recipient, and have received this message in error,
> please notify us immediately by reply e-mail and then delete it from 
> your system.
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org <mailto:Dev at ensembl.org>
> Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org <https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org>
> Ensembl Blog: http://www.ensembl.info/ <http://www.ensembl.info/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20211102/03527a3e/attachment.html>


More information about the Dev mailing list