[ensembl-dev] Probably duplicated human gene in latest release

Carlos carlos at ebi.ac.uk
Fri Apr 6 10:08:03 BST 2018


Hi Wolf,

Just to confirm this is now fixed in the Ensembl release 92 - GENCODE 28
which was released yesterday.

Thank you for reporting that.

Cheers,
Carlos

On 01/12/17 16:22, Fergal wrote:
> Hi Wolf,
>
> Yes, we should be able to apply the code change to
> stop ENST00000614349.4 being added to the readthrough for e92.
>
> Fergal.
>
> On 1 Dec 2017, at 15:56, Wolf Beat <Beat.Wolf at hefr.ch
> <mailto:Beat.Wolf at hefr.ch>> wrote:
>
>> Thank you very much for the detailed answer. Will this be fixed for a
>> future release? (specifically the ENST00000614349.4 transcript).
>>
>>
>> I the meantime the idea to filter read through genes (although i will
>> have to read up a little more about what exactly it really means,
>> even if your explanation is already quite good). I will check if i
>> can do this though the biomart interface.
>>
>>
>> Kind regards
>>
>>
>> Beat Wolf
>>
>> ________________________________
>> From: Dev <dev-bounces at ensembl.org <mailto:dev-bounces at ensembl.org>>
>> on behalf of Fergal <fergal at ebi.ac.uk <mailto:fergal at ebi.ac.uk>>
>> Sent: Friday, December 1, 2017 4:52:56 PM
>> To: Ensembl developers list
>> Subject: Re: [ensembl-dev] Probably duplicated human gene in latest
>> release
>>
>> Hi Wolf,
>>
>> This is a rather complicated scenario. ENSG00000255292 is a
>> readthrough gene. Readthrough genes are manually annotated by the
>> Havana team and are made when there is some biological evidence of
>> transcription of a single molecule that spans two distinct loci (in
>> this case ENSG00000204370 and ENSG00000197580). This information can
>> be seen on the gene summary via the annotation attribute “overlapping
>> locus”, though it admittedly this is not very obvious.
>>
>> While there is experimental evidence for this occurring, it is
>> unclear if such events have any true biological meaning. Often the
>> assigned biotypes is non-sense mediated decay in these instances to
>> signify that the product is not viable.
>>
>> As ENSG00000204370 and ENSG00000197580  are two distinct genes and
>> ENSG00000255292 represents a readthrough event between them, the
>> records should not be merged. However, as you’ve noted this scenario
>> does then provide challenges for mapping pipelines when it comes to
>> naming and cross-referencing.
>>
>> One thing that does appear to have gone wrong is the inclusion of
>> ENST00000614349.4 in the readthrough gene. This was added into the
>> gene via our merge code, which bases the decision to merge
>> automatically annotated transcripts into manually curated genes based
>> on exon overlap. ENST00000614349.4 had the most exon overlap with one
>> of the transcripts in the readtrhough gene and thus was merged in. We
>> are going to add a rule avoid merging protein coding transcripts into
>> readthrough genes to hopefully solve the issue in future releases.
>>
>> A workaround (depending on what you’re doing) is to just filter
>> readthrough genes out of you analysis. You can generate a list of
>> readthroughs via the following SQL:
>>
>> mysql -uanonymous -hensembldb.ensembl.org
>> <http://hensembldb.ensembl.org/><http://hensembldb.ensembl.org
>> <http://hensembldb.ensembl.org/>> homo_sapiens_core_90_38 -NB -e
>> "select distinct(concat(gene.stable_id,'.',gene.version)) from gene
>> join transcript using(gene_id) join transcript_attrib
>> using(transcript_id) where value='readthrough'"
>>
>> This can also be done through the API by looking at the transcript
>> attributes.
>>
>> Hope this helps,
>>
>> Fergal.
>>
>>
>> On 1 Dec 2017, at 14:56, Wolf Beat <Beat.Wolf at hefr.ch
>> <mailto:Beat.Wolf at hefr.ch><mailto:Beat.Wolf at hefr.ch>> wrote:
>>
>> Sorry, this is my fault. I was comparing all possible ensembl
>> versions and copied the link from the wrong tab.
>>
>>
>> So the correct links are:
>>
>>
>> http://www.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000204370;r=11:112086773-112120013
>>
>> http://www.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000255292;r=11:112086824-112193805
>>
>>
>> Sorry for being incorrect in the last email with my links.
>>
>> ________________________________
>> From: Matthew Laird <lairdm at ebi.ac.uk>
>> Sent: Friday, December 1, 2017 3:54:18 PM
>> To: Ensembl developers list; Wolf Beat
>> Subject: Re: [ensembl-dev] Probably duplicated human gene in latest
>> release
>>
>> Hello Wolf,
>>
>> The latter link is for Ensembl release 75, which was the final release
>> in which GRCh37 was used. So the records represented by those two links
>> are for two different assemblies of the genome. Between that and the
>> periodic updates that do happen to annotations between releases, it's
>> not surprising the transcripts for the gene would be different. If you
>> look in the gene history [1] page on the current release you can see the
>> gene was updated and the version number incremented in Ensembl
>> release 81.
>>
>> If I'm misunderstanding your question, please let me know and we can try
>> to resolve it. Cheers.
>>
>> [1]
>> http://www.ensembl.org/Homo_sapiens/Gene/Idhistory?db=core;g=ENSG00000204370;r=11:112086773-112120013
>>
>> On 01/12/17 13:27, Wolf Beat wrote:
>> Hello,
>>
>>
>> i just noticed that the human gene SDHD. It does not have the same
>> transcripts in both entries, but at least one protein coding gene is
>> present in both. Also the description, including the HGNC Symbol is
>> the same, which makes me think that this is some kind of error. Both
>> entries should probably be merged. Here are the two entries for the
>> same gene:
>>
>>
>> http://www.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000204370;r=11:112086773-112120013
>>
>>
>> http://feb2014.archive.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000204370;r=11:111957497-111990353
>>
>>
>> Kind regards
>>
>>
>> Beat Wolf
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20180406/a706824f/attachment.html>


More information about the Dev mailing list