[ensembl-dev] Transcripts with stop codon ?

Michael Paulini mh6 at sanger.ac.uk
Wed Feb 1 16:32:56 GMT 2012

On 01/02/12 16:25, Lukasz Huminiecki wrote:
> While it sounds like the problem discussed here is an experimental or prediction artefact, please also note that there are some genuine (if not necessarily always functional) transcripts with stop codons. For example:
> Pseudo-messenger RNA: phantoms of the transcriptome.
> Frith MC, Wilming LG, Forrest A, Kawaji H, Tan SL, Wahlestedt C, Bajic VB, Kai C, Kawai J, Carninci P, Hayashizaki Y, Bailey TL, Huminiecki L.
> PLoS Genet. 2006 Apr;2(4):e23. Epub 2006 Apr 28.
> PMID: 16683022
> kind regards, Lukasz

Obviously they will have seq_edits connected to their transcripts, 
similar to the selenocysteins and not appear with internal stops ;-)


> On Feb 1, 2012, at 4:11 PM, Michael Paulini wrote:
>> On 01/02/12 14:54, Moretti Sébastien wrote:
>>>>> Hi
>>>>> I have just noticed that some transcripts have stop codon(s) in their
>>>>> sequence. E.g. ENSCJAT00000065209
>>>>> Is it normal ?
>>>>> These stop codons and, more problematic, the "fake" codons next after
>>>>> the stop are included in compara alignments.
>>>> you mean translations?
>>>> Due to the case that the translation also doesn't have a start, I would
>>>> put that down as a prediction artefact, similar to what you can see on a
>>>> lot of low-coverage gene sets where you get fragments of genes and
>>>> in-frame-stops.
>>> I fully agree about prediction artefacts.
>>> But in the case of ENSCJAT00000065209, there are 2 predicted amino acids after the stop codon. Those ones are included in your alignment and tree building processes.
>>> Two aa should not disturb the phylogeny too much but what happens if this is 40 untranslated aa ?
>> I think the usual rule applies here: "bad protein predictions make bad phylogenetic trees".
>> In the case of our nematode genomes, we fix them when we see them, but in more hands-off operations if it can't be scripted, it will not happen.
>> I also had a long discussion about in-frame stops in models on next-gen assemblies, and as most of these genomes will not get fixed in the foreseeable future, there are two options: allow them to cover genomic errors, or count them as real stops and loose potential parts of a gene (which will also mess up phylogenetic trees).
>> M
>>>> M
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/

More information about the Dev mailing list