[ensembl-dev] Transcripts with stop codon ?

Wed Feb 1 15:11:14 GMT 2012

On 01/02/12 14:54, Moretti Sébastien wrote:
>>> Hi
>>>
>>> I have just noticed that some transcripts have stop codon(s) in their
>>> sequence. E.g. ENSCJAT00000065209
>>>
>>> Is it normal ?
>>>
>>>
>>> These stop codons and, more problematic, the "fake" codons next after
>>> the stop are included in compara alignments.
>>>
>> you mean translations?
>> Due to the case that the translation also doesn't have a start, I would
>> put that down as a prediction artefact, similar to what you can see on a
>> lot of low-coverage gene sets where you get fragments of genes and
>> in-frame-stops.
>
> I fully agree about prediction artefacts.
> But in the case of ENSCJAT00000065209, there are 2 predicted amino 
> acids after the stop codon. Those ones are included in your alignment 
> and tree building processes.
> Two aa should not disturb the phylogeny too much but what happens if 
> this is 40 untranslated aa ?
I think the usual rule applies here: "bad protein predictions make bad 
phylogenetic trees".
In the case of our nematode genomes, we fix them when we see them, but 
in more hands-off operations if it can't be scripted, it will not happen.
I also had a long discussion about in-frame stops in models on next-gen 
assemblies, and as most of these genomes will not get fixed in the 
foreseeable future, there are two options: allow them to cover genomic 
errors, or count them as real stops and loose potential parts of a gene 
(which will also mess up phylogenetic trees).

M
>> M
>