[ensembl-dev] Transcripts with stop codon ?
jherrero at ebi.ac.uk
Wed Feb 1 16:37:55 GMT 2012
It is likely this particular example is a prediction artefact, the stop
codon being very close to the splice site in orthologous proteins
As pointed out by Lukasz, not all stop codons are wrong. Selenocysteins
are encoded by stop codons. This is why you may find them in our compara
On 01/02/12 16:25, Lukasz Huminiecki wrote:
> Dear ENSEMBL,
> While it sounds like the problem discussed here is an experimental or prediction artefact, please also note that there are some genuine (if not necessarily always functional) transcripts with stop codons. For example:
> Pseudo-messenger RNA: phantoms of the transcriptome.
> Frith MC, Wilming LG, Forrest A, Kawaji H, Tan SL, Wahlestedt C, Bajic VB, Kai C, Kawai J, Carninci P, Hayashizaki Y, Bailey TL, Huminiecki L.
> PLoS Genet. 2006 Apr;2(4):e23. Epub 2006 Apr 28.
> PMID: 16683022
> kind regards, Lukasz
> On Feb 1, 2012, at 4:11 PM, Michael Paulini wrote:
>> On 01/02/12 14:54, Moretti Sébastien wrote:
>>>>> I have just noticed that some transcripts have stop codon(s) in their
>>>>> sequence. E.g. ENSCJAT00000065209
>>>>> Is it normal ?
>>>>> These stop codons and, more problematic, the "fake" codons next after
>>>>> the stop are included in compara alignments.
>>>> you mean translations?
>>>> Due to the case that the translation also doesn't have a start, I would
>>>> put that down as a prediction artefact, similar to what you can see on a
>>>> lot of low-coverage gene sets where you get fragments of genes and
>>> I fully agree about prediction artefacts.
>>> But in the case of ENSCJAT00000065209, there are 2 predicted amino acids after the stop codon. Those ones are included in your alignment and tree building processes.
>>> Two aa should not disturb the phylogeny too much but what happens if this is 40 untranslated aa ?
>> I think the usual rule applies here: "bad protein predictions make bad phylogenetic trees".
>> In the case of our nematode genomes, we fix them when we see them, but in more hands-off operations if it can't be scripted, it will not happen.
>> I also had a long discussion about in-frame stops in models on next-gen assemblies, and as most of these genomes will not get fixed in the foreseeable future, there are two options: allow them to cover genomic errors, or count them as real stops and loose potential parts of a gene (which will also mess up phylogenetic trees).
>> Dev mailing list Dev at ensembl.org
>> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
> Dev mailing list Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
Javier Herrero, PhD
Ensembl Coordinator and Ensembl Compara Project Leader
European Bioinformatics Institute (EMBL-EBI)
Wellcome Trust Genome Campus, Hinxton
Cambridge - CB10 1SD - UK
More information about the Dev