[ensembl-dev] Transcripts with stop codon ?

Wed Feb 1 16:54:38 GMT 2012

On Feb 1, 2012, at 5:37 PM, Javier Herrero wrote:

> Dear Sébastien
> 
> It is likely this particular example is a prediction artefact, the stop codon being very close to the splice site in orthologous proteins (http://tinyurl.com/7k2lfsf)
> 
> As pointed out by Lukasz, not all stop codons are wrong. Selenocysteins are encoded by stop codons. This is why you may find them in our compara alignments.

Our study focused on FANTOM3 mouse cDNAs, while ENSEMBL starts predictions from the genome and mostly relies on protein homology as evidence, so it's difficult to compare directly.. 

Still, intron retention was a common cause for internal STOP codons in the F3 collection, and the same might happen with gene predictions if substantial weight was attached to full-length cDNAs as evidence..

> 
> Kind regards
> 
> Javier
> 
> On 01/02/12 16:25, Lukasz Huminiecki wrote:
>> Dear ENSEMBL,
>> While it sounds like the problem discussed here is an experimental or prediction artefact, please also note that there are some genuine (if not necessarily always functional) transcripts with stop codons. For example:
>> 
>> Pseudo-messenger RNA: phantoms of the transcriptome.
>> Frith MC, Wilming LG, Forrest A, Kawaji H, Tan SL, Wahlestedt C, Bajic VB, Kai C, Kawai J, Carninci P, Hayashizaki Y, Bailey TL, Huminiecki L.
>> PLoS Genet. 2006 Apr;2(4):e23. Epub 2006 Apr 28.
>> PMID: 16683022
>> 
>> kind regards, Lukasz
>> 
>> On Feb 1, 2012, at 4:11 PM, Michael Paulini wrote:
>> 
>>> On 01/02/12 14:54, Moretti Sébastien wrote:
>>>>>> Hi
>>>>>> 
>>>>>> I have just noticed that some transcripts have stop codon(s) in their
>>>>>> sequence. E.g. ENSCJAT00000065209
>>>>>> 
>>>>>> Is it normal ?
>>>>>> 
>>>>>> 
>>>>>> These stop codons and, more problematic, the "fake" codons next after
>>>>>> the stop are included in compara alignments.
>>>>>> 
>>>>> you mean translations?
>>>>> Due to the case that the translation also doesn't have a start, I would
>>>>> put that down as a prediction artefact, similar to what you can see on a
>>>>> lot of low-coverage gene sets where you get fragments of genes and
>>>>> in-frame-stops.
>>>> I fully agree about prediction artefacts.
>>>> But in the case of ENSCJAT00000065209, there are 2 predicted amino acids after the stop codon. Those ones are included in your alignment and tree building processes.
>>>> Two aa should not disturb the phylogeny too much but what happens if this is 40 untranslated aa ?
>>> I think the usual rule applies here: "bad protein predictions make bad phylogenetic trees".
>>> In the case of our nematode genomes, we fix them when we see them, but in more hands-off operations if it can't be scripted, it will not happen.
>>> I also had a long discussion about in-frame stops in models on next-gen assemblies, and as most of these genomes will not get fixed in the foreseeable future, there are two options: allow them to cover genomic errors, or count them as real stops and loose potential parts of a gene (which will also mess up phylogenetic trees).
>>> 
>>> M
>>>>> M
>>> 
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>> 
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>> 
> 
> -- 
> Javier Herrero, PhD
> Ensembl Coordinator and Ensembl Compara Project Leader
> European Bioinformatics Institute (EMBL-EBI)
> Wellcome Trust Genome Campus, Hinxton
> Cambridge - CB10 1SD - UK
> 
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/