[ensembl-dev] Translations

Arnaud Kerhornou arnaud at ebi.ac.uk
Thu Oct 4 16:27:59 BST 2012


On 04/10/2012 15:45, Sam Seaver wrote:
> Dear Arnaud,
>
> Apparently these embedded stop codons were found in a few sequences in
> O. sativa and V. vinifera.  There was a miscommunication and by
> "ignored", my colleague actually meant '*'.
Re. V. vinifera, we have noticed some genes had their translation 
holding internal stop codon. This will be fixed in the next release with 
is coming at the end of this month.
Because of their number (44 cases), it would be difficult to go through 
each of them to find out how to fix them, so we have removed their 
translation and updated their biotype to 'nontranslating_cds'.

Re. O. sativa, I can not find any cases of translations with internal 
stop codons or of translation where we perform amino acid substitution, 
can you direct us to a gene or translation ?
> However, your email provokes another question, how do you define
> whether a stop codon actually belongs to another amino acid such as
> Selenocystein.  Is this a case where, for the species, every instance
> of TGA is known to belong to Selenocystein?
Not all TGAs are Selenocystein. Selenocystein amonoacids are defined by 
the presence of an RNA motif, called SECIS, in the 3' UTR of the 
transcript.
Ideally, they are specified in the gff3 file we load to build our core 
databases, but it is not always the case.
What I usually do is to look at the gene function, as these genes are 
associated with oxydo-reduction reaction. Then in Ensembl we have 
mechanisms to substitute one or more aminoacid at a given position in 
the protein sequence.
That what we did for Chlamydomonas, e.g.:
http://plants.ensembl.org/Chlamydomonas_reinhardtii/Transcript/Sequence_Protein?db=core;g=CHLREDRAFT_206086;r=DS496117:1347779-1349885;t=EDP05676 


Arnaud
> Thanks
> Sam
>
> On Thu, Oct 4, 2012 at 8:50 AM, Arnaud Kerhornou <arnaud at ebi.ac.uk> wrote:
>> Dear Sam,
>>
>> Could you give us the list of species where it is the case ?
>> There are some cases where the transcribed DNA sequence has stop codons but
>> they're not real, and we have a mechanism in the Ensembl API to replace the
>> stop codon by the right amino acid.
>>
>> Typical case is for Selenocystein genes where an internal stop codon (TGA),
>> which is replaced by a 'U' in the amino acid sequence.
>>
>> In all cases, they should not be ignored. If we don't specify the correct
>> amino acid behind a stop codon, it is not discarded and the amino acid
>> sequence would hold an internal '*' character.
>>
>> Arnaud
>>
>>
>> On 04/10/2012 14:30, Sam Seaver wrote:
>>> Dear ensembl-dev,
>>>
>>> A colleague has discovered that in a few of the plant genomes, the
>>> underlying DNA sequence of a CDS may have some embedded stop codons.
>>> He subsequently found that the resulting translation, as performed by
>>> Ensembl, ignores these completely.
>>>
>>> We were wondering what, if any, other problems are encountered when
>>> translating plant genes, and what the Ensembl translation code does to
>>> address these?
>>>
>>> Thanks
>>> Sam
>>>
>
>





More information about the Dev mailing list