[ensembl-dev] Translations

Sam Seaver samseaver at gmail.com
Thu Oct 4 16:59:50 BST 2012


Arnaud,

One example we have in Oryza sativa is: LOC_Os10g21210.1

which translates to:
MTIALGRVTKEENDLFDIMDDWLRRDRFVFVGWSGLFFFLVLISL*EVGLQGQLL*LLGI
PMDWRVPIWKVAIS*PQQFPPLPIV*HTLCCYYGARKHKGILLVGVN*VVCGLLLLSMGL
LH**VSCYVNLNLLGLFNCGLIMQFHSLAQSLFLFPYS*FIHWGNPVGSLRRVLA*QRYF
DSSSSSKDFIIGR*THFI*WELPEY*ARLCYALFMGQPWKTLYLRTVMVQIPSALLTQLK
LKKLIQWSPLIAFGPKSLVLLFPINVGYISLCYLYRSPVYG*VLLA*SAWL*TYVPMTSF
PRKSVQRKILNLRLSTPKIFF*TRVFVRGWQLRISLMKILYSLRRFYHVEMLF

However, I'm also trying to find the actual Ensembl release this came
from, we got the data from Gramene and the release numbers don't
match.  To be perfectly honest with you, we are confused as to whether
to discuss these issues with Gramene or Ensembl Plants, does this
depend on the species?

S

On Thu, Oct 4, 2012 at 10:23 AM, Arnaud Kerhornou
<arnaudbioinfo at gmail.com> wrote:
> On 04/10/2012 15:45, Sam Seaver wrote:
>>
>> Dear Arnaud,
>>
>> Apparently these embedded stop codons were found in a few sequences in
>> O. sativa and V. vinifera.  There was a miscommunication and by
>> "ignored", my colleague actually meant '*'.
>
> Re. V. vinifera, we have noticed some genes had their translation holding
> internal stop codon. This will be fixed in the next release with is coming
> at the end of this month.
> Because of their number (44 cases), it would be difficult to go through each
> of them to find out how to fix them, so we have removed their translation
> and updated their biotype to 'nontranslating_cds'.
>
> Re. O. sativa, I can not find any cases of translations with internal stop
> codons or of translation where we perform amino acid substitution, can you
> direct us to a gene or translation ?
>
>> However, your email provokes another question, how do you define
>> whether a stop codon actually belongs to another amino acid such as
>> Selenocystein.  Is this a case where, for the species, every instance
>> of TGA is known to belong to Selenocystein?
>
> Not all TGAs are Selenocystein. Selenocystein amonoacids are defined by the
> presence of an RNA motif, called SECIS, in the 3' UTR of the transcript.
> Ideally, they are specified in the gff3 file we load to build our core
> databases, but it is not always the case.
> What I usually do is to look at the gene function, as these genes are
> associated with oxydo-reduction reaction. Then in Ensembl we have mechanisms
> to substitute one or more aminoacid at a given position in the protein
> sequence.
> That what we did for Chlamydomonas, e.g.:
> http://plants.ensembl.org/Chlamydomonas_reinhardtii/Transcript/Sequence_Protein?db=core;g=CHLREDRAFT_206086;r=DS496117:1347779-1349885;t=EDP05676
>
> Arnaud
>
>>
>> Thanks
>> Sam
>>
>> On Thu, Oct 4, 2012 at 8:50 AM, Arnaud Kerhornou <arnaud at ebi.ac.uk> wrote:
>>>
>>> Dear Sam,
>>>
>>> Could you give us the list of species where it is the case ?
>>> There are some cases where the transcribed DNA sequence has stop codons
>>> but
>>> they're not real, and we have a mechanism in the Ensembl API to replace
>>> the
>>> stop codon by the right amino acid.
>>>
>>> Typical case is for Selenocystein genes where an internal stop codon
>>> (TGA),
>>> which is replaced by a 'U' in the amino acid sequence.
>>>
>>> In all cases, they should not be ignored. If we don't specify the correct
>>> amino acid behind a stop codon, it is not discarded and the amino acid
>>> sequence would hold an internal '*' character.
>>>
>>> Arnaud
>>>
>>>
>>> On 04/10/2012 14:30, Sam Seaver wrote:
>>>>
>>>> Dear ensembl-dev,
>>>>
>>>> A colleague has discovered that in a few of the plant genomes, the
>>>> underlying DNA sequence of a CDS may have some embedded stop codons.
>>>> He subsequently found that the resulting translation, as performed by
>>>> Ensembl, ignores these completely.
>>>>
>>>> We were wondering what, if any, other problems are encountered when
>>>> translating plant genes, and what the Ensembl translation code does to
>>>> address these?
>>>>
>>>> Thanks
>>>> Sam
>>>>
>>
>>
>



-- 
Postdoctoral Fellow
Mathematics and Computer Science Division
Argonne National Laboratory
9700 S. Cass Avenue
Argonne, IL 60439

http://www.linkedin.com/pub/sam-seaver/0/412/168
samseaver at gmail.com
(773) 796-7144

"We shall not cease from exploration
And the end of all our exploring
Will be to arrive where we started
And know the place for the first time."
   --T. S. Eliot




More information about the Dev mailing list