[ensembl-dev] Translations

Arnaud Kerhornou arnaud at ebi.ac.uk
Thu Oct 4 17:18:38 BST 2012


On 04/10/2012 16:59, Sam Seaver wrote:
> Arnaud,
>
> One example we have in Oryza sativa is: LOC_Os10g21210.1
>
> which translates to:
> MTIALGRVTKEENDLFDIMDDWLRRDRFVFVGWSGLFFFLVLISL*EVGLQGQLL*LLGI
> PMDWRVPIWKVAIS*PQQFPPLPIV*HTLCCYYGARKHKGILLVGVN*VVCGLLLLSMGL
> LH**VSCYVNLNLLGLFNCGLIMQFHSLAQSLFLFPYS*FIHWGNPVGSLRRVLA*QRYF
> DSSSSSKDFIIGR*THFI*WELPEY*ARLCYALFMGQPWKTLYLRTVMVQIPSALLTQLK
> LKKLIQWSPLIAFGPKSLVLLFPINVGYISLCYLYRSPVYG*VLLA*SAWL*TYVPMTSF
> PRKSVQRKILNLRLSTPKIFF*TRVFVRGWQLRISLMKILYSLRRFYHVEMLF
>
> However, I'm also trying to find the actual Ensembl release this came
> from, we got the data from Gramene and the release numbers don't
> match.  To be perfectly honest with you, we are confused as to whether
> to discuss these issues with Gramene or Ensembl Plants, does this
> depend on the species?
We work with Gramene to produce Ensembl datasets, so Gramene and Ensembl 
Genomes should hold the same underlined data.

To comment on this particular case, because the translation holds 
internal stop codons, we associate the gene with a particular biotype 
'nontranslating_cds' and we delete the translation object if any.
Looks like we missed to remove the translation object which should not 
have been visible. That something we can clear up for the next release.

Thanks for coming back to us about this,
Arnaud
> S
>
> On Thu, Oct 4, 2012 at 10:23 AM, Arnaud Kerhornou
> <arnaudbioinfo at gmail.com> wrote:
>> On 04/10/2012 15:45, Sam Seaver wrote:
>>> Dear Arnaud,
>>>
>>> Apparently these embedded stop codons were found in a few sequences in
>>> O. sativa and V. vinifera.  There was a miscommunication and by
>>> "ignored", my colleague actually meant '*'.
>> Re. V. vinifera, we have noticed some genes had their translation holding
>> internal stop codon. This will be fixed in the next release with is coming
>> at the end of this month.
>> Because of their number (44 cases), it would be difficult to go through each
>> of them to find out how to fix them, so we have removed their translation
>> and updated their biotype to 'nontranslating_cds'.
>>
>> Re. O. sativa, I can not find any cases of translations with internal stop
>> codons or of translation where we perform amino acid substitution, can you
>> direct us to a gene or translation ?
>>
>>> However, your email provokes another question, how do you define
>>> whether a stop codon actually belongs to another amino acid such as
>>> Selenocystein.  Is this a case where, for the species, every instance
>>> of TGA is known to belong to Selenocystein?
>> Not all TGAs are Selenocystein. Selenocystein amonoacids are defined by the
>> presence of an RNA motif, called SECIS, in the 3' UTR of the transcript.
>> Ideally, they are specified in the gff3 file we load to build our core
>> databases, but it is not always the case.
>> What I usually do is to look at the gene function, as these genes are
>> associated with oxydo-reduction reaction. Then in Ensembl we have mechanisms
>> to substitute one or more aminoacid at a given position in the protein
>> sequence.
>> That what we did for Chlamydomonas, e.g.:
>> http://plants.ensembl.org/Chlamydomonas_reinhardtii/Transcript/Sequence_Protein?db=core;g=CHLREDRAFT_206086;r=DS496117:1347779-1349885;t=EDP05676
>>
>> Arnaud
>>
>>> Thanks
>>> Sam
>>>
>>> On Thu, Oct 4, 2012 at 8:50 AM, Arnaud Kerhornou <arnaud at ebi.ac.uk> wrote:
>>>> Dear Sam,
>>>>
>>>> Could you give us the list of species where it is the case ?
>>>> There are some cases where the transcribed DNA sequence has stop codons
>>>> but
>>>> they're not real, and we have a mechanism in the Ensembl API to replace
>>>> the
>>>> stop codon by the right amino acid.
>>>>
>>>> Typical case is for Selenocystein genes where an internal stop codon
>>>> (TGA),
>>>> which is replaced by a 'U' in the amino acid sequence.
>>>>
>>>> In all cases, they should not be ignored. If we don't specify the correct
>>>> amino acid behind a stop codon, it is not discarded and the amino acid
>>>> sequence would hold an internal '*' character.
>>>>
>>>> Arnaud
>>>>
>>>>
>>>> On 04/10/2012 14:30, Sam Seaver wrote:
>>>>> Dear ensembl-dev,
>>>>>
>>>>> A colleague has discovered that in a few of the plant genomes, the
>>>>> underlying DNA sequence of a CDS may have some embedded stop codons.
>>>>> He subsequently found that the resulting translation, as performed by
>>>>> Ensembl, ignores these completely.
>>>>>
>>>>> We were wondering what, if any, other problems are encountered when
>>>>> translating plant genes, and what the Ensembl translation code does to
>>>>> address these?
>>>>>
>>>>> Thanks
>>>>> Sam
>>>>>
>>>
>
>





More information about the Dev mailing list