[ensembl-dev] translateable_seq returning sequences that don't appear to be translateable

Michael Schuster michaels at ebi.ac.uk
Tue Jan 25 11:13:19 GMT 2011


Hi Jeff,

Just to clarify this a bit more. The transcripts you see are mainly arising from the manual genome annotation effort by the Havana group at the Wellcome Trust Sanger Institute. As these transcripts are manually annotated and hand checked, curators extend the transcript as far as they find evidence for in cDNAs and ESTs. This may lead to cases where the end of the transcript and thus its translation no longer ends at a codon boundary. Truncating the translation back to a codon boundary would imply a UTR, which is not true. These cases rather indicate that the translation must biologically extend, yet there is currently no support for its complete annotation.

The Ensembl API handles these cases perfectly fine and upon getting the translated sequence you will find an X either at the beginning or end of those transcripts. Looking for an X in the translated sequence will therefore tell you that the sequence is not complete and biological evidence is missing.

In the automated Ensembl genome annotation pipeline these cases are handled differently so that all transcripts and their translations are truncated back to a codon boundary irrespective of a short overhang.

Sequence edits are completely separate from this issue and the Ensembl API supports them on both levels, for Transcripts and for Translations. Since Ensembl strictly transcribes transcripts off the genome sequence, a transcript is just the concatenation of all exon sequences and a translation is the translated part of the coding region, we can use these sequence edits to override the genome sequence locally.

As far as I am aware, we are not using RNA edits at this stage, but we could use them to override polymorphic pseudogenes where the reference genome has either a missense mutation or a stop codon, while other populations clearly have a functional gene. With such a sequence edit we could patch the mRNA resulting from the reference genome into a functional molecule.

An example for a sequence edit on the translation level would be selenocysteines, where the symbol for stop codons (*) get replaced by the symbol for selenocysteins (U).

I hope this clarifies your observations.

Best regards,
Michael


> Hello. I recently asked a question about the EnsEMBL Perl API on the biostar stackexchange site - http://biostar.stackexchange.com/questions/5044/ensembl-perl-api-translateable-seq-returns-sequences-that-arent-multiples-of-3-n. I have some questions about Giulietta's response to my question, and this list seemed a more appropriate place to continue discussion than in comments on biostar.
> 
> 1) Could anyone elaborate on Giulietta's point involving "all defined RNA edits" and selenocysteine? My (very limited) understanding of selenocysteine incorporation is that in eukaryotes, nothing in the mRNA in the immediate vicinity of a UGA codon is changed by the fact that the UGA will eventually be translated into selenocysteine. The database would need to know about this in order to return the correct amino acid sequence for a transcript, but translateable_seq doesn't return an amino acid sequence. It returns a nucleotide sequence.
> 
> 2) The focus on ENSMUSG00000064363 in the biostar thread is unfortunate. I was pressed for a specific example and chose one randomly. I am more concerned with the issue of whether I have realistic expectations for the translateable_seq method. A sequence that isn't an whole number of codons long or that contains an 'N' character doesn't seem translateable in a strict sense of the word. Is it consistent with the design intent for the method for these sequences to be returned by it?
> 
>  - Jeff
> _______________________________________________
> Dev mailing list
> Dev at ensembl.org
> http://lists.ensembl.org/mailman/listinfo/dev

--
Michael Schuster
Ensembl Genome Browser
EMBL - European Bioinformatics Institute
Wellcome Trust Genome Campus
Hinxton, Cambridgeshire CB10 1SD
United Kingdom

URL: http://www.ensembl.org/






More information about the Dev mailing list