[ensembl-dev] translateable_seq returning sequences that don't appear to be translateable

Mon Jan 24 17:14:08 GMT 2011

Hello. I recently asked a question about the EnsEMBL Perl API on the biostar
stackexchange site -
http://biostar.stackexchange.com/questions/5044/ensembl-perl-api-translateable-seq-returns-sequences-that-arent-multiples-of-3-n.
I have some questions about Giulietta's response to my question, and this
list seemed a more appropriate place to continue discussion than in comments
on biostar.

1) Could anyone elaborate on Giulietta's point involving "all defined RNA
edits" and selenocysteine? My (very limited) understanding of selenocysteine
incorporation is that in eukaryotes, nothing in the mRNA in the immediate
vicinity of a UGA codon is changed by the fact that the UGA will eventually
be translated into selenocysteine. The database would need to know about
this in order to return the correct amino acid sequence for a transcript,
but translateable_seq doesn't return an amino acid sequence. It returns a
nucleotide sequence.

2) The focus on ENSMUSG00000064363 in the biostar thread is unfortunate. I
was pressed for a specific example and chose one randomly. I am more
concerned with the issue of whether I have realistic expectations for the
translateable_seq method. A sequence that isn't an whole number of codons
long or that contains an 'N' character doesn't seem translateable in a
strict sense of the word. Is it consistent with the design intent for the
method for these sequences to be returned by it?

 - Jeff
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20110124/ee4c7111/attachment.html>