[ensembl-dev] Find out where translation errors occur within transcript?

Thu May 3 11:39:00 BST 2018

Hi ensembl-dev, 

Sometimes the annotation I try to load up has transcripts that don't
translate to valid proteins. I then go to look at them in a genome
viewer to get an idea what's wrong, and it's helpful to know where to
look. 

I've tried to work with the values reported to me by the
ProteinTranslation healthcheck log, until I realised they're nonsense -
I think this code is wrong: 

https://github.com/Ensembl/ensj-healthcheck/blob/release/92/perl/Bio/EnsEMBL/Healthcheck/Translation.pm#L306

It takes the protein sequence (in the peptide alphabet), looks for
indexes of '*', adds these to the beginning of transcript start ( in the
dna alphabet), and claims these to be locations of stop codons. 

I currently have no good way of doing this. I have been translating the
exons in all three phases, saving them to a file, and then text
searching for bits of the sequence around the *. Does Ensembl offer a
better way that I couldn't find, or, can you think of one?Thanks,Wojtek
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20180503/a34838a7/attachment.html>