[ensembl-dev] Find out where translation errors occur within transcript?

Matthew Laird lairdm at ebi.ac.uk
Fri May 4 11:00:25 BST 2018


Hi Wojtek,

I went to speak with Kevin to better understand what you're trying to 
accomplish. Based on our conversation, you'll want to look at the 
pep2genomic() method in the Transcript object.

For these transcripts you're trying to evaluate if the data you've used 
to load an Ensembl db is correct, use the translateable_seq() function 
to retrieve the sequence as would be translated based on the input data. 
Then cycle through the sequence looking for the stop codon characters. 
You can then take those positions in the protein and feed them in to 
pep2genomic( x, x ) to find the genomic coordinates.

The Ensembl API documentation [1] details the return type for this call, 
a series of Coordinate and Gap objects containing the genomic coordinates.

If you have any further questions, or if we didn't properly understand 
what you were trying to accomplish, let us know. Thanks.

[1] 
http://www.ensembl.org/info/docs/Doxygen/core-api/classBio_1_1EnsEMBL_1_1TranscriptMapper.html#afb66982442190f4eaa711fdee89b0418

On 03/05/18 11:39, Wojtek Bażant wrote:
>
> Hi ensembl-dev,
>
> Sometimes the annotation I try to load up has transcripts that don't 
> translate to valid proteins. I then go to look at them in a genome 
> viewer to get an idea what's wrong, and it's helpful to know where to 
> look.
>
> I've tried to work with the values reported to me by the 
> ProteinTranslation healthcheck log, until I realised they're nonsense 
> - I think this code is wrong:
>
> https://github.com/Ensembl/ensj-healthcheck/blob/release/92/perl/Bio/EnsEMBL/Healthcheck/Translation.pm#L306
>
> It takes the protein sequence (in the peptide alphabet), looks for 
> indexes of '*', adds these to the beginning of transcript start ( in 
> the dna alphabet), and claims these to be locations of stop codons.
>
> I currently have no good way of doing this. I have been translating 
> the exons in all three phases, saving them to a file, and then text 
> searching for bits of the sequence around the *. Does Ensembl offer a 
> better way that I couldn't find, or, can you think of one?Thanks,Wojtek
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20180504/621ce96b/attachment.html>


More information about the Dev mailing list