[ensembl-dev] Find out where translation errors occur within transcript?
Matthew Laird
lairdm at ebi.ac.uk
Fri May 4 11:00:25 BST 2018
Hi Wojtek,
I went to speak with Kevin to better understand what you're trying to
accomplish. Based on our conversation, you'll want to look at the
pep2genomic() method in the Transcript object.
For these transcripts you're trying to evaluate if the data you've used
to load an Ensembl db is correct, use the translateable_seq() function
to retrieve the sequence as would be translated based on the input data.
Then cycle through the sequence looking for the stop codon characters.
You can then take those positions in the protein and feed them in to
pep2genomic( x, x ) to find the genomic coordinates.
The Ensembl API documentation [1] details the return type for this call,
a series of Coordinate and Gap objects containing the genomic coordinates.
If you have any further questions, or if we didn't properly understand
what you were trying to accomplish, let us know. Thanks.
[1]
http://www.ensembl.org/info/docs/Doxygen/core-api/classBio_1_1EnsEMBL_1_1TranscriptMapper.html#afb66982442190f4eaa711fdee89b0418
On 03/05/18 11:39, Wojtek Bażant wrote:
>
> Hi ensembl-dev,
>
> Sometimes the annotation I try to load up has transcripts that don't
> translate to valid proteins. I then go to look at them in a genome
> viewer to get an idea what's wrong, and it's helpful to know where to
> look.
>
> I've tried to work with the values reported to me by the
> ProteinTranslation healthcheck log, until I realised they're nonsense
> - I think this code is wrong:
>
> https://github.com/Ensembl/ensj-healthcheck/blob/release/92/perl/Bio/EnsEMBL/Healthcheck/Translation.pm#L306
>
> It takes the protein sequence (in the peptide alphabet), looks for
> indexes of '*', adds these to the beginning of transcript start ( in
> the dna alphabet), and claims these to be locations of stop codons.
>
> I currently have no good way of doing this. I have been translating
> the exons in all three phases, saving them to a file, and then text
> searching for bits of the sequence around the *. Does Ensembl offer a
> better way that I couldn't find, or, can you think of one?Thanks,Wojtek
>
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20180504/621ce96b/attachment.html>
More information about the Dev
mailing list