[ensembl-dev] protein coordinates of domains and exons

Mon May 18 15:01:27 BST 2015

Hi Leila,

For a given transcript, you can access all its exons and its translation 
(when available) with related protein features.

This snippet of code shows how you can display protein coordinates for 
all exons and protein domains for the related translation, starting from 
a given transcript:

my $registry = Bio::EnsEMBL::Registry->load_registry_from_db(
-host => 'ensembldb.ensembl.org',
-user => 'anonymous',
-port => '3306'
);

my $transcript_adaptor = $registry->get_adaptor('human', 'core', 
'Transcript');
my $stable_id = 'ENST00000380152';
my $transcript = $transcript_adaptor->fetch_by_stable_id($stable_id);

# Only get exons within the coding region
my $exons = $transcript->get_all_translateable_Exons();
foreach my $exon (@$exons) {
   # Print the genomic coordinates for each exon
   print "Exon " . $exon->stable_id . ":" . $exon->start . "-" . 
$exon->end. "\t";
   my @pep_coords = $transcript->genomic2pep($exon->start, $exon->end, 
$exon->strand);
   foreach my $pep (@pep_coords) {
     # Print the protein coordinates for each exon
     print $pep->start() . "-" . $pep->end() . "\n";
   }
}

my $translation = $transcript->translation;
# Check if there is a translation
if ($translation) {
   my $pfs = $translation->get_all_ProteinFeatures();
   # Display all protein features
   foreach my $pf (@$pfs) {
     print $pf->hseqname . ":" .  $pf->start . "-" . $pf->end . "\n";
   }
}

If you only have exon coordinates to start with, you will need to create 
a slice for each set of coordinates, then retrieve transcripts 
overlapping that slice and use the process described above.

my $slice_adaptor = $registry->get_adaptor('human', 'core', 'Slice');
my $slice = $slice_adaptor->fetch_by_region('chromosome', $chromosome, 
$exon_start, $exon_end);
my $transcripts = $slice->get_all_Transcripts();

Hope that helps,
Magali

On 16/05/2015 00:32, Leila Alieh wrote:
> Hi all!
>
> I have a list of genomic coordinates of exons and I want to transform 
> them into protein coordinates of the different protein isoforms these 
> exons belong to. Moreover I want to find the protein coordinates of 
> the domains of these proteins, and then overlap the 2 sets of 
> information to find exons which encode for protein domains. For what I 
> read the (only?) way to do so is to use the Perl API of ensembl, and 
> in particular  I should use TranscriptMapper and ProteinFeauture, 
> right? I read the the tutorial and the documentation but I still find 
> it very difficult to understand the API and I don't knowhow to write 
> the code in a way to restrict the query only to my list of 
> exons/proteins. Could you please show me some examples? In particular 
> I'd like to know what Greg did to find the protein coordinates of the 
> protein domains 
> (http://lists.ensembl.org/pipermail/dev/2015-April/011013.html).
>
> Thank you in advance and I apologize if I did some mistake in the 
> thread, it's the first time that I'm using the ensembl mailing list.
>
> P.S. Please, please, please, make the protein coordinates accessible 
> in Ensembl gene mart as soon as possible, it would save a lot of work/time
>
> Thanks again!
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150518/880e2f58/attachment.html>