[ensembl-dev] protein coordinates of domains and exons

Leila Alieh alieh.leila at gmail.com
Tue May 19 15:23:00 BST 2015


Thank you very much!!!

Magali, your code has been really helpful, i just modified it to read the
list of transcript IDs from a text file. Here is the version I'm using in
case you want to check (but it seems it's working fine) and someone else
will need it

#!/usr/bin/perl

use strict;
use warnings;
use Bio::EnsEMBL::Registry;

my $registry = 'Bio::EnsEMBL::Registry' ;

$registry->load_registry_from_db(
-host => 'ensembldb.ensembl.org' ,
-user => 'anonymous' ,
-port => '3306'
);

my $transcript_adaptor = $registry->get_adaptor( 'mouse', 'core',
'Transcript');
my $txinput= 'tx_test.txt' ;
open my $TX, $txinput or die $!;
my @data= <$TX> ;
foreach my $line(@data)

{
$line=~s/ //g;
$line=~s/\t//g;
$   line=~s/\n//g;

my $transcript = $transcript_adaptor->fetch_by_stable_id($line);

my $exons = $transcript->get_all_translateable_Exons();
foreach my $exon (@$exons) {
  print "Transcript " . $transcript->stable_id . "\t" ."Exon " .
$exon->stable_id . ":" . $exon->start . "-" . $exon->end. "\t";
  my @pep_coords = $transcript->genomic2pep($exon->start, $exon->end,
$exon->strand);
  foreach my $pep (@pep_coords) {

    print $pep->start() . "-" . $pep->end() . "\n";
  }
}
my $translation = $transcript->translation;

if ($translation) {
  my $pfs = $translation->get_all_ProteinFeatures();

  foreach my $pf (@$pfs) {
    print "Transcript " . $transcript->stable_id ."\t" . "Domain ".
$pf->hseqname . ":" .  $pf->start . "-" . $pf->end . "\n";
  }
}
}
close $TX;

Thanks again!

2015-05-18 16:01 GMT+02:00 mag <mr6 at ebi.ac.uk>:

>  Hi Leila,
>
> For a given transcript, you can access all its exons and its translation
> (when available) with related protein features.
>
> This snippet of code shows how you can display protein coordinates for all
> exons and protein domains for the related translation, starting from a
> given transcript:
>
> my $registry = Bio::EnsEMBL::Registry->load_registry_from_db(
> -host => 'ensembldb.ensembl.org',
> -user => 'anonymous',
> -port => '3306'
> );
>
> my $transcript_adaptor = $registry->get_adaptor('human', 'core',
> 'Transcript');
> my $stable_id = 'ENST00000380152';
> my $transcript = $transcript_adaptor->fetch_by_stable_id($stable_id);
>
> # Only get exons within the coding region
> my $exons = $transcript->get_all_translateable_Exons();
> foreach my $exon (@$exons) {
>   # Print the genomic coordinates for each exon
>   print "Exon " . $exon->stable_id . ":" . $exon->start . "-" .
> $exon->end. "\t";
>   my @pep_coords = $transcript->genomic2pep($exon->start, $exon->end,
> $exon->strand);
>   foreach my $pep (@pep_coords) {
>     # Print the protein coordinates for each exon
>     print $pep->start() . "-" . $pep->end() . "\n";
>   }
> }
>
> my $translation = $transcript->translation;
> # Check if there is a translation
> if ($translation) {
>   my $pfs = $translation->get_all_ProteinFeatures();
>   # Display all protein features
>   foreach my $pf (@$pfs) {
>     print $pf->hseqname . ":" .  $pf->start . "-" . $pf->end . "\n";
>   }
> }
>
>
> If you only have exon coordinates to start with, you will need to create a
> slice for each set of coordinates, then retrieve transcripts overlapping
> that slice and use the process described above.
>
> my $slice_adaptor = $registry->get_adaptor('human', 'core', 'Slice');
> my $slice = $slice_adaptor->fetch_by_region('chromosome', $chromosome,
> $exon_start, $exon_end);
> my $transcripts = $slice->get_all_Transcripts();
>
>
> Hope that helps,
> Magali
>
>
> On 16/05/2015 00:32, Leila Alieh wrote:
>
>    Hi all!
>
>  I have a list of genomic coordinates of exons and I want to transform
> them into protein coordinates of the different protein isoforms these exons
> belong to. Moreover I want to find the protein coordinates of the domains
> of these proteins, and then overlap the 2 sets of information to find exons
> which encode for protein domains. For what I read the (only?) way to do so
> is to use the Perl API of ensembl, and in particular  I should use
> TranscriptMapper and ProteinFeauture, right? I read the the tutorial and
> the documentation but I still find it very difficult to understand the API
> and I don't knowhow to write the code in a way to restrict the query only
> to my list of exons/proteins. Could you please show me some examples? In
> particular I'd like to know what Greg did to find the protein coordinates
> of the protein domains (
> http://lists.ensembl.org/pipermail/dev/2015-April/011013.html).
>
>  Thank you in advance and I apologize if I did some mistake in the thread,
> it's the first time that I'm using the ensembl mailing list.
>
>  P.S. Please, please, please, make the protein coordinates accessible in
> Ensembl gene mart as soon as possible, it would save a lot of work/time
>
>  Thanks again!
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150519/c358d96e/attachment.html>


More information about the Dev mailing list