[ensembl-dev] protein coordinates of domains and exons
Leila Alieh
alieh.leila at gmail.com
Tue May 19 15:23:00 BST 2015
Thank you very much!!!
Magali, your code has been really helpful, i just modified it to read the
list of transcript IDs from a text file. Here is the version I'm using in
case you want to check (but it seems it's working fine) and someone else
will need it
#!/usr/bin/perl
use strict;
use warnings;
use Bio::EnsEMBL::Registry;
my $registry = 'Bio::EnsEMBL::Registry' ;
$registry->load_registry_from_db(
-host => 'ensembldb.ensembl.org' ,
-user => 'anonymous' ,
-port => '3306'
);
my $transcript_adaptor = $registry->get_adaptor( 'mouse', 'core',
'Transcript');
my $txinput= 'tx_test.txt' ;
open my $TX, $txinput or die $!;
my @data= <$TX> ;
foreach my $line(@data)
{
$line=~s/ //g;
$line=~s/\t//g;
$ line=~s/\n//g;
my $transcript = $transcript_adaptor->fetch_by_stable_id($line);
my $exons = $transcript->get_all_translateable_Exons();
foreach my $exon (@$exons) {
print "Transcript " . $transcript->stable_id . "\t" ."Exon " .
$exon->stable_id . ":" . $exon->start . "-" . $exon->end. "\t";
my @pep_coords = $transcript->genomic2pep($exon->start, $exon->end,
$exon->strand);
foreach my $pep (@pep_coords) {
print $pep->start() . "-" . $pep->end() . "\n";
}
}
my $translation = $transcript->translation;
if ($translation) {
my $pfs = $translation->get_all_ProteinFeatures();
foreach my $pf (@$pfs) {
print "Transcript " . $transcript->stable_id ."\t" . "Domain ".
$pf->hseqname . ":" . $pf->start . "-" . $pf->end . "\n";
}
}
}
close $TX;
Thanks again!
2015-05-18 16:01 GMT+02:00 mag <mr6 at ebi.ac.uk>:
> Hi Leila,
>
> For a given transcript, you can access all its exons and its translation
> (when available) with related protein features.
>
> This snippet of code shows how you can display protein coordinates for all
> exons and protein domains for the related translation, starting from a
> given transcript:
>
> my $registry = Bio::EnsEMBL::Registry->load_registry_from_db(
> -host => 'ensembldb.ensembl.org',
> -user => 'anonymous',
> -port => '3306'
> );
>
> my $transcript_adaptor = $registry->get_adaptor('human', 'core',
> 'Transcript');
> my $stable_id = 'ENST00000380152';
> my $transcript = $transcript_adaptor->fetch_by_stable_id($stable_id);
>
> # Only get exons within the coding region
> my $exons = $transcript->get_all_translateable_Exons();
> foreach my $exon (@$exons) {
> # Print the genomic coordinates for each exon
> print "Exon " . $exon->stable_id . ":" . $exon->start . "-" .
> $exon->end. "\t";
> my @pep_coords = $transcript->genomic2pep($exon->start, $exon->end,
> $exon->strand);
> foreach my $pep (@pep_coords) {
> # Print the protein coordinates for each exon
> print $pep->start() . "-" . $pep->end() . "\n";
> }
> }
>
> my $translation = $transcript->translation;
> # Check if there is a translation
> if ($translation) {
> my $pfs = $translation->get_all_ProteinFeatures();
> # Display all protein features
> foreach my $pf (@$pfs) {
> print $pf->hseqname . ":" . $pf->start . "-" . $pf->end . "\n";
> }
> }
>
>
> If you only have exon coordinates to start with, you will need to create a
> slice for each set of coordinates, then retrieve transcripts overlapping
> that slice and use the process described above.
>
> my $slice_adaptor = $registry->get_adaptor('human', 'core', 'Slice');
> my $slice = $slice_adaptor->fetch_by_region('chromosome', $chromosome,
> $exon_start, $exon_end);
> my $transcripts = $slice->get_all_Transcripts();
>
>
> Hope that helps,
> Magali
>
>
> On 16/05/2015 00:32, Leila Alieh wrote:
>
> Hi all!
>
> I have a list of genomic coordinates of exons and I want to transform
> them into protein coordinates of the different protein isoforms these exons
> belong to. Moreover I want to find the protein coordinates of the domains
> of these proteins, and then overlap the 2 sets of information to find exons
> which encode for protein domains. For what I read the (only?) way to do so
> is to use the Perl API of ensembl, and in particular I should use
> TranscriptMapper and ProteinFeauture, right? I read the the tutorial and
> the documentation but I still find it very difficult to understand the API
> and I don't knowhow to write the code in a way to restrict the query only
> to my list of exons/proteins. Could you please show me some examples? In
> particular I'd like to know what Greg did to find the protein coordinates
> of the protein domains (
> http://lists.ensembl.org/pipermail/dev/2015-April/011013.html).
>
> Thank you in advance and I apologize if I did some mistake in the thread,
> it's the first time that I'm using the ensembl mailing list.
>
> P.S. Please, please, please, make the protein coordinates accessible in
> Ensembl gene mart as soon as possible, it would save a lot of work/time
>
> Thanks again!
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150519/c358d96e/attachment.html>
More information about the Dev
mailing list