[ensembl-dev] protein coordinates of domains and exons

Leila Alieh alieh.leila at gmail.com
Tue Jun 2 11:02:16 BST 2015


Hi!

I'm having a stupid problem with my input file. I made a list of
transcripts IDs in a csv file, open it, copied the list and pasted in a
text file, UTF-8 in plain text, one transcript ID per  line, without any
comma or quotes. The code is getting me an error

Can't call method "get_all_translateable_Exons" on an undefined value at
./transdom_RR.pl line 53, <$TX> line 1.

The same code is running on a previous text file with transcript IDs that I
used for trial. If I copy and paste one of these transcript from the "old"
file to the "new" one the code is running until the first "new" transcript
ID. The transcript IDs that I used for trial are present also in my new csv
list and if I copy and paste them from the csv to a new text file the code
doesn't work. So I think the problem is somehow in the format of the
transcript IDs in the excel file, I tried to convert the csv file into xlxs
and also to change the format in general and in text, but it didn't work.
Do you have any suggestion? How should I prepare the txt input file for
the perl code?

thanks!


2015-05-19 16:23 GMT+02:00 Leila Alieh <alieh.leila at gmail.com>:

> Thank you very much!!!
>
> Magali, your code has been really helpful, i just modified it to read the
> list of transcript IDs from a text file. Here is the version I'm using in
> case you want to check (but it seems it's working fine) and someone else
> will need it
>
> #!/usr/bin/perl
>
> use strict;
> use warnings;
> use Bio::EnsEMBL::Registry;
>
> my $registry = 'Bio::EnsEMBL::Registry' ;
>
> $registry->load_registry_from_db(
> -host => 'ensembldb.ensembl.org' ,
> -user => 'anonymous' ,
> -port => '3306'
> );
>
> my $transcript_adaptor = $registry->get_adaptor( 'mouse', 'core',
> 'Transcript');
> my $txinput= 'tx_test.txt' ;
> open my $TX, $txinput or die $!;
> my @data= <$TX> ;
> foreach my $line(@data)
>
> {
> $line=~s/ //g;
> $line=~s/\t//g;
> $   line=~s/\n//g;
>
> my $transcript = $transcript_adaptor->fetch_by_stable_id($line);
>
> my $exons = $transcript->get_all_translateable_Exons();
> foreach my $exon (@$exons) {
>   print "Transcript " . $transcript->stable_id . "\t" ."Exon " .
> $exon->stable_id . ":" . $exon->start . "-" . $exon->end. "\t";
>   my @pep_coords = $transcript->genomic2pep($exon->start, $exon->end,
> $exon->strand);
>   foreach my $pep (@pep_coords) {
>
>     print $pep->start() . "-" . $pep->end() . "\n";
>   }
> }
> my $translation = $transcript->translation;
>
> if ($translation) {
>   my $pfs = $translation->get_all_ProteinFeatures();
>
>   foreach my $pf (@$pfs) {
>     print "Transcript " . $transcript->stable_id ."\t" . "Domain ".
> $pf->hseqname . ":" .  $pf->start . "-" . $pf->end . "\n";
>   }
> }
> }
> close $TX;
>
> Thanks again!
>
> 2015-05-18 16:01 GMT+02:00 mag <mr6 at ebi.ac.uk>:
>
>>  Hi Leila,
>>
>> For a given transcript, you can access all its exons and its translation
>> (when available) with related protein features.
>>
>> This snippet of code shows how you can display protein coordinates for
>> all exons and protein domains for the related translation, starting from a
>> given transcript:
>>
>> my $registry = Bio::EnsEMBL::Registry->load_registry_from_db(
>> -host => 'ensembldb.ensembl.org',
>> -user => 'anonymous',
>> -port => '3306'
>> );
>>
>> my $transcript_adaptor = $registry->get_adaptor('human', 'core',
>> 'Transcript');
>> my $stable_id = 'ENST00000380152';
>> my $transcript = $transcript_adaptor->fetch_by_stable_id($stable_id);
>>
>> # Only get exons within the coding region
>> my $exons = $transcript->get_all_translateable_Exons();
>> foreach my $exon (@$exons) {
>>   # Print the genomic coordinates for each exon
>>   print "Exon " . $exon->stable_id . ":" . $exon->start . "-" .
>> $exon->end. "\t";
>>   my @pep_coords = $transcript->genomic2pep($exon->start, $exon->end,
>> $exon->strand);
>>   foreach my $pep (@pep_coords) {
>>     # Print the protein coordinates for each exon
>>     print $pep->start() . "-" . $pep->end() . "\n";
>>   }
>> }
>>
>> my $translation = $transcript->translation;
>> # Check if there is a translation
>> if ($translation) {
>>   my $pfs = $translation->get_all_ProteinFeatures();
>>   # Display all protein features
>>   foreach my $pf (@$pfs) {
>>     print $pf->hseqname . ":" .  $pf->start . "-" . $pf->end . "\n";
>>   }
>> }
>>
>>
>> If you only have exon coordinates to start with, you will need to create
>> a slice for each set of coordinates, then retrieve transcripts overlapping
>> that slice and use the process described above.
>>
>> my $slice_adaptor = $registry->get_adaptor('human', 'core', 'Slice');
>> my $slice = $slice_adaptor->fetch_by_region('chromosome', $chromosome,
>> $exon_start, $exon_end);
>> my $transcripts = $slice->get_all_Transcripts();
>>
>>
>> Hope that helps,
>> Magali
>>
>>
>> On 16/05/2015 00:32, Leila Alieh wrote:
>>
>>    Hi all!
>>
>>  I have a list of genomic coordinates of exons and I want to transform
>> them into protein coordinates of the different protein isoforms these exons
>> belong to. Moreover I want to find the protein coordinates of the domains
>> of these proteins, and then overlap the 2 sets of information to find exons
>> which encode for protein domains. For what I read the (only?) way to do so
>> is to use the Perl API of ensembl, and in particular  I should use
>> TranscriptMapper and ProteinFeauture, right? I read the the tutorial and
>> the documentation but I still find it very difficult to understand the API
>> and I don't knowhow to write the code in a way to restrict the query only
>> to my list of exons/proteins. Could you please show me some examples? In
>> particular I'd like to know what Greg did to find the protein coordinates
>> of the protein domains (
>> http://lists.ensembl.org/pipermail/dev/2015-April/011013.html).
>>
>>  Thank you in advance and I apologize if I did some mistake in the
>> thread, it's the first time that I'm using the ensembl mailing list.
>>
>>  P.S. Please, please, please, make the protein coordinates accessible in
>> Ensembl gene mart as soon as possible, it would save a lot of work/time
>>
>>  Thanks again!
>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150602/c2f56063/attachment.html>


More information about the Dev mailing list