[ensembl-dev] protein coordinates of domains and exons

Tue Jun 2 11:22:45 BST 2015

Thanks for your reply. I did put use utf8 in the code and I think it's not
a problem of the format of the text file per se but of the copied
transcript ID list from the cvs of xlxs file, as if i copy an ID from a
previous text file the code works, wherease if i copy the same ID from the
csv/xlxs file the code doesn't work. I circumvent the problem by putting my
list into biomart, selecting only transcript IDs as attribute, and using
the  output csv file converted into txt. It worked perfectly, but I'd like
to avoid to do this extrastep everytime I have to analyze some IDs with
perl. I'll try the editors you suggested.
Thanks again

2015-06-02 12:14 GMT+02:00 Kieron Taylor <ktaylor at ebi.ac.uk>:

> Dear Leila,
>
> File formatting can be tricky, especially if you’re using UTF-8. The first
> thing you can try is putting ‘ use utf8;’ in your script, Perl is capable
> of reading that encoding natively as well as the usual Latin sets that are
> common defaults.
>
> If that doens’t immediately help, then you should consider examining your
> file in a different text editor. Some editors hide formatting characters or
> other details from your sight. Good ones for seeing hidden characters are
> TextWrangler, Emacs and Vi (in no particular order). There are many more
> too, but you’ll have to experiment to find what is wrong with your input
> file.
>
> Regards,
>
> Kieron
>
>
> Kieron Taylor PhD.
> Ensembl Core senior software developer
>
> EMBL, European Bioinformatics Institute
>
>
>
>
>
> > On 2 Jun 2015, at 11:02, Leila Alieh <alieh.leila at gmail.com> wrote:
> >
> > Hi!
> >
> > I'm having a stupid problem with my input file. I made a list of
> transcripts IDs in a csv file, open it, copied the list and pasted in a
> text file, UTF-8 in plain text, one transcript ID per  line, without any
> comma or quotes. The code is getting me an error
> >
> > Can't call method "get_all_translateable_Exons" on an undefined value at
> ./transdom_RR.pl line 53, <$TX> line 1.
> >
> > The same code is running on a previous text file with transcript IDs
> that I used for trial. If I copy and paste one of these transcript from the
> "old" file to the "new" one the code is running until the first "new"
> transcript ID. The transcript IDs that I used for trial are present also in
> my new csv list and if I copy and paste them from the csv to a new text
> file the code doesn't work. So I think the problem is somehow in the format
> of the transcript IDs in the excel file, I tried to convert the csv file
> into xlxs and also to change the format in general and in text, but it
> didn't work.
> > Do you have any suggestion? How should I prepare the txt input file for
> the perl code?
> >
> > thanks!
> >
> >
> > 2015-05-19 16:23 GMT+02:00 Leila Alieh <alieh.leila at gmail.com>:
> > Thank you very much!!!
> >
> > Magali, your code has been really helpful, i just modified it to read
> the list of transcript IDs from a text file. Here is the version I'm using
> in case you want to check (but it seems it's working fine) and someone else
> will need it
> >
> > #!/usr/bin/perl
> >
> > use strict;
> > use warnings;
> > use Bio::EnsEMBL::Registry;
> >
> > my $registry = 'Bio::EnsEMBL::Registry' ;
> >
> > $registry->load_registry_from_db(
> > -host => 'ensembldb.ensembl.org' ,
> > -user => 'anonymous' ,
> > -port => '3306'
> > );
> >
> > my $transcript_adaptor = $registry->get_adaptor( 'mouse', 'core',
> 'Transcript');
> > my $txinput= 'tx_test.txt' ;
> > open my $TX, $txinput or die $!;
> > my @data= <$TX> ;
> > foreach my $line(@data)
> >
> > {
> > $line=~s/ //g;
> > $line=~s/\t//g;
> > $   line=~s/\n//g;
> >
> > my $transcript = $transcript_adaptor->fetch_by_stable_id($line);
> >
> > my $exons = $transcript->get_all_translateable_Exons();
> > foreach my $exon (@$exons) {
> >   print "Transcript " . $transcript->stable_id . "\t" ."Exon " .
> $exon->stable_id . ":" . $exon->start . "-" . $exon->end. "\t";
> >   my @pep_coords = $transcript->genomic2pep($exon->start, $exon->end,
> $exon->strand);
> >   foreach my $pep (@pep_coords) {
> >
> >     print $pep->start() . "-" . $pep->end() . "\n";
> >   }
> > }
> > my $translation = $transcript->translation;
> >
> > if ($translation) {
> >   my $pfs = $translation->get_all_ProteinFeatures();
> >
> >   foreach my $pf (@$pfs) {
> >     print "Transcript " . $transcript->stable_id ."\t" . "Domain ".
> $pf->hseqname . ":" .  $pf->start . "-" . $pf->end . "\n";
> >   }
> > }
> > }
> > close $TX;
> >
> > Thanks again!
> >
> > 2015-05-18 16:01 GMT+02:00 mag <mr6 at ebi.ac.uk>:
> > Hi Leila,
> >
> > For a given transcript, you can access all its exons and its translation
> (when available) with related protein features.
> >
> > This snippet of code shows how you can display protein coordinates for
> all exons and protein domains for the related translation, starting from a
> given transcript:
> >
> > my $registry = Bio::EnsEMBL::Registry->load_registry_from_db(
> > -host => 'ensembldb.ensembl.org',
> > -user => 'anonymous',
> > -port => '3306'
> > );
> >
> > my $transcript_adaptor = $registry->get_adaptor('human', 'core',
> 'Transcript');
> > my $stable_id = 'ENST00000380152';
> > my $transcript = $transcript_adaptor->fetch_by_stable_id($stable_id);
> >
> > # Only get exons within the coding region
> > my $exons = $transcript->get_all_translateable_Exons();
> > foreach my $exon (@$exons) {
> >   # Print the genomic coordinates for each exon
> >   print "Exon " . $exon->stable_id . ":" . $exon->start . "-" .
> $exon->end. "\t";
> >   my @pep_coords = $transcript->genomic2pep($exon->start, $exon->end,
> $exon->strand);
> >   foreach my $pep (@pep_coords) {
> >     # Print the protein coordinates for each exon
> >     print $pep->start() . "-" . $pep->end() . "\n";
> >   }
> > }
> >
> > my $translation = $transcript->translation;
> > # Check if there is a translation
> > if ($translation) {
> >   my $pfs = $translation->get_all_ProteinFeatures();
> >   # Display all protein features
> >   foreach my $pf (@$pfs) {
> >     print $pf->hseqname . ":" .  $pf->start . "-" . $pf->end . "\n";
> >   }
> > }
> >
> >
> > If you only have exon coordinates to start with, you will need to create
> a slice for each set of coordinates, then retrieve transcripts overlapping
> that slice and use the process described above.
> >
> > my $slice_adaptor = $registry->get_adaptor('human', 'core', 'Slice');
> > my $slice = $slice_adaptor->fetch_by_region('chromosome', $chromosome,
> $exon_start, $exon_end);
> > my $transcripts = $slice->get_all_Transcripts();
> >
> >
> > Hope that helps,
> > Magali
> >
> >
> > On 16/05/2015 00:32, Leila Alieh wrote:
> >> Hi all!
> >>
> >> I have a list of genomic coordinates of exons and I want to transform
> them into protein coordinates of the different protein isoforms these exons
> belong to. Moreover I want to find the protein coordinates of the domains
> of these proteins, and then overlap the 2 sets of information to find exons
> which encode for protein domains. For what I read the (only?) way to do so
> is to use the Perl API of ensembl, and in particular  I should use
> TranscriptMapper and ProteinFeauture, right? I read the the tutorial and
> the documentation but I still find it very difficult to understand the API
> and I don't knowhow to write the code in a way to restrict the query only
> to my list of exons/proteins. Could you please show me some examples? In
> particular I'd like to know what Greg did to find the protein coordinates
> of the protein domains (
> http://lists.ensembl.org/pipermail/dev/2015-April/011013.html).
> >>
> >> Thank you in advance and I apologize if I did some mistake in the
> thread, it's the first time that I'm using the ensembl mailing list.
> >>
> >> P.S. Please, please, please, make the protein coordinates accessible in
> Ensembl gene mart as soon as possible, it would save a lot of work/time
> >>
> >> Thanks again!
> >>
> >>
> >> _______________________________________________
> >> Dev mailing list
> >> Dev at ensembl.org
> >>
> >> Posting guidelines and subscribe/unsubscribe info:
> >> http://lists.ensembl.org/mailman/listinfo/dev
> >>
> >> Ensembl Blog:
> >> http://www.ensembl.info/
> >
> >
> > _______________________________________________
> > Dev mailing list    Dev at ensembl.org
> > Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> > Ensembl Blog: http://www.ensembl.info/
> >
> >
> >
> > _______________________________________________
> > Dev mailing list    Dev at ensembl.org
> > Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> > Ensembl Blog: http://www.ensembl.info/
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150602/38c8cb3b/attachment.html>