[ensembl-dev] protein coordinates of domains and exons

Tue Jun 2 19:36:45 BST 2015

Haha! Nice mnemonic! Thanks, I'll try it !
On 2 Jun 2015 20:35, "Jan Vogel" <jan.vogel at gmail.com> wrote:

> Leila,
>
> just run “*cat -vet*” on your file from the command line and you see any
> control and nonprinting characters which might screw your file up…
> my mnemonic for this:  if you have a sick *cat*, take it to the *vet* …
>
> Jan
>
>
> On Jun 2, 2015, at 3:22 AM, Leila Alieh <alieh.leila at gmail.com> wrote:
>
> Thanks for your reply. I did put use utf8 in the code and I think it's not
> a problem of the format of the text file per se but of the copied
> transcript ID list from the cvs of xlxs file, as if i copy an ID from a
> previous text file the code works, wherease if i copy the same ID from the
> csv/xlxs file the code doesn't work. I circumvent the problem by putting my
> list into biomart, selecting only transcript IDs as attribute, and using
> the  output csv file converted into txt. It worked perfectly, but I'd like
> to avoid to do this extrastep everytime I have to analyze some IDs with
> perl. I'll try the editors you suggested.
> Thanks again
>
> 2015-06-02 12:14 GMT+02:00 Kieron Taylor <ktaylor at ebi.ac.uk>:
>
>> Dear Leila,
>>
>> File formatting can be tricky, especially if you’re using UTF-8. The
>> first thing you can try is putting ‘ use utf8;’ in your script, Perl is
>> capable of reading that encoding natively as well as the usual Latin sets
>> that are common defaults.
>>
>> If that doens’t immediately help, then you should consider examining your
>> file in a different text editor. Some editors hide formatting characters or
>> other details from your sight. Good ones for seeing hidden characters are
>> TextWrangler, Emacs and Vi (in no particular order). There are many more
>> too, but you’ll have to experiment to find what is wrong with your input
>> file.
>>
>> Regards,
>>
>> Kieron
>>
>>
>> Kieron Taylor PhD.
>> Ensembl Core senior software developer
>>
>> EMBL, European Bioinformatics Institute
>>
>>
>>
>>
>>
>> > On 2 Jun 2015, at 11:02, Leila Alieh <alieh.leila at gmail.com> wrote:
>> >
>> > Hi!
>> >
>> > I'm having a stupid problem with my input file. I made a list of
>> transcripts IDs in a csv file, open it, copied the list and pasted in a
>> text file, UTF-8 in plain text, one transcript ID per  line, without any
>> comma or quotes. The code is getting me an error
>> >
>> > Can't call method "get_all_translateable_Exons" on an undefined value
>> at ./transdom_RR.pl line 53, <$TX> line 1.
>> >
>> > The same code is running on a previous text file with transcript IDs
>> that I used for trial. If I copy and paste one of these transcript from the
>> "old" file to the "new" one the code is running until the first "new"
>> transcript ID. The transcript IDs that I used for trial are present also in
>> my new csv list and if I copy and paste them from the csv to a new text
>> file the code doesn't work. So I think the problem is somehow in the format
>> of the transcript IDs in the excel file, I tried to convert the csv file
>> into xlxs and also to change the format in general and in text, but it
>> didn't work.
>> > Do you have any suggestion? How should I prepare the txt input file
>> for  the perl code?
>> >
>> > thanks!
>> >
>> >
>> > 2015-05-19 16:23 GMT+02:00 Leila Alieh <alieh.leila at gmail.com>:
>> > Thank you very much!!!
>> >
>> > Magali, your code has been really helpful, i just modified it to read
>> the list of transcript IDs from a text file. Here is the version I'm using
>> in case you want to check (but it seems it's working fine) and someone else
>> will need it
>> >
>> > #!/usr/bin/perl
>> >
>> > use strict;
>> > use warnings;
>> > use Bio::EnsEMBL::Registry;
>> >
>> > my $registry = 'Bio::EnsEMBL::Registry' ;
>> >
>> > $registry->load_registry_from_db(
>> > -host => 'ensembldb.ensembl.org' ,
>> > -user => 'anonymous' ,
>> > -port => '3306'
>> > );
>> >
>> > my $transcript_adaptor = $registry->get_adaptor( 'mouse', 'core',
>> 'Transcript');
>> > my $txinput= 'tx_test.txt' ;
>> > open my $TX, $txinput or die $!;
>> > my @data= <$TX> ;
>> > foreach my $line(@data)
>> >
>> > {
>> > $line=~s/ //g;
>> > $line=~s/\t//g;
>> > $   line=~s/\n//g;
>> >
>> > my $transcript = $transcript_adaptor->fetch_by_stable_id($line);
>> >
>> > my $exons = $transcript->get_all_translateable_Exons();
>> > foreach my $exon (@$exons) {
>> >   print "Transcript " . $transcript->stable_id . "\t" ."Exon " .
>> $exon->stable_id . ":" . $exon->start . "-" . $exon->end. "\t";
>> >   my @pep_coords = $transcript->genomic2pep($exon->start, $exon->end,
>> $exon->strand);
>> >   foreach my $pep (@pep_coords) {
>> >
>> >     print $pep->start() . "-" . $pep->end() . "\n";
>> >   }
>> > }
>> > my $translation = $transcript->translation;
>> >
>> > if ($translation) {
>> >   my $pfs = $translation->get_all_ProteinFeatures();
>> >
>> >   foreach my $pf (@$pfs) {
>> >     print "Transcript " . $transcript->stable_id ."\t" . "Domain ".
>> $pf->hseqname . ":" .  $pf->start . "-" . $pf->end . "\n";
>> >   }
>> > }
>> > }
>> > close $TX;
>> >
>> > Thanks again!
>> >
>> > 2015-05-18 16:01 GMT+02:00 mag <mr6 at ebi.ac.uk>:
>> > Hi Leila,
>> >
>> > For a given transcript, you can access all its exons and its
>> translation (when available) with related protein features.
>> >
>> > This snippet of code shows how you can display protein coordinates for
>> all exons and protein domains for the related translation, starting from a
>> given transcript:
>> >
>> > my $registry = Bio::EnsEMBL::Registry->load_registry_from_db(
>> > -host => 'ensembldb.ensembl.org',
>> > -user => 'anonymous',
>> > -port => '3306'
>> > );
>> >
>> > my $transcript_adaptor = $registry->get_adaptor('human', 'core',
>> 'Transcript');
>> > my $stable_id = 'ENST00000380152';
>> > my $transcript = $transcript_adaptor->fetch_by_stable_id($stable_id);
>> >
>> > # Only get exons within the coding region
>> > my $exons = $transcript->get_all_translateable_Exons();
>> > foreach my $exon (@$exons) {
>> >   # Print the genomic coordinates for each exon
>> >   print "Exon " . $exon->stable_id . ":" . $exon->start . "-" .
>> $exon->end. "\t";
>> >   my @pep_coords = $transcript->genomic2pep($exon->start, $exon->end,
>> $exon->strand);
>> >   foreach my $pep (@pep_coords) {
>> >     # Print the protein coordinates for each exon
>> >     print $pep->start() . "-" . $pep->end() . "\n";
>> >   }
>> > }
>> >
>> > my $translation = $transcript->translation;
>> > # Check if there is a translation
>> > if ($translation) {
>> >   my $pfs = $translation->get_all_ProteinFeatures();
>> >   # Display all protein features
>> >   foreach my $pf (@$pfs) {
>> >     print $pf->hseqname . ":" .  $pf->start . "-" . $pf->end . "\n";
>> >   }
>> > }
>> >
>> >
>> > If you only have exon coordinates to start with, you will need to
>> create a slice for each set of coordinates, then retrieve transcripts
>> overlapping that slice and use the process described above.
>> >
>> > my $slice_adaptor = $registry->get_adaptor('human', 'core', 'Slice');
>> > my $slice = $slice_adaptor->fetch_by_region('chromosome', $chromosome,
>> $exon_start, $exon_end);
>> > my $transcripts = $slice->get_all_Transcripts();
>> >
>> >
>> > Hope that helps,
>> > Magali
>> >
>> >
>> > On 16/05/2015 00:32, Leila Alieh wrote:
>> >> Hi all!
>> >>
>> >> I have a list of genomic coordinates of exons and I want to transform
>> them into protein coordinates of the different protein isoforms these exons
>> belong to. Moreover I want to find the protein coordinates of the domains
>> of these proteins, and then overlap the 2 sets of information to find exons
>> which encode for protein domains. For what I read the (only?) way to do so
>> is to use the Perl API of ensembl, and in particular  I should use
>> TranscriptMapper and ProteinFeauture, right? I read the the tutorial and
>> the documentation but I still find it very difficult to understand the API
>> and I don't knowhow to write the code in a way to restrict the query only
>> to my list of exons/proteins. Could you please show me some examples? In
>> particular I'd like to know what Greg did to find the protein coordinates
>> of the protein domains (
>> http://lists.ensembl.org/pipermail/dev/2015-April/011013.html).
>> >>
>> >> Thank you in advance and I apologize if I did some mistake in the
>> thread, it's the first time that I'm using the ensembl mailing list.
>> >>
>> >> P.S. Please, please, please, make the protein coordinates accessible
>> in Ensembl gene mart as soon as possible, it would save a lot of work/time
>> >>
>> >> Thanks again!
>> >>
>> >>
>> >> _______________________________________________
>> >> Dev mailing list
>> >> Dev at ensembl.org
>> >>
>> >> Posting guidelines and subscribe/unsubscribe info:
>> >> http://lists.ensembl.org/mailman/listinfo/dev
>> >>
>> >> Ensembl Blog:
>> >> http://www.ensembl.info/
>> >
>> >
>> > _______________________________________________
>> > Dev mailing list    Dev at ensembl.org
>> > Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> > Ensembl Blog: http://www.ensembl.info/
>> >
>> >
>> >
>> > _______________________________________________
>> > Dev mailing list    Dev at ensembl.org
>> > Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> > Ensembl Blog: http://www.ensembl.info/
>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150602/f59b0be5/attachment.html>