[ensembl-dev] Get genomic location for translatable cdna seq

Magali mr6 at ebi.ac.uk
Fri Feb 8 09:59:03 GMT 2013


Hi Abhishek,

As Kieron mentioned, exon objects in ensembl have a
coding_region_start() method.

If, for the exons, you replace the feature->start() and feature->end()
methods with feature->coding_region_start() and
feature->coding_region_end(), you will get only the coding parts for
each exon.
If the entire exon is non-coding, it will return undefined.


Hope that helps,
Magali

On 07/02/13 17:30, Abhishek Niroula wrote:
> Thanks Kieron.
> I have pasted my code in here which I used to extract the exon
> information. ensembl_gene_transcript_id.txt file contains Ensembl gene
> and ensembl transcript. For each transcript, I want to obtain
> corresponding genome co-ordinate for each amino acid position. I am
> not pretty sure if somebody has already done this.
>
>
> #!/usr/bin/perl
>
> use strict;
> use warnings;
> use Bio::EnsEMBL::Registry;
> use Data::Dumper;
>
> sub feature2string
> {
>     my $feature = shift;
>
>     my $stable_id  = $feature->stable_id();
>     my $seq_region = $feature->slice->seq_region_name();
>     my $start      = $feature->start();
>     my $end        = $feature->end();
>     my $strand     = $feature->strand();
>
>     return sprintf( "%s: %s:%d-%d (%+d)",
>         $stable_id, $seq_region, $start, $end, $strand );
> }
>
> my $registry = "Bio::EnsEMBL::Registry";
> ## Load the databases into the registry
> $registry->load_registry_from_db( -host =>'ensembldb.ensembl.org
> <http://ensembldb.ensembl.org>', -user => 'anonymous' );
>
> my $gene_adaptor  = $registry->get_adaptor( 'Human', 'Core',
> 'Transcript' );
>
> open MYFILE, "<ensembl_gene_transcript_id.txt" or die $!;
> my @lines = <MYFILE>;
> close (MYFILE);
> foreach my $line (@lines){
>     print $line;
>     my $substring=substr($line,0,-1);
>     my @ids=split(/\|/,$substring);
>     my $transcript=$ids[1];
>     my $gene=$ids[0];
> ### Open a file for each gene to write the exons
>     open (CDS, ">".$gene."_exon.txt") or die "open: $!";
>
> ### Now fetch all the exons for the transcript
>     my $geneobj=$gene_adaptor->fetch_by_stable_id($ids[1]);
>     my $cdsseq=$geneobj->translateable_seq();
>     open(CDSSEQ, ">".$gene.".fa") or die "open: $!";
>     print CDSSEQ ">".$gene."\n".$cdsseq."\n";
>     close (CDSSEQ);
>     my $exons=$geneobj->get_all_Exons();
> ## Just to print the exons loop across the array
>     foreach my $exon ( @{ $exons } ) {
>         my $exon_info= feature2string($exon);
>         print CDS "".$exon_info."\n";
>     }
>     close (CDS);
> }
>
>
>
>
> On Thu, Feb 7, 2013 at 6:13 PM, Kieron Taylor <ktaylor at ebi.ac.uk
> <mailto:ktaylor at ebi.ac.uk>> wrote:
>
>     Hi Abishek,
>
>     We need you to provide more specifics in order to determine what
>     the difficulty is.
>
>     If you have Ensembl Exon objects, their coding_region_start() will
>     inform you if the Exon does not code.
>
>     We can be of more assistance if you can tell us more or provide
>     code samples. There are several ways to approach the task and we
>     wouldn't want to recommend the most difficult for you!
>
>     Regards,
>
>     -- 
>     Kieron Taylor PhD.
>     Ensembl Core team
>     EBI
>
>
>
>     On 30/01/2013 15:15, Abhishek Niroula wrote:
>
>         Hello,
>
>         I am trying to get genomic co-ordinates for translatable
>         portion of some
>         human cdna sequences. I could succesfully extract the
>         transcript start
>         and end coordinates and also coordinates for each exon in a
>         transcript.
>         But, all the exons in a protein may not be translatable. I am
>         stuck at
>         this point.
>         My goal is to check if a given genomic co-ordinate in a
>         chromosome is
>         located in protein coding (translatable region) of the chromosome.
>
>         Thanks for your effort in advance.
>
>         --
>         Best Reagrds,
>         Abhishek Niroula
>
>
>         _______________________________________________
>         Dev mailing list    Dev at ensembl.org <mailto:Dev at ensembl.org>
>         Posting guidelines and subscribe/unsubscribe info:
>         http://lists.ensembl.org/mailman/listinfo/dev
>         Ensembl Blog: http://www.ensembl.info/
>
>
>
>
>     _______________________________________________
>     Dev mailing list    Dev at ensembl.org <mailto:Dev at ensembl.org>
>     Posting guidelines and subscribe/unsubscribe info:
>     http://lists.ensembl.org/mailman/listinfo/dev
>     Ensembl Blog: http://www.ensembl.info/
>
>
>
>
> -- 
> Best Reagrds,
> Abhishek Niroula
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130208/bac10b1b/attachment.html>


More information about the Dev mailing list