[ensembl-dev] Get genomic location for translatable cdna seq

Abhishek Niroula abhishekniroula7 at gmail.com
Thu Feb 7 17:30:34 GMT 2013


Thanks Kieron.
I have pasted my code in here which I used to extract the exon information.
ensembl_gene_transcript_id.txt file contains Ensembl gene and ensembl
transcript. For each transcript, I want to obtain corresponding genome
co-ordinate for each amino acid position. I am not pretty sure if somebody
has already done this.


#!/usr/bin/perl

use strict;
use warnings;
use Bio::EnsEMBL::Registry;
use Data::Dumper;

sub feature2string
{
    my $feature = shift;

    my $stable_id  = $feature->stable_id();
    my $seq_region = $feature->slice->seq_region_name();
    my $start      = $feature->start();
    my $end        = $feature->end();
    my $strand     = $feature->strand();

    return sprintf( "%s: %s:%d-%d (%+d)",
        $stable_id, $seq_region, $start, $end, $strand );
}

my $registry = "Bio::EnsEMBL::Registry";
## Load the databases into the registry
$registry->load_registry_from_db( -host =>'ensembldb.ensembl.org', -user =>
'anonymous' );

my $gene_adaptor  = $registry->get_adaptor( 'Human', 'Core', 'Transcript' );

open MYFILE, "<ensembl_gene_transcript_id.txt" or die $!;
my @lines = <MYFILE>;
close (MYFILE);
foreach my $line (@lines){
    print $line;
    my $substring=substr($line,0,-1);
    my @ids=split(/\|/,$substring);
    my $transcript=$ids[1];
    my $gene=$ids[0];
### Open a file for each gene to write the exons
    open (CDS, ">".$gene."_exon.txt") or die "open: $!";

### Now fetch all the exons for the transcript
    my $geneobj=$gene_adaptor->fetch_by_stable_id($ids[1]);
    my $cdsseq=$geneobj->translateable_seq();
    open(CDSSEQ, ">".$gene.".fa") or die "open: $!";
    print CDSSEQ ">".$gene."\n".$cdsseq."\n";
    close (CDSSEQ);
    my $exons=$geneobj->get_all_Exons();
## Just to print the exons loop across the array
    foreach my $exon ( @{ $exons } ) {
        my $exon_info= feature2string($exon);
        print CDS "".$exon_info."\n";
    }
    close (CDS);
}




On Thu, Feb 7, 2013 at 6:13 PM, Kieron Taylor <ktaylor at ebi.ac.uk> wrote:

> Hi Abishek,
>
> We need you to provide more specifics in order to determine what the
> difficulty is.
>
> If you have Ensembl Exon objects, their coding_region_start() will inform
> you if the Exon does not code.
>
> We can be of more assistance if you can tell us more or provide code
> samples. There are several ways to approach the task and we wouldn't want
> to recommend the most difficult for you!
>
> Regards,
>
> --
> Kieron Taylor PhD.
> Ensembl Core team
> EBI
>
>
>
> On 30/01/2013 15:15, Abhishek Niroula wrote:
>
>> Hello,
>>
>> I am trying to get genomic co-ordinates for translatable portion of some
>> human cdna sequences. I could succesfully extract the transcript start
>> and end coordinates and also coordinates for each exon in a transcript.
>> But, all the exons in a protein may not be translatable. I am stuck at
>> this point.
>> My goal is to check if a given genomic co-ordinate in a chromosome is
>> located in protein coding (translatable region) of the chromosome.
>>
>> Thanks for your effort in advance.
>>
>> --
>> Best Reagrds,
>> Abhishek Niroula
>>
>>
>> ______________________________**_________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/**mailman/listinfo/dev<http://lists.ensembl.org/mailman/listinfo/dev>
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>
>
> ______________________________**_________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/**mailman/listinfo/dev<http://lists.ensembl.org/mailman/listinfo/dev>
> Ensembl Blog: http://www.ensembl.info/
>



-- 
Best Reagrds,
Abhishek Niroula
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130207/d50b06da/attachment.html>


More information about the Dev mailing list