[ensembl-dev] getting gene exons and transcripts that overlap only the original slice...

Steve Moss gawbul at gmail.com
Wed Jan 12 11:42:02 GMT 2011


Hi Andrea,

I'm not sure about the solution to your problem yet, I will have to do some
investigating, but I can recommend a way to reduce your dataset. You are
currently retrieving all exons for each transcript for each gene, which
actually returns a redundant set of exons, as exons can be shared between
transcripts. A better way to do this would be to either use the canonical
transcript method e.g. $c_transcript = $gene->canonical_transcript(); and
the $c_transcript->get_all_Exons(); or even better just called
$gene->get_all_Exons? Or do you need the transcripts?

Perhaps I am missing the point, but I wonder if it can't be done using
seq_region_name and checking for overlaps? I find this link particularly
useful when trying to understand the more intricate functions of the API -
http://www.ensembl.org/info/docs/Pdoc/ensembl/index.html.

Cheers,

Steve

On 12 January 2011 10:43, <dev-request at ensembl.org> wrote:
>
> Date: Tue, 11 Jan 2011 19:38:02 +0000
> From: Andrea Edwards <edwardsa at cs.man.ac.uk>
> Subject: [ensembl-dev] getting gene exons and transcripts that overlap
>        only the original slice
> To: Dev at ensembl.org
> Message-ID: <4D2CB19A.7010501 at cs.man.ac.uk>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Hello
>
> i have this code below taken from the core api tutorial which gets me
> all the exons and transcripts for the gene(s) that overlap a slice.
>
>  I was hoping for an easy way to get those features of the gene that
> only overlap the original one bp slice; this code gets all exons and
> transcripts
> associated with the gene
>
> I thought you might be able to call 'get_all_Object' methods with a
> parameter which represents a region of sequence overlap but it seems not.
> I also thought they might be filtered automatically based on the
> underlying slice but it seems not.
>
> Naturally i can filter the features in the list based on their start and
> end positions but for speed it would be easier not to retrieve them all at.
> I have a lot of data so speed is important. Please can you advise the
> best way to do this.
>
> $slice = $slice_adaptor->fetch_by_region( 'chromosome', '9', 21816758,
> 21816758 );
>
> my $genes = $slice->get_all_Genes();
> while ( my $gene = shift @{$genes} ) {
>     my $gstring = feature2string($gene);
>     print "$gstring\n";
>
>     my $transcripts = $gene->get_all_Transcripts();
>     while ( my $transcript = shift @{$transcripts} ) {
>         my $tstring = feature2string($transcript);
>         print "\t$tstring\n";
>
>         foreach my $exon ( @{ $transcript->get_all_Exons() } ) {
>             my $estring = feature2string($exon);
>             print "\t\t$estring\n";
>         }
>     }
> }
>
> print "done\n";
>
> Many thanks
>
>
>
>
-- 
Kindest regards,

Steve Moss
http://stevemoss.ath.cx
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20110112/7f6bf47a/attachment.html>


More information about the Dev mailing list