[ensembl-dev] Performance issues getting spliced sequence from Bio::EnsEMBL::Transcript

Wed Nov 12 22:39:13 GMT 2014

I'm working on a VEP plugin where I need to look at a section of cDNA
around the variant.

In a previous plugin, where I needed to do something similar with genomic
DNA, I was able to get a slice from the VariationFeature and subslice it
like this:

my $subseq = $vf->slice->sub_Slice($start, $end)->seq;

That worked really well and performed really well.

I can't find anything similar for the cDNA so I'm getting the spliced
sequence from the transcript and then using substr() to do what sub_Slice
did above.

my $cdna_seq = $transcript->spliced_seq;
my $subseq = substr($cdna_seq, $start, $end);

It works well enough, but performance is too poor to be useful, taking 2 or
3 seconds to get $subseq per transcript. I'm wondering if I'm going about
things the wrong way and am skipping a cache or something with the methods
I'm using.

Any ideas for how I can get better performance? Is there a better way to
get a chunk of a transcript's spliced sequence?

Thanks,
Matt Wood
Codified Genomics
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20141112/b59bee7f/attachment.html>