[ensembl-dev] VEP Cache

Konrad Karczewski konradk at broadinstitute.org
Mon Apr 7 03:21:04 BST 2014


Hello!

I've been developing a loss-of-function plugin for VEP and having some implementation issues relating to the VEP cache. Specifically, when accessing transcripts via the API (with the --offline flag set) it seems the cache does not store intronic sequences. When I run the code below without the --offline flag, it works as expected. With --offline, the lengths prints properly, but the sequence is N repeated length times.

# $transcript_variation is provided from VEP plugin "run" subroutine
my @gene_introns = @{$transcript_variation->transcript->get_all_Introns()};
my $intron_number = 0;
print length($gene_introns[$intron_number]->seq()) . "\n"; # Returns correct length for first intron of the transcript
print $gene_introns[$intron_number]->seq() . "\n"; # Returns "N"*length(intron)

I can rebuild my cache if need be, but I was wondering if there were any plans to integrate intron (and exon) sequence into the cache? (Seems like it should be reasonably straightforward, since VEP requires the genome fasta anyway, but I'm not sure about the details of how this part is implemented). This would be very helpful for a number of reasons, including detecting proper intron sequences (i.e. with a canonical splice motif).

(This happens in API versions 74 and 75).

Thanks!
-Konrad



More information about the Dev mailing list