[ensembl-dev] VEP Cache

Will McLaren wm2 at ebi.ac.uk
Mon Apr 7 10:41:34 BST 2014


Hi Konrad,

Assuming you have the FASTA file available and functioning, this should
work OK; you should see a message like this at VEP startup:

2014-04-07 10:32:11 - Read existing cache info
2014-04-07 10:32:12 - Auto-detected FASTA file in cache directory
2014-04-07 10:32:13 - Checking/creating FASTA index

or just the final message if you are pointing manually to a FASTA file
using --fasta.

I just tested this with a rudimental plugin and I can retrieve the intron
sequence OK, no Ns. Let me know if you still have any problems.

Plugin code:

package IntronSeq;
use Bio::EnsEMBL::Variation::Utils::BaseVepPlugin;
use base qw(Bio::EnsEMBL::Variation::Utils::BaseVepPlugin);

sub run {
  my ($self, $tva) = @_;
  print STDERR $tva->transcript->get_all_Introns->[0]->seq()."\n";
  return {};
}

1;

Output:

> perl variant_effect_predictor.pl -i example.vcf -force -plugin IntronSeq
-offline -no_progress
2014-04-07 10:39:27 - Read existing cache info
2014-04-07 10:39:27 - Auto-detected FASTA file in cache directory
2014-04-07 10:39:27 - Checking/creating FASTA index
2014-04-07 10:39:27 - Loaded plugin: IntronSeq
2014-04-07 10:39:27 - Starting...
2014-04-07 10:39:27 - Detected format of input file as vcf
2014-04-07 10:39:27 - Read 173 variants into buffer
2014-04-07 10:39:27 - Reading transcript data from cache and/or database
2014-04-07 10:39:28 - Retrieved 3097 transcripts (0 mem, 3162 cached, 0 DB,
65 duplicates)
2014-04-07 10:39:28 - Analyzing chromosome 21
2014-04-07 10:39:28 - Analyzing variants
2014-04-07 10:39:28 - Calculating consequences
Plugin 'IntronSeq' went wrong: Can't call method "seq" on an undefined
value at /nfs/users/nfs_w/wm2/.vep/Plugins/IntronSeq.pm line 48, <GEN0>
line 175.
GTGAGTTTCAGAGGCCGTAGGGACAGGGAGCGAGGCCTAGATAGTGGTGTCTGTCTAGATTGGGTCTGAGGCGGGGCCGGGGAGGTCCCGCGGGGCAGAGGAAGGAGGAGGGTTTCTTAGTCCCTCCGCGGCGGTCGCTCTTGCACAGCTTGGGAGGACTAATTTATGGGAACGAGGGTCTGGCGGAGGGCAGGGGCAAGGGCAGGGGTCGGGGCCAGGGGTCGGAGCCAGGCCGCGGGAGGAGCTTGGGCCCGCCTCTGGGAAGCAGCGCACGTTCCGTGCACATCTGTCCATGTCTTCCCAAGGAATACTCGTACTTGCCTTGGCAGGTTCCCTGATTTGGCCTTTGGGATATAAACTCAGCATTTCTCATTCTGGATATTGATAGTTTCGGTGTGGGACCTTTGGTTTCCTGAAATTTTCTTGTTTTTCTTCAGACCCTGTCAAACCGACCACTTTGTTCACCTTCCCAATGACTCTAGTCCAGTTTTGACTCCGTTTCCTGGTTACTTTTTGCCCCTTATTGTAAAGCACTGATTGGAAACACGACACAGGAAATTGGTGGGAAATAGCGATCTGATGTGAAAGAGCCAAATTTAAAAGTAGAGGCACGTATCTGGGCCAGCTCTGTTTCTTCCGCTGGTGTTTGTTAATATTACAAATTGGTTTAATTTTACCTCTGAGCGCACTTTTGGCAGTACGTTAATCATTTTTTCAGTCTTCATATTTATTGTAACTTCTCCACAG
GTGAGTTTCAGAGGCCGTAGGGACAGGGAGCGAGGCCTAGATAGTGGTGTCTGTCTAGATTGGGTCTGAGGCGGGGCCGGGGAGGTCCCGCGGGGCAGAGGAAGGAGGAGGGTTTCTTAGTCCCTCCGCGGCGGTCGCTCTTGCACAGCTTGGGAGGACTAATTTATGGGAACGAGGGTCTGGCGGAGGGCAGGGGCAAGGGCAGGGGTCGGGGCCAGGGGTCGGAGCCAGGCCGCGGGAGGAGCTTGGGCCCGCCTCTGGGAAGCAGCGCACGTTCCGTGCACATCTGTCCATGTCTTCCCAAGGAATACTCGTACTTGCCTTGGCAGGTTCCCTGATTTGGCCTTTGGGATATAAACTCAGCATTTCTCATTCTGGATATTGATAGTTTCGGTGTGGGACCTTTGGTTTCCTGAAATTTTCTTGTTTTTCTTCAGACCCTGTCAAACCGACCACTTTGTTCACCTTCCCAATGACTCTAGTCCAGTTTTGACTCCGTTTCCTGGTTACTTTTTGCCCCTTATTGTAAAGCACTGATTGGAAACACGACACAGGAAATTGGTGGGAAATAGCGATCTGATGTGAAAGAGCCAAATTTAAAAGTAGAGGCACGTATCTGGGCCAGCTCTGTTTCTTCCGCTGGTGTTTGTTAATATTACAAATTGGTTTAATTTTACCTCTGAGCGCACTTTTGGCAGTACGTTAATCATTTTTTCAGTCTTCATATTTATTGTAACTTCTCCACAG

etc etc

Regards

Will McLaren
Ensembl Variation


On 7 April 2014 03:21, Konrad Karczewski <konradk at broadinstitute.org> wrote:

> Hello!
>
> I've been developing a loss-of-function plugin for VEP and having some
> implementation issues relating to the VEP cache. Specifically, when
> accessing transcripts via the API (with the --offline flag set) it seems
> the cache does not store intronic sequences. When I run the code below
> without the --offline flag, it works as expected. With --offline, the
> lengths prints properly, but the sequence is N repeated length times.
>
> # $transcript_variation is provided from VEP plugin "run" subroutine
> my @gene_introns = @{$transcript_variation->transcript->get_all_Introns()};
> my $intron_number = 0;
> print length($gene_introns[$intron_number]->seq()) . "\n"; # Returns
> correct length for first intron of the transcript
> print $gene_introns[$intron_number]->seq() . "\n"; # Returns
> "N"*length(intron)
>
> I can rebuild my cache if need be, but I was wondering if there were any
> plans to integrate intron (and exon) sequence into the cache? (Seems like
> it should be reasonably straightforward, since VEP requires the genome
> fasta anyway, but I'm not sure about the details of how this part is
> implemented). This would be very helpful for a number of reasons, including
> detecting proper intron sequences (i.e. with a canonical splice motif).
>
> (This happens in API versions 74 and 75).
>
> Thanks!
> -Konrad
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20140407/809e3d8e/attachment.html>


More information about the Dev mailing list