David Tamborero david.tamborero at gmail.com
Fri Jun 7 10:17:59 BST 2019

Hi all,

I m working with VEP (command line) v95 to analyse (large) vcf files and
generate VEP -tab files

I ve realized that the output does not contain all the entries that I m
feeding to the tool. I ve been doing several tests for vcf files with 1K,
5K, 10K, 20K, 50K, 100K entries and it looks to me that VEP reads 'a
maximum' of 16K8 variants (see VEP stats output below)

I was not aware that it existed such a limit, nor I am capable of finding
any flag regarding this issue. I feel that I m missing a big point, but I
do not know which one --sorry in advance if so.

./vep -i blablabla.vcf -o blbabla.txt --tab --warning_file blabla.txt
--format vcf --cache --offline --force_overwrite --hgvs --symbol
--stats_file blablabla.txt --stats_text

release 95
sub 4f834538054c1aee24098c72f31f92d4c5aa303b

and the stats file content for the run with a vcf file with 100K entries :
 [VEP run statistics]
VEP version (API) 95 (95)
Annotation sources Cache: /home/david/.vep/homo_sapiens/95_GRCh37
Species homo_sapiens
Command line options --cache --dir /home/david/.vep --force_overwrite
--format vcf --hgvs --input_file /home/david/Desktop/tmp/tmp/hq.vcf
--offline --output_file /home/david/Desktop/tmp/tmp/vep.txt --stats_file
/home/david/Desktop/tmp/tmp/stats_file.txt --stats_text --symbol --tab
--warning_file /home/david/Desktop/tmp/tmp/vep_warnings.txt
Start time 2019-06-07 10:50:24
End time 2019-06-07 10:59:46
Run time 562 seconds
Input file /home/david/Desktop/tmp/tmp/hq.vcf
Output file /home/david/Desktop/tmp/tmp/vep.txt

[General statistics]
Lines of input read 16856
Variants processed 16855
Variants filtered out 0
Novel / existing variants -
Overlapped genes 2809
Overlapped transcripts 15672
Overlapped regulatory features -
