[ensembl-dev] dev-owner at ensembl.org

Helen Schuilenburg helens at ebi.ac.uk
Fri Jun 7 11:39:14 BST 2019

Hi David

VEP can run with 100K variants. The VEP publication (McLaren W et al. 
2016, doi:10.1186/s13059-016-0974-4) gives examples of
number of variants and timings e.g for 4,474,140 variants.

Were any warnings reported in your warnings_file ( --warning_file 

 From the stats file, there may be a problem with your input file 
causing to to stop processing at Line 16856 of your input file.


On 07/06/2019 10:17, David Tamborero wrote:
> Hi all,
> I m working with VEP (command line) v95 to analyse (large) vcf files 
> and generate VEP -tab files
> I ve realized that the output does not contain all the entries that I 
> m feeding to the tool. I ve been doing several tests for vcf files 
> with 1K, 5K, 10K, 20K, 50K, 100K entries and it looks to me that VEP 
> reads 'a maximum' of 16K8 variants (see VEP stats output below)
> I was not aware that it existed such a limit, nor I am capable of 
> finding any flag regarding this issue. I feel that I m missing a big 
> point, but I do not know which one --sorry in advance if so.
> command:
> ./vep -i blablabla.vcf -o blbabla.txt --tab --warning_file blabla.txt 
> --format vcf --cache --offline --force_overwrite --hgvs --symbol 
> --stats_file blablabla.txt --stats_text
> version:
> release 95
> sub 4f834538054c1aee24098c72f31f92d4c5aa303b
> and the stats file content for the run with a vcf file with 100K entries :
>  [VEP run statistics]
> VEP version (API) 95 (95)
> Annotation sources Cache: /home/david/.vep/homo_sapiens/95_GRCh37
> Species homo_sapiens
> Command line options --cache --dir /home/david/.vep --force_overwrite 
> --format vcf --hgvs --input_file /home/david/Desktop/tmp/tmp/hq.vcf 
> --offline --output_file /home/david/Desktop/tmp/tmp/vep.txt 
> --stats_file /home/david/Desktop/tmp/tmp/stats_file.txt --stats_text 
> --symbol --tab --warning_file /home/david/Desktop/tmp/tmp/vep_warnings.txt
> Start time 2019-06-07 10:50:24
> End time 2019-06-07 10:59:46
> Run time 562 seconds
> Input file /home/david/Desktop/tmp/tmp/hq.vcf
> Output file /home/david/Desktop/tmp/tmp/vep.txt
> [General statistics]
> Lines of input read 16856
> Variants processed 16855
> Variants filtered out 0
> Novel / existing variants -
> Overlapped genes 2809
> Overlapped transcripts 15672
> Overlapped regulatory features -
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20190607/5c35f4e3/attachment.html>

More information about the Dev mailing list