[ensembl-dev] dev-owner at ensembl.org
Helen Schuilenburg
helens at ebi.ac.uk
Fri Jun 7 11:39:14 BST 2019
Hi David
VEP can run with 100K variants. The VEP publication (McLaren W et al.
2016, doi:10.1186/s13059-016-0974-4) gives examples of
number of variants and timings e.g for 4,474,140 variants.
Were any warnings reported in your warnings_file ( --warning_file
/home/david/Desktop/tmp/tmp/vep_warnings.txt)?
From the stats file, there may be a problem with your input file
causing to to stop processing at Line 16856 of your input file.
Regards
Helen
On 07/06/2019 10:17, David Tamborero wrote:
> Hi all,
>
> I m working with VEP (command line) v95 to analyse (large) vcf files
> and generate VEP -tab files
>
> I ve realized that the output does not contain all the entries that I
> m feeding to the tool. I ve been doing several tests for vcf files
> with 1K, 5K, 10K, 20K, 50K, 100K entries and it looks to me that VEP
> reads 'a maximum' of 16K8 variants (see VEP stats output below)
>
> I was not aware that it existed such a limit, nor I am capable of
> finding any flag regarding this issue. I feel that I m missing a big
> point, but I do not know which one --sorry in advance if so.
>
> command:
> ./vep -i blablabla.vcf -o blbabla.txt --tab --warning_file blabla.txt
> --format vcf --cache --offline --force_overwrite --hgvs --symbol
> --stats_file blablabla.txt --stats_text
>
> version:
> release 95
> sub 4f834538054c1aee24098c72f31f92d4c5aa303b
>
> and the stats file content for the run with a vcf file with 100K entries :
> [VEP run statistics]
> VEP version (API) 95 (95)
> Annotation sources Cache: /home/david/.vep/homo_sapiens/95_GRCh37
> Species homo_sapiens
> Command line options --cache --dir /home/david/.vep --force_overwrite
> --format vcf --hgvs --input_file /home/david/Desktop/tmp/tmp/hq.vcf
> --offline --output_file /home/david/Desktop/tmp/tmp/vep.txt
> --stats_file /home/david/Desktop/tmp/tmp/stats_file.txt --stats_text
> --symbol --tab --warning_file /home/david/Desktop/tmp/tmp/vep_warnings.txt
> Start time 2019-06-07 10:50:24
> End time 2019-06-07 10:59:46
> Run time 562 seconds
> Input file /home/david/Desktop/tmp/tmp/hq.vcf
> Output file /home/david/Desktop/tmp/tmp/vep.txt
>
> [General statistics]
> Lines of input read 16856
> Variants processed 16855
> Variants filtered out 0
> Novel / existing variants -
> Overlapped genes 2809
> Overlapped transcripts 15672
> Overlapped regulatory features -
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
> Ensembl Blog: http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20190607/5c35f4e3/attachment.html>
More information about the Dev
mailing list