[ensembl-dev] dev-owner at ensembl.org

David Tamborero david.tamborero at gmail.com
Fri Jun 7 13:02:07 BST 2019


Hi Helen, wow, thanks for the quick reply, you re awesome guys !

there was no warnings file created, that s why I was assuming that the VEP
run was ok and it just stopped at some point 'naturally'.

But after your reply, I took the lines 16500-17000 (to catch the supposed
to be wrong one) and inputted to VEP;

-> In the ouptut tab file, there are 465 variants

-- > the stats file show that 465 (not 500) variants are read; and 464
processed; and 0 filtered out.

Lines of input read 465
Variants processed 464
Variants filtered out 0

-> the ids of the variants that are in the input but not ias you pointed
outn the output (note that the id is the variant number), are from 16965 to
17000. This would be compatible with VEP finding a line that does not like
and stop, as you pointed out

-> the first 'not mapped' variant is ( GRCh37)
2  167099158  16965     A   A
which o course it makes no sense to consider it as a variant, but I do not
know if it s such a weird thing to make everything stop (indeed in the web
VEP this is not mapped neither)

--> And there is no warning file

--> I removed this 'conflictive' entry and VEP consequently run the 499
variants

So I think this is a bug, since --as far as i know-- the normal behaviour
when VEP encounters a line that does not like is to ignore it and generate
the corresponding warning file, right ?




El vie., 7 jun. 2019 a las 12:39, Helen Schuilenburg (<helens at ebi.ac.uk>)
escribió:

> Hi David
>
> VEP can run with 100K variants. The VEP publication (McLaren W et al.
> 2016, doi:10.1186/s13059-016-0974-4) gives examples of
> number of variants and timings e.g for 4,474,140 variants.
>
> Were any warnings reported in your warnings_file ( --warning_file
> /home/david/Desktop/tmp/tmp/vep_warnings.txt)?
>
> From the stats file, there may be a problem with your input file causing
> to to stop processing at Line 16856 of your input file.
>
> Regards
> Helen
> On 07/06/2019 10:17, David Tamborero wrote:
>
> Hi all,
>
> I m working with VEP (command line) v95 to analyse (large) vcf files and
> generate VEP -tab files
>
> I ve realized that the output does not contain all the entries that I m
> feeding to the tool. I ve been doing several tests for vcf files with 1K,
> 5K, 10K, 20K, 50K, 100K entries and it looks to me that VEP reads 'a
> maximum' of 16K8 variants (see VEP stats output below)
>
> I was not aware that it existed such a limit, nor I am capable of finding
> any flag regarding this issue. I feel that I m missing a big point, but I
> do not know which one --sorry in advance if so.
>
> command:
> ./vep -i blablabla.vcf -o blbabla.txt --tab --warning_file blabla.txt
> --format vcf --cache --offline --force_overwrite --hgvs --symbol
> --stats_file blablabla.txt --stats_text
>
> version:
> release 95
> sub 4f834538054c1aee24098c72f31f92d4c5aa303b
>
> and the stats file content for the run with a vcf file with 100K entries :
>  [VEP run statistics]
> VEP version (API) 95 (95)
> Annotation sources Cache: /home/david/.vep/homo_sapiens/95_GRCh37
> Species homo_sapiens
> Command line options --cache --dir /home/david/.vep --force_overwrite
> --format vcf --hgvs --input_file /home/david/Desktop/tmp/tmp/hq.vcf
> --offline --output_file /home/david/Desktop/tmp/tmp/vep.txt --stats_file
> /home/david/Desktop/tmp/tmp/stats_file.txt --stats_text --symbol --tab
> --warning_file /home/david/Desktop/tmp/tmp/vep_warnings.txt
> Start time 2019-06-07 10:50:24
> End time 2019-06-07 10:59:46
> Run time 562 seconds
> Input file /home/david/Desktop/tmp/tmp/hq.vcf
> Output file /home/david/Desktop/tmp/tmp/vep.txt
>
> [General statistics]
> Lines of input read 16856
> Variants processed 16855
> Variants filtered out 0
> Novel / existing variants -
> Overlapped genes 2809
> Overlapped transcripts 15672
> Overlapped regulatory features -
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
> Ensembl Blog: http://www.ensembl.info/
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
> Ensembl Blog: http://www.ensembl.info/-
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20190607/cc4ec7e6/attachment.html>


More information about the Dev mailing list