[ensembl-dev] dev-owner at ensembl.org

David Tamborero david.tamborero at gmail.com
Fri Jun 7 18:44:49 BST 2019


Hi Helen

FYI, if I m not wrong at this time of the day, VEP run (--tab output)
stops (w/o
any msg or warning file created) as soon as a line with REF==ALT is
reached, so it does not seem to need to 'accumulate' a number of these
'non-variant' entries

on the other hand, as you say, when using vcf as output with the
--allow_non_variant flag, entries as e.g.

12 111352091 16500 C C . PASS

are included in the output w/o any annotation,

thanks!
and have a nice weekedn
d


El vie., 7 jun. 2019 a las 18:25, Helen Schuilenburg (<helens at ebi.ac.uk>)
escribió:

> Hi David
>
> Thanks for the information.
>
> The VEP should skip non-variant lines of input by default and not stop.
>
> We will look at updating the stats to report the lines skiped.
>
> When using VCF format as input and output, by default VEP will skip
> non-variant lines of input.  Please could you try running your sample
> variants with vcf output (--vcf) and --allow_non_variant.
>
> https://www.ensembl.org/info/docs/tools/vep/script/vep_options.html#filt
>
> With the text output, it should also skip the non-variant lines. VEP could
> be stopping on your input file, if it has skipped a number of non-variant
> lines. We will look into this
>
> Regards
> Helen
>
>
> On 07/06/2019 13:02, David Tamborero wrote:
>
> Hi Helen, wow, thanks for the quick reply, you re awesome guys !
>
> there was no warnings file created, that s why I was assuming that the VEP
> run was ok and it just stopped at some point 'naturally'.
>
> But after your reply, I took the lines 16500-17000 (to catch the supposed
> to be wrong one) and inputted to VEP;
>
> -> In the ouptut tab file, there are 465 variants
>
> -- > the stats file show that 465 (not 500) variants are read; and 464
> processed; and 0 filtered out.
>
> Lines of input read 465
> Variants processed 464
> Variants filtered out 0
>
> -> the ids of the variants that are in the input but not ias you pointed
> outn the output (note that the id is the variant number), are from 16965 to
> 17000. This would be compatible with VEP finding a line that does not
> like and stop, as you pointed out
>
> -> the first 'not mapped' variant is ( GRCh37)
> 2  167099158  16965     A   A
> which o course it makes no sense to consider it as a variant, but I do not
> know if it s such a weird thing to make everything stop (indeed in the
> web VEP this is not mapped neither)
>
> --> And there is no warning file
>
> --> I removed this 'conflictive' entry and VEP consequently run the 499
> variants
>
> So I think this is a bug, since --as far as i know-- the normal behaviour
> when VEP encounters a line that does not like is to ignore it and generate
> the corresponding warning file, right ?
>
>
>
>
> El vie., 7 jun. 2019 a las 12:39, Helen Schuilenburg (<helens at ebi.ac.uk>)
> escribió:
>
>> Hi David
>>
>> VEP can run with 100K variants. The VEP publication (McLaren W et al.
>> 2016, doi:10.1186/s13059-016-0974-4) gives examples of
>> number of variants and timings e.g for 4,474,140 variants.
>>
>> Were any warnings reported in your warnings_file ( --warning_file
>> /home/david/Desktop/tmp/tmp/vep_warnings.txt)?
>>
>> From the stats file, there may be a problem with your input file causing
>> to to stop processing at Line 16856 of your input file.
>>
>> Regards
>> Helen
>> On 07/06/2019 10:17, David Tamborero wrote:
>>
>> Hi all,
>>
>> I m working with VEP (command line) v95 to analyse (large) vcf files and
>> generate VEP -tab files
>>
>> I ve realized that the output does not contain all the entries that I m
>> feeding to the tool. I ve been doing several tests for vcf files with 1K,
>> 5K, 10K, 20K, 50K, 100K entries and it looks to me that VEP reads 'a
>> maximum' of 16K8 variants (see VEP stats output below)
>>
>> I was not aware that it existed such a limit, nor I am capable of finding
>> any flag regarding this issue. I feel that I m missing a big point, but I
>> do not know which one --sorry in advance if so.
>>
>> command:
>> ./vep -i blablabla.vcf -o blbabla.txt --tab --warning_file blabla.txt
>> --format vcf --cache --offline --force_overwrite --hgvs --symbol
>> --stats_file blablabla.txt --stats_text
>>
>> version:
>> release 95
>> sub 4f834538054c1aee24098c72f31f92d4c5aa303b
>>
>> and the stats file content for the run with a vcf file with 100K entries :
>>  [VEP run statistics]
>> VEP version (API) 95 (95)
>> Annotation sources Cache: /home/david/.vep/homo_sapiens/95_GRCh37
>> Species homo_sapiens
>> Command line options --cache --dir /home/david/.vep --force_overwrite
>> --format vcf --hgvs --input_file /home/david/Desktop/tmp/tmp/hq.vcf
>> --offline --output_file /home/david/Desktop/tmp/tmp/vep.txt --stats_file
>> /home/david/Desktop/tmp/tmp/stats_file.txt --stats_text --symbol --tab
>> --warning_file /home/david/Desktop/tmp/tmp/vep_warnings.txt
>> Start time 2019-06-07 10:50:24
>> End time 2019-06-07 10:59:46
>> Run time 562 seconds
>> Input file /home/david/Desktop/tmp/tmp/hq.vcf
>> Output file /home/david/Desktop/tmp/tmp/vep.txt
>>
>> [General statistics]
>> Lines of input read 16856
>> Variants processed 16855
>> Variants filtered out 0
>> Novel / existing variants -
>> Overlapped genes 2809
>> Overlapped transcripts 15672
>> Overlapped regulatory features -
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
>> Ensembl Blog: http://www.ensembl.info/
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
>> Ensembl Blog: http://www.ensembl.info/-
>>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
> Ensembl Blog: http://www.ensembl.info/
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
> Ensembl Blog: http://www.ensembl.info/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20190607/e26ec59d/attachment.html>


More information about the Dev mailing list