[ensembl-dev] VEP does not like whitespace in VCF input

João Eiras joao.eiras at gmail.com
Sat May 12 12:58:11 BST 2018


Hi.

I've upgraded VEP in my pipeline to version 92, from version 87.
I have a big test suite (275 vcf files) with many VCF files which test
my pipeline, and those are artificial files, created manually which
naturally contain comments and whitespace.

VEP 92 stop reading the input when it finds a blank line immediately
after the header :( Meaning, if my VCF headers and variant lines are
condensed without any whitespace in the middle, VEP will be able to
annotate my variants, else it stops at the first blank line.

Example:
$ perl $vep_dist/vep \
  -i a.vcf --format vcf \
  --cache --offline --merged \
  --species mus_musculus --assembly GRCm38 \
  --force_overwrite --verbose \
  --dir $vep_cache \
  -o a.vcf.txt \

$ cat a.vcf
##fileformat=VCFv4.1
##reference=GRCm38
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT

# Substitutions, non synonymous
# AGA -> AGT (R/s)
chr1 99772782 . A T 5000 . . .

$ cat a.vcf.txt
## ENSEMBL VARIANT EFFECT PREDICTOR v92.1
# ...
#Uploaded_variantion

^^^ End of file

If my input file however looks like the following it does produce output:


$ cat a.vcf
##fileformat=VCFv4.1
##reference=GRCm38
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT
# Substitutions, non synonymous
# AGA -> AGT (R/s)
chr1 99772782 . A T 5000 . . .

# CTG -> ATG (L/m)
chr1 99772783 . C A 5000 . . .

$ cat a.vcf.txt
## ENSEMBL VARIANT EFFECT PREDICTOR v92.1
# ...
#Uploaded_variantion
chr1_99772782_A/T       ...
chr1_99772783_C/A       ...

Is this behavior expected ?

Thank you.



More information about the Dev mailing list