[ensembl-dev] VEP incorrectly detects pileup, if the first line of a VCF is missing an ID

Cyriac Kandoth kandothc at mskcc.org
Wed Oct 5 19:17:52 BST 2016


Hi Dev,

This exception shouldn't ever happen because VCF specs require that the ID
field always be non-empty. Specifically "If there is no identifier
available, then the missing value '.' should be used".

However, we all deal with crappy VCFs, so its worthwhile to support it in
code, or least fail gracefully. Attached is a sample VCF, and here is the
command I used to reproduce the bug:

/home/kandoth/perl/perl-5.22.2/bin/perl /home/kandoth/vep/
variant_effect_predictor.pl --species homo_sapiens --assembly GRCh37
--offline --no_progress --no_stats --sift b --ccds --uniprot --hgvs
--symbol --numbers --domains --gene_phenotype --canonical --protein
--biotype --uniprot --polyphen b --gmaf --maf_1kg --maf_esp
--regulatory --tsl --pubmed --variant_class --shift_hgvs 1 --check_existing
--total_length --allele_number --no_escape --xref_refseq --failed 1 --vcf
--minimal --flag_pick_allele --pick_order
canonical,tsl,biotype,rank,ccds,length --dir /home/kandoth/.vep --fasta
/home/kandoth/.vep/homo_sapiens/85_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz
--input_file test.vcf --output_file test.vep.vcf

This is the terminal output I get:

2016-10-05 18:01:37 - Read existing cache info
2016-10-05 18:01:37 - Starting...
2016-10-05 18:01:37 - *Detected format of input file as pileup*
WARNING: Length of reference allele (A length 1) does not match
co-ordinates 112175600-112175599 on line 3
2016-10-05 18:01:37 - Read 2 variants into buffer
2016-10-05 18:01:37 - Checking for existing variations
2016-10-05 18:01:37 - Reading transcript data from cache and/or database
2016-10-05 18:01:38 - Retrieved 653 transcripts (0 mem, 653 cached, 0 DB, 0
duplicates)
2016-10-05 18:01:38 - Reading regulatory data from cache and/or database
2016-10-05 18:01:38 - Retrieved 1940 regulatory features (0 mem, 1940
cached, 0 DB, 0 duplicates)
2016-10-05 18:01:38 - Analyzing chromosome 12
2016-10-05 18:01:38 - Analyzing variants
2016-10-05 18:01:38 - Analyzing RegulatoryFeatures
2016-10-05 18:01:38 - Analyzing MotifFeatures
2016-10-05 18:01:38 - Calculating consequences
Use of uninitialized value in string eq at
/home/kandoth/vep/Bio/EnsEMBL/Variation/TranscriptVariationAllele.pm line
1269.
Use of uninitialized value in uc at /home/kandoth/vep/Bio/SeqUtils.pm line
290.
Use of uninitialized value in string eq at
/home/kandoth/vep/Bio/EnsEMBL/Variation/TranscriptVariationAllele.pm line
1269.
Use of uninitialized value in uc at /home/kandoth/vep/Bio/SeqUtils.pm line
290.
Use of uninitialized value in string eq at
/home/kandoth/vep/Bio/EnsEMBL/Variation/TranscriptVariationAllele.pm line
1269.
Use of uninitialized value in uc at /home/kandoth/vep/Bio/SeqUtils.pm line
290.
Use of uninitialized value in string eq at
/home/kandoth/vep/Bio/EnsEMBL/Variation/TranscriptVariationAllele.pm line
1269.
Use of uninitialized value in uc at /home/kandoth/vep/Bio/SeqUtils.pm line
290.
2016-10-05 18:01:44 - Analyzing chromosome 17
2016-10-05 18:01:44 - Analyzing variants
2016-10-05 18:01:44 - Analyzing RegulatoryFeatures
2016-10-05 18:01:44 - Analyzing MotifFeatures
2016-10-05 18:01:44 - Calculating consequences
Use of uninitialized value in string eq at
/home/kandoth/vep/Bio/EnsEMBL/Variation/TranscriptVariationAllele.pm line
1269.
Use of uninitialized value in uc at /home/kandoth/vep/Bio/SeqUtils.pm line
290.
Use of uninitialized value in string eq at
/home/kandoth/vep/Bio/EnsEMBL/Variation/TranscriptVariationAllele.pm line
1269.
Use of uninitialized value in uc at /home/kandoth/vep/Bio/SeqUtils.pm line
290.
Use of uninitialized value in string eq at
/home/kandoth/vep/Bio/EnsEMBL/Variation/TranscriptVariationAllele.pm line
1269.
Use of uninitialized value in uc at /home/kandoth/vep/Bio/SeqUtils.pm line
290.
Use of uninitialized value in string eq at
/home/kandoth/vep/Bio/EnsEMBL/Variation/TranscriptVariationAllele.pm line
1269.
Use of uninitialized value in uc at /home/kandoth/vep/Bio/SeqUtils.pm line
290.
Use of uninitialized value in string eq at
/home/kandoth/vep/Bio/EnsEMBL/Variation/TranscriptVariationAllele.pm line
1269.
Use of uninitialized value in uc at /home/kandoth/vep/Bio/SeqUtils.pm line
290.
Use of uninitialized value in string eq at
/home/kandoth/vep/Bio/EnsEMBL/Variation/TranscriptVariationAllele.pm line
1269.
Use of uninitialized value in uc at /home/kandoth/vep/Bio/SeqUtils.pm line
290.
2016-10-05 18:01:52 - Processed 2 total variants (0 vars/sec, 0 vars/sec
total)
2016-10-05 18:01:52 - See test.vep.vcf_warnings.txt for details of 1
warnings
2016-10-05 18:01:52 - Finished!

The output has errors too. Basically, it thinks that the rsIDs are the
reference alleles.

~Cyriac
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20161005/82521a55/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.vcf
Type: text/vcard
Size: 161 bytes
Desc: not available
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20161005/82521a55/attachment.vcf>


More information about the Dev mailing list