[ensembl-dev] Bad VCF file crashing VEP

Stuart Watt morungos at gmail.com
Tue May 24 19:26:58 BST 2016


Hi all

I’ve hit an issue with some invalid VarScan2 VCF files crashing VEP extremely fatally. A VCF that triggers this is:

> ##fileformat=VCFv4.1
> ##source=VarScan2
> ##INFO=<ID=DP,Number=1,Type=Integer,Description="Total depth of quality bases">
> ##INFO=<ID=SOMATIC,Number=0,Type=Flag,Description="Indicates if record is a somatic mutation">
> ##INFO=<ID=SS,Number=1,Type=String,Description="Somatic status of variant (0=Reference,1=Germline,2=Somatic,3=LOH, or 5=Unknown)">
> ##INFO=<ID=SSC,Number=1,Type=String,Description="Somatic score in Phred scale (0-255) derived from somatic p-value">
> ##INFO=<ID=GPV,Number=1,Type=Float,Description="Fisher's Exact Test P-value of tumor+normal versus no variant for Germline calls">
> ##INFO=<ID=SPV,Number=1,Type=Float,Description="Fisher's Exact Test P-value of tumor versus normal for Somatic/LOH calls">
> ##FILTER=<ID=str10,Description="Less than 10% or more than 90% of variant supporting reads on one strand">
> ##FILTER=<ID=indelError,Description="Likely artifact due to indel reads at this position">
> ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
> ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
> ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
> ##FORMAT=<ID=RD,Number=1,Type=Integer,Description="Depth of reference-supporting bases (reads1)">
> ##FORMAT=<ID=AD,Number=1,Type=Integer,Description="Depth of variant-supporting bases (reads2)">
> ##FORMAT=<ID=FREQ,Number=1,Type=String,Description="Variant allele frequency">
> ##FORMAT=<ID=DP4,Number=1,Type=String,Description="Strand read counts: ref/fwd, ref/rev, var/fwd, var/rev">
> #CHROM  POS     ID	REF     ALT     QUAL    FILTER  INFO    FORMAT  NORMAL  498_tissue
> chr2    242814072	.	TG	T	.	PASS    .	GT:GQ:DP:RD:AD:FREQ:DP4 0/0:.:34:34:0:0%:18,16,0,0	0/1:.:77:73:2:2.67%:35,38,1,1
> chr3    239555  .	C	CT/-T   .	PASS    .	GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:77:29:19:39.58%:10,19,4,15        0/1:.:72:43:15:25.86%:19,24,4,11
> 

it’s the last like that does this. If the chr2 entry is missing, the file isn’t even detected as a VCF.

The error is:

> MSG: start arg must be less than or equal to end arg + 1
> STACK Bio::EnsEMBL::TranscriptMapper::genomic2cds /mnt/work1/software/vep/83/Bio/EnsEMBL/TranscriptMapper.pm:397
> STACK Bio::EnsEMBL::Variation::BaseTranscriptVariation::cds_coords /mnt/work1/software/vep/83/Bio/EnsEMBL/Variation/BaseTranscriptVariation.pm:325
> STACK Bio::EnsEMBL::Variation::BaseVariationFeatureOverlapAllele::_pre_consequence_predicates /mnt/work1/software/vep/83/Bio/EnsEMBL/Variation/BaseVariationFeatureOverlapAllele.pm:393
> STACK Bio::EnsEMBL::Variation::BaseVariationFeatureOverlapAllele::get_all_OverlapConsequences /mnt/work1/software/vep/83/Bio/EnsEMBL/Variation/BaseVariationFeatureOverlapAllele.pm:237
> STACK Bio::EnsEMBL::Variation::Utils::VEP::tva_to_line /mnt/work1/software/vep/83/Bio/EnsEMBL/Variation/Utils/VEP.pm:2568
> STACK Bio::EnsEMBL::Variation::Utils::VEP::vfoa_to_line /mnt/work1/software/vep/83/Bio/EnsEMBL/Variation/Utils/VEP.pm:2504
> STACK Bio::EnsEMBL::Variation::Utils::VEP::vf_to_consequences /mnt/work1/software/vep/83/Bio/EnsEMBL/Variation/Utils/VEP.pm:2191
> STACK Bio::EnsEMBL::Variation::Utils::VEP::rejoin_variants /mnt/work1/software/vep/83/Bio/EnsEMBL/Variation/Utils/VEP.pm:1777
> STACK Bio::EnsEMBL::Variation::Utils::VEP::vf_list_to_cons /mnt/work1/software/vep/83/Bio/EnsEMBL/Variation/Utils/VEP.pm:1485
> STACK Bio::EnsEMBL::Variation::Utils::VEP::get_all_consequences /mnt/work1/software/vep/83/Bio/EnsEMBL/Variation/Utils/VEP.pm:1205
> STACK main::main /mnt/work1/software/vep/83/variant_effect_predictor.pl:321
> STACK toplevel /mnt/work1/software/vep/83/variant_effect_predictor.pl:148
> Date (localtime)    = Tue May 24 13:59:39 2016
> Ensembl API version = 83
> ---------------------------------------------------
> ERROR: Forked process(es) died

We’re still trying to figure the VarScan issue, but this shouldn’t really take out an entire VEP run. Even the issue where this line breaks recognition of VEP input is, I’d say, less than ideal, as the file contains about 7000 other valid records. 

All the best
Stuart

—
Stuart Watt, PhD
Scientific Research Associate, Princess Margaret Cancer Centre
MaRS Centre, 101 College Street
Toronto Medical Discovery Tower, Room 9-302
Toronto, Ontario, Canada M5G 1L7
stuart.watt at uhnresearch.ca <mailto:stuart.watt at uhnresearch.ca>
416-634-8816

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160524/e165f194/attachment.html>


More information about the Dev mailing list