[ensembl-dev] N in regulatory feature causes bug in VEP
Chris Penkett
cjp64 at cam.ac.uk
Tue Apr 28 14:51:04 BST 2015
Hi Ensembl developers,
When I run VEP with the --regulatory flag in cache mode for GRCh37, it
gives me an exception on this particular SNV:
% cat tmp.vcf
7 61967171 . C A 33 PASS SNVSB=0;SNVHPOL=3;AN=2;AC=1
[To make your own version of this file with tabs, you can do this:
% echo "7_61967171_._C_A_33_PASS_SNVSB=0;SNVHPOL=3;AN=2;AC=1" | tr _ \\t
> tmp.vcf
]
% variant_effect_predictor.pl --quiet --force_overwrite -i tmp.vcf
--offline --cache -o STDOUT --regulatory | grep -v ^#
-------------------- EXCEPTION --------------------
MSG: Sequence NGAATTCTCAGTAAC contains invalid characters: Only Aa Cc Gg
Tt accepted
STACK Bio::EnsEMBL::Funcgen::BindingMatrix::relative_affinity
/home/cbrcmod/scratch/modules/out/modulebin/VEP/79/perl/Bio/EnsEMBL/Funcgen/BindingMatrix.pm:301
STACK
Bio::EnsEMBL::Variation::MotifFeatureVariationAllele::motif_score_delta
/home/cbrcmod/scratch/modules/out/modulebin/VEP/79/perl/Bio/EnsEMBL/Variation/MotifFeatureVariationAllele.pm:226
STACK Bio::EnsEMBL::Variation::Utils::VEP::mfva_to_line
/home/cbrcmod/scratch/modules/out/modulebin/VEP/79/perl/Bio/EnsEMBL/Variation/Utils/VEP.pm:2361
STACK Bio::EnsEMBL::Variation::Utils::VEP::vfoa_to_line
/home/cbrcmod/scratch/modules/out/modulebin/VEP/79/perl/Bio/EnsEMBL/Variation/Utils/VEP.pm:2121
STACK Bio::EnsEMBL::Variation::Utils::VEP::vf_to_consequences
/home/cbrcmod/scratch/modules/out/modulebin/VEP/79/perl/Bio/EnsEMBL/Variation/Utils/VEP.pm:1835
STACK Bio::EnsEMBL::Variation::Utils::VEP::vf_list_to_cons
/home/cbrcmod/scratch/modules/out/modulebin/VEP/79/perl/Bio/EnsEMBL/Variation/Utils/VEP.pm:1473
STACK Bio::EnsEMBL::Variation::Utils::VEP::get_all_consequences
/home/cbrcmod/scratch/modules/out/modulebin/VEP/79/perl/Bio/EnsEMBL/Variation/Utils/VEP.pm:1193
STACK main::main
/home/cbrcmod/scratch/modules/out/modulebin/VEP/79/bin/variant_effect_predictor.pl:306
STACK toplevel
/home/cbrcmod/scratch/modules/out/modulebin/VEP/79/bin/variant_effect_predictor.pl:144
Date (localtime) = Tue Apr 28 14:38:17 2015
Ensembl API version = 79
---------------------------------------------------
If you look in the regulatory cache file for this region, there is an N
in there:
% zcat
/home/cbrcmod/scratch/modules/out/modulebin/VEP/79/.vep/homo_sapiens/79_GRCh37/7/61000001-62000000_reg.gz
| strings | grep NG
NGAATTCTCAGTAAC
It looks like this is the same sequence as in the error/exception
message above (Sequence NGAATTCTCAGTAAC contains invalid characters).
VEP works fine if you switch off the --regulatory flag:
% variant_effect_predictor.pl --quiet --force_overwrite -i tmp.vcf
--offline --cache -o STDOUT | grep -v ^#
7_61967171_C/A 7:61967171 A - - - intergenic_variant
- - - - - - IMPACT=MODIFIER
Best wishes,
Chris
--
Head of Pipelines
NIHR BioResource Rare Diseases
Department of Haematology
University of Cambridge
NHSBT Building
Long Road
Cambridge CB2 0PT
Tel: 01223 588092
More information about the Dev
mailing list