[ensembl-dev] N in regulatory feature causes bug in VEP

Chris Penkett cjp64 at cam.ac.uk
Tue Apr 28 14:51:04 BST 2015


Hi Ensembl developers,

When I run VEP with the --regulatory flag in cache mode for GRCh37, it 
gives me an exception on this particular SNV:

% cat tmp.vcf
7    61967171    .    C    A    33    PASS SNVSB=0;SNVHPOL=3;AN=2;AC=1

[To make your own version of this file with tabs, you can do this:
% echo "7_61967171_._C_A_33_PASS_SNVSB=0;SNVHPOL=3;AN=2;AC=1" | tr _ \\t 
 > tmp.vcf
]

% variant_effect_predictor.pl --quiet --force_overwrite -i tmp.vcf 
--offline --cache -o STDOUT --regulatory | grep -v ^#

-------------------- EXCEPTION --------------------
MSG: Sequence NGAATTCTCAGTAAC contains invalid characters: Only Aa Cc Gg 
Tt accepted
STACK Bio::EnsEMBL::Funcgen::BindingMatrix::relative_affinity 
/home/cbrcmod/scratch/modules/out/modulebin/VEP/79/perl/Bio/EnsEMBL/Funcgen/BindingMatrix.pm:301
STACK 
Bio::EnsEMBL::Variation::MotifFeatureVariationAllele::motif_score_delta 
/home/cbrcmod/scratch/modules/out/modulebin/VEP/79/perl/Bio/EnsEMBL/Variation/MotifFeatureVariationAllele.pm:226 

STACK Bio::EnsEMBL::Variation::Utils::VEP::mfva_to_line 
/home/cbrcmod/scratch/modules/out/modulebin/VEP/79/perl/Bio/EnsEMBL/Variation/Utils/VEP.pm:2361
STACK Bio::EnsEMBL::Variation::Utils::VEP::vfoa_to_line 
/home/cbrcmod/scratch/modules/out/modulebin/VEP/79/perl/Bio/EnsEMBL/Variation/Utils/VEP.pm:2121
STACK Bio::EnsEMBL::Variation::Utils::VEP::vf_to_consequences 
/home/cbrcmod/scratch/modules/out/modulebin/VEP/79/perl/Bio/EnsEMBL/Variation/Utils/VEP.pm:1835
STACK Bio::EnsEMBL::Variation::Utils::VEP::vf_list_to_cons 
/home/cbrcmod/scratch/modules/out/modulebin/VEP/79/perl/Bio/EnsEMBL/Variation/Utils/VEP.pm:1473 

STACK Bio::EnsEMBL::Variation::Utils::VEP::get_all_consequences 
/home/cbrcmod/scratch/modules/out/modulebin/VEP/79/perl/Bio/EnsEMBL/Variation/Utils/VEP.pm:1193
STACK main::main 
/home/cbrcmod/scratch/modules/out/modulebin/VEP/79/bin/variant_effect_predictor.pl:306
STACK toplevel 
/home/cbrcmod/scratch/modules/out/modulebin/VEP/79/bin/variant_effect_predictor.pl:144
Date (localtime)    = Tue Apr 28 14:38:17 2015
Ensembl API version = 79
---------------------------------------------------

If you look in the regulatory cache file for this region, there is an N 
in there:

% zcat 
/home/cbrcmod/scratch/modules/out/modulebin/VEP/79/.vep/homo_sapiens/79_GRCh37/7/61000001-62000000_reg.gz 
| strings | grep NG
NGAATTCTCAGTAAC

It looks like this is the same sequence as in the error/exception 
message above (Sequence NGAATTCTCAGTAAC contains invalid characters).

VEP works fine if you switch off the --regulatory flag:

% variant_effect_predictor.pl --quiet --force_overwrite -i tmp.vcf 
--offline --cache -o STDOUT | grep -v ^#
7_61967171_C/A    7:61967171    A    -    -    - intergenic_variant    
-    -    -    -    -    - IMPACT=MODIFIER

Best wishes,
Chris


-- 
Head of Pipelines
NIHR BioResource Rare Diseases
Department of Haematology
University of Cambridge
NHSBT Building
Long Road
Cambridge CB2 0PT
Tel: 01223 588092





More information about the Dev mailing list