[ensembl-dev] N in regulatory feature causes bug in VEP

njohnson njohnson at ebi.ac.uk
Tue Apr 28 15:01:29 BST 2015


Hi Chris

Yes, this looks like our relative_affinity doesn’t deal well with Ns. We’ll look to find a solution asap.

Nathan Johnson

Ensembl Regulation
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
United Kingdom

http://www.ensembl.info/
http://twitter.com/#!/ensembl
https://www.facebook.com/Ensembl.org

> On 28 Apr 2015, at 14:51, Chris Penkett <cjp64 at cam.ac.uk> wrote:
> 
> 
> Hi Ensembl developers,
> 
> When I run VEP with the --regulatory flag in cache mode for GRCh37, it gives me an exception on this particular SNV:
> 
> % cat tmp.vcf
> 7    61967171    .    C    A    33    PASS SNVSB=0;SNVHPOL=3;AN=2;AC=1
> 
> [To make your own version of this file with tabs, you can do this:
> % echo "7_61967171_._C_A_33_PASS_SNVSB=0;SNVHPOL=3;AN=2;AC=1" | tr _ \\t > tmp.vcf
> ]
> 
> % variant_effect_predictor.pl --quiet --force_overwrite -i tmp.vcf --offline --cache -o STDOUT --regulatory | grep -v ^#
> 
> -------------------- EXCEPTION --------------------
> MSG: Sequence NGAATTCTCAGTAAC contains invalid characters: Only Aa Cc Gg Tt accepted
> STACK Bio::EnsEMBL::Funcgen::BindingMatrix::relative_affinity /home/cbrcmod/scratch/modules/out/modulebin/VEP/79/perl/Bio/EnsEMBL/Funcgen/BindingMatrix.pm:301
> STACK Bio::EnsEMBL::Variation::MotifFeatureVariationAllele::motif_score_delta /home/cbrcmod/scratch/modules/out/modulebin/VEP/79/perl/Bio/EnsEMBL/Variation/MotifFeatureVariationAllele.pm:226 
> STACK Bio::EnsEMBL::Variation::Utils::VEP::mfva_to_line /home/cbrcmod/scratch/modules/out/modulebin/VEP/79/perl/Bio/EnsEMBL/Variation/Utils/VEP.pm:2361
> STACK Bio::EnsEMBL::Variation::Utils::VEP::vfoa_to_line /home/cbrcmod/scratch/modules/out/modulebin/VEP/79/perl/Bio/EnsEMBL/Variation/Utils/VEP.pm:2121
> STACK Bio::EnsEMBL::Variation::Utils::VEP::vf_to_consequences /home/cbrcmod/scratch/modules/out/modulebin/VEP/79/perl/Bio/EnsEMBL/Variation/Utils/VEP.pm:1835
> STACK Bio::EnsEMBL::Variation::Utils::VEP::vf_list_to_cons /home/cbrcmod/scratch/modules/out/modulebin/VEP/79/perl/Bio/EnsEMBL/Variation/Utils/VEP.pm:1473 
> STACK Bio::EnsEMBL::Variation::Utils::VEP::get_all_consequences /home/cbrcmod/scratch/modules/out/modulebin/VEP/79/perl/Bio/EnsEMBL/Variation/Utils/VEP.pm:1193
> STACK main::main /home/cbrcmod/scratch/modules/out/modulebin/VEP/79/bin/variant_effect_predictor.pl:306
> STACK toplevel /home/cbrcmod/scratch/modules/out/modulebin/VEP/79/bin/variant_effect_predictor.pl:144
> Date (localtime)    = Tue Apr 28 14:38:17 2015
> Ensembl API version = 79
> ---------------------------------------------------
> 
> If you look in the regulatory cache file for this region, there is an N in there:
> 
> % zcat /home/cbrcmod/scratch/modules/out/modulebin/VEP/79/.vep/homo_sapiens/79_GRCh37/7/61000001-62000000_reg.gz | strings | grep NG
> NGAATTCTCAGTAAC
> 
> It looks like this is the same sequence as in the error/exception message above (Sequence NGAATTCTCAGTAAC contains invalid characters).
> 
> VEP works fine if you switch off the --regulatory flag:
> 
> % variant_effect_predictor.pl --quiet --force_overwrite -i tmp.vcf --offline --cache -o STDOUT | grep -v ^#
> 7_61967171_C/A    7:61967171    A    -    -    - intergenic_variant    -    -    -    -    -    - IMPACT=MODIFIER
> 
> Best wishes,
> Chris
> 
> 
> -- 
> Head of Pipelines
> NIHR BioResource Rare Diseases
> Department of Haematology
> University of Cambridge
> NHSBT Building
> Long Road
> Cambridge CB2 0PT
> Tel: 01223 588092
> 
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/





More information about the Dev mailing list