[ensembl-dev] N in regulatory feature causes bug in VEP
njohnson
njohnson at ebi.ac.uk
Tue Apr 28 15:01:29 BST 2015
Hi Chris
Yes, this looks like our relative_affinity doesn’t deal well with Ns. We’ll look to find a solution asap.
Nathan Johnson
Ensembl Regulation
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
United Kingdom
http://www.ensembl.info/
http://twitter.com/#!/ensembl
https://www.facebook.com/Ensembl.org
> On 28 Apr 2015, at 14:51, Chris Penkett <cjp64 at cam.ac.uk> wrote:
>
>
> Hi Ensembl developers,
>
> When I run VEP with the --regulatory flag in cache mode for GRCh37, it gives me an exception on this particular SNV:
>
> % cat tmp.vcf
> 7 61967171 . C A 33 PASS SNVSB=0;SNVHPOL=3;AN=2;AC=1
>
> [To make your own version of this file with tabs, you can do this:
> % echo "7_61967171_._C_A_33_PASS_SNVSB=0;SNVHPOL=3;AN=2;AC=1" | tr _ \\t > tmp.vcf
> ]
>
> % variant_effect_predictor.pl --quiet --force_overwrite -i tmp.vcf --offline --cache -o STDOUT --regulatory | grep -v ^#
>
> -------------------- EXCEPTION --------------------
> MSG: Sequence NGAATTCTCAGTAAC contains invalid characters: Only Aa Cc Gg Tt accepted
> STACK Bio::EnsEMBL::Funcgen::BindingMatrix::relative_affinity /home/cbrcmod/scratch/modules/out/modulebin/VEP/79/perl/Bio/EnsEMBL/Funcgen/BindingMatrix.pm:301
> STACK Bio::EnsEMBL::Variation::MotifFeatureVariationAllele::motif_score_delta /home/cbrcmod/scratch/modules/out/modulebin/VEP/79/perl/Bio/EnsEMBL/Variation/MotifFeatureVariationAllele.pm:226
> STACK Bio::EnsEMBL::Variation::Utils::VEP::mfva_to_line /home/cbrcmod/scratch/modules/out/modulebin/VEP/79/perl/Bio/EnsEMBL/Variation/Utils/VEP.pm:2361
> STACK Bio::EnsEMBL::Variation::Utils::VEP::vfoa_to_line /home/cbrcmod/scratch/modules/out/modulebin/VEP/79/perl/Bio/EnsEMBL/Variation/Utils/VEP.pm:2121
> STACK Bio::EnsEMBL::Variation::Utils::VEP::vf_to_consequences /home/cbrcmod/scratch/modules/out/modulebin/VEP/79/perl/Bio/EnsEMBL/Variation/Utils/VEP.pm:1835
> STACK Bio::EnsEMBL::Variation::Utils::VEP::vf_list_to_cons /home/cbrcmod/scratch/modules/out/modulebin/VEP/79/perl/Bio/EnsEMBL/Variation/Utils/VEP.pm:1473
> STACK Bio::EnsEMBL::Variation::Utils::VEP::get_all_consequences /home/cbrcmod/scratch/modules/out/modulebin/VEP/79/perl/Bio/EnsEMBL/Variation/Utils/VEP.pm:1193
> STACK main::main /home/cbrcmod/scratch/modules/out/modulebin/VEP/79/bin/variant_effect_predictor.pl:306
> STACK toplevel /home/cbrcmod/scratch/modules/out/modulebin/VEP/79/bin/variant_effect_predictor.pl:144
> Date (localtime) = Tue Apr 28 14:38:17 2015
> Ensembl API version = 79
> ---------------------------------------------------
>
> If you look in the regulatory cache file for this region, there is an N in there:
>
> % zcat /home/cbrcmod/scratch/modules/out/modulebin/VEP/79/.vep/homo_sapiens/79_GRCh37/7/61000001-62000000_reg.gz | strings | grep NG
> NGAATTCTCAGTAAC
>
> It looks like this is the same sequence as in the error/exception message above (Sequence NGAATTCTCAGTAAC contains invalid characters).
>
> VEP works fine if you switch off the --regulatory flag:
>
> % variant_effect_predictor.pl --quiet --force_overwrite -i tmp.vcf --offline --cache -o STDOUT | grep -v ^#
> 7_61967171_C/A 7:61967171 A - - - intergenic_variant - - - - - - IMPACT=MODIFIER
>
> Best wishes,
> Chris
>
>
> --
> Head of Pipelines
> NIHR BioResource Rare Diseases
> Department of Haematology
> University of Cambridge
> NHSBT Building
> Long Road
> Cambridge CB2 0PT
> Tel: 01223 588092
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
More information about the Dev
mailing list