[ensembl-dev] Variant Effect Predictor not using HTS to read gzipped fata

Will McLaren wm2 at ebi.ac.uk
Wed Dec 7 14:45:04 GMT 2016


Hi Sebastian,

For some reason or other it looks as though your installation of
Bio::DB::HTS is not working. You can double check by executing:

$ perl -MBio::DB::HTS -e1

This should give you an error if the module is not installed properly, and
nothing if it is working OK.

In the case of the error, I'd try setting up Bio::DB::HTS again according
to the README here: http://search.cpan.org/dist/Bio-DB-HTS/

Having said this, it may be that you can avoid having to use it. From your
command it looks as though you are not using a VEP flag that requires
reading sequence data from a FASTA file (the role of Bio::DB::HTS) - those
that require this are --hgvs and --check_ref.

You may therefore try one of two solutions:

1) unarchive the bgzipped FASTA file that VEP is finding, something like:

$ gzip -d /opt/modules/i12g/ensembl-
tools/85/cachedir/homo_sapiens/85_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz

This will allow the code to use Bio::DB::Fasta to index the unarchived
file, meaning that if you wish to use --hgvs, for example, it should work
OK.

2) move the FASTA file so that it is not picked up by VEP for indexing:

$ mv /opt/modules/i12g/ensembl-
tools/85/cachedir/homo_sapiens/85_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz
/opt/modules/i12g/ensembl-
tools/85/cachedir/homo_sapiens/85_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz.bak

You could then move it back again in future if you do require it.

Hope that solves the issue for you!

Will McLaren
Ensembl Variation

On 7 December 2016 at 13:12, Hollizeck, Sebastian <
Sebastian.Hollizeck at med.uni-muenchen.de> wrote:

> Hi,
>
> I installed the standalone script in my module system for usage on our
> server, but i cant seem to get it working with the cache.
> The script works with just using the --database option, but when i use the
> --cache i get the following error
>
> MSG: ERROR: Cannot index bgzipped FASTA file with Bio::DB::Fasta
>
> this is the full output with verbose
>
> [hollizeck at kkf0f6ee kleinPipeline]$ variant_effect_predictor.pl
> --check_existing --gmaf --maf_1kg --maf_esp --maf_exac  --pubmed
> --regulatory --species homo_sapiens --port 3337 --buffer_size 40000 --cache
> --minimal  --sift b --polyphen b --ccds --symbol --biotype --gene_phenotype
> --variant_class  --fork 6 --force_overwrite --filter_common --dir_cache
> /opt/modules/i12g/ensembl-tools/85/cachedir -i
> /archive/sample_data/scn281/scn281pa_fa.recalibrated_variants.vcf.gz
> --vcf -o /archive/sample_data/scn281/scn281pa_fa.recalibrated_variants_vep.vcf.gz
> -v#----------------------------------#
> # ENSEMBL VARIANT EFFECT PREDICTOR #
> #----------------------------------#
>
> version 85
>
> By Will McLaren (wm2 at ebi.ac.uk)
>
> Configuration options:
>
> biotype            1
> buffer_size        40000
> cache              1
> ccds               1
> check_existing     1
> core_type          core
> dir                /opt/modules/i12g/ensembl-tools/85/cachedir
> dir_cache          /opt/modules/i12g/ensembl-tools/85/cachedir
> dir_plugins        /home/hollizeck/.vep/Plugins
> filter_common      1
> force_overwrite    1
> fork               6
> gene_phenotype     1
> gmaf               1
> host               ensembldb.ensembl.org
> input_file         /archive/sample_data/scn281/scn281pa_fa.recalibrated_
> variants.vcf.gz
> maf_1kg            1
> maf_esp            1
> maf_exac           1
> minimal            1
> numbers            1
> output_file        /archive/sample_data/scn281/scn281pa_fa.recalibrated_
> variants_vep.vcf.gz
> polyphen           b
> port               3337
> pubmed             1
> regulatory         1
> sift               b
> species            homo_sapiens
> stats              HASH(0x58ccbc0)
> symbol             1
> variant_class      1
> vcf                1
> verbose            1
>
> --------------------
>
> Will only load v85 databases
> Species 'homo_sapiens' loaded from database 'homo_sapiens_core_85_37'
> Species 'homo_sapiens' loaded from database 'homo_sapiens_cdna_85_37'
> Species 'homo_sapiens' loaded from database 'homo_sapiens_vega_85_37'
> Species 'homo_sapiens' loaded from database 'homo_sapiens_otherfeatures_
> 85_37'
> Species 'homo_sapiens' loaded from database 'homo_sapiens_rnaseq_85_37'
> homo_sapiens_variation_85_37 loaded
> homo_sapiens_funcgen_85_37 loaded
> No ancestral database found
> No ontology database found
> No taxonomy database found
> No ensembl_metadata database found
> No production database or adaptor found
> 2016-12-07 14:04:31 - Connected to core version 85 database and variation
> version 85 database
> 2016-12-07 14:04:31 - Read existing cache info
> 2016-12-07 14:04:32 - Auto-detected FASTA file in cache directory
>
> -------------------- EXCEPTION --------------------
> MSG: ERROR: Cannot index bgzipped FASTA file with Bio::DB::Fasta
>
> STACK Bio::EnsEMBL::Variation::Utils::FastaSequence::setup_fasta
> /opt/modules/i12g/ensembl-tools/85/modules/Bio/EnsEMBL/
> Variation/Utils/FastaSequence.pm:194
> STACK main::configure /opt/modules/i12g/ensembl-tools/85/bin/
> variant_effect_predictor.pl:835
> STACK toplevel /opt/modules/i12g/ensembl-tools/85/bin/variant_effect_
> predictor.pl:146
> Date (localtime)    = Wed Dec  7 14:04:32 2016
> Ensembl API version = 85
> ---------------------------------------------------
>
> This looks like the program does not select the Bio:DB:HTS module but it
> is present and also in the perl path
>
> [hollizeck at kkf0f6ee kleinPipeline]$ ll /opt/modules/i12g/ensembl-
> tools/85/modules/Bio/DB/HTS.pm
> -rw-rw-rw- 1 root root 77700 Sep 21 18:44 /opt/modules/i12g/ensembl-
> tools/85/modules/Bio/DB/HTS.pm
> [hollizeck at kkf0f6ee kleinPipeline]$ echo $PERL5LIB
> /opt/modules/i12g/perl/5.24.0/perl5lib/lib/perl5:/opt/
> modules/i12g/ensembl-tools/85/modules
>
> I do not know, if I even look at the right spot.
> Can you please help me to debug this further?
>
> Is there any way to specify which plugin to use?
>
>
> Sebastian Hollizeck
> Bioinformatics
>
> Dr. von Hauner Children's Hospital
> Kubus Research Center
> Lindwurmstraße 2a
> D-80337 München
> phone +49 89-4400-57488
> fax +49 89-4400-57979
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20161207/6069b85d/attachment.html>


More information about the Dev mailing list