[ensembl-dev] Problems installing VEP

Joshua Randall joshua.randall at sanger.ac.uk
Thu Mar 10 17:40:59 GMT 2016

I’ve been struggling to install VEP (originally started trying with release 84 but now have reverted to trying release 83), but the tests are failing. This happens with both GRCh37 and GRCh38. 

I’ve dug in a bit deeper to what is going wrong in the current attempt, which is release 83 with GRCh38, and the problem seems to be with the tests of convert_cache. 

What seems to happen is that the first time the convert_cache.pl script is called by convert_cache.t, it successfully runs bgzip on the all_vars file in the cache and then successfully indexes it with tabix. However, the next time it is called, it fails to tabix index it. 

This appears to be because on the first invocation, it is calling tabix like this:
tabix -s 1 -b 5 -e 5 /software/hgi/pkglocal/ensembl-vep-release-83-GRCh38/lib/perl5//t/testdata//vep-cache//homo_sapiens/78_GRCh38/21/all_vars.gz

Whereas on all subsequent invocations, it calls tabix like this:
tabix -s 1 -b 6 -e 6 /software/hgi/pkglocal/ensembl-vep-release-83-GRCh38/lib/perl5//t/testdata//vep-cache//homo_sapiens/78_GRCh38/21/all_vars.gz

This causes an error message like this:
ERROR: tabix failed
[E::get_intv] failed to parse TBX_GENERIC, was wrong -p [type] used?
The offending line was: "21     rs753123870     .       .       25973491        .       G/C     .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .  .

As you can see, the “offending line” (which is the first line of all_vars) does indeed have position in the 5th column as opposed to the 6th. Something that the first invocation of convert_cache.pl is doing is changing the configuration such that it tries to index the wrong column.

I’ve checked the info.txt in the cache before and after running convert_cache.pl for the first time, and the differences are:

# diff -y -W 200 --suppress-common-lines fresh-info.txt.sort broken-info.txt.sort
# CACHE UPDATED 2014-11-20 10:08:53                                                                |    # CACHE UPDATED 2016-03-10 17:16:31
regulatory      1                                                                                  <
source_gencode  GENCODE 22                                                                         |    source_gencode  GENCODE
variation_cols  variation_name,failed,somatic,start,end,allele_string,strand,minor_allele,minor_   |    variation_cols  chr,variation_name,failed,somatic,start,end,allele_string,strand,minor_allele,mi
>  var_type        tabix

I’ve tracked this down to: https://github.com/Ensembl/ensembl-tools/blob/release/84/scripts/variant_effect_predictor/convert_cache.pl#L283

Why does this line exist (forcing the first column to be “chr”, which seems to break when the “chr” has actually been called “variation_name”) and why is it only a problem for me? 

I am also confused why release 83 is using test cache data from release 78 - is that normal, or might that be part of the problem? 

Perhaps it is useful to note the installer command line I am running:
# perl scripts/variant_effect_predictor/INSTALL.pl --DESTDIR /software/hgi/pkglocal/ensembl-vep-release-83-GRCh38/lib/perl5 --CACHEDIR /software/hgi/resources/ensembl/vep --VERSION 83 --AUTO acfp --SPECIES homo_sapiens_merged --ASSEMBLY GRCh38 --PLUGINS all



More information about the Dev mailing list