[ensembl-dev] Problems installing VEP

Will McLaren wm2 at ebi.ac.uk
Fri Mar 11 10:11:45 GMT 2016


Hi Josh,

This isn't the first time this has been reported. I have committed fixes
for it to release/83, so they should be in release/84 too, so I'm surprised
this is still an issue.

Basically, it happens for users that have the Sereal perl module installed.
The info.txt file that you refer to gets modified by the convert_cache.pl
script (the chr column gets added, which is what tabix uses to try and
index it). However, it should be reverted to it's original state after the
tests, and this works fine for me every time I test it.

Line 42 of convert_cache.t makes a backup of info.txt to info.txt.bak, then
the backup is reinstated on lines 98 an 99. This then happens again on line
144 and 145.

Can you check if you have info.txt.bak in
[install_dir]/t/testdata/vep-cache/homo_sapiens/78_GRCh38/, and if so do:

mv  [install_dir]/t/testdata/vep-cache/homo_sapiens/78_GRCh38/info.txt.bak
 [install_dir]/t/testdata/vep-cache/homo_sapiens/78_GRCh38/info.txt

Then try running the test script manually from the VEP directory (here's my
output after a fresh install):

> perl t/convert_cache.t
ok 1 - script exists
ok 2 - run script
ok 3 - info.txt - var_type tabix
ok 4 - info.txt - variation_cols
ok 5 - info.txt - first col chr
ok 6 - all_vars.gz
ok 7 - all_vars.gz.tbi
ok 8 - all_vars.gz - column number
ok 9 - all_vars.gz - start int
ok 10 - run script with --sereal
ok 11 - info.txt - serialiser_type sereal
ok 12 - transcript file exists
ok 13 - regfeat file exists
ok 14 - parsed transcript file
ok 15 - transcript hash index
ok 16 - transcript isa
ok 17 - transcript stable_id
ok 18 - parsed regfeat file
ok 19 - regfeat hash index 1
ok 20 - regfeat hash index 2
ok 21 - regfeat hash index 3
ok 22 - regfeat isa
ok 23 - regfeat stable_id
1..23

Thanks for your help with this

Will McLaren
Ensembl Variation

On 10 March 2016 at 17:40, Joshua Randall <joshua.randall at sanger.ac.uk>
wrote:

> I’ve been struggling to install VEP (originally started trying with
> release 84 but now have reverted to trying release 83), but the tests are
> failing. This happens with both GRCh37 and GRCh38.
>
> I’ve dug in a bit deeper to what is going wrong in the current attempt,
> which is release 83 with GRCh38, and the problem seems to be with the tests
> of convert_cache.
>
> What seems to happen is that the first time the convert_cache.pl script
> is called by convert_cache.t, it successfully runs bgzip on the all_vars
> file in the cache and then successfully indexes it with tabix. However, the
> next time it is called, it fails to tabix index it.
>
> This appears to be because on the first invocation, it is calling tabix
> like this:
> ```
> tabix -s 1 -b 5 -e 5
> /software/hgi/pkglocal/ensembl-vep-release-83-GRCh38/lib/perl5//t/testdata//vep-cache//homo_sapiens/78_GRCh38/21/all_vars.gz
> ```
>
> Whereas on all subsequent invocations, it calls tabix like this:
> ```
> tabix -s 1 -b 6 -e 6
> /software/hgi/pkglocal/ensembl-vep-release-83-GRCh38/lib/perl5//t/testdata//vep-cache//homo_sapiens/78_GRCh38/21/all_vars.gz
> ```
>
> This causes an error message like this:
> ```
> ERROR: tabix failed
> [E::get_intv] failed to parse TBX_GENERIC, was wrong -p [type] used?
> The offending line was: "21     rs753123870     .       .       25973491
>       .       G/C     .       .       .       .       .       .       .
>    .       .       .       .       .       .       .       .       .
>  .       .       .       .       .  .
> "
> ```
>
> As you can see, the “offending line” (which is the first line of all_vars)
> does indeed have position in the 5th column as opposed to the 6th.
> Something that the first invocation of convert_cache.pl is doing is
> changing the configuration such that it tries to index the wrong column.
>
>
> I’ve checked the info.txt in the cache before and after running
> convert_cache.pl for the first time, and the differences are:
>
> ```
> # diff -y -W 200 --suppress-common-lines fresh-info.txt.sort
> broken-info.txt.sort
> # CACHE UPDATED 2014-11-20 10:08:53
>                         |    # CACHE UPDATED 2016-03-10 17:16:31
> regulatory      1
>                         <
> source_gencode  GENCODE 22
>                          |    source_gencode  GENCODE
> variation_cols
> variation_name,failed,somatic,start,end,allele_string,strand,minor_allele,minor_
>  |    variation_cols
> chr,variation_name,failed,somatic,start,end,allele_string,strand,minor_allele,mi
> >  var_type        tabix
> ```
>
>
> I’ve tracked this down to:
> https://github.com/Ensembl/ensembl-tools/blob/release/84/scripts/variant_effect_predictor/convert_cache.pl#L283
>
> Why does this line exist (forcing the first column to be “chr”, which
> seems to break when the “chr” has actually been called “variation_name”)
> and why is it only a problem for me?
>
> I am also confused why release 83 is using test cache data from release 78
> - is that normal, or might that be part of the problem?
>
> Perhaps it is useful to note the installer command line I am running:
> ```
> # perl scripts/variant_effect_predictor/INSTALL.pl --DESTDIR
> /software/hgi/pkglocal/ensembl-vep-release-83-GRCh38/lib/perl5 --CACHEDIR
> /software/hgi/resources/ensembl/vep --VERSION 83 --AUTO acfp --SPECIES
> homo_sapiens_merged --ASSEMBLY GRCh38 --PLUGINS all
> ```
>
> Cheers,
>
> Josh.
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160311/caac143f/attachment.html>


More information about the Dev mailing list