[ensembl-dev] VEP installation problems - unable to install GRCh37 caches

Paul Hatton P.S.HATTON at bham.ac.uk
Tue Apr 26 17:09:32 BST 2016


Cyriac,

I have to send abject apologies - I had omitted the

cat $VEP_DATA/*_vep_84_GRC{h37,h38,m38}.tar.gz | tar -izxf - -C $VEP_DATA

command, and all is fine now. I had been looking at the instructions for so long that my eyes glazed over. It worked fine with the combined

convert_cache.pl --species homo_sapiens,mus_musculus --version 84_GRCh37,84_GRCh38,84_GRCm38 --dir $VEP_DATA

command, and the example program on the gist ran absolutely fine.

Many thanks for pointing me to the gist and apologies again for my error at the final hurdle.

Regards

--
Paul Hatton
High Performance Computing and Visualisation Specialist
IT Services, The University of Birmingham
Ph: 0121-414-3994  Mob: 07785-977340  Skype: P.S.Hatton
[Service Manager, Birmingham Environment for Academic Research]
[    http://www.birmingham.ac.uk/bear ]
[Also Technical Director, IBM Visual and Spatial Technology Centre]

From: Cyriac Kandoth [mailto:kandoth at cbio.mskcc.org]
Sent: 25 April 2016 23:22
To: Paul Hatton
Cc: Ensembl developers list
Subject: Re: [ensembl-dev] VEP installation problems - unable to install GRCh37 caches

I'm unable to reproduce that error. But looking at the code, it appears to happen if you're missing the "info.txt" file in a cache folder. Check for them like this:

$ ll -h $VEP_DATA/{homo_sapiens,mus_musculus}/*/info.txt
-rw-r--r-- 1 kandoth pwgmgr  786 Feb 26 16:03 /opt/common/CentOS_6-dev/vep/v84/homo_sapiens/84_GRCh37/info.txt
-rw-r--r-- 1 kandoth pwgmgr 1.4K Feb 29 12:31 /opt/common/CentOS_6-dev/vep/v84/homo_sapiens/84_GRCh38/info.txt
-rw-r--r-- 1 kandoth pwgmgr  533 Feb 23 10:14 /opt/common/CentOS_6-dev/vep/v84/mus_musculus/84_GRCm38/info.txt

If your info.txt files are missing, then one of the steps before "convert_cache.pl<http://convert_cache.pl>" was skipped. E.g. make sure you didn't forget to untar those cache tarballs after rsync-ing them.

~Cyriac

On Mon, Apr 25, 2016 at 1:01 PM, Paul Hatton <P.S.HATTON at bham.ac.uk<mailto:P.S.HATTON at bham.ac.uk>> wrote:
Afraid not:

[vep 17:57] $ perl ./convert_cache.pl<http://convert_cache.pl> --species mus_musculus --version 84_GRCm38 --dir $VEP_DATA
2016-04-25 17:58:10 - Processing mus_musculus
2016-04-25 17:58:10 - Processing version 84_GRCm38
Can't use an undefined value as an ARRAY reference at ./convert_cache.pl<http://convert_cache.pl> line 188.

Not sure if this helps:

[vep 18:00] $ ll -h $VEP_DATA
total 13G
-rw-r--r-- 1 appmaint appmaint 1.7G Apr 24 17:32 ExAC.r0.3.sites.minus_somatic.vcf.gz
-rw-r--r-- 1 appmaint appmaint 800K Apr 24 17:35 ExAC.r0.3.sites.minus_somatic.vcf.gz.tbi
drwxr-xr-x 4 appmaint appmaint  512 Apr 24 17:03 homo_sapiens
-r-x------ 1 appmaint appmaint 4.8G Apr 24 16:42 homo_sapiens_vep_84_GRCh37.tar.gz
-r-x------ 1 appmaint appmaint 4.8G Apr 24 16:42 homo_sapiens_vep_84_GRCh38.tar.gz
drwxr-xr-x 3 appmaint appmaint  512 Apr 24 17:17 mus_musculus
-r-x------ 1 appmaint appmaint 1.4G Apr 24 16:42 mus_musculus_vep_84_GRCm38.tar.gz
drwxr-xr-x 2 appmaint appmaint  512 Apr 24 16:59 Plugins

Regards

--
Paul Hatton
High Performance Computing and Visualisation Specialist
IT Services, The University of Birmingham
Ph: 0121-414-3994  Mob:07785 977340  Skype:P.S.Hatton
[Service Manager, Birmingham Environment for Academic Research]
[Also Technical Director, IBM Visual and Spatial Technology Centre]

From: Cyriac Kandoth [mailto:kandoth at cbio.mskcc.org<mailto:kandoth at cbio.mskcc.org>]
Sent: 25 April 2016 16:34
To: Paul Hatton
Cc: Ensembl developers list

Subject: Re: [ensembl-dev] VEP installation problems - unable to install GRCh37 caches

Try separating out the mouse caches from the human caches...

perl convert_cache.pl<http://convert_cache.pl> --species mus_musculus --version 84_GRCm38 --dir $VEP_DATA
perl convert_cache.pl<http://convert_cache.pl> --species homo_sapiens --version 84_GRCh37,84_GRCh38 --dir $VEP_DATA

If that works, lemme know, I'll update the gist.

~Cyriac

On Sun, Apr 24, 2016 at 12:40 PM, Paul Hatton <P.S.HATTON at bham.ac.uk<mailto:P.S.HATTON at bham.ac.uk>> wrote:
When I follow the gist it is fine apart from:

[variant_effect_predictor 17:28] $ convert_cache.pl<http://convert_cache.pl> --species homo_sapiens,mus_musculus --version 84_GRCh37,84_GRCh38,84_GRCm38 --dir $VEP_DATA
2016-04-24 17:29:18 - Processing homo_sapiens
2016-04-24 17:29:18 - Processing version 84_GRCh38
Can't use an undefined value as an ARRAY reference at ./convert_cache.pl<http://convert_cache.pl> line 188.

Does this look at all familiar? Maybe an error on my part but I have followed the gist closely.

This is using perl 5.20, which is the version that has been recommended to me for vcf2maf, in case that is relevant:

[variant_effect_predictor 17:35] $ which perl
/gpfs/apps/perl/v5.20.0_gcc-v4.7.2/bin/perl

Many thanks (again)

--
Paul Hatton
High Performance Computing and Visualisation Specialist
IT Services, The University of Birmingham
Ph: 0121-414-3994  Mob:07785 977340  Skype:P.S.Hatton
[Service Manager, Birmingham Environment for Academic Research]
[Also Technical Director, IBM Visual and Spatial Technology Centre]

From: Cyriac Kandoth [mailto:kandoth at cbio.mskcc.org<mailto:kandoth at cbio.mskcc.org>]
Sent: 19 April 2016 21:59
To: Ensembl developers list
Cc: Paul Hatton
Subject: Re: [ensembl-dev] VEP installation problems - unable to install GRCh37 caches

I forgot to mention - the order of instructions in that gist is specifically addressing the error you reported - "For technical reasons this installer is unable to install GRCh37 caches alongside others; please install them separately"

Also, I'd recommend against installing all plugins, it can get messy. Only the ExAC plugin is currently used by vcf2maf. In the future, it may use more. Here is a list of all the available plugins:
https://github.com/Ensembl/VEP_plugins

~C

On Tue, Apr 19, 2016 at 4:18 PM, Cyriac Kandoth <kandoth at cbio.mskcc.org<mailto:kandoth at cbio.mskcc.org>> wrote:
Hi Paul,

It is appropriate to post such a qn to dev at ensembl. If you have an issue specifically with vcf2maf, we (mskcc) can help you at https://github.com/mskcc/vcf2maf/issues

The latest readme for vcf2maf includes instructions for installing VEP v83 - https://github.com/mskcc/vcf2maf - I will remove these instructions in the next few days in favor of gists. Here's a gist for installing VEP v84 with an offline cache for GRCh37 - https://gist.github.com/ckandoth/57d189f018b448774704d3b2191720a6

~Cyriac

On Tue, Apr 19, 2016 at 4:07 AM, Paul Hatton <P.S.HATTON at bham.ac.uk<mailto:P.S.HATTON at bham.ac.uk>> wrote:
Apologies if this is the wrong list to post this to, but a search for this problem led me to this list and I can't find any mention of it in the archives (which sort-of suggests that I'm on the wrong list).

I look after the applications base on our Linux-based HPC service at the University of Birmingham (UK) and we have recently established a new Centre for Computational Biology and hence I am asked to install much specialist software such as VEP. I have a great deal of experience in installing applications on a Linux HPC service but limited experience of these specialist applications, so apologies again if this is a naive question posted to the wrong list .... feel free to point me elsewhere if more appropriate ......

Anyhow, I have been asked to get vcf2maf running, which depends on VEP. I have been unable to get a clean installation of VEP 82, 83 or 84 which I think is having knock-on problems to users running vcf2maf, and so I'd like to get VEP installed cleanly first. Whilst VEP itself build fine with

cd ensembl-tools-release-84/scripts/variant_effect_predictor
export PERL5LIB=/gpfs/apps/VEP/84:$PERL5LIB
export PATH=/gpfs/apps/VEP/84/htslib:$PATH
perl INSTALL.pl --DESTDIR /gpfs/apps/VEP/84 --CACHEDIR /gpfs/apps/VEP/84/cache --PLUGINS all

and installs VEP as expected, when I ask it to download the cache files it fails at the end with

 - downloading ftp://ftp.ensembl.org/pub/release-84/variation/VEP/xiphophorus_maculatus_vep_84_Xipmac4.4.2.tar.gz
- unpacking xiphophorus_maculatus_vep_84_Xipmac4.4.2.tar.gz
ERROR: For technical reasons this installer is unable to install GRCh37 caches alongside others; please install them separately

and I can't find any help as to what I should do next. For example, though, /gpfs/apps/VEP/84/cache/homo_sapiens has directories 84_GRCh37 and 84_GRCh38 which seem fully populated, though, so can this be safely ignored?

If I then repeat the installation asking not to install any cache files and ask it to install all of the FASTA files the third one fails with

ERROR: Could not change directory to dna

and the installer then terminates, rather than trying the next one. I think this comes from lines 737 to 739 in INSTALL.pl:

 foreach my $sub(split /\//, $3) {
      $ftp->cwd($sub) or die "ERROR: Could not change directory to $sub\n$@\n";
    }

and suggests that there are some missing directories on the download site for these files. Is this the case and, if so, is there any way around this apart from rerunning the installer for each of the 70 options one-by-one, which would take quite a while?

Apologies again if this is the wrong list and/or these are simplistic questions. Any help is much appreciated.

Regards

--
Paul Hatton
High Performance Computing and Visualisation Specialist
IT Services, The University of Birmingham
Ph: 0121-414-3994  Mob:07785 977340  Skype:P.S.Hatton
[Service Manager, Birmingham Environment for Academic Research]
[Also Technical Director, IBM Visual and Spatial Technology Centre]



_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160426/eaef0b1a/attachment.html>


More information about the Dev mailing list