[ensembl-dev] VEP installer script fails to download homo_sapiens_refseq_vep_73.tar.gz

Will McLaren wm2 at ebi.ac.uk
Fri Nov 8 09:36:54 GMT 2013


Hi Chris,

Thanks for your mail - answers in line below.

On 7 November 2013 16:47, Chris Boustred <cboustred at gmail.com> wrote:

>  Hi,
>
> I am using the VEP installer script to download and unpack caches to use
> with the VEP script.
>
> I would like to use the human refseq cache, to get NM_ transcript IDs, as
> this is what my colleagues would like reported in their output.
>
> When prompted which cache to download, if I choose '25 :
> homo_sapiens_refseq_vep_73.tar.gz' it is downloaded - put into a tmp folder
> within ~/.vep, however it looks as if it fails to unpack as the resulting
> cache folder (homo_sapiens) is empty?
>

There's a bug with the installer when you select the refseq cache - I'm
working on fixing it for the next VEP release.


>
> If I choose '26 : homo_sapiens_vep_73.tar.gz' the unpacked 'homo_sapiens'
> folder contains all the cache information.
>
> I therefore downloaded the cache files directly from
> ftp://ftp.ensembl.org/pub/release-73/variation/VEP/ however when I unpack
> them both they are both named 'homo_sapiens'. I believe in the past the
> refseq cache had a different name e.g. homo_sapiens_refseq ? I am using
> --dir_cache to get around this.
>

Both have always been called just homo_sapiens; it's not ideal. The
original intention was that users would choose one or the other, so there
wouldn't be conflict. However, there are several users who use both. I'll
try and come up with a better solution.


>
> Finally, when running the VEP script with the refseq cache and using the
> --symbol flag I was getting the error:
>
> Can't call method "display_xref" on an undefined value at
> /home/chris/VEP/variant_effect_predictor/Bio/EnsEMBL/Variation/Utils/VEP.pm
> line 1997.
>

The refseq cache does not contain gene symbols unfortunately. I will update
the script to indicate as such.

Cheers

Will McLaren
Ensembl Variation


>
> And the process hangs.
>
> If I run with the --refseq flag I no longer get the error but the output
> of --symbol is not populated i.e. the gene HGNC symbol.
>
> I don't any get errors if I use the ensembl vep cache...
>
> Here are the three commands I am running:
>
> 1. Using ref seq cache without --refseq flag (throws the
> "/VEP/variant_effect_predictor/Bio/EnsEMBL/Variation/Utils/VEP.pm line
> 1997" error
>
> perl $VEP/variant_effect_predictor.pl \
> -fork 4 \
> --buffer_size 10000 \
> --cache \
> --dir_cache /home/chris/.vep/Refseq \
> --dir_plugins /home/chris/.vep/Plugins \
> --fasta
> /home/chris/.vep/EnsemblRef/Homo_sapiens.GRCh37.73.dna.primary_assembly.fa \
> --input_file $inputVCF \
> --output_file $outputVCF \
> --sift b  \
> --polyphen b  \
> --allele_number \
> --numbers \
> --domains \
> --HGVS \
> --protein \
> --symbol \
> --ccds \
> --canonical \
> --biotype \
> --check_alleles \
> --gmaf \
> --maf_1kg \
> --maf_esp \
> --pubmed \
> --vcf \
> --force_overwrite \
> --plugin FATHMM,"python ~/Reference_sequences/Variants/FATHMM/fathmm.py"
>
>
> 2. As above but with --refseq flag - works without an error but HGNC
> (--symbol) is not populated?
>
> perl $VEP/variant_effect_predictor.pl \
> -fork 4 \
> --buffer_size 10000 \
> --cache \
> --dir_cache /home/chris/.vep/Refseq \
> --dir_plugins /home/chris/.vep/Plugins \
> --fasta
> /home/chris/.vep/EnsemblRef/Homo_sapiens.GRCh37.73.dna.primary_assembly.fa \
> --input_file $inputVCF \
> --output_file $outputVCF \
> --sift b  \
> --polyphen b  \
> --allele_number \
> --numbers \
> --domains \
> --HGVS \
> --protein \
> --symbol \
> --ccds \
> --canonical \
> --biotype \
> --check_alleles \
> --gmaf \
> --maf_1kg \
> --maf_esp \
> --pubmed \
> --vcf \
> --refseq \
> --force_overwrite \
> --plugin FATHMM,"python ~/Reference_sequences/Variants/FATHMM/fathmm.py"
>
> 3. Using ensembl cache - works but no ref seq trasncript IDs!
>
> perl $VEP/variant_effect_predictor.pl \
> -fork 4 \
> --buffer_size 10000 \
> --cache \
> --dir_cache /home/chris/.vep/ \
> --dir_plugins /home/chris/.vep/Plugins \
> --fasta
> /home/chris/.vep/EnsemblRef/Homo_sapiens.GRCh37.73.dna.primary_assembly.fa \
> --input_file $inputVCF \
> --output_file $outputVCF \
> --sift b  \
> --polyphen b  \
> --allele_number \
> --numbers \
> --domains \
> --HGVS \
> --protein \
> --symbol \
> --ccds \
> --canonical \
> --biotype \
> --check_alleles \
> --gmaf \
> --maf_1kg \
> --maf_esp \
> --pubmed \
> --vcf \
> --refseq \
> --force_overwrite \
> --plugin FATHMM,"python ~/Reference_sequences/Variants/FATHMM/fathmm.py"
>
> Any help with the above would be much appreciated!
>
> Thanks
>
> Chris
>
>
>
> --
>
> *Chris Boustred*
> Laboratory Bioinformatician
> Regional Molecular Genetics
> Great Ormond Street for Children NHS Foundation Trust
> Level 6, York House
> 37 Queen Square
> London
> WC1N 3BH
> christopher.boustred at gosh.nhs.uk
> cboustred at gmail.com
> Phone: 020 7762 6874
> Fax: 020 7813 8196
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20131108/087fb694/attachment.html>


More information about the Dev mailing list