[ensembl-dev] VEP "Fork of Death" possible interaction between --hgvs/--check_ref

Stuart Watt Stuart.Watt at oicr.on.ca
Thu Jun 27 15:39:50 BST 2013


Hi all

I found an interesting corner case for another VEP "fork of death" problem. I managed to replicate it fairly successfully, but haven't had a chance to pin it down.

The scenario is: I'm using a subset of COSMIC in a fairly large (700k variants) input file. I'm using both --hgvs and --check_ref, and they seem to interact. Using --hgvs alone works. Using both in non-fork mode I get sometimes got errors about reference alleles and variant alleles being the same. In fork mode the same parameters throw an error, possibly as the errors leak into data communications. What is odd is that these errors went away when --check_ref was not used, and that when I checked the lines reporting errors, the alleles were in fact different. It is as if something accessed or revealed by --check_ref is being modified behind the scenes, triggering the HGVS errors.

  *   Succeeds: perl -X /Users/swatt/ensembl/ensembl-tools/scripts/variant_effect_predictor/variant_effect_predictor.pl --format ensembl --offline --no_progress --canonical --check_existing --force_overwrite --numbers --buffer_size 5000 --sift b --polyphen b --compress gzcat --fasta /Users/swatt/fasta --input_file vep1.txt --output_file vep1.vep_output --hgvs --fork 8
  *   Fails (Fork of Death): perl -X /Users/swatt/ensembl/ensembl-tools/scripts/variant_effect_predictor/variant_effect_predictor.pl --format ensembl --offline --no_progress --canonical --check_existing --force_overwrite --numbers --buffer_size 5000 --sift b --polyphen b --compress gzcat --fasta /Users/swatt/fasta --input_file vep1.txt --output_file vep1.vep_output --hgvs --fork 8 --check_ref
  *   Succeeds (probably, eventually, but it takes a very long time): perl -X /Users/swatt/ensembl/ensembl-tools/scripts/variant_effect_predictor/variant_effect_predictor.pl --format ensembl --offline --no_progress --canonical --check_existing --force_overwrite --numbers --buffer_size 5000 --sift b --polyphen b --compress gzcat --fasta /Users/swatt/fasta --input_file vep1.txt --output_file vep1.vep_output --hgvs --check_ref

The only options that seem to determine these behaviours are --fork, --check_ref, and --hgvs, but I haven't exhaustively factored out the other options.

Despite this, I can now complete the full file in fork mode on OSX, and that saves me many hours. So thanks for all the great work on these issues so far.

I'm happy to provide data files and other info if required. If anybody has any tips in debugging this, I may get time to peek further into the perl code and figure out where things go funny.

Configuration info:
OSX - Lion
perl - 5.16.1 (not system perl, built using perlbrew)
Ensembl/VEP version 72

All the best

Stuart

--

Stuart Watt, PhD

Scientific Associate

Ontario Institute for Cancer Research

MaRS Centre, South Tower
101 College Street, Suite 800
Toronto, Ontario, Canada M5G 0A3

Toll-free: 1-866-678-6427
www.oicr.on.ca<http://www.oicr.on.ca>


This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130627/d33d88de/attachment.html>


More information about the Dev mailing list