[ensembl-dev] Variant Effect Predictor --gene option not working

Sebastian Ginzel sginze2s at inf.h-brs.de
Wed Jan 9 13:14:31 GMT 2013


Hello Will,

thanks for your quick response.

I would want to stay on version 66, because our NGS pipeline is built 
around this release and I would like to keep all our results as 
consistent as possible.

But now back to my problem: I tried what you said and set up two 
installations for VEP with ensembl 66 and for ensembl 69 (according to 
the link you provided), both times using the install script to also 
setup the corresponding API and cache version. First of all I thought I 
dowloaded VEP version 2.4 as stated by the link description on the 
website and the README file inside the tar.gz. But I looked into the 
source code and help screen and the VEP script is actually version 2.6. 
This is kind of confusing, because if I understood you correctly each 
VEP version should best be used with a certain API version.

I only saw ensembl gene IDs when I used Ensembl version 69 API, but 
never with Ensembl API version 66. The VEP version did not matter and 
the --gene option wasn't available in any of the two VEP versions.

I also noticed that, when using version 66 (with VEP 2.7 and VEP2.6) I 
get an error message that says
     Can't use string ("21    26960070    rs116645811    G    A .    
.    "...) as a SCALAR ref while "strict refs" in use at 
variant_effect_predictor.pl line 1550

When I change this line in the source from
     $output = $$line;
to
     $output = $line;
it works, but then it doesn't work for API version 69 anymore. Maybe 
this is a bit off topic though.

So I think now I have two questions regarding the Gene Ids:

1) How can I download the actual VEP 2.4 script to try out what you 
suggested? The link on the official website only lets me download VEP2.6.
2) To me it seems that the API version is the cause for my problem, 
because I get Gene IDs when I use ensembl 69 with any of the two VEP 
versions. Is there any explaination for this?

And a third question about the cache: What is included in the cache and 
where can I find information on how to use the local database to add 
custom annotations? I would really like to use the local cache only, but 
I always used our local database because I figured that you couldn't 
have possibly stored all information for all possible variants in a 
single 2GB cache file, or could you?


Best wishes & sorry for all the text and all the questions :)
Sebastian

On 08.01.2013 14:17, Will McLaren wrote:
> Hello Sebastien,
>
> Thanks for the detailed report.
>
> I think the problem may be caused by using an older version of the 
> API. The latest version of the script (2.7) should be used with 
> version 69 of the Ensembl API.
>
> If you need to use version 66 of the API (for example if you are 
> unable to upgrade the database you are using), you should use the 
> appropriate script version with this. You can see which versions go 
> together here:
>
> http://www.ensembl.org/info/docs/variation/vep/vep_script.html#download
>
> As an aside, you may find that the latest version of the VEP gives you 
> everything you need in the cache files available from us without 
> having to use a local database. However, of course if you are using a 
> local database for custom annotations, you should continue to do so.
>
> Regards
>
> Will McLaren
> Ensembl Variation
>
>
> On 8 January 2013 11:59, Sebastian Ginzel <sginze2s at inf.h-brs.de 
> <mailto:sginze2s at inf.h-brs.de>> wrote:
>
>     Dear Ensembl-Developer Team,
>
>     I want to use the Variant Effect Predictor standalone script to
>     annotate my VCF file for further processing and I need the
>     variants to have Ensembl Gene IDs.
>
>     Unfortunatly the --gene option is not working and results in this
>     error message:
>
>     Unknown option: gene
>     ERROR: Failed to parse command-line flags
>
>     Without the --gene option everything runs through perfectly, but
>     no Ensembl Gene IDs show up in the output although the
>     documentation avaiable at
>     http://www.ensembl.org/info/docs/variation/vep/vep_script.html#output
>     suggests that the output of ENSG IDs is forced when using the
>     --cache option (which I also use). A quick check of the source
>     code of the variant_effect_predictor.pl
>     <http://variant_effect_predictor.pl> script showed me, that the
>     --gene option seems not to be implemented anymore.
>
>     I saw that there was somebody mentioning the removal of the --gene
>     option in the mailing list archives following a thread that
>     started on 5th December 2012 09:12:48. But it doesn't mention
>     anything like my problem.
>
>     That leaves me with two questions:
>
>     1) What happend to the --gene option and can anyone reproduce this?
>     2) How can I force the population of the Ensembl Gene ID column
>     when --cache is also not working?
>
>
>     Best wishes,
>     Sebastian Ginzel
>
>     PS: Here is what I did to setup VEP on my Ubuntu 12.04 system with
>     perl v5.14.2.
>
>     I downloaded and setup the latest VEP version 2.7
>     (http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl-tools/scripts/variant_effect_predictor.tar.gz?view=tar&root=ensembl&pathrev=branch-ensembl-69
>     - MD5Sum ab780dcb0267e5872f85ebe2ff4837f5)
>
>     "perl variant_effect_predictor.pls --help" shows me that I
>     actually use the 2.7 version.
>
>     I also downloaded some plugins through GIT using:
>     git clone "https://github.com/ensembl-variation/VEP_plugins"
>
>     I used this command line to call the script:
>
>     perl
>     /home/sginze2s/vep/lib/vep/bin/variant_effect_predictor/variant_effect_predictor.pl
>     <http://variant_effect_predictor.pl> -i
>     /home/sginze2s/vep/lib/vep/sample1.vcf -o /tmp/bla.vcf --cache
>     --dir /home/sginze2s/vep/lib/vep/bin/cache --prefetch
>     --no_adaptor_cache --write_cache --strip  --everything --gmaf
>     --xref_refseq --failed 1  --fork 4 --vcf --format vcf
>     --no_progress --check_existing --check_svs --plugin
>     Condel,/home/sginze2s/vep/lib/vep/bin/cache/Plugins/config/Condel/config
>     --plugin Blosum62 --plugin Downstream --species homo_sapiens
>     --db_version=66 --host bio.inf.h-brs.de <http://bio.inf.h-brs.de>
>     --user ensembl --password ******* --port 13306 --force_overwrite
>     --quiet --gene
>
>     I use Ensembl API version 66 and use the PERL5LIB variable to link
>     to it.
>     PERL5LIB=/lib/ensembl_66/ensembl-functgenomics/modules:/lib/ensembl_66/ensembl-variation/modules:/lib/ensembl_66/ensembl-compara/modules:/lib/ensembl_66/ensembl/modules:/lib/bioperl-1.2.3:/lib/bioperl-1.5.2_102_Matrix
>
>
>
>     _______________________________________________
>     Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>     Posting guidelines and subscribe/unsubscribe info:
>     http://lists.ensembl.org/mailman/listinfo/dev
>     Ensembl Blog: http://www.ensembl.info/
>
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130109/54d616cd/attachment.html>


More information about the Dev mailing list