[ensembl-dev] VEP 79 API problems

Healy, Matthew Matthew.Healy at bms.com
Thu Jun 4 17:19:08 BST 2015


In the regex documentation for Perl 5.8.8 there is no mention of \R (there is of course \r lowercase), so the Perl version probably is the issue:
http://perldoc.perl.org/5.8.8/perlre.html


From: dev-bounces at ensembl.org [mailto:dev-bounces at ensembl.org] On Behalf Of Will McLaren
Sent: 04 June, 2015 12:12 PM
To: Ensembl developers list
Subject: Re: [ensembl-dev] VEP 79 API problems

I'm wondering if 5.8.8 has different regex handling to newer Perl versions. Someone else in the Ensembl team may know better than me on this one.

I believe the Ensembl project now recommends at least 5.10 (according to http://www.ensembl.org/info/docs/api/api_installation.html at least); most people in the wild use 5.14 or 5.16 AFAIK.

If you can possibly try a newer version of Perl this may solve your issues. Perlbrew is a nice way to manage different versions and module sets http://perlbrew.pl/

Will

On 4 June 2015 at 17:02, Guillermo Marco Puche <guillermo.marco at sistemasgenomicos.com<mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
Hello Will,

Wrong behavior machine has Centos 5.4 and Perl v5.8.8 built for x86_64-linux-thread-multi.

So should I completly remove ?

 foreach my $line(split /\r|\R/) {

I was thinking about just removing \R from regex.

Regards,
Guillermo.

On 04/06/15 17:49, Will McLaren wrote:
Thanks

I had forgotten about that change. You could just edit the script and change or even remove the regexp:

foreach my $line(($_)) {

What's your Perl version and system architecture? I'm surprised this has not caught anyone else out.

Will

On 4 June 2015 at 14:47, Guillermo Marco Puche <guillermo.marco at sistemasgenomicos.com<mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
Hi Will,

I've been comparing variant_effect_predictor script from version 75 vs 79.
After adding a few prints to VEP.pm inside I've spotted the bug. However I cannot resolve it.

Those lanes are new from 75 to 79 in VEP script (175 and 176):

    # split again to avoid Windows character nonsense

    foreach my $line(split /\r|\R/) {


I've checked that script is spliting line each time it finds a capital R in VCF file as identifying it as a newline character from Windows. I can't reproduce it in virtual machine since its a fresh Linux install. In my work environment I'm getting this kind of bug, so I guess it has something to do with file enconding or locale? Has anyone else experienced this?

Now I know where's the error but I've no idea how to solve it.

Regards,
Guillermo.

On 04/06/15 15:16, Will McLaren wrote:

Sorry Guillermo, I'm running out of ideas.

Does the test unit run OK?

perl ensembl-tools/scripts/variant_effect_predictor/t/variant_effect_predictor.t

Will
On 4 Jun 2015 12:27, "Guillermo Marco Puche" <guillermo.marco at sistemasgenomicos.com<mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
Hi Will,

I'm getting the exact same error with example_GRCh37.vcf:

ERROR: Could not detect input file format

I've made a test script as you suggest with the following code and I don't get any error:

#!/usr/bin/env perl



use strict;

use Bio::EnsEMBL::Variation::Utils::VEP qw(detect_format);

Regards,
Guillermo.
On 04/06/15 13:12, Will McLaren wrote:

Hi again

If the script is not detecting the input format then it is almost certainly an issue with the input file. There's very little code that gets run to detect the format, and it's all internal to the VEP code.

You could write a short script to test the method, just import detect_format from Bio:EnsEMBL::Variation::Utils::VEP

Does it detect the example_GRCh37.vcf format correctly?

The file you shared on Dropbox works fine for me on my Mac and a Linux box.

Will
On 4 Jun 2015 10:44, "Guillermo Marco Puche" <guillermo.marco at sistemasgenomicos.com<mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
Hi again Will,

I'm trying with latest ensembl 80.
If I don't specify -format vcf I get the following error:

perl ensembl-tools/scripts/variant_effect_predictor/variant_effect_predictor.pl<http://variant_effect_predictor.pl> -i input.vcf -database --force_overwrite

2015-06-04 11:36:59 - Starting...

ERROR: Could not detect input file format

If I force format with -format vcf I get all the errors. (see error log attached). I'm using the same input.vcf file I posted yesterday.
Just to discard it's not VCF, I've installed a fresh linux on virtual machine and just cloned and setup Ensembl and Bioperl. On fresh Linux install I was only asked to install MySQL perl module (I installed it via CPAN).
It's working like a cake.

I discard there's a problem with the input VCF because I'm using exactly the input over the two environments (and the same one you used to test it yesterday)

So my question is: Does VEP script use any other library, environment variable or tool which may be interfering?

Best regards,
Guillermo.

On 04/06/15 09:32, Guillermo Marco Puche wrote:
Hi again Will,

I've completly cleaned PERL5LIB environment var. I've been testing changing between bioperl 1.2.3 and bioperl 1.6.1 and got same warnings/errors.
I've cloned again all 79 API like you suggested in a new tmp location and included it in $PERL5LIB.

echo $PERL5LIB
/share/apps/local/bioperl-live:/share/gluster/tests/gmarco/tmp/ensembl/modules:/share/gluster/tests/gmarco/tmp/ensembl-funcgen/modules:/share/gluster/tests/gmarco/tmp/ensembl-variation/modules

ll /share/gluster/tests/gmarco/tmp
total 20
drwxrwxr-x  8 gmarco users 4096 jun  4 08:44 ensembl
drwxrwxr-x  8 gmarco users  146 jun  4 08:46 ensembl-funcgen
drwxrwxr-x  5 gmarco users   64 jun  4 08:43 ensembl-tools
drwxrwxr-x 10 gmarco users 4096 jun  4 08:45 ensembl-variation

perl ensembl-tools/scripts/variant_effect_predictor/variant_effect_predictor.pl<http://variant_effect_predictor.pl> -i input.vcf -database --force_overwrite
2015-06-04 09:29:13 - Starting...
ERROR: Could not detect input file format

If use the following flags  -format vcf -vcf then I start getting all those errors (see yesterday log).

Is there any other Perl lib or requirement I could be missing? As I said it's very weird I have 0 problems with Ensembl 75 local API.

Best regards,
Guillermo.
On 03/06/15 18:14, Will McLaren wrote:
Hi again,

I can't recreate the problem with that input file I'm afraid, either on my normal setup or scrubbing PERL5LIB and starting from scratch.

See commands I used and input below.

Perhaps you haven't got release/79 of ensembl-tools too?

Have you tried running the installer from within ensembl-tools/scripts/variant_effect_predictor? This shouldn't affect your PERL5LIB or other git checkouts.

Will

===================

mkdir ~/src/tmp
cd ~/src/tmp
git clone --branch release/79 https://github.com/Ensembl/ensembl-tools.git
git clone --branch release/79 https://github.com/Ensembl/ensembl.git
git clone --branch release/79 https://github.com/Ensembl/ensembl-variation.git
git clone --branch release/79 https://github.com/Ensembl/ensembl-funcgen.git
export PERL5LIB=ensembl/modules:ensembl-variation/modules:ensembl-funcgen/modules:/Users/will/src/bioperl-1.2.3/:/Users/will/src/lib/perl5/
perl ensembl-tools/scripts/variant_effect_predictor/variant_effect_predictor.pl<http://variant_effect_predictor.pl>  -i ~/Downloads/input.vcf  -database
2015-06-03 17:09:54 - Starting...
2015-06-03 17:09:54 - Detected format of input file as vcf
2015-06-03 17:09:54 - Read 1 variants into buffer
2015-06-03 17:09:54 - Reading transcript data from cache and/or database
[================================================================================================================================]  [ 100% ]
2015-06-03 17:10:00 - Retrieved 7 transcripts (0 mem, 0 cached, 7 DB, 0 duplicates)
2015-06-03 17:10:00 - Analyzing chromosome 1
2015-06-03 17:10:00 - Analyzing variants
[================================================================================================================================]  [ 100% ]
2015-06-03 17:10:00 - Calculating consequences
2015-06-03 17:10:00 - Processed 1 total variants (0 vars/sec, 0 vars/sec total)
2015-06-03 17:10:00 - Wrote stats summary to variant_effect_output.txt_summary.html
2015-06-03 17:10:00 - Finished!




On 3 June 2015 at 16:51, Guillermo Marco Puche <guillermo.marco at sistemasgenomicos.com<mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
Hi Will,

I've been checking and I can't see any unintended whitespace or problem with tabulations.
I've no issues with old vep 75 script and API. I've updated the Bioperl lib in $PERL5LIB variable from 1.2.3 to 1.6.1 (I didn't see this change before sorry) however I'm still getting all those errors.

Here's a link where you can download the VCF I'm using as input: https://www.dropbox.com/sh/felwyoo5kl2mgty/AAC177Digqy-_mEmyk9WvmYba/input.vcf?dl=0

Thank you.

Best regards,
Guille.

On 03/06/15 17:30, Will McLaren wrote:
Hi Guille,

It looks to me like your input is not being parsed properly.

Check the formatting of your input VCF; double check that it is valid VCF, and that you haven't got any unintended whitespace on any of the lines.

If you still have an issue, can you send a line or two of the input that recreates these issues?

Thanks

Will McLaren
Ensembl Variation


On 3 June 2015 at 16:16, Guillermo Marco Puche <guillermo.marco at sistemasgenomicos.com<mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
Dear devs,

I'm trying ensembl 79 VEP.

This is my dummy input VCF: http://pastebin.com/kFKWH50q#<http://pastebin.com/kFKWH50q>

I've cloned and installed API from github as always (this step is repeated for variaton, funcgen and compara):
·         git clone --branch release/79 https://github.com/Ensembl/ensembl.git ensembl_79

PERL5LIB env variable is correctly pointing to the cloned API:
·         echo $PERL5LIB
/share/apps/local/bioperl-live:/share/apps/src/ensembl_79/modules:/share/apps/src/ensembl_79-compara/modules:/share/apps/src/ensembl_79-variation/modules:/share/apps/src/ensembl_79-functgenomics/modules

However I'm getting a lot of errors I really don't understand. It seems like a bug with API installation with me. If I change $PERL5LIB variable to point to 75 API (previous version I was using) I can't reproduce the errors VEP script works for this old 75 version.

I've been reading the docs again and I can't seen any additional PERL library requirement.

Here's the error log: http://pastebin.com/VvQrkEQZ

Thank you!

Best regards,
Guille.

_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/



_______________________________________________

Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>

Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev

Ensembl Blog: http://www.ensembl.info/


_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/



_______________________________________________

Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>

Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev

Ensembl Blog: http://www.ensembl.info/




_______________________________________________

Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>

Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev

Ensembl Blog: http://www.ensembl.info/


_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/


_______________________________________________

Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>

Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev

Ensembl Blog: http://www.ensembl.info/

_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/


_______________________________________________

Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>

Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev

Ensembl Blog: http://www.ensembl.info/


_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/



_______________________________________________

Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>

Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev

Ensembl Blog: http://www.ensembl.info/


_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/

________________________________
This message (including any attachments) may contain confidential, proprietary, privileged and/or private information. The information is intended to be for the use of the individual or entity designated above. If you are not the intended recipient of this message, please notify the sender immediately, and delete the message and any attachments. Any disclosure, reproduction, distribution or other use of this message or any attachments by an individual or entity other than the intended recipient is prohibited.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150604/b0bae89b/attachment.html>


More information about the Dev mailing list