[ensembl-dev] VEP 79 API problems
Guillermo Marco Puche
guillermo.marco at sistemasgenomicos.com
Fri Jun 5 11:36:24 BST 2015
Hello Will,
Thank you for the info Matthew.
I'm still getting lines split on my work environment.
1. *foreach my $line(split /\r|(?>\v|\x0D\x0A)/) {...}*
variant_effect_predictor.pl -database -i input.vcf -o test.vcf
--force_overwrite
2015-06-05 12:26:14 - Starting...
*4.2* ("##fileformat=VCFv4.2 getting" split into "*4.2*" I
putted a print in VEP.pm)
2015-06-05 12:26:14 - Detected format of input file as id
2. *foreach my $line(split /(?>\v|\x0D\x0A)/) {...}*
same result as 1
3. *foreach my $line(split /\r/) {...}
*
same result as 1 and 2
I cannot update Perl version at this moment so I gues I will have to
completely remove this split from variant_effect_predictor.pl code.
How can I update this line of code (176) to avoid any split?
Regards,
Guillermo.
On 05/06/15 10:28, Will McLaren wrote:
> Thanks Matthew for the detective work.
>
> I've removed the \R from the split function and replaced it with what
> perldoc says it is shorthand for; tests pass OK and it seems to work
> on the Windows input file that prompted me to make this change in the
> first place.
>
> I've patched the fix to release/79 and release/80, so Guillermo I'd
> appreciate if you could update your ensembl-tools checkout and give
> this a test run for me.
>
> Thanks everyone
>
> Will
>
> On 4 June 2015 at 19:17, Healy, Matthew <Matthew.Healy at bms.com
> <mailto:Matthew.Healy at bms.com>> wrote:
>
> The \R was added in Perl 5.10.0:
>
> http://perldoc.perl.org/5.10.0/perldelta.html
>
> *Vertical and horizontal whitespace, and linebreak*
>
> Regular expressions now recognize the \vand \hescapes that match
> vertical and horizontal whitespace, respectively. \Vand
> \Hlogically match their complements.
>
> \Rmatches a generic linebreak, that is, vertical whitespace, plus
> the multi-character sequence "\x0D\x0A".
>
> *From:*dev-bounces at ensembl.org <mailto:dev-bounces at ensembl.org>
> [mailto:dev-bounces at ensembl.org <mailto:dev-bounces at ensembl.org>]
> *On Behalf Of *Guillermo Marco Puche
> *Sent:* 04 June, 2015 2:10 PM
> *To:* dev at ensembl.org <mailto:dev at ensembl.org>
>
>
> *Subject:* Re: [ensembl-dev] VEP 79 API problems
>
> Yes I guess it's clear Perl version is the problem. I'll remove \R
> from this line in the script until I can update Perl version in my
> work environment.
>
> As always, thank you for your fantastic support.
>
> Best regards,
> Guillermo.
>
> El 04/06/2015 a las 18:19, Healy, Matthew escribió:
>
> In the regex documentation for Perl 5.8.8 there is no mention
> of \R (there is of course \r lowercase), so the Perl version
> probably is the issue:
>
> http://perldoc.perl.org/5.8.8/perlre.html
>
> *From:*dev-bounces at ensembl.org
> <mailto:dev-bounces at ensembl.org>
> [mailto:dev-bounces at ensembl.org] *On Behalf Of *Will McLaren
> *Sent:* 04 June, 2015 12:12 PM
> *To:* Ensembl developers list
> *Subject:* Re: [ensembl-dev] VEP 79 API problems
>
> I'm wondering if 5.8.8 has different regex handling to newer
> Perl versions. Someone else in the Ensembl team may know
> better than me on this one.
>
> I believe the Ensembl project now recommends at least 5.10
> (according to
> http://www.ensembl.org/info/docs/api/api_installation.html at
> least); most people in the wild use 5.14 or 5.16 AFAIK.
>
> If you can possibly try a newer version of Perl this may solve
> your issues. Perlbrew is a nice way to manage different
> versions and module sets http://perlbrew.pl/
>
> Will
>
> On 4 June 2015 at 17:02, Guillermo Marco Puche
> <guillermo.marco at sistemasgenomicos.com
> <mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
>
> Hello Will,
>
> Wrong behavior machine has Centos 5.4 and Perl v5.8.8 built
> for x86_64-linux-thread-multi.
>
> So should I completly remove ?
>
> * foreach my $line(split /\r|\R/) {*
>
>
> I was thinking about just removing \R from regex.
>
> Regards,
> Guillermo.
>
> On 04/06/15 17:49, Will McLaren wrote:
>
> Thanks
>
> I had forgotten about that change. You could just edit the
> script and change or even remove the regexp:
>
> foreach my $line(($_)) {
>
> What's your Perl version and system architecture? I'm
> surprised this has not caught anyone else out.
>
> Will
>
> On 4 June 2015 at 14:47, Guillermo Marco Puche
> <guillermo.marco at sistemasgenomicos.com
> <mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
>
> Hi Will,
>
> I've been comparing variant_effect_predictor script from
> version 75 vs 79.
> After adding a few prints to VEP.pm inside I've spotted
> the bug. However I cannot resolve it.
>
> Those lanes are new from 75 to 79 in VEP script (175 and 176):
>
> * # split again to avoid Windows character nonsense*
>
> * foreach my $line(split /\r|\R/) {*
>
>
>
> I've checked that script is spliting line each time it
> finds a capital R in VCF file as identifying it as a
> newline character from Windows. I can't reproduce it in
> virtual machine since its a fresh Linux install. In my
> work environment I'm getting this kind of bug, so I guess
> it has something to do with file enconding or locale? Has
> anyone else experienced this?
>
> Now I know where's the error but I've no idea how to solve it.
>
> Regards,
> Guillermo.
>
> On 04/06/15 15:16, Will McLaren wrote:
>
> Sorry Guillermo, I'm running out of ideas.
>
> Does the test unit run OK?
>
> perl
> ensembl-tools/scripts/variant_effect_predictor/t/variant_effect_predictor.t
>
> Will
>
> On 4 Jun 2015 12:27, "Guillermo Marco Puche"
> <guillermo.marco at sistemasgenomicos.com
> <mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
>
> Hi Will,
>
> I'm getting the exact same error with example_GRCh37.vcf:
>
> ERROR: Could not detect input file format
>
>
> I've made a test script as you suggest with the
> following code and I don't get any error:
>
> #!/usr/bin/env perl
>
>
>
> use strict;
>
> use Bio::EnsEMBL::Variation::Utils::VEP qw(detect_format);
>
>
> Regards,
> Guillermo.
>
> On 04/06/15 13:12, Will McLaren wrote:
>
> Hi again
>
> If the script is not detecting the input format
> then it is almost certainly an issue with the
> input file. There's very little code that gets run
> to detect the format, and it's all internal to the
> VEP code.
>
> You could write a short script to test the method,
> just import detect_format from
> Bio:EnsEMBL::Variation::Utils::VEP
>
> Does it detect the example_GRCh37.vcf format
> correctly?
>
> The file you shared on Dropbox works fine for me
> on my Mac and a Linux box.
>
> Will
>
> On 4 Jun 2015 10:44, "Guillermo Marco Puche"
> <guillermo.marco at sistemasgenomicos.com
> <mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
>
> Hi again Will,
>
> I'm trying with latest ensembl 80.
> If I don't specify *-format vcf* I get the
> following error:
>
> perl ensembl-tools/scripts/variant_effect_predictor/variant_effect_predictor.pl <http://variant_effect_predictor.pl> -i input.vcf -database --force_overwrite
>
> 2015-06-04 11:36:59 - Starting...
>
> ERROR: Could not detect input file format
>
>
> If I force format with*-format vcf *I get all the
> errors. (see error log attached). I'm using the
> same input.vcf file I posted yesterday.
> Just to discard it's not VCF, I've installed a
> fresh linux on virtual machine and just cloned and
> setup Ensembl and Bioperl. On fresh Linux install
> I was only asked to install MySQL perl module (I
> installed it via CPAN).
> It's working like a cake.
>
> I discard there's a problem with the input VCF
> because I'm using exactly the input over the two
> environments (and the same one you used to test it
> yesterday)
>
> So my question is: Does VEP script use any other
> library, environment variable or tool which may be
> interfering?
>
> Best regards,
> Guillermo.
>
>
> On 04/06/15 09:32, Guillermo Marco Puche wrote:
>
> Hi again Will,
>
> I've completly cleaned PERL5LIB environment
> var. I've been testing changing between
> bioperl 1.2.3 and bioperl 1.6.1 and got same
> warnings/errors.
> I've cloned again all 79 API like you
> suggested in a new tmp location and included
> it in $PERL5LIB.
>
> *echo $PERL5LIB*
>
> /share/apps/local/bioperl-live:/share/gluster/tests/gmarco/tmp/ensembl/modules:/share/gluster/tests/gmarco/tmp/ensembl-funcgen/modules:/share/gluster/tests/gmarco/tmp/ensembl-variation/modules
>
> *ll /share/gluster/tests/gmarco/tmp*
>
> total 20
> drwxrwxr-x 8 gmarco users 4096 jun 4 08:44
> ensembl
> drwxrwxr-x 8 gmarco users 146 jun 4 08:46
> ensembl-funcgen
> drwxrwxr-x 5 gmarco users 64 jun 4 08:43
> ensembl-tools
> drwxrwxr-x 10 gmarco users 4096 jun 4 08:45
> ensembl-variation
>
> *perl ensembl-tools/scripts/variant_effect_predictor/variant_effect_predictor.pl <http://variant_effect_predictor.pl> -i input.vcf -database --force_overwrite*
>
> 2015-06-04 09:29:13 - Starting...
> ERROR: Could not detect input file format
>
> If use the following flags *-format vcf* *-vcf
> *then I start getting all those errors (see
> yesterday log).
>
> Is there any other Perl lib or requirement I
> could be missing? As I said it's very weird I
> have 0 problems with Ensembl 75 local API.
>
> Best regards,
> Guillermo.
>
> On 03/06/15 18:14, Will McLaren wrote:
>
> Hi again,
>
> I can't recreate the problem with that
> input file I'm afraid, either on my normal
> setup or scrubbing PERL5LIB and starting
> from scratch.
>
> See commands I used and input below.
>
> Perhaps you haven't got release/79 of
> ensembl-tools too?
>
> Have you tried running the installer from
> within
> ensembl-tools/scripts/variant_effect_predictor?
> This shouldn't affect your PERL5LIB or
> other git checkouts.
>
> Will
>
> ===================
>
> mkdir ~/src/tmp
>
> cd ~/src/tmp
>
> git clone --branch release/79
> https://github.com/Ensembl/ensembl-tools.git
>
> git clone --branch release/79
> https://github.com/Ensembl/ensembl.git
>
> git clone --branch release/79
> https://github.com/Ensembl/ensembl-variation.git
>
> git clone --branch release/79
> https://github.com/Ensembl/ensembl-funcgen.git
>
> export
> PERL5LIB=ensembl/modules:ensembl-variation/modules:ensembl-funcgen/modules:/Users/will/src/bioperl-1.2.3/:/Users/will/src/lib/perl5/
>
> perl
> ensembl-tools/scripts/variant_effect_predictor/variant_effect_predictor.pl
> <http://variant_effect_predictor.pl> -i
> ~/Downloads/input.vcf -database
>
> 2015-06-03 17:09:54 - Starting...
>
> 2015-06-03 17:09:54 - Detected format of
> input file as vcf
>
> 2015-06-03 17:09:54 - Read 1 variants into
> buffer
>
> 2015-06-03 17:09:54 - Reading transcript
> data from cache and/or database
>
> [================================================================================================================================]
> [ 100% ]
>
> 2015-06-03 17:10:00 - Retrieved 7
> transcripts (0 mem, 0 cached, 7 DB, 0
> duplicates)
>
> 2015-06-03 17:10:00 - Analyzing chromosome 1
>
> 2015-06-03 17:10:00 - Analyzing variants
>
> [================================================================================================================================]
> [ 100% ]
>
> 2015-06-03 17:10:00 - Calculating consequences
>
> 2015-06-03 17:10:00 - Processed 1 total
> variants (0 vars/sec, 0 vars/sec total)
>
> 2015-06-03 17:10:00 - Wrote stats summary
> to variant_effect_output.txt_summary.html
>
> 2015-06-03 17:10:00 - Finished!
>
> On 3 June 2015 at 16:51, Guillermo Marco
> Puche
> <guillermo.marco at sistemasgenomicos.com
> <mailto:guillermo.marco at sistemasgenomicos.com>>
> wrote:
>
> Hi Will,
>
> I've been checking and I can't see any
> unintended whitespace or problem with
> tabulations.
> I've no issues with old vep 75 script and
> API. I've updated the Bioperl lib in
> $PERL5LIB variable from 1.2.3 to 1.6.1 (I
> didn't see this change before sorry)
> however I'm still getting all those errors.
>
> Here's a link where you can download the
> VCF I'm using as input:
> https://www.dropbox.com/sh/felwyoo5kl2mgty/AAC177Digqy-_mEmyk9WvmYba/input.vcf?dl=0
>
> Thank you.
>
> Best regards,
> Guille.
>
> On 03/06/15 17:30, Will McLaren wrote:
>
> Hi Guille,
>
>
> It looks to me like your input is not
> being parsed properly.
>
> Check the formatting of your input
> VCF; double check that it is valid
> VCF, and that you haven't got any
> unintended whitespace on any of the lines.
>
> If you still have an issue, can you
> send a line or two of the input that
> recreates these issues?
>
> Thanks
>
> Will McLaren
>
> Ensembl Variation
>
> On 3 June 2015 at 16:16, Guillermo
> Marco Puche
> <guillermo.marco at sistemasgenomicos.com
> <mailto:guillermo.marco at sistemasgenomicos.com>>
> wrote:
>
> Dear devs,
>
> I'm trying ensembl 79 VEP.
>
> This is my dummy input VCF:
> http://pastebin.com/kFKWH50q#
> <http://pastebin.com/kFKWH50q>
>
> I've cloned and installed API from
> github as always (this step is
> repeated for variaton, funcgen and
> compara):
>
> ·git clone --branch release/79
> https://github.com/Ensembl/ensembl.git
> ensembl_79
>
> PERL5LIB env variable is correctly
> pointing to the cloned API:
>
> ·echo $PERL5LIB
> /share/apps/local/bioperl-live:/share/apps/src/ensembl_79/modules:/share/apps/src/ensembl_79-compara/modules:/share/apps/src/ensembl_79-variation/modules:/share/apps/src/ensembl_79-functgenomics/modules
>
> However I'm getting a lot of errors I
> really don't understand. It seems like
> a bug with API installation with me.
> If I change $PERL5LIB variable to
> point to 75 API (previous version I
> was using) I can't reproduce the
> errors VEP script works for this old
> 75 version.
>
> I've been reading the docs again and I
> can't seen any additional PERL library
> requirement.
>
> Here's the error log:
> http://pastebin.com/VvQrkEQZ
>
>
> Thank you!
>
> Best regards,
> Guille.
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> <mailto:Dev at ensembl.org>
> Posting guidelines and
> subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
> _______________________________________________
>
> Dev mailing listDev at ensembl.org <mailto:Dev at ensembl.org>
>
> Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>
> Ensembl Blog:http://www.ensembl.info/
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> <mailto:Dev at ensembl.org>
> Posting guidelines and
> subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
> _______________________________________________
>
> Dev mailing listDev at ensembl.org <mailto:Dev at ensembl.org>
>
> Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>
> Ensembl Blog:http://www.ensembl.info/
>
>
>
>
>
> _______________________________________________
>
> Dev mailing listDev at ensembl.org <mailto:Dev at ensembl.org>
>
> Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>
> Ensembl Blog:http://www.ensembl.info/
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> <mailto:Dev at ensembl.org>
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
> _______________________________________________
>
> Dev mailing listDev at ensembl.org <mailto:Dev at ensembl.org>
>
> Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>
> Ensembl Blog:http://www.ensembl.info/
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
> _______________________________________________
>
> Dev mailing listDev at ensembl.org <mailto:Dev at ensembl.org>
>
> Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>
> Ensembl Blog:http://www.ensembl.info/
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
> _______________________________________________
>
> Dev mailing listDev at ensembl.org <mailto:Dev at ensembl.org>
>
> Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>
> Ensembl Blog:http://www.ensembl.info/
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
> ------------------------------------------------------------------------
>
> This message (including any attachments) may contain
> confidential, proprietary, privileged and/or private
> information. The information is intended to be for the use of
> the individual or entity designated above. If you are not the
> intended recipient of this message, please notify the sender
> immediately, and delete the message and any attachments. Any
> disclosure, reproduction, distribution or other use of this
> message or any attachments by an individual or entity other
> than the intended recipient is prohibited.
>
>
> _______________________________________________
>
> Dev mailing listDev at ensembl.org <mailto:Dev at ensembl.org>
>
> Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>
> Ensembl Blog:http://www.ensembl.info/
>
> ------------------------------------------------------------------------
> This message (including any attachments) may contain confidential,
> proprietary, privileged and/or private information. The
> information is intended to be for the use of the individual or
> entity designated above. If you are not the intended recipient of
> this message, please notify the sender immediately, and delete the
> message and any attachments. Any disclosure, reproduction,
> distribution or other use of this message or any attachments by an
> individual or entity other than the intended recipient is prohibited.
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150605/4fa3d137/attachment.html>
More information about the Dev
mailing list