[ensembl-dev] VEP 79 API problems

Guillermo Marco Puche guillermo.marco at sistemasgenomicos.com
Fri Jun 5 11:36:24 BST 2015


Hello Will,

Thank you for the info Matthew.

I'm still getting lines split on my work environment.

 1. *foreach my $line(split /\r|(?>\v|\x0D\x0A)/) {...}*

        variant_effect_predictor.pl -database -i input.vcf -o test.vcf
        --force_overwrite
        2015-06-05 12:26:14 - Starting...
        *4.2* ("##fileformat=VCFv4.2 getting" split into "*4.2*" I
        putted a print in VEP.pm)
        2015-06-05 12:26:14 - Detected format of input file as id

 2. *foreach my $line(split /(?>\v|\x0D\x0A)/) {...}*

        same result as 1

 3. *foreach my $line(split /\r/) {...}
    *

        same result as 1 and 2

I cannot update Perl version at this moment so I gues I will have to 
completely remove this split from variant_effect_predictor.pl code.
How can I update this line of code (176) to avoid any split?

Regards,
Guillermo.


On 05/06/15 10:28, Will McLaren wrote:
> Thanks Matthew for the detective work.
>
> I've removed the \R from the split function and replaced it with what 
> perldoc says it is shorthand for; tests pass OK and it seems to work 
> on the Windows input file that prompted me to make this change in the 
> first place.
>
> I've patched the fix to release/79 and release/80, so Guillermo I'd 
> appreciate if you could update your ensembl-tools checkout and give 
> this a test run for me.
>
> Thanks everyone
>
> Will
>
> On 4 June 2015 at 19:17, Healy, Matthew <Matthew.Healy at bms.com 
> <mailto:Matthew.Healy at bms.com>> wrote:
>
>     The \R was added in Perl 5.10.0:
>
>     http://perldoc.perl.org/5.10.0/perldelta.html
>
>     *Vertical and horizontal whitespace, and linebreak*
>
>     Regular expressions now recognize the \vand \hescapes that match
>     vertical and horizontal whitespace, respectively. \Vand
>     \Hlogically match their complements.
>
>     \Rmatches a generic linebreak, that is, vertical whitespace, plus
>     the multi-character sequence "\x0D\x0A".
>
>     *From:*dev-bounces at ensembl.org <mailto:dev-bounces at ensembl.org>
>     [mailto:dev-bounces at ensembl.org <mailto:dev-bounces at ensembl.org>]
>     *On Behalf Of *Guillermo Marco Puche
>     *Sent:* 04 June, 2015 2:10 PM
>     *To:* dev at ensembl.org <mailto:dev at ensembl.org>
>
>
>     *Subject:* Re: [ensembl-dev] VEP 79 API problems
>
>     Yes I guess it's clear Perl version is the problem. I'll remove \R
>     from this line in the script until I can update Perl version in my
>     work environment.
>
>     As always, thank you for your fantastic support.
>
>     Best regards,
>     Guillermo.
>
>     El 04/06/2015 a las 18:19, Healy, Matthew escribió:
>
>         In the regex documentation for Perl 5.8.8 there is no mention
>         of \R (there is of course \r lowercase), so the Perl version
>         probably is the issue:
>
>         http://perldoc.perl.org/5.8.8/perlre.html
>
>         *From:*dev-bounces at ensembl.org
>         <mailto:dev-bounces at ensembl.org>
>         [mailto:dev-bounces at ensembl.org] *On Behalf Of *Will McLaren
>         *Sent:* 04 June, 2015 12:12 PM
>         *To:* Ensembl developers list
>         *Subject:* Re: [ensembl-dev] VEP 79 API problems
>
>         I'm wondering if 5.8.8 has different regex handling to newer
>         Perl versions. Someone else in the Ensembl team may know
>         better than me on this one.
>
>         I believe the Ensembl project now recommends at least 5.10
>         (according to
>         http://www.ensembl.org/info/docs/api/api_installation.html at
>         least); most people in the wild use 5.14 or 5.16 AFAIK.
>
>         If you can possibly try a newer version of Perl this may solve
>         your issues. Perlbrew is a nice way to manage different
>         versions and module sets http://perlbrew.pl/
>
>         Will
>
>         On 4 June 2015 at 17:02, Guillermo Marco Puche
>         <guillermo.marco at sistemasgenomicos.com
>         <mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
>
>         Hello Will,
>
>         Wrong behavior machine has Centos 5.4 and Perl v5.8.8 built
>         for x86_64-linux-thread-multi.
>
>         So should I completly remove ?
>
>         *  foreach my $line(split /\r|\R/) {*
>
>
>         I was thinking about just removing \R from regex.
>
>         Regards,
>         Guillermo.
>
>         On 04/06/15 17:49, Will McLaren wrote:
>
>             Thanks
>
>             I had forgotten about that change. You could just edit the
>             script and change or even remove the regexp:
>
>             foreach my $line(($_)) {
>
>             What's your Perl version and system architecture? I'm
>             surprised this has not caught anyone else out.
>
>             Will
>
>             On 4 June 2015 at 14:47, Guillermo Marco Puche
>             <guillermo.marco at sistemasgenomicos.com
>             <mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
>
>             Hi Will,
>
>             I've been comparing variant_effect_predictor script from
>             version 75 vs 79.
>             After adding a few prints to VEP.pm inside I've spotted
>             the bug. However I cannot resolve it.
>
>             Those lanes are new from 75 to 79 in VEP script (175 and 176):
>
>             *     # split again to avoid Windows character nonsense*
>
>             *     foreach my $line(split /\r|\R/) {*
>
>
>
>             I've checked that script is spliting line each time it
>             finds a capital R in VCF file as identifying it as a
>             newline character from Windows. I can't reproduce it in
>             virtual machine since its a fresh Linux install. In my
>             work environment I'm getting this kind of bug, so I guess
>             it has something to do with file enconding or locale? Has
>             anyone else experienced this?
>
>             Now I know where's the error but I've no idea how to solve it.
>
>             Regards,
>             Guillermo.
>
>             On 04/06/15 15:16, Will McLaren wrote:
>
>                 Sorry Guillermo, I'm running out of ideas.
>
>                 Does the test unit run OK?
>
>                 perl
>                 ensembl-tools/scripts/variant_effect_predictor/t/variant_effect_predictor.t
>
>                 Will
>
>                 On 4 Jun 2015 12:27, "Guillermo Marco Puche"
>                 <guillermo.marco at sistemasgenomicos.com
>                 <mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
>
>                 Hi Will,
>
>                 I'm getting the exact same error with example_GRCh37.vcf:
>
>                 ERROR: Could not detect input file format
>
>
>                 I've made a test script as you suggest with the
>                 following code and I don't get any error:
>
>                 #!/usr/bin/env perl
>
>                   
>
>                 use strict;
>
>                 use Bio::EnsEMBL::Variation::Utils::VEP qw(detect_format);
>
>
>                 Regards,
>                 Guillermo.
>
>                 On 04/06/15 13:12, Will McLaren wrote:
>
>                     Hi again
>
>                     If the script is not detecting the input format
>                     then it is almost certainly an issue with the
>                     input file. There's very little code that gets run
>                     to detect the format, and it's all internal to the
>                     VEP code.
>
>                     You could write a short script to test the method,
>                     just import detect_format from
>                     Bio:EnsEMBL::Variation::Utils::VEP
>
>                     Does it detect the example_GRCh37.vcf format
>                     correctly?
>
>                     The file you shared on Dropbox works fine for me
>                     on my Mac and a Linux box.
>
>                     Will
>
>                     On 4 Jun 2015 10:44, "Guillermo Marco Puche"
>                     <guillermo.marco at sistemasgenomicos.com
>                     <mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
>
>                     Hi again Will,
>
>                     I'm trying with latest ensembl 80.
>                     If I don't specify *-format vcf* I get the
>                     following error:
>
>                     perl ensembl-tools/scripts/variant_effect_predictor/variant_effect_predictor.pl  <http://variant_effect_predictor.pl>  -i input.vcf -database --force_overwrite
>
>                     2015-06-04 11:36:59 - Starting...
>
>                     ERROR: Could not detect input file format
>
>
>                     If I force format with*-format vcf *I get all the
>                     errors. (see error log attached). I'm using the
>                     same input.vcf file I posted yesterday.
>                     Just to discard it's not VCF, I've installed a
>                     fresh linux on virtual machine and just cloned and
>                     setup Ensembl and Bioperl. On fresh Linux install
>                     I was only asked to install MySQL perl module (I
>                     installed it via CPAN).
>                     It's working like a cake.
>
>                     I discard there's a problem with the input VCF
>                     because I'm using exactly the input over the two
>                     environments (and the same one you used to test it
>                     yesterday)
>
>                     So my question is: Does VEP script use any other
>                     library, environment variable or tool which may be
>                     interfering?
>
>                     Best regards,
>                     Guillermo.
>
>
>                     On 04/06/15 09:32, Guillermo Marco Puche wrote:
>
>                         Hi again Will,
>
>                         I've completly cleaned PERL5LIB environment
>                         var. I've been testing changing between
>                         bioperl 1.2.3 and bioperl 1.6.1 and got same
>                         warnings/errors.
>                         I've cloned again all 79 API like you
>                         suggested in a new tmp location and included
>                         it in $PERL5LIB.
>
>                         *echo $PERL5LIB*
>
>                         /share/apps/local/bioperl-live:/share/gluster/tests/gmarco/tmp/ensembl/modules:/share/gluster/tests/gmarco/tmp/ensembl-funcgen/modules:/share/gluster/tests/gmarco/tmp/ensembl-variation/modules
>
>                         *ll /share/gluster/tests/gmarco/tmp*
>
>                         total 20
>                         drwxrwxr-x  8 gmarco users 4096 jun  4 08:44
>                         ensembl
>                         drwxrwxr-x  8 gmarco users 146 jun  4 08:46
>                         ensembl-funcgen
>                         drwxrwxr-x  5 gmarco users 64 jun  4 08:43
>                         ensembl-tools
>                         drwxrwxr-x 10 gmarco users 4096 jun  4 08:45
>                         ensembl-variation
>
>                         *perl ensembl-tools/scripts/variant_effect_predictor/variant_effect_predictor.pl  <http://variant_effect_predictor.pl>  -i input.vcf -database --force_overwrite*
>
>                         2015-06-04 09:29:13 - Starting...
>                         ERROR: Could not detect input file format
>
>                         If use the following flags *-format vcf* *-vcf
>                         *then I start getting all those errors (see
>                         yesterday log).
>
>                         Is there any other Perl lib or requirement I
>                         could be missing? As I said it's very weird I
>                         have 0 problems with Ensembl 75 local API.
>
>                         Best regards,
>                         Guillermo.
>
>                         On 03/06/15 18:14, Will McLaren wrote:
>
>                             Hi again,
>
>                             I can't recreate the problem with that
>                             input file I'm afraid, either on my normal
>                             setup or scrubbing PERL5LIB and starting
>                             from scratch.
>
>                             See commands I used and input below.
>
>                             Perhaps you haven't got release/79 of
>                             ensembl-tools too?
>
>                             Have you tried running the installer from
>                             within
>                             ensembl-tools/scripts/variant_effect_predictor?
>                             This shouldn't affect your PERL5LIB or
>                             other git checkouts.
>
>                             Will
>
>                             ===================
>
>                             mkdir ~/src/tmp
>
>                             cd ~/src/tmp
>
>                             git clone --branch release/79
>                             https://github.com/Ensembl/ensembl-tools.git
>
>                             git clone --branch release/79
>                             https://github.com/Ensembl/ensembl.git
>
>                             git clone --branch release/79
>                             https://github.com/Ensembl/ensembl-variation.git
>
>                             git clone --branch release/79
>                             https://github.com/Ensembl/ensembl-funcgen.git
>
>                             export
>                             PERL5LIB=ensembl/modules:ensembl-variation/modules:ensembl-funcgen/modules:/Users/will/src/bioperl-1.2.3/:/Users/will/src/lib/perl5/
>
>                             perl
>                             ensembl-tools/scripts/variant_effect_predictor/variant_effect_predictor.pl
>                             <http://variant_effect_predictor.pl>  -i
>                             ~/Downloads/input.vcf  -database
>
>                             2015-06-03 17:09:54 - Starting...
>
>                             2015-06-03 17:09:54 - Detected format of
>                             input file as vcf
>
>                             2015-06-03 17:09:54 - Read 1 variants into
>                             buffer
>
>                             2015-06-03 17:09:54 - Reading transcript
>                             data from cache and/or database
>
>                             [================================================================================================================================]
>                              [ 100% ]
>
>                             2015-06-03 17:10:00 - Retrieved 7
>                             transcripts (0 mem, 0 cached, 7 DB, 0
>                             duplicates)
>
>                             2015-06-03 17:10:00 - Analyzing chromosome 1
>
>                             2015-06-03 17:10:00 - Analyzing variants
>
>                             [================================================================================================================================]
>                              [ 100% ]
>
>                             2015-06-03 17:10:00 - Calculating consequences
>
>                             2015-06-03 17:10:00 - Processed 1 total
>                             variants (0 vars/sec, 0 vars/sec total)
>
>                             2015-06-03 17:10:00 - Wrote stats summary
>                             to variant_effect_output.txt_summary.html
>
>                             2015-06-03 17:10:00 - Finished!
>
>                             On 3 June 2015 at 16:51, Guillermo Marco
>                             Puche
>                             <guillermo.marco at sistemasgenomicos.com
>                             <mailto:guillermo.marco at sistemasgenomicos.com>>
>                             wrote:
>
>                             Hi Will,
>
>                             I've been checking and I can't see any
>                             unintended whitespace or problem with
>                             tabulations.
>                             I've no issues with old vep 75 script and
>                             API. I've updated the Bioperl lib in
>                             $PERL5LIB variable from 1.2.3 to 1.6.1 (I
>                             didn't see this change before sorry)
>                             however I'm still getting all those errors.
>
>                             Here's a link where you can download the
>                             VCF I'm using as input:
>                             https://www.dropbox.com/sh/felwyoo5kl2mgty/AAC177Digqy-_mEmyk9WvmYba/input.vcf?dl=0
>
>                             Thank you.
>
>                             Best regards,
>                             Guille.
>
>                             On 03/06/15 17:30, Will McLaren wrote:
>
>                                 Hi Guille,
>
>
>                                 It looks to me like your input is not
>                                 being parsed properly.
>
>                                 Check the formatting of your input
>                                 VCF; double check that it is valid
>                                 VCF, and that you haven't got any
>                                 unintended whitespace on any of the lines.
>
>                                 If you still have an issue, can you
>                                 send a line or two of the input that
>                                 recreates these issues?
>
>                                 Thanks
>
>                                 Will McLaren
>
>                                 Ensembl Variation
>
>                                 On 3 June 2015 at 16:16, Guillermo
>                                 Marco Puche
>                                 <guillermo.marco at sistemasgenomicos.com
>                                 <mailto:guillermo.marco at sistemasgenomicos.com>>
>                                 wrote:
>
>                                 Dear devs,
>
>                                 I'm trying ensembl 79 VEP.
>
>                                 This is my dummy input VCF:
>                                 http://pastebin.com/kFKWH50q#
>                                 <http://pastebin.com/kFKWH50q>
>
>                                 I've cloned and installed API from
>                                 github as always (this step is
>                                 repeated for variaton, funcgen and
>                                 compara):
>
>                                 ·git clone --branch release/79
>                                 https://github.com/Ensembl/ensembl.git
>                                 ensembl_79
>
>                                 PERL5LIB env variable is correctly
>                                 pointing to the cloned API:
>
>                                 ·echo $PERL5LIB
>                                 /share/apps/local/bioperl-live:/share/apps/src/ensembl_79/modules:/share/apps/src/ensembl_79-compara/modules:/share/apps/src/ensembl_79-variation/modules:/share/apps/src/ensembl_79-functgenomics/modules
>
>                                 However I'm getting a lot of errors I
>                                 really don't understand. It seems like
>                                 a bug with API installation with me.
>                                 If I change $PERL5LIB variable to
>                                 point to 75 API (previous version I
>                                 was using) I can't reproduce the
>                                 errors VEP script works for this old
>                                 75 version.
>
>                                 I've been reading the docs again and I
>                                 can't seen any additional PERL library
>                                 requirement.
>
>                                 Here's the error log:
>                                 http://pastebin.com/VvQrkEQZ
>
>
>                                 Thank you!
>
>                                 Best regards,
>                                 Guille.
>
>
>                                 _______________________________________________
>                                 Dev mailing list Dev at ensembl.org
>                                 <mailto:Dev at ensembl.org>
>                                 Posting guidelines and
>                                 subscribe/unsubscribe info:
>                                 http://lists.ensembl.org/mailman/listinfo/dev
>                                 Ensembl Blog: http://www.ensembl.info/
>
>                                 _______________________________________________
>
>                                 Dev mailing listDev at ensembl.org  <mailto:Dev at ensembl.org>
>
>                                 Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>
>                                 Ensembl Blog:http://www.ensembl.info/
>
>
>                             _______________________________________________
>                             Dev mailing list Dev at ensembl.org
>                             <mailto:Dev at ensembl.org>
>                             Posting guidelines and
>                             subscribe/unsubscribe info:
>                             http://lists.ensembl.org/mailman/listinfo/dev
>                             Ensembl Blog: http://www.ensembl.info/
>
>                             _______________________________________________
>
>                             Dev mailing listDev at ensembl.org  <mailto:Dev at ensembl.org>
>
>                             Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>
>                             Ensembl Blog:http://www.ensembl.info/
>
>
>
>
>
>                         _______________________________________________
>
>                         Dev mailing listDev at ensembl.org  <mailto:Dev at ensembl.org>
>
>                         Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>
>                         Ensembl Blog:http://www.ensembl.info/
>
>
>                     _______________________________________________
>                     Dev mailing list Dev at ensembl.org
>                     <mailto:Dev at ensembl.org>
>                     Posting guidelines and subscribe/unsubscribe info:
>                     http://lists.ensembl.org/mailman/listinfo/dev
>                     Ensembl Blog: http://www.ensembl.info/
>
>                     _______________________________________________
>
>                     Dev mailing listDev at ensembl.org  <mailto:Dev at ensembl.org>
>
>                     Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>
>                     Ensembl Blog:http://www.ensembl.info/
>
>
>                 _______________________________________________
>                 Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>                 Posting guidelines and subscribe/unsubscribe info:
>                 http://lists.ensembl.org/mailman/listinfo/dev
>                 Ensembl Blog: http://www.ensembl.info/
>
>                 _______________________________________________
>
>                 Dev mailing listDev at ensembl.org  <mailto:Dev at ensembl.org>
>
>                 Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>
>                 Ensembl Blog:http://www.ensembl.info/
>
>
>             _______________________________________________
>             Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>             Posting guidelines and subscribe/unsubscribe info:
>             http://lists.ensembl.org/mailman/listinfo/dev
>             Ensembl Blog: http://www.ensembl.info/
>
>             _______________________________________________
>
>             Dev mailing listDev at ensembl.org  <mailto:Dev at ensembl.org>
>
>             Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>
>             Ensembl Blog:http://www.ensembl.info/
>
>
>         _______________________________________________
>         Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>         Posting guidelines and subscribe/unsubscribe info:
>         http://lists.ensembl.org/mailman/listinfo/dev
>         Ensembl Blog: http://www.ensembl.info/
>
>         ------------------------------------------------------------------------
>
>         This message (including any attachments) may contain
>         confidential, proprietary, privileged and/or private
>         information. The information is intended to be for the use of
>         the individual or entity designated above. If you are not the
>         intended recipient of this message, please notify the sender
>         immediately, and delete the message and any attachments. Any
>         disclosure, reproduction, distribution or other use of this
>         message or any attachments by an individual or entity other
>         than the intended recipient is prohibited.
>
>
>         _______________________________________________
>
>         Dev mailing listDev at ensembl.org  <mailto:Dev at ensembl.org>
>
>         Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>
>         Ensembl Blog:http://www.ensembl.info/
>
>     ------------------------------------------------------------------------
>     This message (including any attachments) may contain confidential,
>     proprietary, privileged and/or private information. The
>     information is intended to be for the use of the individual or
>     entity designated above. If you are not the intended recipient of
>     this message, please notify the sender immediately, and delete the
>     message and any attachments. Any disclosure, reproduction,
>     distribution or other use of this message or any attachments by an
>     individual or entity other than the intended recipient is prohibited.
>
>     _______________________________________________
>     Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>     Posting guidelines and subscribe/unsubscribe info:
>     http://lists.ensembl.org/mailman/listinfo/dev
>     Ensembl Blog: http://www.ensembl.info/
>
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150605/4fa3d137/attachment.html>


More information about the Dev mailing list