[ensembl-dev] VEP 79 API problems

Will McLaren wm2 at ebi.ac.uk
Thu Jun 4 17:11:47 BST 2015


I'm wondering if 5.8.8 has different regex handling to newer Perl versions.
Someone else in the Ensembl team may know better than me on this one.

I believe the Ensembl project now recommends at least 5.10 (according to
http://www.ensembl.org/info/docs/api/api_installation.html at least); most
people in the wild use 5.14 or 5.16 AFAIK.

If you can possibly try a newer version of Perl this may solve your issues.
Perlbrew is a nice way to manage different versions and module sets
http://perlbrew.pl/

Will

On 4 June 2015 at 17:02, Guillermo Marco Puche <
guillermo.marco at sistemasgenomicos.com> wrote:

>  Hello Will,
>
> Wrong behavior machine has Centos 5.4 and Perl v5.8.8 built for
> x86_64-linux-thread-multi.
>
> So should I completly remove ?
>
> * foreach my $line(split /\r|\R/) {*
>
>
> I was thinking about just removing \R from regex.
>
> Regards,
> Guillermo.
>
>
> On 04/06/15 17:49, Will McLaren wrote:
>
> Thanks
>
>  I had forgotten about that change. You could just edit the script and
> change or even remove the regexp:
>
>  foreach my $line(($_)) {
>
>  What's your Perl version and system architecture? I'm surprised this has
> not caught anyone else out.
>
>  Will
>
> On 4 June 2015 at 14:47, Guillermo Marco Puche <
> guillermo.marco at sistemasgenomicos.com> wrote:
>
>>  Hi Will,
>>
>> I've been comparing variant_effect_predictor script from version 75 vs 79.
>> After adding a few prints to VEP.pm inside I've spotted the bug. However
>> I cannot resolve it.
>>
>> Those lanes are new from 75 to 79 in VEP script (175 and 176):
>>
>>  *    # split again to avoid Windows character nonsense*
>>
>> *    foreach my $line(split /\r|\R/) {*
>>
>>
>>
>> I've checked that script is spliting line each time it finds a capital R
>> in VCF file as identifying it as a newline character from Windows. I can't
>> reproduce it in virtual machine since its a fresh Linux install. In my work
>> environment I'm getting this kind of bug, so I guess it has something to do
>> with file enconding or locale? Has anyone else experienced this?
>>
>> Now I know where's the error but I've no idea how to solve it.
>>
>> Regards,
>> Guillermo.
>>
>>
>> On 04/06/15 15:16, Will McLaren wrote:
>>
>> Sorry Guillermo, I'm running out of ideas.
>>
>> Does the test unit run OK?
>>
>> perl
>> ensembl-tools/scripts/variant_effect_predictor/t/variant_effect_predictor.t
>>
>> Will
>> On 4 Jun 2015 12:27, "Guillermo Marco Puche" <
>> guillermo.marco at sistemasgenomicos.com> wrote:
>>
>>>  Hi Will,
>>>
>>> I'm getting the exact same error with example_GRCh37.vcf:
>>>
>>> ERROR: Could not detect input file format
>>>
>>>
>>> I've made a test script as you suggest with the following code and I
>>> don't get any error:
>>>
>>> #!/usr/bin/env perl
>>>
>>> use strict;
>>> use Bio::EnsEMBL::Variation::Utils::VEP qw(detect_format);
>>>
>>>
>>> Regards,
>>> Guillermo.
>>>
>>> On 04/06/15 13:12, Will McLaren wrote:
>>>
>>> Hi again
>>>
>>> If the script is not detecting the input format then it is almost
>>> certainly an issue with the input file. There's very little code that gets
>>> run to detect the format, and it's all internal to the VEP code.
>>>
>>> You could write a short script to test the method, just import
>>> detect_format from Bio:EnsEMBL::Variation::Utils::VEP
>>>
>>> Does it detect the example_GRCh37.vcf format correctly?
>>>
>>> The file you shared on Dropbox works fine for me on my Mac and a Linux
>>> box.
>>>
>>> Will
>>> On 4 Jun 2015 10:44, "Guillermo Marco Puche" <
>>> guillermo.marco at sistemasgenomicos.com> wrote:
>>>
>>>>  Hi again Will,
>>>>
>>>> I'm trying with latest ensembl 80.
>>>> If I don't specify *-format vcf* I get the following error:
>>>>
>>>> perl ensembl-tools/scripts/variant_effect_predictor/variant_effect_predictor.pl -i input.vcf -database --force_overwrite
>>>> 2015-06-04 11:36:59 - Starting...
>>>> ERROR: Could not detect input file format
>>>>
>>>>
>>>> If I force format with* -format vcf *I get all the errors. (see error
>>>> log attached). I'm using the same input.vcf file I posted yesterday.
>>>> Just to discard it's not VCF, I've installed a fresh linux on virtual
>>>> machine and just cloned and setup Ensembl and Bioperl. On fresh Linux
>>>> install I was only asked to install MySQL perl module (I installed it via
>>>> CPAN).
>>>> It's working like a cake.
>>>>
>>>> I discard there's a problem with the input VCF because I'm using
>>>> exactly the input over the two environments (and the same one you used to
>>>> test it yesterday)
>>>>
>>>> So my question is: Does VEP script use any other library, environment
>>>> variable or tool which may be interfering?
>>>>
>>>> Best regards,
>>>> Guillermo.
>>>>
>>>>
>>>> On 04/06/15 09:32, Guillermo Marco Puche wrote:
>>>>
>>>> Hi again Will,
>>>>
>>>> I've completly cleaned PERL5LIB environment var. I've been testing
>>>> changing between bioperl 1.2.3 and bioperl 1.6.1 and got same
>>>> warnings/errors.
>>>> I've cloned again all 79 API like you suggested in a new tmp location
>>>> and included it in $PERL5LIB.
>>>>
>>>> *echo $PERL5LIB *
>>>>
>>>>
>>>> /share/apps/local/bioperl-live:/share/gluster/tests/gmarco/tmp/ensembl/modules:/share/gluster/tests/gmarco/tmp/ensembl-funcgen/modules:/share/gluster/tests/gmarco/tmp/ensembl-variation/modules
>>>>
>>>> *ll /share/gluster/tests/gmarco/tmp*
>>>>
>>>> total 20
>>>> drwxrwxr-x  8 gmarco users 4096 jun  4 08:44 ensembl
>>>> drwxrwxr-x  8 gmarco users  146 jun  4 08:46 ensembl-funcgen
>>>> drwxrwxr-x  5 gmarco users   64 jun  4 08:43 ensembl-tools
>>>> drwxrwxr-x 10 gmarco users 4096 jun  4 08:45 ensembl-variation
>>>>
>>>> *perl ensembl-tools/scripts/variant_effect_predictor/variant_effect_predictor.pl <http://variant_effect_predictor.pl> -i input.vcf -database --force_overwrite*
>>>>
>>>> 2015-06-04 09:29:13 - Starting...
>>>> ERROR: Could not detect input file format
>>>>
>>>> If use the following flags  *-format vcf* *-vcf *then I start getting
>>>> all those errors (see yesterday log).
>>>>
>>>> Is there any other Perl lib or requirement I could be missing? As I
>>>> said it's very weird I have 0 problems with Ensembl 75 local API.
>>>>
>>>> Best regards,
>>>> Guillermo.
>>>>
>>>> On 03/06/15 18:14, Will McLaren wrote:
>>>>
>>>> Hi again,
>>>>
>>>>  I can't recreate the problem with that input file I'm afraid, either
>>>> on my normal setup or scrubbing PERL5LIB and starting from scratch.
>>>>
>>>>  See commands I used and input below.
>>>>
>>>>  Perhaps you haven't got release/79 of ensembl-tools too?
>>>>
>>>>  Have you tried running the installer from within
>>>> ensembl-tools/scripts/variant_effect_predictor? This shouldn't affect your
>>>> PERL5LIB or other git checkouts.
>>>>
>>>>  Will
>>>>
>>>>  ===================
>>>>
>>>>  mkdir ~/src/tmp
>>>> cd ~/src/tmp
>>>> git clone --branch release/79
>>>> https://github.com/Ensembl/ensembl-tools.git
>>>>  git clone --branch release/79 https://github.com/Ensembl/ensembl.git
>>>> git clone --branch release/79
>>>> https://github.com/Ensembl/ensembl-variation.git
>>>>  git clone --branch release/79
>>>> https://github.com/Ensembl/ensembl-funcgen.git
>>>>  export
>>>> PERL5LIB=ensembl/modules:ensembl-variation/modules:ensembl-funcgen/modules:/Users/will/src/bioperl-1.2.3/:/Users/will/src/lib/perl5/
>>>>  perl ensembl-tools/scripts/variant_effect_predictor/
>>>> variant_effect_predictor.pl  -i ~/Downloads/input.vcf  -database
>>>> 2015-06-03 17:09:54 - Starting...
>>>> 2015-06-03 17:09:54 - Detected format of input file as vcf
>>>> 2015-06-03 17:09:54 - Read 1 variants into buffer
>>>> 2015-06-03 17:09:54 - Reading transcript data from cache and/or database
>>>> [================================================================================================================================]
>>>>  [ 100% ]
>>>> 2015-06-03 17:10:00 - Retrieved 7 transcripts (0 mem, 0 cached, 7 DB, 0
>>>> duplicates)
>>>> 2015-06-03 17:10:00 - Analyzing chromosome 1
>>>> 2015-06-03 17:10:00 - Analyzing variants
>>>> [================================================================================================================================]
>>>>  [ 100% ]
>>>> 2015-06-03 17:10:00 - Calculating consequences
>>>> 2015-06-03 17:10:00 - Processed 1 total variants (0 vars/sec, 0
>>>> vars/sec total)
>>>> 2015-06-03 17:10:00 - Wrote stats summary to
>>>> variant_effect_output.txt_summary.html
>>>> 2015-06-03 17:10:00 - Finished!
>>>>
>>>>
>>>>
>>>>
>>>> On 3 June 2015 at 16:51, Guillermo Marco Puche <
>>>> guillermo.marco at sistemasgenomicos.com> wrote:
>>>>
>>>>>  Hi Will,
>>>>>
>>>>> I've been checking and I can't see any unintended whitespace or
>>>>> problem with tabulations.
>>>>> I've no issues with old vep 75 script and API. I've updated the
>>>>> Bioperl lib in $PERL5LIB variable from 1.2.3 to 1.6.1 (I didn't see this
>>>>> change before sorry) however I'm still getting all those errors.
>>>>>
>>>>> Here's a link where you can download the VCF I'm using as input:
>>>>> https://www.dropbox.com/sh/felwyoo5kl2mgty/AAC177Digqy-_mEmyk9WvmYba/input.vcf?dl=0
>>>>>
>>>>> Thank you.
>>>>>
>>>>> Best regards,
>>>>> Guille.
>>>>>
>>>>>
>>>>> On 03/06/15 17:30, Will McLaren wrote:
>>>>>
>>>>> Hi Guille,
>>>>>
>>>>> It looks to me like your input is not being parsed properly.
>>>>>
>>>>>  Check the formatting of your input VCF; double check that it is
>>>>> valid VCF, and that you haven't got any unintended whitespace on any of the
>>>>> lines.
>>>>>
>>>>>  If you still have an issue, can you send a line or two of the input
>>>>> that recreates these issues?
>>>>>
>>>>>  Thanks
>>>>>
>>>>>  Will McLaren
>>>>> Ensembl Variation
>>>>>
>>>>>
>>>>> On 3 June 2015 at 16:16, Guillermo Marco Puche <
>>>>> guillermo.marco at sistemasgenomicos.com> wrote:
>>>>>
>>>>>>  Dear devs,
>>>>>>
>>>>>> I'm trying ensembl 79 VEP.
>>>>>>
>>>>>> This is my dummy input VCF: http://pastebin.com/kFKWH50q#
>>>>>>
>>>>>> I've cloned and installed API from github as always (this step is
>>>>>> repeated for variaton, funcgen and compara):
>>>>>>
>>>>>>    - git clone --branch release/79
>>>>>>    https://github.com/Ensembl/ensembl.git ensembl_79
>>>>>>
>>>>>> PERL5LIB env variable is correctly pointing to the cloned API:
>>>>>>
>>>>>>    - echo $PERL5LIB
>>>>>>
>>>>>>    /share/apps/local/bioperl-live:/share/apps/src/ensembl_79/modules:/share/apps/src/ensembl_79-compara/modules:/share/apps/src/ensembl_79-variation/modules:/share/apps/src/ensembl_79-functgenomics/modules
>>>>>>
>>>>>> However I'm getting a lot of errors I really don't understand. It
>>>>>> seems like a bug with API installation with me. If I change $PERL5LIB
>>>>>> variable to point to 75 API (previous version I was using) I can't
>>>>>> reproduce the errors VEP script works for this old 75 version.
>>>>>>
>>>>>> I've been reading the docs again and I can't seen any additional PERL
>>>>>> library requirement.
>>>>>>
>>>>>> Here's the error log: http://pastebin.com/VvQrkEQZ
>>>>>>
>>>>>>
>>>>>> Thank you!
>>>>>>
>>>>>> Best regards,
>>>>>> Guille.
>>>>>>
>>>>>> _______________________________________________
>>>>>> Dev mailing list    Dev at ensembl.org
>>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Dev mailing list    Dev at ensembl.org
>>>>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Dev mailing list    Dev at ensembl.org
>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Dev mailing list    Dev at ensembl.org
>>>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>>>> Ensembl Blog: http://www.ensembl.info/
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Dev mailing list    Dev at ensembl.org
>>>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>>>> Ensembl Blog: http://www.ensembl.info/
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Dev mailing list    Dev at ensembl.org
>>>> Posting guidelines and subscribe/unsubscribe info:
>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>> Ensembl Blog: http://www.ensembl.info/
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>>
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info:
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150604/7f34802a/attachment.html>


More information about the Dev mailing list