[ensembl-dev] VEP 79 API problems

Guillermo Marco Puche guillermo.marco at sistemasgenomicos.com
Thu Jun 4 14:47:35 BST 2015


Hi Will,

I've been comparing variant_effect_predictor script from version 75 vs 79.
After adding a few prints to VEP.pm inside I've spotted the bug. However 
I cannot resolve it.

Those lanes are new from 75 to 79 in VEP script (175 and 176):
*
*

*     # split again to avoid Windows character nonsense*

*     foreach my $line(split /\r|\R/) {*



I've checked that script is spliting line each time it finds a capital R 
in VCF file as identifying it as a newline character from Windows. I 
can't reproduce it in virtual machine since its a fresh Linux install. 
In my work environment I'm getting this kind of bug, so I guess it has 
something to do with file enconding or locale? Has anyone else 
experienced this?

Now I know where's the error but I've no idea how to solve it.

Regards,
Guillermo.

On 04/06/15 15:16, Will McLaren wrote:
>
> Sorry Guillermo, I'm running out of ideas.
>
> Does the test unit run OK?
>
> perl 
> ensembl-tools/scripts/variant_effect_predictor/t/variant_effect_predictor.t
>
> Will
>
> On 4 Jun 2015 12:27, "Guillermo Marco Puche" 
> <guillermo.marco at sistemasgenomicos.com 
> <mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
>
>     Hi Will,
>
>     I'm getting the exact same error with example_GRCh37.vcf:
>
>     ERROR: Could not detect input file format
>
>
>     I've made a test script as you suggest with the following code and
>     I don't get any error:
>
>     #!/usr/bin/env perl
>
>     use strict;
>     use Bio::EnsEMBL::Variation::Utils::VEP qw(detect_format);
>
>
>     Regards,
>     Guillermo.
>
>     On 04/06/15 13:12, Will McLaren wrote:
>>
>>     Hi again
>>
>>     If the script is not detecting the input format then it is almost
>>     certainly an issue with the input file. There's very little code
>>     that gets run to detect the format, and it's all internal to the
>>     VEP code.
>>
>>     You could write a short script to test the method, just import
>>     detect_format from Bio:EnsEMBL::Variation::Utils::VEP
>>
>>     Does it detect the example_GRCh37.vcf format correctly?
>>
>>     The file you shared on Dropbox works fine for me on my Mac and a
>>     Linux box.
>>
>>     Will
>>
>>     On 4 Jun 2015 10:44, "Guillermo Marco Puche"
>>     <guillermo.marco at sistemasgenomicos.com
>>     <mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
>>
>>         Hi again Will,
>>
>>         I'm trying with latest ensembl 80.
>>         If I don't specify *-format vcf* I get the following error:
>>
>>         perl ensembl-tools/scripts/variant_effect_predictor/variant_effect_predictor.pl  <http://variant_effect_predictor.pl>  -i input.vcf -database --force_overwrite
>>         2015-06-04 11:36:59 - Starting...
>>         ERROR: Could not detect input file format
>>
>>
>>         If I force format with*-format vcf *I get all the errors.
>>         (see error log attached). I'm using the same input.vcf file I
>>         posted yesterday.
>>         Just to discard it's not VCF, I've installed a fresh linux on
>>         virtual machine and just cloned and setup Ensembl and
>>         Bioperl. On fresh Linux install I was only asked to install
>>         MySQL perl module (I installed it via CPAN).
>>         It's working like a cake.
>>
>>         I discard there's a problem with the input VCF because I'm
>>         using exactly the input over the two environments (and the
>>         same one you used to test it yesterday)
>>
>>         So my question is: Does VEP script use any other library,
>>         environment variable or tool which may be interfering?
>>
>>         Best regards,
>>         Guillermo.
>>
>>
>>         On 04/06/15 09:32, Guillermo Marco Puche wrote:
>>>         Hi again Will,
>>>
>>>         I've completly cleaned PERL5LIB environment var. I've been
>>>         testing changing between bioperl 1.2.3 and bioperl 1.6.1 and
>>>         got same warnings/errors.
>>>         I've cloned again all 79 API like you suggested in a new tmp
>>>         location and included it in $PERL5LIB.
>>>
>>>         *echo $PERL5LIB*
>>>         /share/apps/local/bioperl-live:/share/gluster/tests/gmarco/tmp/ensembl/modules:/share/gluster/tests/gmarco/tmp/ensembl-funcgen/modules:/share/gluster/tests/gmarco/tmp/ensembl-variation/modules
>>>
>>>         *ll /share/gluster/tests/gmarco/tmp*
>>>         total 20
>>>         drwxrwxr-x  8 gmarco users 4096 jun  4 08:44 ensembl
>>>         drwxrwxr-x  8 gmarco users  146 jun  4 08:46 ensembl-funcgen
>>>         drwxrwxr-x  5 gmarco users   64 jun  4 08:43 ensembl-tools
>>>         drwxrwxr-x 10 gmarco users 4096 jun  4 08:45 ensembl-variation
>>>
>>>         *perl ensembl-tools/scripts/variant_effect_predictor/variant_effect_predictor.pl  <http://variant_effect_predictor.pl>  -i input.vcf -database --force_overwrite*
>>>         2015-06-04 09:29:13 - Starting...
>>>         ERROR: Could not detect input file format
>>>
>>>         If use the following flags *-format vcf* *-vcf *then I start
>>>         getting all those errors (see yesterday log).
>>>
>>>         Is there any other Perl lib or requirement I could be
>>>         missing? As I said it's very weird I have 0 problems with
>>>         Ensembl 75 local API.
>>>
>>>         Best regards,
>>>         Guillermo.
>>>
>>>         On 03/06/15 18:14, Will McLaren wrote:
>>>>         Hi again,
>>>>
>>>>         I can't recreate the problem with that input file I'm
>>>>         afraid, either on my normal setup or scrubbing PERL5LIB and
>>>>         starting from scratch.
>>>>
>>>>         See commands I used and input below.
>>>>
>>>>         Perhaps you haven't got release/79 of ensembl-tools too?
>>>>
>>>>         Have you tried running the installer from within
>>>>         ensembl-tools/scripts/variant_effect_predictor? This
>>>>         shouldn't affect your PERL5LIB or other git checkouts.
>>>>
>>>>         Will
>>>>
>>>>         ===================
>>>>
>>>>         mkdir ~/src/tmp
>>>>         cd ~/src/tmp
>>>>         git clone --branch release/79
>>>>         https://github.com/Ensembl/ensembl-tools.git
>>>>         git clone --branch release/79
>>>>         https://github.com/Ensembl/ensembl.git
>>>>         git clone --branch release/79
>>>>         https://github.com/Ensembl/ensembl-variation.git
>>>>         git clone --branch release/79
>>>>         https://github.com/Ensembl/ensembl-funcgen.git
>>>>         export
>>>>         PERL5LIB=ensembl/modules:ensembl-variation/modules:ensembl-funcgen/modules:/Users/will/src/bioperl-1.2.3/:/Users/will/src/lib/perl5/
>>>>         perl
>>>>         ensembl-tools/scripts/variant_effect_predictor/variant_effect_predictor.pl
>>>>         <http://variant_effect_predictor.pl>  -i
>>>>         ~/Downloads/input.vcf  -database
>>>>         2015-06-03 17:09:54 - Starting...
>>>>         2015-06-03 17:09:54 - Detected format of input file as vcf
>>>>         2015-06-03 17:09:54 - Read 1 variants into buffer
>>>>         2015-06-03 17:09:54 - Reading transcript data from cache
>>>>         and/or database
>>>>         [================================================================================================================================]
>>>>          [ 100% ]
>>>>         2015-06-03 17:10:00 - Retrieved 7 transcripts (0 mem, 0
>>>>         cached, 7 DB, 0 duplicates)
>>>>         2015-06-03 17:10:00 - Analyzing chromosome 1
>>>>         2015-06-03 17:10:00 - Analyzing variants
>>>>         [================================================================================================================================]
>>>>          [ 100% ]
>>>>         2015-06-03 17:10:00 - Calculating consequences
>>>>         2015-06-03 17:10:00 - Processed 1 total variants (0
>>>>         vars/sec, 0 vars/sec total)
>>>>         2015-06-03 17:10:00 - Wrote stats summary to
>>>>         variant_effect_output.txt_summary.html
>>>>         2015-06-03 17:10:00 - Finished!
>>>>
>>>>
>>>>
>>>>
>>>>         On 3 June 2015 at 16:51, Guillermo Marco Puche
>>>>         <guillermo.marco at sistemasgenomicos.com
>>>>         <mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
>>>>
>>>>             Hi Will,
>>>>
>>>>             I've been checking and I can't see any unintended
>>>>             whitespace or problem with tabulations.
>>>>             I've no issues with old vep 75 script and API. I've
>>>>             updated the Bioperl lib in $PERL5LIB variable from
>>>>             1.2.3 to 1.6.1 (I didn't see this change before sorry)
>>>>             however I'm still getting all those errors.
>>>>
>>>>             Here's a link where you can download the VCF I'm using
>>>>             as input:
>>>>             https://www.dropbox.com/sh/felwyoo5kl2mgty/AAC177Digqy-_mEmyk9WvmYba/input.vcf?dl=0
>>>>
>>>>             Thank you.
>>>>
>>>>             Best regards,
>>>>             Guille.
>>>>
>>>>
>>>>             On 03/06/15 17:30, Will McLaren wrote:
>>>>>             Hi Guille,
>>>>>
>>>>>             It looks to me like your input is not being parsed
>>>>>             properly.
>>>>>
>>>>>             Check the formatting of your input VCF; double check
>>>>>             that it is valid VCF, and that you haven't got any
>>>>>             unintended whitespace on any of the lines.
>>>>>
>>>>>             If you still have an issue, can you send a line or two
>>>>>             of the input that recreates these issues?
>>>>>
>>>>>             Thanks
>>>>>
>>>>>             Will McLaren
>>>>>             Ensembl Variation
>>>>>
>>>>>
>>>>>             On 3 June 2015 at 16:16, Guillermo Marco Puche
>>>>>             <guillermo.marco at sistemasgenomicos.com
>>>>>             <mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
>>>>>
>>>>>                 Dear devs,
>>>>>
>>>>>                 I'm trying ensembl 79 VEP.
>>>>>
>>>>>                 This is my dummy input VCF:
>>>>>                 http://pastebin.com/kFKWH50q#
>>>>>
>>>>>                 I've cloned and installed API from github as
>>>>>                 always (this step is repeated for variaton,
>>>>>                 funcgen and compara):
>>>>>
>>>>>                   * git clone --branch release/79
>>>>>                     https://github.com/Ensembl/ensembl.git ensembl_79
>>>>>
>>>>>                 PERL5LIB env variable is correctly pointing to the
>>>>>                 cloned API:
>>>>>
>>>>>                   * echo $PERL5LIB
>>>>>                     /share/apps/local/bioperl-live:/share/apps/src/ensembl_79/modules:/share/apps/src/ensembl_79-compara/modules:/share/apps/src/ensembl_79-variation/modules:/share/apps/src/ensembl_79-functgenomics/modules
>>>>>
>>>>>                 However I'm getting a lot of errors I really don't
>>>>>                 understand. It seems like a bug with API
>>>>>                 installation with me. If I change $PERL5LIB
>>>>>                 variable to point to 75 API (previous version I
>>>>>                 was using) I can't reproduce the errors VEP script
>>>>>                 works for this old 75 version.
>>>>>
>>>>>                 I've been reading the docs again and I can't seen
>>>>>                 any additional PERL library requirement.
>>>>>
>>>>>                 Here's the error log: http://pastebin.com/VvQrkEQZ
>>>>>
>>>>>
>>>>>                 Thank you!
>>>>>
>>>>>                 Best regards,
>>>>>                 Guille.
>>>>>
>>>>>
>>>>>                 _______________________________________________
>>>>>                 Dev mailing list Dev at ensembl.org
>>>>>                 <mailto:Dev at ensembl.org>
>>>>>                 Posting guidelines and subscribe/unsubscribe info:
>>>>>                 http://lists.ensembl.org/mailman/listinfo/dev
>>>>>                 Ensembl Blog: http://www.ensembl.info/
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>             _______________________________________________
>>>>>             Dev mailing listDev at ensembl.org  <mailto:Dev at ensembl.org>
>>>>>             Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>>>>>             Ensembl Blog:http://www.ensembl.info/
>>>>
>>>>
>>>>
>>>>             _______________________________________________
>>>>             Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>>>>             Posting guidelines and subscribe/unsubscribe info:
>>>>             http://lists.ensembl.org/mailman/listinfo/dev
>>>>             Ensembl Blog: http://www.ensembl.info/
>>>>
>>>>
>>>>
>>>>
>>>>         _______________________________________________
>>>>         Dev mailing listDev at ensembl.org  <mailto:Dev at ensembl.org>
>>>>         Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>>>>         Ensembl Blog:http://www.ensembl.info/
>>>
>>>
>>>
>>>
>>>         _______________________________________________
>>>         Dev mailing listDev at ensembl.org  <mailto:Dev at ensembl.org>
>>>         Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>>>         Ensembl Blog:http://www.ensembl.info/
>>
>>
>>
>>         _______________________________________________
>>         Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>>         Posting guidelines and subscribe/unsubscribe info:
>>         http://lists.ensembl.org/mailman/listinfo/dev
>>         Ensembl Blog: http://www.ensembl.info/
>>
>>
>>
>>     _______________________________________________
>>     Dev mailing listDev at ensembl.org  <mailto:Dev at ensembl.org>
>>     Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>>     Ensembl Blog:http://www.ensembl.info/
>
>     _______________________________________________
>     Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>     Posting guidelines and subscribe/unsubscribe info:
>     http://lists.ensembl.org/mailman/listinfo/dev
>     Ensembl Blog: http://www.ensembl.info/
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150604/3e5d851e/attachment.html>


More information about the Dev mailing list