[ensembl-dev] Bug in VEP

Will McLaren wm2 at ebi.ac.uk
Fri Sep 14 13:26:08 BST 2012


Hi Duarte,

The best way I've found to locate offending lines is to iterate the
following procedure - a pain, but it works!

1. reduce buffer size
2. run until it crashes - note last "Processed n total variants" line
3. do "head -n [n + buffer] input_file | tail -n [buffer] >
temp_input_file" (you'll need to account for header lines here too if
you're using e.g. VCF

Once temp_input_file gets down to a sensible size (a few hundred
maybe) you can run with "--buffer_size 1" and when it crashes the
exact line number on the input file should appear as <GEN0> line n in
the error output.

Will

On 14 September 2012 13:15, Duarte Molha <Duarte.Molha at ogt.co.uk> wrote:
> Hi Will
>
> Here are the different chromosomes that the input file contains:
>
> 1
> 2
> 3
> 4
> 5
> 6
> 7
> 8
> 9
> 10
> 11
> 12
> 13
> 14
> 15
> 16
> 17
> 18
> 19
> 20
> 21
> 22
> MT
> X
> Y
> GL000192.1
> GL000193.1
> GL000194.1
> GL000195.1
> GL000198.1
> GL000199.1
> GL000203.1
> GL000204.1
> GL000205.1
> GL000208.1
> GL000209.1
> GL000211.1
> GL000212.1
> GL000214.1
> GL000216.1
> GL000217.1
> GL000218.1
> GL000219.1
> GL000220.1
> GL000221.1
> GL000222.1
> GL000224.1
> GL000225.1
> GL000228.1
> GL000229.1
> GL000230.1
> GL000231.1
> GL000232.1
> GL000233.1
> GL000234.1
> GL000235.1
> GL000237.1
> GL000238.1
> GL000239.1
> GL000240.1
> GL000241.1
> GL000247.1
>
> As for your request ... Can you tell me what is the best way of determining the line of input that is causing the problem?
>
> Best regards,
>
> Duarte
>
> -----Original Message-----
> From: dev-bounces at ensembl.org [mailto:dev-bounces at ensembl.org] On Behalf Of Will McLaren
> Sent: 14 September 2012 13:07
> To: Ensembl developers list
> Subject: Re: [ensembl-dev] Bug in VEP
>
> I'm surprised you're seeing this error using --cache, since it shouldn't be fetching data from the DB adaptor unless it is outside the chromosome/position range covered by the cache.
>
> Do you have some weirdly named chromosomes in your input file, or perhaps some coordinates that may fall outside of the GRCh37 normal coordinates?
>
> In any case, I've committed a fix that should bypass this problem. Let me know if it doesn't work, and preferably in this case if you could give me a line of input that recreates the problem (as I've said before, this is always extremely useful when debugging) that would be great.
>
> Thanks
>
> Will
>
> On 14 September 2012 12:55, Nathan Johnson <njohnson at ebi.ac.uk> wrote:
>> Hi Duarte
>>
>> That is caused by a bad logical test which I have corrected in the
>> patch below(updated on the head). However, this is indicative of
>> another problem as the reason it is dying is due to the Slice argument
>> being passed to the method in question, being undefined.
>>
>> ...over to Will
>>
>> Thanks
>>
>> Nath
>>
>>
>> Index: BaseFeatureAdaptor.pm
>> ===================================================================
>> RCS file:
>> /nfs/ensembl/cvsroot/ensembl-functgenomics/modules/Bio/EnsEMBL/Funcgen
>> /DBSQL/BaseFeatureAdaptor.pm,v
>> retrieving revision 1.65
>> diff -u -r1.65 BaseFeatureAdaptor.pm
>> --- BaseFeatureAdaptor.pm       16 Jul 2012 12:10:40 -0000      1.65
>> +++ BaseFeatureAdaptor.pm       14 Sep 2012 11:52:48 -0000
>> @@ -5,7 +5,7 @@
>>
>>  =head1 LICENSE
>>
>> -  Copyright (c) 1999-2011 The European Bioinformatics Institute and
>> +  Copyright (c) 1999-2012 The European Bioinformatics Institute and
>>    Genome Research Limited.  All rights reserved.
>>
>>    This software is distributed under a modified Apache license.
>> @@ -132,8 +132,8 @@
>>
>>    my @result;
>>
>> -  if(!ref($slice) || !$slice->isa("Bio::EnsEMBL::Slice")) {
>> -    throw("Bio::EnsEMBL::Slice argument expected.");
>> +  if(! (ref($slice) && $slice->isa('Bio::EnsEMBL::Slice')) {
>> +    throw('Bio::EnsEMBL::Slice argument expected.');
>>    }
>>
>>    $constraint ||= '';
>>
>>
>>
>>
>>
>> On 14 Sep 2012, at 12:44, Duarte Molha wrote:
>>
>> Dear developers,
>>
>> I am sorry to bother you again (especially Will J ) but in testing the
>> VEP26, I got a crash using the command:
>>
>> perl variant_effect_predictor.pl -i input_variants.vcf.gz -o
>> output.ann --config vep.ini
>>
>> Here is the error output:
>>
>> 2012-09-14 12:29:30 - Reading regulatory data from cache and/or
>> database [==================================>
>> ]   [ 22% ]
>> Can't call method "isa" on unblessed reference at
>> /NGS_Tools/bin/VEP_26_testing/Bio/EnsEMBL/Funcgen/DBSQL/BaseFeatureAda
>> ptor.pm
>> line 135, <GEN0> line 28406.
>>
>> In attachment is the config file I use.
>>
>> Best regards,
>>
>> Duarte Molha
>>
>> <vep.ini>_______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> List admin (including subscribe/unsubscribe):
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>> Nathan Johnson
>> Senior Scientific Programmer
>> Ensembl Regulation
>> European Bioinformatics Institute
>> Wellcome Trust Genome Campus
>> Hinxton
>> Cambridge CB10 1SD
>>
>> http://www.ensembl.info/
>> http://twitter.com/#!/ensembl
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> List admin (including subscribe/unsubscribe):
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/




More information about the Dev mailing list