[ensembl-dev] Warning message in from VariationFeature.pm
Duarte Molha
duartemolha at gmail.com
Thu Aug 1 10:39:11 BST 2013
On a related note Will
I have a plugin that I want to run for every output annotation line. It
basicaly adds the genotype fields form the VCF into as extra fields ...
It works fine for the large majority of cases but in some,
the $vf->{base_variation_feature_overlap}->{base_variation_feature}->{_line}
seems to be undefined (I've highlighted the line in question with commments)
since I require this line to extract the fields I am interested in, can you
tell me what I might be doing wrong.
Here is the code of the plugin:
###########################################
=head1 LICENSE
Selected_VCF_fields_output
Copyright (C) 2013 Duarte Molha
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
=head1 CONTACT
Questions may also be sent to <duartemolha at gmail.com>.
=cut
=head1 NAME
Selected_VCF_fields_output
=head1 SYNOPSIS
mv Selected_VCF_fields_output.pm ~/.vep/Plugins
perl variant_effect_predictor.pl -i variations.vcf --plugin
Selected_VCF_fields_output
=head1 DESCRIPTION
This plugin retrieves the quality score fields and the genotype fields
from the input VFC and outputs them in the output tab delimited annotation
file
=cut
package Selected_VCF_fields_output;
use base qw(Bio::EnsEMBL::Variation::Utils::BaseVepPlugin);
use strict;
use warnings;
sub version {
return '71';
}
sub new {
my $class = shift;
my $self = $class->SUPER::new(@_);
return $self;
}
sub get_header_info {
return {
"quality_score" => "Quality score from VCF input Field",
"GT_PARAMS_AD" => "Allelic depths for the ref and alt alleles in the order
listed",
"GT_PARAMS_DP" => "Read Depth (only filtered reads used for calling)",
"GT_PARAMS_GQ" => "Genotype Quality",
"GT_PARAMS_GT" => "Genotype",
"GT_PARAMS_PL" => "Normalized, Phred-scaled likelihoods for AA,AB,BB
genotypes where A=ref and B=alt; not applicable if site is not biallelic",
"GT_PARAMS_SDP" => "Raw Read Depth as reported by SAMtools",
"GT_PARAMS_RD" => "Depth of reference-supporting bases (reads1)",
"GT_PARAMS_FREQ" => "Variant allele frequency",
"GT_PARAMS_PVAL" => "P-value from Fisher's Exact Test",
"GT_PARAMS_RBQ" => "Average quality of reference-supporting bases (qual1)",
"GT_PARAMS_ABQ" => "Average quality of variant-supporting bases (qual2)",
"GT_PARAMS_RDF" => "Depth of reference-supporting bases on forward strand
(reads1plus)",
"GT_PARAMS_RDR" => "Depth of reference-supporting bases on reverse strand
(reads1minus)",
"GT_PARAMS_ADF" => "Depth of variant-supporting bases on forward strand
(reads2plus)",
"GT_PARAMS_ADR" => "Depth of variant-supporting bases on reverse strand
(reads2minus)",
};
}
sub feature_types {
return ['Feature', 'Intergenic'];
}
sub run {
my $self = shift;
my $vf = shift;
my $line_hash = shift;
my $config = $self->{config};
if(defined($config->{individual}) && $config->{format} eq 'vcf') {
my $ind_cols = $config->{ind_cols};
############################################################################################################################
my $line =
$vf->{base_variation_feature_overlap}->{base_variation_feature}->{_line};
* # in this line sometimes the {_line} field is undef.
Why???*
############################################################################################################################
my $individual =
$vf->{base_variation_feature_overlap}->{base_variation_feature}->{individual};
my @split_line = split /[\s\t]+/, $line;
my @gt_format = split /:/, $split_line[8];
foreach my $p (@gt_format){
$p = "GT_PARAMS_".$p ;
}
my @gt_data = split /:/, $split_line[$ind_cols->{$individual}];
my $results = {map { shift @gt_format => $_ } @gt_data};
$results->{"quality_score"} = $split_line[5];
return $results;
}else{
return {};
}
}
1;
###########################################################
=========================
Duarte Miguel Paulo Molha
http://about.me/duarte
=========================
On Thu, Aug 1, 2013 at 9:49 AM, Duarte Molha <duartemolha at gmail.com> wrote:
> Thanks Will
>
> I should have checked that before asking :S
>
> I'll redownload and check if the error is gone ... thanks
>
> Duarte
>
>
> =========================
> Duarte Miguel Paulo Molha
> http://about.me/duarte
> =========================
>
>
> On Thu, Aug 1, 2013 at 9:46 AM, Will McLaren <wm2 at ebi.ac.uk> wrote:
>
>> Hi Duarte,
>>
>> I think this is a bug I've already found and fixed - can you update your
>> ensembl-variation API and try again?
>>
>> Here's the fix for reference:
>>
>>
>> http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl-variation/modules/Bio/EnsEMBL/Variation/Utils/VEP.pm?root=ensembl&r1=1.101.2.4&r2=1.101.2.5
>>
>> Will
>>
>>
>> On 1 August 2013 09:33, Duarte Molha <duartemolha at gmail.com> wrote:
>>
>>> I believe the main problem is that this Variation feature, for some
>>> reason does not have a splice attached to it : 'slice' => undef
>>>
>>> so the method to extract the slice and expand
>>>
>>>
>>> 471: my $slice = $self->feature_Slice->expand(
>>> 472: MAX_DISTANCE_FROM_TRANSCRIPT,
>>> 473: MAX_DISTANCE_FROM_TRANSCRIPT
>>> 474: );
>>>
>>> Fails.
>>>
>>> Anyone knows what might be causing this?
>>>
>>> Best regards
>>>
>>> Duarte
>>>
>>>
>>>
>>> =========================
>>> Duarte Miguel Paulo Molha
>>> http://about.me/duarte
>>> =========================
>>>
>>>
>>> On Wed, Jul 31, 2013 at 5:05 PM, Duarte Molha <duartemolha at gmail.com>wrote:
>>>
>>>> In an effort to understand better what might be causing this ... here
>>>> is a dumb of one such object causing the error message:
>>>>
>>>> the VCF line:
>>>> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT
>>>> sample-01 sample-02 sample-03 sample-04 sample-05 sample-06
>>>> 1 I777437 I. IA IC I667.93 IPASS
>>>> IAC=1;AF=0.083;AN=12;BaseQRankSum=6.089;DP=487;Dels=0.00;FS=1.662;HRun=0;HaplotypeScore=0.9613;MQ=44.32;MQ0=49;MQRankSum=-1.435;QD=8.35;ReadPosRankSum=-1.520;SB=-247.14;set=variant2
>>>> IGT:AD:DP:GQ:PL I0/0:60,0:60:99:0,138,1819 I0/0:82,0:82:99:0,205,2614
>>>> I0/1:52,28:80:99:706,0,1290 I0/0:100,0:100:99:0,253,3074
>>>> I0/0:83,0:83:99:0,178,2360 I0/0:82,0:82:99:0,166,2135
>>>>
>>>> at line 471 of /Bio/EnsEMBL/Variation/VariationFeature.pm
>>>>
>>>> my object $self contains:
>>>>
>>>> 0 Bio::EnsEMBL::Variation::VariationFeature=HASH(0x57802b0)
>>>> '_line' =>
>>>> "1\cI777437\cI.\cIA\cIC\cI667.93\cIPASS\cIAC=1;AF=0.083;AN=12;BaseQRankSum=6.089;DP=487;Dels=0.00;FS=1.662;HRun=0;HaplotypeScore=0.9613;MQ=44.32;MQ0=49;MQRankSum=-1.435;QD=8.35;ReadPosRankSum=-1.520;SB=-247.14;set=variant2\cIGT:AD:DP:GQ:PL\cI0/0:60,0:60:99:0,138,1819\cI0/0:82,0:82:99:0,205,2614\cI0/1:52,28:80:99:706,0,1290\cI0/0:100,0:100:99:0,253,3074\cI0/0:83,0:83:99:0,178,2360\cI0/0:82,0:82:99:0,166,2135"
>>>> 'adaptor' =>
>>>> Bio::EnsEMBL::Variation::DBSQL::VariationFeatureAdaptor=HASH(0x4b06030)
>>>> '_is_multispecies' => ''
>>>> 'db' => Bio::EnsEMBL::Variation::DBSQL::DBAdaptor=HASH(0x52b4f08)
>>>> '_dbc' => Bio::EnsEMBL::DBSQL::DBConnection=HASH(0x52b50d0)
>>>> '_dbname' => 'homo_sapiens_variation_72_37'
>>>> '_driver' => 'mysql'
>>>> '_host' => 'ensembldb.ensembl.org'
>>>> '_port' => 5306
>>>> '_query_count' => 4
>>>> '_timeout' => 0
>>>> '_username' => 'anonymous'
>>>> 'connected32406' => 1
>>>> 'db_handle32406' => DBI::db=HASH(0x51e5ee8)
>>>> empty hash
>>>> 'reconnect_when_lost' => 1
>>>> '_group' => 'variation'
>>>> '_is_multispecies' => ''
>>>> '_no_cache' => 1
>>>> '_species' => 'homo_sapiens'
>>>> '_species_id' => 1
>>>> 'dbc' => Bio::EnsEMBL::DBSQL::DBConnection=HASH(0x52b50d0)
>>>> -> REUSED_ADDRESS
>>>> 'species_id' => 1
>>>> 'allele_string' => 'A'
>>>> 'chr' => 1
>>>> 'end' => 777437
>>>> 'existing' => ARRAY(0x10861900)
>>>> empty array
>>>> 'genotype' => ARRAY(0x577ff80)
>>>> 0 'A'
>>>> 1 'A'
>>>> 'individual' => 'sample-01'
>>>> 'map_weight' => 1
>>>> 'non_variant' => 1
>>>> 'phased' => 1
>>>> 'slice' => undef
>>>> 'start' => 777437
>>>> 'strand' => 1
>>>> 'variation_name' => '1_777437_A'
>>>>
>>>>
>>>>
>>>>
>>>> =========================
>>>> Duarte Miguel Paulo Molha
>>>> http://about.me/duarte
>>>> =========================
>>>>
>>>>
>>>> On 31 July 2013 13:52, Duarte Molha <duartemolha at gmail.com> wrote:
>>>>
>>>>> Hi Devs
>>>>>
>>>>> I have been trying to run a VCF file by the variant annotation script
>>>>> and I've been getting a warning message that I have never before
>>>>> encountered..
>>>>>
>>>>> I was wondering if someone could let me know if it is something I am
>>>>> doing wrong…
>>>>>
>>>>> The message is :
>>>>>
>>>>> *Can't call method "expand" on an undefined value at
>>>>> <sic>/Bio/EnsEMBL/Variation/VariationFeature.pm line 471*
>>>>>
>>>>> **
>>>>>
>>>>> * *
>>>>>
>>>>> Here are the configuration options I am using:
>>>>>
>>>>>
>>>>>
>>>>> Configuration options:
>>>>>
>>>>> ###
>>>>>
>>>>> allow_non_variant 1
>>>>>
>>>>> cache 1
>>>>>
>>>>> canonical 1
>>>>>
>>>>> ccds 1
>>>>>
>>>>> check_alleles 1
>>>>>
>>>>> check_existing 1
>>>>>
>>>>> config vep_human.ini
>>>>>
>>>>> core_type core
>>>>>
>>>>> custom
>>>>> ./vep_additional_annotations/Somatic_variation_phenotypes.bed.gz,Somatic,bed,exact
>>>>> ./vep_additional_annotations/dbsnp135_ensembl_variation_phenotype.bed.gz,dbsnp135,bed,exact
>>>>>
>>>>> db_version 72
>>>>>
>>>>> dir /ReferenceData/vep_cache
>>>>>
>>>>> dir_cache /ReferenceData/vep_cache
>>>>>
>>>>> dir_plugins ./Plugins
>>>>>
>>>>> domains 1
>>>>>
>>>>> force_overwrite 1
>>>>>
>>>>> fork 5
>>>>>
>>>>> gmaf 1
>>>>>
>>>>> hgnc 1
>>>>>
>>>>> host ensembldb.ensembl.org
>>>>>
>>>>> individual all
>>>>>
>>>>> input_file All_BOTH_SNPINDELfilter_PASSED.vcf
>>>>>
>>>>> maf_1kg 1
>>>>>
>>>>> numbers 1
>>>>>
>>>>> output_file All_BOTH_SNPINDELfilter_PASSED.ann
>>>>>
>>>>> plugin
>>>>> OGT_NHBLI_MAF,/ReferenceData/NHLBI_EVS/NHLBI_OGT.gz
>>>>> OGT_selected_VCF_fields_output Blosum62 Carol OGT_Condel,b
>>>>> OGT_Grantham TSSDistance Downstream
>>>>>
>>>>> polyphen b
>>>>>
>>>>> port 5306
>>>>>
>>>>> protein 1
>>>>>
>>>>> regulatory 1
>>>>>
>>>>> sift b
>>>>>
>>>>> species homo_sapiens
>>>>>
>>>>> stats HASH(0x4370ad8)
>>>>>
>>>>> terms SO
>>>>>
>>>>> verbose 1
>>>>>
>>>>>
>>>>>
>>>>> I would be very grateful for your help.
>>>>>
>>>>> Duarte Molha
>>>>>
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Dev mailing list Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info:
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>>
>>
>> _______________________________________________
>> Dev mailing list Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130801/c3dbed68/attachment.html>
More information about the Dev
mailing list