[ensembl-dev] Input information plugin for a variation on same chromosome and position
Guillermo Marco Puche
guillermo.marco at sistemasgenomicos.com
Thu Jun 6 10:50:09 BST 2013
Thanks to Duarte's code I've fixed my plugin to parse VCF input.
It's a shame all this information isn't on a devs guide for VEP. At
least some information about all the available objects whould be nice.
Regards,
Guillermo.
On 06/05/2013 06:20 PM, Guillermo Marco Puche wrote:
> Hello,
>
> I've noticed that the plugin I'm using to parse my VCF is being wrong
> in one case.
>
> In the case there's 2 variants in the same chromosome and the same
> position:
>
> Here's the example:
>
> Input VCF:
>
> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT DATA
> chr11 123502514 . G A 1000 . GT:GK:VS:GF:PA:F3
> 0/1:0:2:0:2:0
> chr11 123502514 . G C 1000 . GT:GK:VS:GF:PA:F3
> 0/1:0:2:0:2:0
>
>
> The output is correct it reports all variants but the column refering
> to VAR ALLELE (which is parsed by my plugin) "C" allele is being
> reported in both cases. This is wrong. "A" and "C" alleles should be
> reported.
>
> I suppose this is due because my plugin access the VCF input to parse
> information with Tabix. And tabix access VCF input file with chr,
> start and end position. The chromosome and position being the same for
> both changes then the output from parse is incorrect.
>
> I had to do this due to this because the VCF input used by my
> workmates is a bit weird.
>
> The code for my vcf_input.pm parser is located here:
> https://github.com/guillermomarco/vep_plugins_71/blob/master/vcf_input.pm
>
> Important lines for this problem are from line 102 to 126 (tabix related).
>
> If accessing VCF file with tabix it's impossible to distinguish
> between two variations in same position, is there any other way I can
> access the VCF for the the current variation consequence without
> having to parse the whole input file?
>
> I know Duarte Molha had a script to get VCF input information for the
> input line of consequence being calculated.
>
> This must be the important line to access the input VCF line object:
> *my $line = $vf->{base_variation_feature_overlap}->{base_variation_feature}->{_line};*
>
> This is par of Duarte's code:
>
> sub run {
> my $self = shift;
> my $vf = shift;
> my $line_hash = shift;
> my $config = $self->{config};
> my $ind_cols = $config->{ind_cols};
> my $line = $vf->{base_variation_feature_overlap}->{base_variation_feature}->{_line};
> my $individual = $vf->{base_variation_feature_overlap}->{base_variation_feature}->{individual};
> my @split_line = split /[\s\t]+/, $line;
> my $qual_score = $split_line[5];
> my @gt_format = split /:/, $split_line[8];
> my @gt_data = split /:/, $split_line[$ind_cols->{$individual}];
> my $results = {map { shift @gt_format => $_ } @gt_data};
> $results->{"quality_score"} = $qual_score;
>
> return $results;
> }
> Thank you.
>
> Best regards,
> Guillermo.
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130606/fea05784/attachment.html>
More information about the Dev
mailing list