[ensembl-dev] Input information plugin for a variation on same chromosome and position

Guillermo Marco Puche guillermo.marco at sistemasgenomicos.com
Wed Jun 5 17:20:15 BST 2013


Hello,

I've noticed that the plugin I'm using to parse my VCF is being wrong in 
one case.

In the case there's 2 variants in the same chromosome and the same position:

Here's the example:

Input VCF:

#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO FORMAT    DATA
chr11    123502514    .    G    A    1000    . GT:GK:VS:GF:PA:F3    
0/1:0:2:0:2:0
chr11    123502514    .    G    C    1000    . GT:GK:VS:GF:PA:F3    
0/1:0:2:0:2:0


The output is correct it reports all variants but the column refering to 
VAR ALLELE (which is parsed by my plugin) "C" allele is being reported 
in both cases. This is wrong. "A" and "C" alleles should be reported.

I suppose this is due because my plugin access the VCF input to parse 
information with Tabix.  And tabix access VCF input file with chr, start 
and end position. The chromosome and position being the same for both 
changes then the output from parse is incorrect.

I had to do this due to this because the VCF input used by my workmates 
is a bit weird.

The code for my vcf_input.pm parser is located here: 
https://github.com/guillermomarco/vep_plugins_71/blob/master/vcf_input.pm

Important lines for this problem are from line 102 to 126 (tabix related).

If accessing VCF file with tabix it's impossible to distinguish between 
two variations in same position, is there any other way I can access the 
VCF for the the current variation consequence without having to parse 
the whole input file?

I know Duarte Molha had a script to get VCF input information for the 
input line of consequence being calculated.

This must be the important line to access the input VCF line object:

*my $line = $vf->{base_variation_feature_overlap}->{base_variation_feature}->{_line};*


This is par of Duarte's code:

sub run {

     my $self = shift;
     my $vf = shift;
     my $line_hash = shift;

     my $config = $self->{config};
     my $ind_cols = $config->{ind_cols};
     my $line = $vf->{base_variation_feature_overlap}->{base_variation_feature}->{_line};
     my $individual = $vf->{base_variation_feature_overlap}->{base_variation_feature}->{individual};
     my @split_line = split /[\s\t]+/, $line;
     my $qual_score = $split_line[5];
     my @gt_format  = split /:/, $split_line[8];
     my @gt_data    = split /:/, $split_line[$ind_cols->{$individual}];
     my $results = {map { shift @gt_format => $_ } @gt_data};
     $results->{"quality_score"} = $qual_score;

     return $results;

}

Thank you.

Best regards,
Guillermo.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130605/088d8bdf/attachment.html>


More information about the Dev mailing list