[ensembl-dev] Input information plugin for a variation on same chromosome and position

Guillermo Marco Puche guillermo.marco at sistemasgenomicos.com
Thu Jun 6 10:50:09 BST 2013


Thanks to Duarte's code I've fixed my plugin to parse VCF input.

It's a shame all this information isn't on a devs guide for VEP. At 
least some information about all the available objects whould be nice.

Regards,
Guillermo.


On 06/05/2013 06:20 PM, Guillermo Marco Puche wrote:
> Hello,
>
> I've noticed that the plugin I'm using to parse my VCF is being wrong 
> in one case.
>
> In the case there's 2 variants in the same chromosome and the same 
> position:
>
> Here's the example:
>
> Input VCF:
>
> #CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO FORMAT    DATA
> chr11    123502514    .    G    A    1000    . GT:GK:VS:GF:PA:F3    
> 0/1:0:2:0:2:0
> chr11    123502514    .    G    C    1000    . GT:GK:VS:GF:PA:F3    
> 0/1:0:2:0:2:0
>
>
> The output is correct it reports all variants but the column refering 
> to VAR ALLELE (which is parsed by my plugin) "C" allele is being 
> reported in both cases. This is wrong. "A" and "C" alleles should be 
> reported.
>
> I suppose this is due because my plugin access the VCF input to parse 
> information with Tabix.  And tabix access VCF input file with chr, 
> start and end position. The chromosome and position being the same for 
> both changes then the output from parse is incorrect.
>
> I had to do this due to this because the VCF input used by my 
> workmates is a bit weird.
>
> The code for my vcf_input.pm parser is located here: 
> https://github.com/guillermomarco/vep_plugins_71/blob/master/vcf_input.pm
>
> Important lines for this problem are from line 102 to 126 (tabix related).
>
> If accessing VCF file with tabix it's impossible to distinguish 
> between two variations in same position, is there any other way I can 
> access the VCF for the the current variation consequence without 
> having to parse the whole input file?
>
> I know Duarte Molha had a script to get VCF input information for the 
> input line of consequence being calculated.
>
> This must be the important line to access the input VCF line object:
> *my $line = $vf->{base_variation_feature_overlap}->{base_variation_feature}->{_line};*
>
> This is par of Duarte's code:
>
> sub run {
>      my $self = shift;
>      my $vf = shift;
>      my $line_hash = shift;
>      my $config = $self->{config};
>      my $ind_cols = $config->{ind_cols};
>      my $line = $vf->{base_variation_feature_overlap}->{base_variation_feature}->{_line};
>      my $individual = $vf->{base_variation_feature_overlap}->{base_variation_feature}->{individual};
>      my @split_line = split /[\s\t]+/, $line;
>      my $qual_score = $split_line[5];
>      my @gt_format  = split /:/, $split_line[8];
>      my @gt_data    = split /:/, $split_line[$ind_cols->{$individual}];
>      my $results = {map { shift @gt_format => $_ } @gt_data};
>      $results->{"quality_score"} = $qual_score;
>
>      return $results;
> }
> Thank you.
>
> Best regards,
> Guillermo.
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130606/fea05784/attachment.html>


More information about the Dev mailing list