[ensembl-dev] Input information plugin for a variation on same chromosome and position

Guillermo Marco Puche guillermo.marco at sistemasgenomicos.com
Thu Jun 6 11:26:08 BST 2013


Hello Duarte,

I know it's standard VCF. You're totally right, but my workmates are 
using their own modified vcf.
So I've to deal with.

Anyways thanks to your code Duarte I can access VCF line with object 
data straight. So I don't need tabix.

Best regards,
Guillermo.

On 06/06/2013 12:19 PM, Duarte Molha wrote:
>
> Dear Guillermo
>
> How did you create the VCF file that produced the 2 lines that gave 
> you the problem?
>
> Correct me if I am wrong DEVS, but I believe the file you indicated is 
> not following the VCF specifications... Was this 2 individuals you 
> were trying to merge? If so the 2 variations should be merged into 1 
> line like so:
>
> #CHROM    POS    ID    REF    ALT QUAL    FILTER    INFO    FORMAT    
> Sample1          Sample2
> chr11    123502514    .    G    A,C    1000    . GT:GK:VS:GF:PA:F3    
> 0/1:0:2:0:2:0     0/2:0:2:0:2:0
>
> Use something like vcf-merge (from vcf tools to correctly merge VCF files.
>
> If this was 1 individual then it would have become
>
> #CHROM    POS    ID    REF    ALT QUAL    FILTER    INFO    FORMAT    
> Sample1
>
> chr11    123502514    .    G    A,C 1000    .        
> GT:GK:VS:GF:PA:F3    1/2:0:2:0:2:0
>
> just concatenating and sorting 2 vcf files will not be correct.
>
> Just to make sure your file is valid you could probably use the 
> vcf-validator (also in vcf-tools)
>
> Best regards
>
>
> Duarte
>
> *From:*dev-bounces at ensembl.org [mailto:dev-bounces at ensembl.org] *On 
> Behalf Of *Guillermo Marco Puche
> *Sent:* 06 June 2013 10:50
> *To:* dev at ensembl.org
> *Subject:* Re: [ensembl-dev] Input information plugin for a variation 
> on same chromosome and position
>
> Thanks to Duarte's code I've fixed my plugin to parse VCF input.
>
> It's a shame all this information isn't on a devs guide for VEP. At 
> least some information about all the available objects whould be nice.
>
> Regards,
> Guillermo.
>
>
> On 06/05/2013 06:20 PM, Guillermo Marco Puche wrote:
>
>     Hello,
>
>     I've noticed that the plugin I'm using to parse my VCF is being
>     wrong in one case.
>
>     In the case there's 2 variants in the same chromosome and the same
>     position:
>
>     Here's the example:
>
>     Input VCF:
>
>     #CHROM    POS    ID    REF    ALT    QUAL    FILTER INFO   
>     FORMAT    DATA
>     chr11    123502514    .    G    A    1000    .
>     GT:GK:VS:GF:PA:F3    0/1:0:2:0:2:0
>     chr11    123502514    .    G    C    1000    .
>     GT:GK:VS:GF:PA:F3    0/1:0:2:0:2:0
>
>
>     The output is correct it reports all variants but the column
>     refering to VAR ALLELE (which is parsed by my plugin) "C" allele
>     is being reported in both cases. This is wrong. "A" and "C"
>     alleles should be reported.
>
>     I suppose this is due because my plugin access the VCF input to
>     parse information with Tabix.  And tabix access VCF input file
>     with chr, start and end position. The chromosome and position
>     being the same for both changes then the output from parse is
>     incorrect.
>
>     I had to do this due to this because the VCF input used by my
>     workmates is a bit weird.
>
>     The code for my vcf_input.pm parser is located here:
>     https://github.com/guillermomarco/vep_plugins_71/blob/master/vcf_input.pm
>
>     Important lines for this problem are from line 102 to 126 (tabix
>     related).
>
>     If accessing VCF file with tabix it's impossible to distinguish
>     between two variations in same position, is there any other way I
>     can access the VCF for the the current variation consequence
>     without having to parse the whole input file?
>
>     I know Duarte Molha had a script to get VCF input information for
>     the input line of consequence being calculated.
>
>     This must be the important line to access the input VCF line object:
>
>     *my $line = $vf->{base_variation_feature_overlap}->{base_variation_feature}->{_line};*
>
>
>     This is par of Duarte's code:
>
>     sub run {
>
>          my $self = shift;
>
>          my $vf = shift;
>
>          my $line_hash = shift;
>
>          my $config = $self->{config};
>
>          my $ind_cols = $config->{ind_cols};
>
>          my $line = $vf->{base_variation_feature_overlap}->{base_variation_feature}->{_line};
>
>          my $individual = $vf->{base_variation_feature_overlap}->{base_variation_feature}->{individual};
>
>          my @split_line = split /[\s\t]+/, $line;
>
>          my $qual_score = $split_line[5];
>
>          my @gt_format  = split /:/, $split_line[8];
>
>          my @gt_data    = split /:/, $split_line[$ind_cols->{$individual}];
>
>          my $results = {map { shift @gt_format => $_ } @gt_data};
>
>          $results->{"quality_score"} = $qual_score;
>
>       
>
>          return $results;
>
>     }
>
>     Thank you.
>
>     Best regards,
>     Guillermo.
>
>
>
>     _______________________________________________
>
>     Dev mailing listDev at ensembl.org  <mailto:Dev at ensembl.org>
>
>     Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>
>     Ensembl Blog:http://www.ensembl.info/
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130606/b2c89ac6/attachment.html>


More information about the Dev mailing list