[ensembl-dev] Input information plugin for a variation on same chromosome and position
Guillermo Marco Puche
guillermo.marco at sistemasgenomicos.com
Thu Jun 6 11:26:08 BST 2013
Hello Duarte,
I know it's standard VCF. You're totally right, but my workmates are
using their own modified vcf.
So I've to deal with.
Anyways thanks to your code Duarte I can access VCF line with object
data straight. So I don't need tabix.
Best regards,
Guillermo.
On 06/06/2013 12:19 PM, Duarte Molha wrote:
>
> Dear Guillermo
>
> How did you create the VCF file that produced the 2 lines that gave
> you the problem?
>
> Correct me if I am wrong DEVS, but I believe the file you indicated is
> not following the VCF specifications... Was this 2 individuals you
> were trying to merge? If so the 2 variations should be merged into 1
> line like so:
>
> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT
> Sample1 Sample2
> chr11 123502514 . G A,C 1000 . GT:GK:VS:GF:PA:F3
> 0/1:0:2:0:2:0 0/2:0:2:0:2:0
>
> Use something like vcf-merge (from vcf tools to correctly merge VCF files.
>
> If this was 1 individual then it would have become
>
> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT
> Sample1
>
> chr11 123502514 . G A,C 1000 .
> GT:GK:VS:GF:PA:F3 1/2:0:2:0:2:0
>
> just concatenating and sorting 2 vcf files will not be correct.
>
> Just to make sure your file is valid you could probably use the
> vcf-validator (also in vcf-tools)
>
> Best regards
>
>
> Duarte
>
> *From:*dev-bounces at ensembl.org [mailto:dev-bounces at ensembl.org] *On
> Behalf Of *Guillermo Marco Puche
> *Sent:* 06 June 2013 10:50
> *To:* dev at ensembl.org
> *Subject:* Re: [ensembl-dev] Input information plugin for a variation
> on same chromosome and position
>
> Thanks to Duarte's code I've fixed my plugin to parse VCF input.
>
> It's a shame all this information isn't on a devs guide for VEP. At
> least some information about all the available objects whould be nice.
>
> Regards,
> Guillermo.
>
>
> On 06/05/2013 06:20 PM, Guillermo Marco Puche wrote:
>
> Hello,
>
> I've noticed that the plugin I'm using to parse my VCF is being
> wrong in one case.
>
> In the case there's 2 variants in the same chromosome and the same
> position:
>
> Here's the example:
>
> Input VCF:
>
> #CHROM POS ID REF ALT QUAL FILTER INFO
> FORMAT DATA
> chr11 123502514 . G A 1000 .
> GT:GK:VS:GF:PA:F3 0/1:0:2:0:2:0
> chr11 123502514 . G C 1000 .
> GT:GK:VS:GF:PA:F3 0/1:0:2:0:2:0
>
>
> The output is correct it reports all variants but the column
> refering to VAR ALLELE (which is parsed by my plugin) "C" allele
> is being reported in both cases. This is wrong. "A" and "C"
> alleles should be reported.
>
> I suppose this is due because my plugin access the VCF input to
> parse information with Tabix. And tabix access VCF input file
> with chr, start and end position. The chromosome and position
> being the same for both changes then the output from parse is
> incorrect.
>
> I had to do this due to this because the VCF input used by my
> workmates is a bit weird.
>
> The code for my vcf_input.pm parser is located here:
> https://github.com/guillermomarco/vep_plugins_71/blob/master/vcf_input.pm
>
> Important lines for this problem are from line 102 to 126 (tabix
> related).
>
> If accessing VCF file with tabix it's impossible to distinguish
> between two variations in same position, is there any other way I
> can access the VCF for the the current variation consequence
> without having to parse the whole input file?
>
> I know Duarte Molha had a script to get VCF input information for
> the input line of consequence being calculated.
>
> This must be the important line to access the input VCF line object:
>
> *my $line = $vf->{base_variation_feature_overlap}->{base_variation_feature}->{_line};*
>
>
> This is par of Duarte's code:
>
> sub run {
>
> my $self = shift;
>
> my $vf = shift;
>
> my $line_hash = shift;
>
> my $config = $self->{config};
>
> my $ind_cols = $config->{ind_cols};
>
> my $line = $vf->{base_variation_feature_overlap}->{base_variation_feature}->{_line};
>
> my $individual = $vf->{base_variation_feature_overlap}->{base_variation_feature}->{individual};
>
> my @split_line = split /[\s\t]+/, $line;
>
> my $qual_score = $split_line[5];
>
> my @gt_format = split /:/, $split_line[8];
>
> my @gt_data = split /:/, $split_line[$ind_cols->{$individual}];
>
> my $results = {map { shift @gt_format => $_ } @gt_data};
>
> $results->{"quality_score"} = $qual_score;
>
>
>
> return $results;
>
> }
>
> Thank you.
>
> Best regards,
> Guillermo.
>
>
>
> _______________________________________________
>
> Dev mailing listDev at ensembl.org <mailto:Dev at ensembl.org>
>
> Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>
> Ensembl Blog:http://www.ensembl.info/
>
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130606/b2c89ac6/attachment.html>
More information about the Dev
mailing list