[ensembl-dev] Input information plugin for a variation on same chromosome and position
Duarte Molha
duartemolha at gmail.com
Thu Jun 6 11:47:13 BST 2013
But if they are for different individuals then you could just change the
header information for each file and use VCF-merge to deal with these
issues...
IMHO, I think that your approach might result in unforeseen errors.
Best regards
Duarte
=========================
Duarte Miguel Paulo Molha
http://about.me/duarte
=========================
On Thu, Jun 6, 2013 at 11:26 AM, Guillermo Marco Puche <
guillermo.marco at sistemasgenomicos.com> wrote:
> Hello Duarte,
>
> I know it's standard VCF. You're totally right, but my workmates are using
> their own modified vcf.
> So I've to deal with.
>
> Anyways thanks to your code Duarte I can access VCF line with object data
> straight. So I don't need tabix.
>
> Best regards,
> Guillermo.
>
>
> On 06/06/2013 12:19 PM, Duarte Molha wrote:
>
> Dear Guillermo****
>
> ** **
>
> How did you create the VCF file that produced the 2 lines that gave you
> the problem?****
>
> ** **
>
> Correct me if I am wrong DEVS, but I believe the file you indicated is not
> following the VCF specifications… Was this 2 individuals you were trying to
> merge? If so the 2 variations should be merged into 1 line like so:****
>
> ** **
>
> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT
> Sample1 Sample2
> chr11 123502514 . G A,C 1000 .
> GT:GK:VS:GF:PA:F3 0/1:0:2:0:2:0 0/2:0:2:0:2:0****
>
> ** **
>
> Use something like vcf-merge (from vcf tools to correctly merge VCF files.
> ****
>
> ** **
>
> If this was 1 individual then it would have become****
>
> ** **
>
> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT
> Sample1****
>
> chr11 123502514 . G A,C 1000 .
> GT:GK:VS:GF:PA:F3 1/2:0:2:0:2:0****
>
> ** **
>
> ** **
>
> just concatenating and sorting 2 vcf files will not be correct.****
>
> ** **
>
> Just to make sure your file is valid you could probably use the
> vcf-validator (also in vcf-tools)****
>
> ** **
>
> Best regards****
>
>
> Duarte****
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>
> *From:* dev-bounces at ensembl.org [mailto:dev-bounces at ensembl.org<dev-bounces at ensembl.org>]
> *On Behalf Of *Guillermo Marco Puche
> *Sent:* 06 June 2013 10:50
> *To:* dev at ensembl.org
> *Subject:* Re: [ensembl-dev] Input information plugin for a variation on
> same chromosome and position****
>
> ** **
>
> Thanks to Duarte's code I've fixed my plugin to parse VCF input.
>
> It's a shame all this information isn't on a devs guide for VEP. At least
> some information about all the available objects whould be nice.
>
> Regards,
> Guillermo.
>
>
> On 06/05/2013 06:20 PM, Guillermo Marco Puche wrote:****
>
> Hello,
>
> I've noticed that the plugin I'm using to parse my VCF is being wrong in
> one case.
>
> In the case there's 2 variants in the same chromosome and the same
> position:
>
> Here's the example:
>
> Input VCF:
>
> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT
> DATA
> chr11 123502514 . G A 1000 . GT:GK:VS:GF:PA:F3
> 0/1:0:2:0:2:0
> chr11 123502514 . G C 1000 . GT:GK:VS:GF:PA:F3
> 0/1:0:2:0:2:0
>
>
> The output is correct it reports all variants but the column refering to
> VAR ALLELE (which is parsed by my plugin) "C" allele is being reported in
> both cases. This is wrong. "A" and "C" alleles should be reported.
>
> I suppose this is due because my plugin access the VCF input to parse
> information with Tabix. And tabix access VCF input file with chr, start
> and end position. The chromosome and position being the same for both
> changes then the output from parse is incorrect.
>
> I had to do this due to this because the VCF input used by my workmates is
> a bit weird.
>
> The code for my vcf_input.pm parser is located here:
> https://github.com/guillermomarco/vep_plugins_71/blob/master/vcf_input.pm
>
> Important lines for this problem are from line 102 to 126 (tabix related).
>
> If accessing VCF file with tabix it's impossible to distinguish between
> two variations in same position, is there any other way I can access the
> VCF for the the current variation consequence without having to parse the
> whole input file?
>
> I know Duarte Molha had a script to get VCF input information for the
> input line of consequence being calculated.
>
> This must be the important line to access the input VCF line object: *
> ***
>
> *my $line = $vf->{base_variation_feature_overlap}->{base_variation_feature}->{_line};*****
>
>
> This is par of Duarte's code:****
>
> sub run {****
>
> my $self = shift;****
>
> my $vf = shift;****
>
> my $line_hash = shift;****
>
> my $config = $self->{config};****
>
> my $ind_cols = $config->{ind_cols};****
>
> my $line = $vf->{base_variation_feature_overlap}->{base_variation_feature}->{_line};****
>
> my $individual = $vf->{base_variation_feature_overlap}->{base_variation_feature}->{individual};****
>
> my @split_line = split /[\s\t]+/, $line;****
>
> my $qual_score = $split_line[5];****
>
> my @gt_format = split /:/, $split_line[8];****
>
> my @gt_data = split /:/, $split_line[$ind_cols->{$individual}];****
>
> my $results = {map { shift @gt_format => $_ } @gt_data};****
>
> $results->{"quality_score"} = $qual_score;****
>
> ** **
>
> return $results;****
>
> }****
>
> Thank you.
>
> Best regards,
> Guillermo.
>
>
>
> ****
>
> _______________________________________________****
>
> Dev mailing list Dev at ensembl.org****
>
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev****
>
> Ensembl Blog: http://www.ensembl.info/****
>
> ** **
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130606/77b243f8/attachment.html>
More information about the Dev
mailing list