[ensembl-dev] Input information plugin for a variation on same chromosome and position

Duarte Molha duartemolha at gmail.com
Thu Jun 6 11:47:13 BST 2013


But if they are for different individuals then you could just change the
header information for each file and use VCF-merge to deal with these
issues...

IMHO, I think that your approach might result in unforeseen errors.

Best regards

Duarte

=========================
     Duarte Miguel Paulo Molha
         http://about.me/duarte
=========================


On Thu, Jun 6, 2013 at 11:26 AM, Guillermo Marco Puche <
guillermo.marco at sistemasgenomicos.com> wrote:

>  Hello Duarte,
>
> I know it's standard VCF. You're totally right, but my workmates are using
> their own modified vcf.
> So I've to deal with.
>
> Anyways thanks to your code Duarte I can access VCF line with object data
> straight. So I don't need tabix.
>
> Best regards,
> Guillermo.
>
>
> On 06/06/2013 12:19 PM, Duarte Molha wrote:
>
>  Dear Guillermo****
>
> ** **
>
> How did you create the VCF file that produced the 2 lines that gave you
> the problem?****
>
> ** **
>
> Correct me if I am wrong DEVS, but I believe the file you indicated is not
> following the VCF specifications… Was this 2 individuals you were trying to
> merge? If so the 2 variations should be merged into 1 line like so:****
>
> ** **
>
> #CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT
> Sample1          Sample2
> chr11    123502514    .    G    A,C    1000    .
> GT:GK:VS:GF:PA:F3    0/1:0:2:0:2:0     0/2:0:2:0:2:0****
>
> ** **
>
> Use something like vcf-merge (from vcf tools to correctly merge VCF files.
> ****
>
> ** **
>
> If this was 1 individual then it would have become****
>
> ** **
>
> #CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT
> Sample1****
>
> chr11    123502514    .    G    A,C    1000    .
> GT:GK:VS:GF:PA:F3    1/2:0:2:0:2:0****
>
> ** **
>
> ** **
>
> just concatenating and sorting 2 vcf files will not be correct.****
>
> ** **
>
> Just to make sure your file is valid you could probably use the
> vcf-validator (also in vcf-tools)****
>
> ** **
>
> Best regards****
>
>
> Duarte****
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>
> *From:* dev-bounces at ensembl.org [mailto:dev-bounces at ensembl.org<dev-bounces at ensembl.org>]
> *On Behalf Of *Guillermo Marco Puche
> *Sent:* 06 June 2013 10:50
> *To:* dev at ensembl.org
> *Subject:* Re: [ensembl-dev] Input information plugin for a variation on
> same chromosome and position****
>
> ** **
>
> Thanks to Duarte's code I've fixed my plugin to parse VCF input.
>
> It's a shame all this information isn't on a devs guide for VEP. At least
> some information about all the available objects whould be nice.
>
> Regards,
> Guillermo.
>
>
> On 06/05/2013 06:20 PM, Guillermo Marco Puche wrote:****
>
> Hello,
>
> I've noticed that the plugin I'm using to parse my VCF is being wrong in
> one case.
>
> In the case there's 2 variants in the same chromosome and the same
> position:
>
> Here's the example:
>
> Input VCF:
>
> #CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT
> DATA
> chr11    123502514    .    G    A    1000    .        GT:GK:VS:GF:PA:F3
> 0/1:0:2:0:2:0
> chr11    123502514    .    G    C    1000    .        GT:GK:VS:GF:PA:F3
> 0/1:0:2:0:2:0
>
>
> The output is correct it reports all variants but the column refering to
> VAR ALLELE (which is parsed by my plugin) "C" allele is being reported in
> both cases. This is wrong. "A" and "C" alleles should be reported.
>
> I suppose this is due because my plugin access the VCF input to parse
> information with Tabix.  And tabix access VCF input file with chr, start
> and end position. The chromosome and position being the same for both
> changes then the output from parse is incorrect.
>
> I had to do this due to this because the VCF input used by my workmates is
> a bit weird.
>
> The code for my vcf_input.pm parser is located here:
> https://github.com/guillermomarco/vep_plugins_71/blob/master/vcf_input.pm
>
> Important lines for this problem are from line 102 to 126 (tabix related).
>
> If accessing VCF file with tabix it's impossible to distinguish between
> two variations in same position, is there any other way I can access the
> VCF for the the current variation consequence without having to parse the
> whole input file?
>
> I know Duarte Molha had a script to get VCF input information for the
> input line of consequence being calculated.
>
> This must be the important line to access the input VCF line object:     *
> ***
>
> *my $line = $vf->{base_variation_feature_overlap}->{base_variation_feature}->{_line};*****
>
>
> This is par of Duarte's code:****
>
> sub run {****
>
>     my $self = shift;****
>
>     my $vf = shift;****
>
>     my $line_hash = shift;****
>
>     my $config = $self->{config};****
>
>     my $ind_cols = $config->{ind_cols};****
>
>     my $line = $vf->{base_variation_feature_overlap}->{base_variation_feature}->{_line};****
>
>     my $individual = $vf->{base_variation_feature_overlap}->{base_variation_feature}->{individual};****
>
>     my @split_line = split /[\s\t]+/, $line;****
>
>     my $qual_score = $split_line[5];****
>
>     my @gt_format  = split /:/, $split_line[8];****
>
>     my @gt_data    = split /:/, $split_line[$ind_cols->{$individual}];****
>
>     my $results = {map { shift @gt_format => $_ } @gt_data};****
>
>     $results->{"quality_score"} = $qual_score;****
>
> ** **
>
>     return $results;****
>
> }****
>
> Thank you.
>
> Best regards,
> Guillermo.
>
>
>
> ****
>
> _______________________________________________****
>
> Dev mailing list    Dev at ensembl.org****
>
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev****
>
> Ensembl Blog: http://www.ensembl.info/****
>
>  ** **
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130606/77b243f8/attachment.html>


More information about the Dev mailing list