[ensembl-dev] Dev Digest, Vol 23, Issue 5

Will McLaren wm2 at ebi.ac.uk
Wed May 2 12:02:16 BST 2012


Hi Duarte,

You can access this (somewhat indirectly) again through the variation
feature object; the original input data is stored as:

$vf->{_line}

so you could parse the individual data from there. You would also need
the individual ID -> column mapping, which is stored in the config
hash. So, within a plugin, you could do something like the following:

my $config = $self->{config};
my %ind_cols = $config->{ind_cols};

my @split_line = split /\s+/, $vf->{_line};

my $gt_data = $split_line[$ind_cols{$vf->{individual}}];

This is untested code, but I think it should work!

Thanks

Will

On 2 May 2012 11:50, Duarte Molha <Duarte.Molha at ogt.co.uk> wrote:
> Thank you Will. That is exactly what I was hoping for!
>
> Thank you very much. I will eagerly wait for the new version release :)
> If it is not too much to ask would it be possible to also export the complete genotype associated with the variation for plug-in access and the FORMAT-field?
>
> GT:AD:DP:GQ:PL  1/1:0,40:40:99:1456,114,0
>
> It would be also very useful since I do analysis on the variant calls using the quality metrics associated with them.
>
> Best regards
>
> Duarte
>
> -----Original Message-----
> From: dev-bounces at ensembl.org [mailto:dev-bounces at ensembl.org] On Behalf Of dev-request at ensembl.org
> Sent: 02 May 2012 11:41
> To: dev at ensembl.org
> Subject: Dev Digest, Vol 23, Issue 5
>
> Send Dev mailing list submissions to
>        dev at ensembl.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>        http://lists.ensembl.org/mailman/listinfo/dev
> or, via email, send a message with subject or body 'help' to
>        dev-request at ensembl.org
>
> You can reach the person managing the list at
>        dev-owner at ensembl.org
>
> When replying, please edit your Subject line so it is more specific than "Re: Contents of Dev digest..."
>
>
> Today's Topics:
>
>   1. Re: Question regarding the varian_effect_predictor VCF
>      support for multiple samples (Will McLaren)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 2 May 2012 11:40:43 +0100
> From: Will McLaren <wm2 at ebi.ac.uk>
> Subject: Re: [ensembl-dev] Question regarding the
>        varian_effect_predictor VCF support for multiple samples
> To: Ensembl developers list <dev at ensembl.org>
> Message-ID:
>        <CAMVEDX3YfPzEVLT5yOYt+jH41rX=zBABTJ8MWsQx-P2QkPUxzQ at mail.gmail.com>
> Content-Type: text/plain; charset="windows-1252"
>
> Hi Duarte,
>
> I think if I'm understanding you correctly that I've implemented such a feature for v2.5 of the VEP, due for release next week.
>
> You will be able to supply an individual ID, more than one as a comma-separated list, or "all" to consider each individually e.g.:
>
> perl variant_effect_predictor.pl -individual MYSAMPLEID1,MYSAMPLEID2
>
> or
>
> perl variant_effect_predictor.pl -individual all
>
> and the consequences will be calculated only where the genotype for that individual is non-reference (either hetero- or homozygous). The individual ID will appear as a field in the "Extra" column. Note that unless you are using VCF output, a locus that is homozygous for the reference allele will not be considered as a variant and will not appear in the output (if you are using VCF output, it will appear but with no consequence data added).
>
> In terms of plugins, what is actually happening internally is that for each individual, a copy of the variation feature object is being created with an appropriate allele string. You can access this in a plugin as follows:
>
> sub run {
>    my ($self, $tva) = @_;
>
>    my $tv = $tva->transcript_variation;
>    my $vf = $tv->variation_feature;
>
>    my $individual_id = $vf->{individual};
>
> ....
>
> }
>
> Hope this helps!
>
> Cheers
>
> Will McLaren
> Ensembl Variation
>
> On 2 May 2012 11:18, Duarte Molha <Duarte.Molha at ogt.co.uk> wrote:
>
>> Dear Developers****
>>
>> ** **
>>
>> I have been playing around with the latest version of the VEP and I
>> would like to congratulate you for the many nice features you have
>> been able to
>> include.****
>>
>> I particularly like the new plug-in feature support. This will allow
>> me to develop new features into my analysis pipeline without having to
>> hack your code to much J.****
>>
>> ** **
>>
>> There is however a very important features I would love to be included
>> in your VEP ? VCF with multiple sample support.****
>>
>> ** **
>>
>> I had to change a lot of your code in a previous version of VEP in
>> order to get some sort of support for this and it becomes very
>> complicated to be able to merge what I have done with  your earlier
>> version of VEP into the new versions because the code is evolving very
>> fast.****
>>
>> ** **
>>
>> I noticed that you say that you now support all fields on a VCF. Does
>> this mean that your script is reading in the sample fields but
>> disregards them for the analysis?****
>>
>> It would be great if the VEP could do the analysis of each variant and
>> for each allelic substitution it could include the sample information
>> for wish it is relevant.****
>>
>> ** **
>>
>> Here is an example of what your code outputs and what would I think
>> would be very usefull to have it do:****
>>
>> ** **
>>
>> Input VCF entry:****
>>
>> #CHROM             POS        ID            REF         ALT
>> QUAL    FILTER   INFO      FORMAT              sample_01
>> sample_02                sample_03****
>>
>> 1              50311454             .               G
>> A             5322.41 PASS
>> AC=3;AF=0.500;AN=6;BaseQRankSum=5.991;DP=271;Dels=0.00;FS=3.551;HRun=0;HaplotypeScore=2.6095;MQ=59.14;MQ0=0;MQRankSum=0.759;QD=28.46;ReadPosRankSum=-0.332;SB=-2325.79;SF=0,1,2
>> GT:AD:DP:GQ:PL              1/1:0,40:40:99:1456,114,0
>> 0/0:37,0:37:99:0,102,1245                0/1:23,26:49:99:839,0,617****
>>
>> ** **
>>
>> Current OUTPUT:****
>>
>> ** **
>>
>> #Uploaded_variatio****
>>
>> Location****
>>
>> Allele****
>>
>> Gene****
>>
>> Feature****
>>
>> Feature_type****
>>
>> Consequence****
>>
>> cDNA_position****
>>
>> CDS_position****
>>
>> Protein_position****
>>
>> Amino_acids****
>>
>> Codons****
>>
>> Existing_variation****
>>
>> Extra****
>>
>> 1_50311454_G/A****
>>
>> 1:50311454****
>>
>> A****
>>
>> ENSG00000186094****
>>
>> ENST00000371839****
>>
>> Transcript****
>>
>> INTRONIC****
>>
>> -****
>>
>> -****
>>
>> -****
>>
>> -****
>>
>> -****
>>
>> rs4926833****
>>
>> ENSP=ENSP00000360905;HGVSc=ENST00000371839.1:c.157+5614C>T;INTRON=2/13
>> ****
>>
>> 1_50311454_G/A****
>>
>> 1:50311454****
>>
>> A****
>>
>> ENSG00000215887****
>>
>> ENST00000502859****
>>
>> Transcript****
>>
>> WITHIN_NON_CODING_GENE****
>>
>> 1348****
>>
>> -****
>>
>> -****
>>
>> -****
>>
>> -****
>>
>> rs4926833****
>>
>> HGVSc=ENST00000502859.1:1348G>A;EXON=3/3****
>>
>> 1_50311454_G/A****
>>
>> 1:50311454****
>>
>> A****
>>
>> ENSG00000186094****
>>
>> ENST00000411952****
>>
>> Transcript****
>>
>> INTRONIC****
>>
>> -****
>>
>> -****
>>
>> -****
>>
>> -****
>>
>> -****
>>
>> rs4926833****
>>
>> ENSP=ENSP00000411423;HGVSc=ENST00000411952.2:c.139+5614C>T;INTRON=2/14
>> ****
>>
>> 1_50311454_G/A****
>>
>> 1:50311454****
>>
>> A****
>>
>> ENSG00000186094****
>>
>> ENST00000497451****
>>
>> Transcript****
>>
>> WITHIN_NON_CODING_GENE,INTRONIC****
>>
>> -****
>>
>> -****
>>
>> -****
>>
>> -****
>>
>> -****
>>
>> rs4926833****
>>
>> HGVSc=ENST00000497451.1:123+5614C>T;INTRON=1/2****
>>
>> 1_50311454_G/A****
>>
>> 1:50311454****
>>
>> A****
>>
>> ENSG00000186094****
>>
>> ENST00000371838****
>>
>> Transcript****
>>
>> INTRONIC****
>>
>> -****
>>
>> -****
>>
>> -****
>>
>> -****
>>
>> -****
>>
>> rs4926833****
>>
>> ENSP=ENSP00000360904;HGVSc=ENST00000371838.1:c.157+5614C>T;INTRON=2/8*
>> ***
>>
>> 1_50311454_G/A****
>>
>> 1:50311454****
>>
>> A****
>>
>> ENSG00000186094****
>>
>> ENST00000371836****
>>
>> Transcript****
>>
>> INTRONIC****
>>
>> -****
>>
>> -****
>>
>> -****
>>
>> -****
>>
>> -****
>>
>> rs4926833****
>>
>> ENSP=ENSP00000360902;HGVSc=ENST00000371836.1:c.157+5614C>T;INTRON=2/6*
>> ***
>>
>> ** **
>>
>> ** **
>>
>> Same output but containing sample information for non-reference
>> samples:**
>> **
>>
>> ** **
>>
>> 1_50311454_G/A****
>>
>> 1:50311454****
>>
>> A****
>>
>> Sample_01****
>>
>> GT:AD:DP:GQ:PL****
>>
>> 1/1:0,40:40:99:1456,114,0****
>>
>> ENSG00000186094****
>>
>> ENST00000371839****
>>
>> Transcript****
>>
>> INTRONIC****
>>
>> -****
>>
>> -****
>>
>> -****
>>
>> -****
>>
>> -****
>>
>> rs4926833****
>>
>> ENSP=ENSP00000360905;HGVSc=ENST00000371839.1:c.157+5614C>T;INTRON=2/13
>> ****
>>
>> 1_50311454_G/A****
>>
>> 1:50311454****
>>
>> A****
>>
>> Sample_01****
>>
>> GT:AD:DP:GQ:PL****
>>
>> 1/1:0,40:40:99:1456,114,0****
>>
>> ENSG00000215887****
>>
>> ENST00000502859****
>>
>> Transcript****
>>
>> WITHIN_NON_CODING_GENE****
>>
>> 1348****
>>
>> -****
>>
>> -****
>>
>> -****
>>
>> -****
>>
>> rs4926833****
>>
>> HGVSc=ENST00000502859.1:1348G>A;EXON=3/3****
>>
>> 1_50311454_G/A****
>>
>> 1:50311454****
>>
>> A****
>>
>> Sample_01****
>>
>> GT:AD:DP:GQ:PL****
>>
>> 1/1:0,40:40:99:1456,114,0****
>>
>> ENSG00000186094****
>>
>> ENST00000411952****
>>
>> Transcript****
>>
>> INTRONIC****
>>
>> -****
>>
>> -****
>>
>> -****
>>
>> -****
>>
>> -****
>>
>> rs4926833****
>>
>> ENSP=ENSP00000411423;HGVSc=ENST00000411952.2:c.139+5614C>T;INTRON=2/14
>> ****
>>
>> 1_50311454_G/A****
>>
>> 1:50311454****
>>
>> A****
>>
>> Sample_01****
>>
>> GT:AD:DP:GQ:PL****
>>
>> 1/1:0,40:40:99:1456,114,0****
>>
>> ENSG00000186094****
>>
>> ENST00000497451****
>>
>> Transcript****
>>
>> WITHIN_NON_CODING_GENE,INTRONIC****
>>
>> -****
>>
>> -****
>>
>> -****
>>
>> -****
>>
>> -****
>>
>> rs4926833****
>>
>> HGVSc=ENST00000497451.1:123+5614C>T;INTRON=1/2****
>>
>> 1_50311454_G/A****
>>
>> 1:50311454****
>>
>> A****
>>
>> Sample_01****
>>
>> GT:AD:DP:GQ:PL****
>>
>> 1/1:0,40:40:99:1456,114,0****
>>
>> ENSG00000186094****
>>
>> ENST00000371838****
>>
>> Transcript****
>>
>> INTRONIC****
>>
>> -****
>>
>> -****
>>
>> -****
>>
>> -****
>>
>> -****
>>
>> rs4926833****
>>
>> ENSP=ENSP00000360904;HGVSc=ENST00000371838.1:c.157+5614C>T;INTRON=2/8*
>> ***
>>
>> 1_50311454_G/A****
>>
>> 1:50311454****
>>
>> A****
>>
>> Sample_01****
>>
>> GT:AD:DP:GQ:PL****
>>
>> 1/1:0,40:40:99:1456,114,0****
>>
>> ENSG00000186094****
>>
>> ENST00000371836****
>>
>> Transcript****
>>
>> INTRONIC****
>>
>> -****
>>
>> -****
>>
>> -****
>>
>> -****
>>
>> -****
>>
>> rs4926833****
>>
>> ENSP=ENSP00000360902;HGVSc=ENST00000371836.1:c.157+5614C>T;INTRON=2/6*
>> ***
>>
>> 1_50311454_G/A****
>>
>> 1:50311454****
>>
>> A****
>>
>> Sample_03****
>>
>> GT:AD:DP:GQ:PL****
>>
>> 0/1:23,26:49:99:839,0,617****
>>
>> ENSG00000186094****
>>
>> ENST00000371839****
>>
>> Transcript****
>>
>> INTRONIC****
>>
>> -****
>>
>> -****
>>
>> -****
>>
>> -****
>>
>> -****
>>
>> rs4926833****
>>
>> ENSP=ENSP00000360905;HGVSc=ENST00000371839.1:c.157+5614C>T;INTRON=2/13
>> ****
>>
>> 1_50311454_G/A****
>>
>> 1:50311454****
>>
>> A****
>>
>> Sample_03****
>>
>> GT:AD:DP:GQ:PL****
>>
>> 0/1:23,26:49:99:839,0,617****
>>
>> ENSG00000215887****
>>
>> ENST00000502859****
>>
>> Transcript****
>>
>> WITHIN_NON_CODING_GENE****
>>
>> 1348****
>>
>> -****
>>
>> -****
>>
>> -****
>>
>> -****
>>
>> rs4926833****
>>
>> HGVSc=ENST00000502859.1:1348G>A;EXON=3/3****
>>
>> 1_50311454_G/A****
>>
>> 1:50311454****
>>
>> A****
>>
>> Sample_03****
>>
>> GT:AD:DP:GQ:PL****
>>
>> 0/1:23,26:49:99:839,0,617****
>>
>> ENSG00000186094****
>>
>> ENST00000411952****
>>
>> Transcript****
>>
>> INTRONIC****
>>
>> -****
>>
>> -****
>>
>> -****
>>
>> -****
>>
>> -****
>>
>> rs4926833****
>>
>> ENSP=ENSP00000411423;HGVSc=ENST00000411952.2:c.139+5614C>T;INTRON=2/14
>> ****
>>
>> 1_50311454_G/A****
>>
>> 1:50311454****
>>
>> A****
>>
>> Sample_03****
>>
>> GT:AD:DP:GQ:PL****
>>
>> 0/1:23,26:49:99:839,0,617****
>>
>> ENSG00000186094****
>>
>> ENST00000497451****
>>
>> Transcript****
>>
>> WITHIN_NON_CODING_GENE,INTRONIC****
>>
>> -****
>>
>> -****
>>
>> -****
>>
>> -****
>>
>> -****
>>
>> rs4926833****
>>
>> HGVSc=ENST00000497451.1:123+5614C>T;INTRON=1/2****
>>
>> 1_50311454_G/A****
>>
>> 1:50311454****
>>
>> A****
>>
>> Sample_03****
>>
>> GT:AD:DP:GQ:PL****
>>
>> 0/1:23,26:49:99:839,0,617****
>>
>> ENSG00000186094****
>>
>> ENST00000371838****
>>
>> Transcript****
>>
>> INTRONIC****
>>
>> -****
>>
>> -****
>>
>> -****
>>
>> -****
>>
>> -****
>>
>> rs4926833****
>>
>> ENSP=ENSP00000360904;HGVSc=ENST00000371838.1:c.157+5614C>T;INTRON=2/8*
>> ***
>>
>> 1_50311454_G/A****
>>
>> 1:50311454****
>>
>> A****
>>
>> Sample_03****
>>
>> GT:AD:DP:GQ:PL****
>>
>> 0/1:23,26:49:99:839,0,617****
>>
>> ENSG00000186094****
>>
>> ENST00000371836****
>>
>> Transcript****
>>
>> INTRONIC****
>>
>> -****
>>
>> -****
>>
>> -****
>>
>> -****
>>
>> -****
>>
>> rs4926833****
>>
>> ENSP=ENSP00000360902;HGVSc=ENST00000371836.1:c.157+5614C>T;INTRON=2/6*
>> ***
>>
>> ** **
>>
>> ** **
>>
>> I would do this myself if there was a way for the  plug-in feature to
>> give be the sample information for each variant.****
>>
>> Any ideas how this can be accomplished?****
>>
>> ** **
>>
>> Best regards, ****
>>
>> ** **
>>
>> Duarte Molha****
>>
>> ** **
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> List admin (including subscribe/unsubscribe):
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://lists.ensembl.org/pipermail/dev/attachments/20120502/e3f95725/attachment.htm>
>
> ------------------------------
>
> _______________________________________________
> Dev mailing list
> Dev at ensembl.org
> http://lists.ensembl.org/mailman/listinfo/dev
>
>
> End of Dev Digest, Vol 23, Issue 5
> **********************************
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/




More information about the Dev mailing list