[ensembl-dev] Question regarding the varian_effect_predictor VCF support for multiple samples

Will McLaren wm2 at ebi.ac.uk
Wed May 2 11:40:43 BST 2012


Hi Duarte,

I think if I'm understanding you correctly that I've implemented such a
feature for v2.5 of the VEP, due for release next week.

You will be able to supply an individual ID, more than one as a
comma-separated list, or "all" to consider each individually e.g.:

perl variant_effect_predictor.pl -individual MYSAMPLEID1,MYSAMPLEID2

or

perl variant_effect_predictor.pl -individual all

and the consequences will be calculated only where the genotype for that
individual is non-reference (either hetero- or homozygous). The individual
ID will appear as a field in the "Extra" column. Note that unless you are
using VCF output, a locus that is homozygous for the reference allele will
not be considered as a variant and will not appear in the output (if you
are using VCF output, it will appear but with no consequence data added).

In terms of plugins, what is actually happening internally is that for each
individual, a copy of the variation feature object is being created with an
appropriate allele string. You can access this in a plugin as follows:

sub run {
    my ($self, $tva) = @_;

    my $tv = $tva->transcript_variation;
    my $vf = $tv->variation_feature;

    my $individual_id = $vf->{individual};

....

}

Hope this helps!

Cheers

Will McLaren
Ensembl Variation

On 2 May 2012 11:18, Duarte Molha <Duarte.Molha at ogt.co.uk> wrote:

> Dear Developers****
>
> ** **
>
> I have been playing around with the latest version of the VEP and I would
> like to congratulate you for the many nice features you have been able to
> include.****
>
> I particularly like the new plug-in feature support. This will allow me to
> develop new features into my analysis pipeline without having to hack your
> code to much J.****
>
> ** **
>
> There is however a very important features I would love to be included in
> your VEP – VCF with multiple sample support.****
>
> ** **
>
> I had to change a lot of your code in a previous version of VEP in order
> to get some sort of support for this and it becomes very complicated to be
> able to merge what I have done with  your earlier version of VEP into the
> new versions because the code is evolving very fast.****
>
> ** **
>
> I noticed that you say that you now support all fields on a VCF. Does this
> mean that your script is reading in the sample fields but disregards them
> for the analysis?****
>
> It would be great if the VEP could do the analysis of each variant and for
> each allelic substitution it could include the sample information for wish
> it is relevant.****
>
> ** **
>
> Here is an example of what your code outputs and what would I think would
> be very usefull to have it do:****
>
> ** **
>
> Input VCF entry:****
>
> #CHROM             POS        ID            REF         ALT
> QUAL    FILTER   INFO      FORMAT              sample_01
> sample_02                sample_03****
>
> 1              50311454             .               G
> A             5322.41 PASS
> AC=3;AF=0.500;AN=6;BaseQRankSum=5.991;DP=271;Dels=0.00;FS=3.551;HRun=0;HaplotypeScore=2.6095;MQ=59.14;MQ0=0;MQRankSum=0.759;QD=28.46;ReadPosRankSum=-0.332;SB=-2325.79;SF=0,1,2
> GT:AD:DP:GQ:PL              1/1:0,40:40:99:1456,114,0
> 0/0:37,0:37:99:0,102,1245                0/1:23,26:49:99:839,0,617****
>
> ** **
>
> Current OUTPUT:****
>
> ** **
>
> #Uploaded_variatio****
>
> Location****
>
> Allele****
>
> Gene****
>
> Feature****
>
> Feature_type****
>
> Consequence****
>
> cDNA_position****
>
> CDS_position****
>
> Protein_position****
>
> Amino_acids****
>
> Codons****
>
> Existing_variation****
>
> Extra****
>
> 1_50311454_G/A****
>
> 1:50311454****
>
> A****
>
> ENSG00000186094****
>
> ENST00000371839****
>
> Transcript****
>
> INTRONIC****
>
> -****
>
> -****
>
> -****
>
> -****
>
> -****
>
> rs4926833****
>
> ENSP=ENSP00000360905;HGVSc=ENST00000371839.1:c.157+5614C>T;INTRON=2/13****
>
> 1_50311454_G/A****
>
> 1:50311454****
>
> A****
>
> ENSG00000215887****
>
> ENST00000502859****
>
> Transcript****
>
> WITHIN_NON_CODING_GENE****
>
> 1348****
>
> -****
>
> -****
>
> -****
>
> -****
>
> rs4926833****
>
> HGVSc=ENST00000502859.1:1348G>A;EXON=3/3****
>
> 1_50311454_G/A****
>
> 1:50311454****
>
> A****
>
> ENSG00000186094****
>
> ENST00000411952****
>
> Transcript****
>
> INTRONIC****
>
> -****
>
> -****
>
> -****
>
> -****
>
> -****
>
> rs4926833****
>
> ENSP=ENSP00000411423;HGVSc=ENST00000411952.2:c.139+5614C>T;INTRON=2/14****
>
> 1_50311454_G/A****
>
> 1:50311454****
>
> A****
>
> ENSG00000186094****
>
> ENST00000497451****
>
> Transcript****
>
> WITHIN_NON_CODING_GENE,INTRONIC****
>
> -****
>
> -****
>
> -****
>
> -****
>
> -****
>
> rs4926833****
>
> HGVSc=ENST00000497451.1:123+5614C>T;INTRON=1/2****
>
> 1_50311454_G/A****
>
> 1:50311454****
>
> A****
>
> ENSG00000186094****
>
> ENST00000371838****
>
> Transcript****
>
> INTRONIC****
>
> -****
>
> -****
>
> -****
>
> -****
>
> -****
>
> rs4926833****
>
> ENSP=ENSP00000360904;HGVSc=ENST00000371838.1:c.157+5614C>T;INTRON=2/8****
>
> 1_50311454_G/A****
>
> 1:50311454****
>
> A****
>
> ENSG00000186094****
>
> ENST00000371836****
>
> Transcript****
>
> INTRONIC****
>
> -****
>
> -****
>
> -****
>
> -****
>
> -****
>
> rs4926833****
>
> ENSP=ENSP00000360902;HGVSc=ENST00000371836.1:c.157+5614C>T;INTRON=2/6****
>
> ** **
>
> ** **
>
> Same output but containing sample information for non-reference samples:**
> **
>
> ** **
>
> 1_50311454_G/A****
>
> 1:50311454****
>
> A****
>
> Sample_01****
>
> GT:AD:DP:GQ:PL****
>
> 1/1:0,40:40:99:1456,114,0****
>
> ENSG00000186094****
>
> ENST00000371839****
>
> Transcript****
>
> INTRONIC****
>
> -****
>
> -****
>
> -****
>
> -****
>
> -****
>
> rs4926833****
>
> ENSP=ENSP00000360905;HGVSc=ENST00000371839.1:c.157+5614C>T;INTRON=2/13****
>
> 1_50311454_G/A****
>
> 1:50311454****
>
> A****
>
> Sample_01****
>
> GT:AD:DP:GQ:PL****
>
> 1/1:0,40:40:99:1456,114,0****
>
> ENSG00000215887****
>
> ENST00000502859****
>
> Transcript****
>
> WITHIN_NON_CODING_GENE****
>
> 1348****
>
> -****
>
> -****
>
> -****
>
> -****
>
> rs4926833****
>
> HGVSc=ENST00000502859.1:1348G>A;EXON=3/3****
>
> 1_50311454_G/A****
>
> 1:50311454****
>
> A****
>
> Sample_01****
>
> GT:AD:DP:GQ:PL****
>
> 1/1:0,40:40:99:1456,114,0****
>
> ENSG00000186094****
>
> ENST00000411952****
>
> Transcript****
>
> INTRONIC****
>
> -****
>
> -****
>
> -****
>
> -****
>
> -****
>
> rs4926833****
>
> ENSP=ENSP00000411423;HGVSc=ENST00000411952.2:c.139+5614C>T;INTRON=2/14****
>
> 1_50311454_G/A****
>
> 1:50311454****
>
> A****
>
> Sample_01****
>
> GT:AD:DP:GQ:PL****
>
> 1/1:0,40:40:99:1456,114,0****
>
> ENSG00000186094****
>
> ENST00000497451****
>
> Transcript****
>
> WITHIN_NON_CODING_GENE,INTRONIC****
>
> -****
>
> -****
>
> -****
>
> -****
>
> -****
>
> rs4926833****
>
> HGVSc=ENST00000497451.1:123+5614C>T;INTRON=1/2****
>
> 1_50311454_G/A****
>
> 1:50311454****
>
> A****
>
> Sample_01****
>
> GT:AD:DP:GQ:PL****
>
> 1/1:0,40:40:99:1456,114,0****
>
> ENSG00000186094****
>
> ENST00000371838****
>
> Transcript****
>
> INTRONIC****
>
> -****
>
> -****
>
> -****
>
> -****
>
> -****
>
> rs4926833****
>
> ENSP=ENSP00000360904;HGVSc=ENST00000371838.1:c.157+5614C>T;INTRON=2/8****
>
> 1_50311454_G/A****
>
> 1:50311454****
>
> A****
>
> Sample_01****
>
> GT:AD:DP:GQ:PL****
>
> 1/1:0,40:40:99:1456,114,0****
>
> ENSG00000186094****
>
> ENST00000371836****
>
> Transcript****
>
> INTRONIC****
>
> -****
>
> -****
>
> -****
>
> -****
>
> -****
>
> rs4926833****
>
> ENSP=ENSP00000360902;HGVSc=ENST00000371836.1:c.157+5614C>T;INTRON=2/6****
>
> 1_50311454_G/A****
>
> 1:50311454****
>
> A****
>
> Sample_03****
>
> GT:AD:DP:GQ:PL****
>
> 0/1:23,26:49:99:839,0,617****
>
> ENSG00000186094****
>
> ENST00000371839****
>
> Transcript****
>
> INTRONIC****
>
> -****
>
> -****
>
> -****
>
> -****
>
> -****
>
> rs4926833****
>
> ENSP=ENSP00000360905;HGVSc=ENST00000371839.1:c.157+5614C>T;INTRON=2/13****
>
> 1_50311454_G/A****
>
> 1:50311454****
>
> A****
>
> Sample_03****
>
> GT:AD:DP:GQ:PL****
>
> 0/1:23,26:49:99:839,0,617****
>
> ENSG00000215887****
>
> ENST00000502859****
>
> Transcript****
>
> WITHIN_NON_CODING_GENE****
>
> 1348****
>
> -****
>
> -****
>
> -****
>
> -****
>
> rs4926833****
>
> HGVSc=ENST00000502859.1:1348G>A;EXON=3/3****
>
> 1_50311454_G/A****
>
> 1:50311454****
>
> A****
>
> Sample_03****
>
> GT:AD:DP:GQ:PL****
>
> 0/1:23,26:49:99:839,0,617****
>
> ENSG00000186094****
>
> ENST00000411952****
>
> Transcript****
>
> INTRONIC****
>
> -****
>
> -****
>
> -****
>
> -****
>
> -****
>
> rs4926833****
>
> ENSP=ENSP00000411423;HGVSc=ENST00000411952.2:c.139+5614C>T;INTRON=2/14****
>
> 1_50311454_G/A****
>
> 1:50311454****
>
> A****
>
> Sample_03****
>
> GT:AD:DP:GQ:PL****
>
> 0/1:23,26:49:99:839,0,617****
>
> ENSG00000186094****
>
> ENST00000497451****
>
> Transcript****
>
> WITHIN_NON_CODING_GENE,INTRONIC****
>
> -****
>
> -****
>
> -****
>
> -****
>
> -****
>
> rs4926833****
>
> HGVSc=ENST00000497451.1:123+5614C>T;INTRON=1/2****
>
> 1_50311454_G/A****
>
> 1:50311454****
>
> A****
>
> Sample_03****
>
> GT:AD:DP:GQ:PL****
>
> 0/1:23,26:49:99:839,0,617****
>
> ENSG00000186094****
>
> ENST00000371838****
>
> Transcript****
>
> INTRONIC****
>
> -****
>
> -****
>
> -****
>
> -****
>
> -****
>
> rs4926833****
>
> ENSP=ENSP00000360904;HGVSc=ENST00000371838.1:c.157+5614C>T;INTRON=2/8****
>
> 1_50311454_G/A****
>
> 1:50311454****
>
> A****
>
> Sample_03****
>
> GT:AD:DP:GQ:PL****
>
> 0/1:23,26:49:99:839,0,617****
>
> ENSG00000186094****
>
> ENST00000371836****
>
> Transcript****
>
> INTRONIC****
>
> -****
>
> -****
>
> -****
>
> -****
>
> -****
>
> rs4926833****
>
> ENSP=ENSP00000360902;HGVSc=ENST00000371836.1:c.157+5614C>T;INTRON=2/6****
>
> ** **
>
> ** **
>
> I would do this myself if there was a way for the  plug-in feature to give
> be the sample information for each variant.****
>
> Any ideas how this can be accomplished?****
>
> ** **
>
> Best regards, ****
>
> ** **
>
> Duarte Molha****
>
> ** **
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe):
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20120502/e3f95725/attachment.html>


More information about the Dev mailing list