[ensembl-dev] Question regarding the varian_effect_predictor VCF support for multiple samples
Duarte Molha
Duarte.Molha at ogt.co.uk
Wed May 2 11:18:38 BST 2012
Dear Developers
I have been playing around with the latest version of the VEP and I would like to congratulate you for the many nice features you have been able to include.
I particularly like the new plug-in feature support. This will allow me to develop new features into my analysis pipeline without having to hack your code to much :).
There is however a very important features I would love to be included in your VEP - VCF with multiple sample support.
I had to change a lot of your code in a previous version of VEP in order to get some sort of support for this and it becomes very complicated to be able to merge what I have done with your earlier version of VEP into the new versions because the code is evolving very fast.
I noticed that you say that you now support all fields on a VCF. Does this mean that your script is reading in the sample fields but disregards them for the analysis?
It would be great if the VEP could do the analysis of each variant and for each allelic substitution it could include the sample information for wish it is relevant.
Here is an example of what your code outputs and what would I think would be very usefull to have it do:
Input VCF entry:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample_01 sample_02 sample_03
1 50311454 . G A 5322.41 PASS AC=3;AF=0.500;AN=6;BaseQRankSum=5.991;DP=271;Dels=0.00;FS=3.551;HRun=0;HaplotypeScore=2.6095;MQ=59.14;MQ0=0;MQRankSum=0.759;QD=28.46;ReadPosRankSum=-0.332;SB=-2325.79;SF=0,1,2 GT:AD:DP:GQ:PL 1/1:0,40:40:99:1456,114,0 0/0:37,0:37:99:0,102,1245 0/1:23,26:49:99:839,0,617
Current OUTPUT:
#Uploaded_variatio
Location
Allele
Gene
Feature
Feature_type
Consequence
cDNA_position
CDS_position
Protein_position
Amino_acids
Codons
Existing_variation
Extra
1_50311454_G/A
1:50311454
A
ENSG00000186094
ENST00000371839
Transcript
INTRONIC
-
-
-
-
-
rs4926833
ENSP=ENSP00000360905;HGVSc=ENST00000371839.1:c.157+5614C>T;INTRON=2/13
1_50311454_G/A
1:50311454
A
ENSG00000215887
ENST00000502859
Transcript
WITHIN_NON_CODING_GENE
1348
-
-
-
-
rs4926833
HGVSc=ENST00000502859.1:1348G>A;EXON=3/3
1_50311454_G/A
1:50311454
A
ENSG00000186094
ENST00000411952
Transcript
INTRONIC
-
-
-
-
-
rs4926833
ENSP=ENSP00000411423;HGVSc=ENST00000411952.2:c.139+5614C>T;INTRON=2/14
1_50311454_G/A
1:50311454
A
ENSG00000186094
ENST00000497451
Transcript
WITHIN_NON_CODING_GENE,INTRONIC
-
-
-
-
-
rs4926833
HGVSc=ENST00000497451.1:123+5614C>T;INTRON=1/2
1_50311454_G/A
1:50311454
A
ENSG00000186094
ENST00000371838
Transcript
INTRONIC
-
-
-
-
-
rs4926833
ENSP=ENSP00000360904;HGVSc=ENST00000371838.1:c.157+5614C>T;INTRON=2/8
1_50311454_G/A
1:50311454
A
ENSG00000186094
ENST00000371836
Transcript
INTRONIC
-
-
-
-
-
rs4926833
ENSP=ENSP00000360902;HGVSc=ENST00000371836.1:c.157+5614C>T;INTRON=2/6
Same output but containing sample information for non-reference samples:
1_50311454_G/A
1:50311454
A
Sample_01
GT:AD:DP:GQ:PL
1/1:0,40:40:99:1456,114,0
ENSG00000186094
ENST00000371839
Transcript
INTRONIC
-
-
-
-
-
rs4926833
ENSP=ENSP00000360905;HGVSc=ENST00000371839.1:c.157+5614C>T;INTRON=2/13
1_50311454_G/A
1:50311454
A
Sample_01
GT:AD:DP:GQ:PL
1/1:0,40:40:99:1456,114,0
ENSG00000215887
ENST00000502859
Transcript
WITHIN_NON_CODING_GENE
1348
-
-
-
-
rs4926833
HGVSc=ENST00000502859.1:1348G>A;EXON=3/3
1_50311454_G/A
1:50311454
A
Sample_01
GT:AD:DP:GQ:PL
1/1:0,40:40:99:1456,114,0
ENSG00000186094
ENST00000411952
Transcript
INTRONIC
-
-
-
-
-
rs4926833
ENSP=ENSP00000411423;HGVSc=ENST00000411952.2:c.139+5614C>T;INTRON=2/14
1_50311454_G/A
1:50311454
A
Sample_01
GT:AD:DP:GQ:PL
1/1:0,40:40:99:1456,114,0
ENSG00000186094
ENST00000497451
Transcript
WITHIN_NON_CODING_GENE,INTRONIC
-
-
-
-
-
rs4926833
HGVSc=ENST00000497451.1:123+5614C>T;INTRON=1/2
1_50311454_G/A
1:50311454
A
Sample_01
GT:AD:DP:GQ:PL
1/1:0,40:40:99:1456,114,0
ENSG00000186094
ENST00000371838
Transcript
INTRONIC
-
-
-
-
-
rs4926833
ENSP=ENSP00000360904;HGVSc=ENST00000371838.1:c.157+5614C>T;INTRON=2/8
1_50311454_G/A
1:50311454
A
Sample_01
GT:AD:DP:GQ:PL
1/1:0,40:40:99:1456,114,0
ENSG00000186094
ENST00000371836
Transcript
INTRONIC
-
-
-
-
-
rs4926833
ENSP=ENSP00000360902;HGVSc=ENST00000371836.1:c.157+5614C>T;INTRON=2/6
1_50311454_G/A
1:50311454
A
Sample_03
GT:AD:DP:GQ:PL
0/1:23,26:49:99:839,0,617
ENSG00000186094
ENST00000371839
Transcript
INTRONIC
-
-
-
-
-
rs4926833
ENSP=ENSP00000360905;HGVSc=ENST00000371839.1:c.157+5614C>T;INTRON=2/13
1_50311454_G/A
1:50311454
A
Sample_03
GT:AD:DP:GQ:PL
0/1:23,26:49:99:839,0,617
ENSG00000215887
ENST00000502859
Transcript
WITHIN_NON_CODING_GENE
1348
-
-
-
-
rs4926833
HGVSc=ENST00000502859.1:1348G>A;EXON=3/3
1_50311454_G/A
1:50311454
A
Sample_03
GT:AD:DP:GQ:PL
0/1:23,26:49:99:839,0,617
ENSG00000186094
ENST00000411952
Transcript
INTRONIC
-
-
-
-
-
rs4926833
ENSP=ENSP00000411423;HGVSc=ENST00000411952.2:c.139+5614C>T;INTRON=2/14
1_50311454_G/A
1:50311454
A
Sample_03
GT:AD:DP:GQ:PL
0/1:23,26:49:99:839,0,617
ENSG00000186094
ENST00000497451
Transcript
WITHIN_NON_CODING_GENE,INTRONIC
-
-
-
-
-
rs4926833
HGVSc=ENST00000497451.1:123+5614C>T;INTRON=1/2
1_50311454_G/A
1:50311454
A
Sample_03
GT:AD:DP:GQ:PL
0/1:23,26:49:99:839,0,617
ENSG00000186094
ENST00000371838
Transcript
INTRONIC
-
-
-
-
-
rs4926833
ENSP=ENSP00000360904;HGVSc=ENST00000371838.1:c.157+5614C>T;INTRON=2/8
1_50311454_G/A
1:50311454
A
Sample_03
GT:AD:DP:GQ:PL
0/1:23,26:49:99:839,0,617
ENSG00000186094
ENST00000371836
Transcript
INTRONIC
-
-
-
-
-
rs4926833
ENSP=ENSP00000360902;HGVSc=ENST00000371836.1:c.157+5614C>T;INTRON=2/6
I would do this myself if there was a way for the plug-in feature to give be the sample information for each variant.
Any ideas how this can be accomplished?
Best regards,
Duarte Molha
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20120502/1a395486/attachment.html>
More information about the Dev
mailing list