[ensembl-dev] Question regarding the varian_effect_predictor VCF support for multiple samples

Duarte Molha Duarte.Molha at ogt.co.uk
Wed May 2 11:18:38 BST 2012


Dear Developers

I have been playing around with the latest version of the VEP and I would like to congratulate you for the many nice features you have been able to include.
I particularly like the new plug-in feature support. This will allow me to develop new features into my analysis pipeline without having to hack your code to much :).

There is however a very important features I would love to be included in your VEP - VCF with multiple sample support.

I had to change a lot of your code in a previous version of VEP in order to get some sort of support for this and it becomes very complicated to be able to merge what I have done with  your earlier version of VEP into the new versions because the code is evolving very fast.

I noticed that you say that you now support all fields on a VCF. Does this mean that your script is reading in the sample fields but disregards them for the analysis?
It would be great if the VEP could do the analysis of each variant and for each allelic substitution it could include the sample information for wish it is relevant.

Here is an example of what your code outputs and what would I think would be very usefull to have it do:

Input VCF entry:
#CHROM             POS        ID            REF         ALT         QUAL    FILTER   INFO      FORMAT              sample_01          sample_02                sample_03
1              50311454             .               G             A             5322.41 PASS                AC=3;AF=0.500;AN=6;BaseQRankSum=5.991;DP=271;Dels=0.00;FS=3.551;HRun=0;HaplotypeScore=2.6095;MQ=59.14;MQ0=0;MQRankSum=0.759;QD=28.46;ReadPosRankSum=-0.332;SB=-2325.79;SF=0,1,2         GT:AD:DP:GQ:PL              1/1:0,40:40:99:1456,114,0             0/0:37,0:37:99:0,102,1245                0/1:23,26:49:99:839,0,617

Current OUTPUT:

#Uploaded_variatio

Location

Allele

Gene

Feature

Feature_type

Consequence

cDNA_position

CDS_position

Protein_position

Amino_acids

Codons

Existing_variation

Extra

1_50311454_G/A

1:50311454

A

ENSG00000186094

ENST00000371839

Transcript

INTRONIC

-

-

-

-

-

rs4926833

ENSP=ENSP00000360905;HGVSc=ENST00000371839.1:c.157+5614C>T;INTRON=2/13

1_50311454_G/A

1:50311454

A

ENSG00000215887

ENST00000502859

Transcript

WITHIN_NON_CODING_GENE

1348

-

-

-

-

rs4926833

HGVSc=ENST00000502859.1:1348G>A;EXON=3/3

1_50311454_G/A

1:50311454

A

ENSG00000186094

ENST00000411952

Transcript

INTRONIC

-

-

-

-

-

rs4926833

ENSP=ENSP00000411423;HGVSc=ENST00000411952.2:c.139+5614C>T;INTRON=2/14

1_50311454_G/A

1:50311454

A

ENSG00000186094

ENST00000497451

Transcript

WITHIN_NON_CODING_GENE,INTRONIC

-

-

-

-

-

rs4926833

HGVSc=ENST00000497451.1:123+5614C>T;INTRON=1/2

1_50311454_G/A

1:50311454

A

ENSG00000186094

ENST00000371838

Transcript

INTRONIC

-

-

-

-

-

rs4926833

ENSP=ENSP00000360904;HGVSc=ENST00000371838.1:c.157+5614C>T;INTRON=2/8

1_50311454_G/A

1:50311454

A

ENSG00000186094

ENST00000371836

Transcript

INTRONIC

-

-

-

-

-

rs4926833

ENSP=ENSP00000360902;HGVSc=ENST00000371836.1:c.157+5614C>T;INTRON=2/6



Same output but containing sample information for non-reference samples:

1_50311454_G/A

1:50311454

A

Sample_01

GT:AD:DP:GQ:PL

1/1:0,40:40:99:1456,114,0

ENSG00000186094

ENST00000371839

Transcript

INTRONIC

-

-

-

-

-

rs4926833

ENSP=ENSP00000360905;HGVSc=ENST00000371839.1:c.157+5614C>T;INTRON=2/13

1_50311454_G/A

1:50311454

A

Sample_01

GT:AD:DP:GQ:PL

1/1:0,40:40:99:1456,114,0

ENSG00000215887

ENST00000502859

Transcript

WITHIN_NON_CODING_GENE

1348

-

-

-

-

rs4926833

HGVSc=ENST00000502859.1:1348G>A;EXON=3/3

1_50311454_G/A

1:50311454

A

Sample_01

GT:AD:DP:GQ:PL

1/1:0,40:40:99:1456,114,0

ENSG00000186094

ENST00000411952

Transcript

INTRONIC

-

-

-

-

-

rs4926833

ENSP=ENSP00000411423;HGVSc=ENST00000411952.2:c.139+5614C>T;INTRON=2/14

1_50311454_G/A

1:50311454

A

Sample_01

GT:AD:DP:GQ:PL

1/1:0,40:40:99:1456,114,0

ENSG00000186094

ENST00000497451

Transcript

WITHIN_NON_CODING_GENE,INTRONIC

-

-

-

-

-

rs4926833

HGVSc=ENST00000497451.1:123+5614C>T;INTRON=1/2

1_50311454_G/A

1:50311454

A

Sample_01

GT:AD:DP:GQ:PL

1/1:0,40:40:99:1456,114,0

ENSG00000186094

ENST00000371838

Transcript

INTRONIC

-

-

-

-

-

rs4926833

ENSP=ENSP00000360904;HGVSc=ENST00000371838.1:c.157+5614C>T;INTRON=2/8

1_50311454_G/A

1:50311454

A

Sample_01

GT:AD:DP:GQ:PL

1/1:0,40:40:99:1456,114,0

ENSG00000186094

ENST00000371836

Transcript

INTRONIC

-

-

-

-

-

rs4926833

ENSP=ENSP00000360902;HGVSc=ENST00000371836.1:c.157+5614C>T;INTRON=2/6

1_50311454_G/A

1:50311454

A

Sample_03

GT:AD:DP:GQ:PL

0/1:23,26:49:99:839,0,617

ENSG00000186094

ENST00000371839

Transcript

INTRONIC

-

-

-

-

-

rs4926833

ENSP=ENSP00000360905;HGVSc=ENST00000371839.1:c.157+5614C>T;INTRON=2/13

1_50311454_G/A

1:50311454

A

Sample_03

GT:AD:DP:GQ:PL

0/1:23,26:49:99:839,0,617

ENSG00000215887

ENST00000502859

Transcript

WITHIN_NON_CODING_GENE

1348

-

-

-

-

rs4926833

HGVSc=ENST00000502859.1:1348G>A;EXON=3/3

1_50311454_G/A

1:50311454

A

Sample_03

GT:AD:DP:GQ:PL

0/1:23,26:49:99:839,0,617

ENSG00000186094

ENST00000411952

Transcript

INTRONIC

-

-

-

-

-

rs4926833

ENSP=ENSP00000411423;HGVSc=ENST00000411952.2:c.139+5614C>T;INTRON=2/14

1_50311454_G/A

1:50311454

A

Sample_03

GT:AD:DP:GQ:PL

0/1:23,26:49:99:839,0,617

ENSG00000186094

ENST00000497451

Transcript

WITHIN_NON_CODING_GENE,INTRONIC

-

-

-

-

-

rs4926833

HGVSc=ENST00000497451.1:123+5614C>T;INTRON=1/2

1_50311454_G/A

1:50311454

A

Sample_03

GT:AD:DP:GQ:PL

0/1:23,26:49:99:839,0,617

ENSG00000186094

ENST00000371838

Transcript

INTRONIC

-

-

-

-

-

rs4926833

ENSP=ENSP00000360904;HGVSc=ENST00000371838.1:c.157+5614C>T;INTRON=2/8

1_50311454_G/A

1:50311454

A

Sample_03

GT:AD:DP:GQ:PL

0/1:23,26:49:99:839,0,617

ENSG00000186094

ENST00000371836

Transcript

INTRONIC

-

-

-

-

-

rs4926833

ENSP=ENSP00000360902;HGVSc=ENST00000371836.1:c.157+5614C>T;INTRON=2/6



I would do this myself if there was a way for the  plug-in feature to give be the sample information for each variant.
Any ideas how this can be accomplished?

Best regards,

Duarte Molha

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20120502/1a395486/attachment.html>


More information about the Dev mailing list