[ensembl-dev] Confused by --gene_phenotype and PHENO in VEP v82

Jessica Chong jxchong at uw.edu
Mon Nov 9 00:20:05 GMT 2015


I am trying to use the --gene_phenotype option in VEP v82 but I am having problems.  

1) if a variant is in a gene that is associated with a particular phenotype (e.g. CFTR and cystic fibrosis), where does this information get stored in the resulting annotated vcf? I don’t see a corresponding field name listed as a possible extras column output on this page http://uswest.ensembl.org/info/docs/tools/vep/vep_formats.html#output

2) if a variant itself is associated with a particular “phenotype, disease, or trait” then my understanding from the VEP output documentation is that I should expect this information to show up under the PHENO field?


I tried annotating a tiny vcf that just includes a variant in CFTR (and the variant is dF508, so it is definitely pathogenic and CFTR should certainly be associated with a phenotype as well on the gene level) but PHENO is always blank (and I don’t see any field mentioning cystic fibrosis as a phenotype/disease name).


Here is what I ran:
perl variant_effect_predictor/variant_effect_predictor.pl \
-i CFTR.VT.vcf \
-o CFTR.VT.VEP.vcf \
--vcf --offline --cache \
--dir_cache variant_effect_predictor/cache/ \
--species homo_sapiens --assembly GRCh37 \
--fasta Homo_sapiens_assembly19.fasta \
--fork 8 --force_overwrite \
--compress 'gunzip -c' \
--sift b --polyphen b --symbol --numbers --biotype \
--total_length --canonical --ccds --hgvs --shift_hgvs 1 --gene_phenotype \
--fields Consequence,Codons,Amino_acids,Gene,SYMBOL,Feature,EXON,PolyPhen,SIFT,Protein_position,BIOTYPE,CANONICAL,CCDS,HGVSc,HGVSp,PHENO


The resulting vcf contains these lines:
##VEP=v82 cache=/ensembl-tools/82/Linux/RHEL6/x86_64/variant_effect_predictor/cache//homo_sapiens/82_GRCh37 db=. polyphen=2.2.2 sift=sift5.2.2 COSMIC=71 ESP=20141103 gencode=GENCODE 19 HGMD-PUBLIC=20152 genebuild=2011-04 regbuild=13 assembly=GRCh37.p13 dbSNP=144 ClinVar=201507
##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence annotations from Ensembl VEP. Format: Consequence|Codons|Amino_acids|Gene|SYMBOL|Feature|EXON|PolyPhen|SIFT|Protein_position|BIOTYPE|CANONICAL|CCDS|HGVSc|HGVSp|PHENO”>
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	sample1A3	sample1A4	sample1A5	sample1A6	sample1A7	sample1A8	sample1A3	sample1A4	sample1A5	sample1A6	sample1A7	sample1A8
7	117199644	rs199826652	ATCT	A	3996.41	PASS	AC=4;AF=0.023;AN=24;BaseQRankSum=-0.941;ClippingRankSum=-0.033;DB;DP=7203;FS=2.556;InbreedingCoeff=-0.0257;MLEAC=5;MLEAF=0.023;MQ0=0;MQ=60.36;MQRankSum=0.129;QD=21.37;ReadPosRankSum=0.673;SOR=0.921;VQSLOD=1.51;culprit=SOR;CSQ=downstream_gene_variant|||ENSG00000232661|AC000111.3|ENST00000441019|||||antisense|YES||||,inframe_deletion|aTCTtt/att|IF/I|ENSG00000001626|CFTR|ENST00000426809|10/26|||477-478/1438|protein_coding|||ENST00000426809.1:c.1431_1433delCTT|ENSP00000389119.1:p.Phe478del|,inframe_deletion|aTCTtt/att|IF/I|ENSG00000001626|CFTR|ENST00000454343|10/26|||446-447/1419|protein_coding|||ENST00000454343.1:c.1338_1340delCTT|ENSP00000403677.1:p.Phe447del|,inframe_deletion|aTCTtt/att|IF/I|ENSG00000001626|CFTR|ENST00000003084|11/27|||507-508/1480|protein_coding|YES|CCDS5773.1|ENST00000003084.6:c.1521_1523delCTT|ENSP00000003084.6:p.Phe508del|,upstream_gene_variant|||ENSG00000001626|CFTR|ENST00000472848|||||processed_transcript|||||	GT:AD:DP:GQ:PL	0/0:4,0:4:9:0,9,135	0/0:10,0:10:24:0,24,360	0/0:3,0:3:5:0,5,86	0/1:10,12:22:99:441,0,474	0/1:9,18:27:99:729,0,424	0/0:23,0:23:63:0,63,945	0/0:22,0:22:66:0,66,714	0/0:33,0:33:84:0,84,1096	0/0:22,0:22:62:0,62,736	0/1:17,14:31:99:537,0,881	0/1:13,16:29:99:620,0,632	0/0:24,0:24:60:0,60,791


Thanks!



More information about the Dev mailing list