[ensembl-dev] Confused by --gene_phenotype and PHENO in VEP v82
Will McLaren
wm2 at ebi.ac.uk
Mon Nov 9 09:44:21 GMT 2015
Hi Jessica,
On 9 November 2015 at 00:20, Jessica Chong <jxchong at uw.edu> wrote:
> I am trying to use the --gene_phenotype option in VEP v82 but I am having
> problems.
>
> 1) if a variant is in a gene that is associated with a particular
> phenotype (e.g. CFTR and cystic fibrosis), where does this information get
> stored in the resulting annotated vcf? I don’t see a corresponding field
> name listed as a possible extras column output on this page
> http://uswest.ensembl.org/info/docs/tools/vep/vep_formats.html#output
The phenotype name does not get stored in the output. Firstly, many
phenotype names are long and contain odd characters that can break file
format encoding. Secondly, many genes and/or variants have several
phenotypes associated with them, so to list all of them (and multiple times
in the case where you are reporting e.g. the same gene multiple times)
would cause the output file size to increase hugely.
Example for CFTR:
http://www.ensembl.org/Homo_sapiens/Gene/Phenotype?g=ENSG00000001626
>
>
> 2) if a variant itself is associated with a particular “phenotype,
> disease, or trait” then my understanding from the VEP output documentation
> is that I should expect this information to show up under the PHENO field?
>
I don't think this is stated anywhere in the documentation.
http://www.ensembl.org/info/docs/tools/vep/vep_formats.html#output shows
the output fields and their descriptions.
>
>
> I tried annotating a tiny vcf that just includes a variant in CFTR (and
> the variant is dF508, so it is definitely pathogenic and CFTR should
> certainly be associated with a phenotype as well on the gene level) but
> PHENO is always blank (and I don’t see any field mentioning cystic fibrosis
> as a phenotype/disease name).
>
Apologies, there seems to be an issue with the v82 caches in that they are
missing the necessary information to populate the fields for
--gene_phenotype. However, the field you want is GENE_PHENO, which should
show a binary flag indicating whether the gene is associated with a
phenotype.
I will look into why this information is missing.
Regards
Will McLaren
Ensembl Variation
>
>
> Here is what I ran:
> perl variant_effect_predictor/variant_effect_predictor.pl \
> -i CFTR.VT.vcf \
> -o CFTR.VT.VEP.vcf \
> --vcf --offline --cache \
> --dir_cache variant_effect_predictor/cache/ \
> --species homo_sapiens --assembly GRCh37 \
> --fasta Homo_sapiens_assembly19.fasta \
> --fork 8 --force_overwrite \
> --compress 'gunzip -c' \
> --sift b --polyphen b --symbol --numbers --biotype \
> --total_length --canonical --ccds --hgvs --shift_hgvs 1 --gene_phenotype \
> --fields
> Consequence,Codons,Amino_acids,Gene,SYMBOL,Feature,EXON,PolyPhen,SIFT,Protein_position,BIOTYPE,CANONICAL,CCDS,HGVSc,HGVSp,PHENO
>
>
> The resulting vcf contains these lines:
> ##VEP=v82
> cache=/ensembl-tools/82/Linux/RHEL6/x86_64/variant_effect_predictor/cache//homo_sapiens/82_GRCh37
> db=. polyphen=2.2.2 sift=sift5.2.2 COSMIC=71 ESP=20141103 gencode=GENCODE
> 19 HGMD-PUBLIC=20152 genebuild=2011-04 regbuild=13 assembly=GRCh37.p13
> dbSNP=144 ClinVar=201507
> ##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence annotations
> from Ensembl VEP. Format:
> Consequence|Codons|Amino_acids|Gene|SYMBOL|Feature|EXON|PolyPhen|SIFT|Protein_position|BIOTYPE|CANONICAL|CCDS|HGVSc|HGVSp|PHENO”>
> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT
> sample1A3 sample1A4 sample1A5 sample1A6 sample1A7
> sample1A8 sample1A3 sample1A4 sample1A5
> sample1A6 sample1A7 sample1A8
> 7 117199644 rs199826652 ATCT A 3996.41 PASS
> AC=4;AF=0.023;AN=24;BaseQRankSum=-0.941;ClippingRankSum=-0.033;DB;DP=7203;FS=2.556;InbreedingCoeff=-0.0257;MLEAC=5;MLEAF=0.023;MQ0=0;MQ=60.36;MQRankSum=0.129;QD=21.37;ReadPosRankSum=0.673;SOR=0.921;VQSLOD=1.51;culprit=SOR;CSQ=downstream_gene_variant|||ENSG00000232661|AC000111.3|ENST00000441019|||||antisense|YES||||,inframe_deletion|aTCTtt/att|IF/I|ENSG00000001626|CFTR|ENST00000426809|10/26|||477-478/1438|protein_coding|||ENST00000426809.1:c.1431_1433delCTT|ENSP00000389119.1:p.Phe478del|,inframe_deletion|aTCTtt/att|IF/I|ENSG00000001626|CFTR|ENST00000454343|10/26|||446-447/1419|protein_coding|||ENST00000454343.1:c.1338_1340delCTT|ENSP00000403677.1:p.Phe447del|,inframe_deletion|aTCTtt/att|IF/I|ENSG00000001626|CFTR|ENST00000003084|11/27|||507-508/1480|protein_coding|YES|CCDS5773.1|ENST00000003084.6:c.1521_1523delCTT|ENSP00000003084.6:p.Phe508del|,upstream_gene_variant|||ENSG00000001626|CFTR|ENST00000472848|||||processed_transcript|||||
> GT:AD:DP:GQ:PL 0/0:4,0:4:9:0,9,135 0/0:10,0:10:24:0,24,360
> 0/0:3,0:3:5:0,5,86 0/1:10,12:22:99:441,0,474
> 0/1:9,18:27:99:729,0,424 0/0:23,0:23:63:0,63,945
> 0/0:22,0:22:66:0,66,714 0/0:33,0:33:84:0,84,1096
> 0/0:22,0:22:62:0,62,736 0/1:17,14:31:99:537,0,881
> 0/1:13,16:29:99:620,0,632 0/0:24,0:24:60:0,60,791
>
>
> Thanks!
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20151109/2a47c369/attachment.html>
More information about the Dev
mailing list