[ensembl-dev] Question regarding MAF frequencies from VEP
Svein Tore Koksrud Seljebotn
s.t.seljebotn at medisin.uio.no
Fri May 29 14:29:35 BST 2015
Hi,
I am trying to figure out some of the output I get from VEP (version 79)
when annotating vcf files. See end of email for input and command.
Please note, I am new to this field, so I might misunderstand a few
concepts...
For the variant (1 197390368 rs3902057 A G) I get the following
output:
CSQ=G|upstream_gene_variant|MODIFIER|CRB1|ENSG00000134376|Transcript|ENST00000480086|processed_transcript||||||||||rs3902057&RISN_CRB1:c.1410A>G|1|1573|1|HGNC|2343||||||||A:0.0803|G:0.7065&G:0.7065|G:0.9813&G:0.9813||G:1&G:1|G:0.999&G:0.999|G:1&G:1|G:0.7696&G:0.7696|G:0.9986&G:0.9986|||19339744||||
{rest of transcripts omitted...}
- This might be a silly question, but why is GMAF given for REF, while
the subpopulations are given for ALT? In my case I'm interested in the
frequency for the ALT, not the REF. I assume it's giving the minor
allele frequency always? But why is there a difference in the allele
given for GMAF vs e.g. AFR_MAF?
Looking at a later transcript for same variant, I see the following:
G|synonymous_variant|LOW|CRB1|23418|Transcript|NM_001193640.1|protein_coding|4/10||NM_001193640.1:c.1074A>G|NM_001193640.1:c.1074A>G(p.=)|1283|1074|358|L|ctA/ctG|rs3902057&RISN_CRB1:c.1410A>G|1||1|||||NP_001180569.1|rseq_mrna_nonmatch&rseq_cds_mismatch&rseq_ens_match_cds||||A:0.0803|G:0.7065&G:0.7065|G:0.9813&G:0.9813||G:1&G:1|G:0.999&G:0.999|G:1&G:1|G:0.7696&G:0.7696|G:0.9986&G:0.9986|||19339744||||,G|5_prime_UTR_variant|MODIFIER|CRB1|ENSG00000134376|Transcript|ENST00000367397|protein_coding|2/6||ENST00000367397.1:c.-448A>G||411|||||rs3902057&RISN_CRB1:c.1410A>G|1||1|HGNC|2343|||ENSP00000356367|||||A:0.0803|G:0.7065&G:0.7065|G:0.9813&G:0.9813||G:1&G:1|G:0.999&G:0.999|G:1&G:1|G:0.7696&G:0.7696|G:0.9986&G:0.9986|||19339744||||
- Why is the frequency for the subpopulation alleles repeated twice with
same value? Why not always give the frequency for all alleles?
Best regards,
Svein Tore Koksrud Seljebotn
**** Example VCF: *****
##fileformat=VCFv4.1
##INFO=<ID=class,Number=.,Type=String,Description="class">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT H02
1 197390368 rs3902057 A G 7128.77 .
AC=2;AF=1.00;AN=2;DB;DP=193;Dels=0.00;FS=0.000;HaplotypeScore=4.6974;MLEAC=2;MLEAF=1.00;MQ=70.00;MQ0=0;QD=29.21
GT:AD:DP:GQ:PL 1/1:0,192:193:99:7157,518,0
***** Command: *****
vep --cache --dir_cache=/work/VEP/cache/
--fasta=/work/human_g1k_v37_decoy.fasta --offline --sift=b --polyphen=b
--ccds --hgvs --numbers --domains --regulatory --canonical --protein
--biotype --gmaf --maf_1kg --maf_esp --pubmed --allow_non_variant
--fork=4 --vcf --allele_number --no_escape --failed=1 --no_stats
--merged --symbol -i testfile.vcf -o testfile.annotated.vcf
More information about the Dev
mailing list