[ensembl-dev] Question regarding MAF frequencies from VEP

Svein Tore Koksrud Seljebotn s.t.seljebotn at medisin.uio.no
Fri May 29 14:29:35 BST 2015


Hi,

I am trying to figure out some of the output I get from VEP (version 79) 
when annotating vcf files. See end of email for input and command. 
Please note, I am new to this field, so I might misunderstand a few 
concepts...

For the variant (1   197390368   rs3902057   A   G) I get the following 
output:

CSQ=G|upstream_gene_variant|MODIFIER|CRB1|ENSG00000134376|Transcript|ENST00000480086|processed_transcript||||||||||rs3902057&RISN_CRB1:c.1410A>G|1|1573|1|HGNC|2343||||||||A:0.0803|G:0.7065&G:0.7065|G:0.9813&G:0.9813||G:1&G:1|G:0.999&G:0.999|G:1&G:1|G:0.7696&G:0.7696|G:0.9986&G:0.9986|||19339744|||| 
{rest of transcripts omitted...}

- This might be a silly question, but why is GMAF given for REF, while 
the subpopulations are given for ALT? In my case I'm interested in the 
frequency for the ALT, not the REF. I assume it's giving the minor 
allele frequency always? But why is there a difference in the allele 
given for GMAF vs e.g. AFR_MAF?

Looking at a later transcript for same variant, I see the following:

G|synonymous_variant|LOW|CRB1|23418|Transcript|NM_001193640.1|protein_coding|4/10||NM_001193640.1:c.1074A>G|NM_001193640.1:c.1074A>G(p.=)|1283|1074|358|L|ctA/ctG|rs3902057&RISN_CRB1:c.1410A>G|1||1|||||NP_001180569.1|rseq_mrna_nonmatch&rseq_cds_mismatch&rseq_ens_match_cds||||A:0.0803|G:0.7065&G:0.7065|G:0.9813&G:0.9813||G:1&G:1|G:0.999&G:0.999|G:1&G:1|G:0.7696&G:0.7696|G:0.9986&G:0.9986|||19339744||||,G|5_prime_UTR_variant|MODIFIER|CRB1|ENSG00000134376|Transcript|ENST00000367397|protein_coding|2/6||ENST00000367397.1:c.-448A>G||411|||||rs3902057&RISN_CRB1:c.1410A>G|1||1|HGNC|2343|||ENSP00000356367|||||A:0.0803|G:0.7065&G:0.7065|G:0.9813&G:0.9813||G:1&G:1|G:0.999&G:0.999|G:1&G:1|G:0.7696&G:0.7696|G:0.9986&G:0.9986|||19339744||||

- Why is the frequency for the subpopulation alleles repeated twice with 
same value? Why not always give the frequency for all alleles?


Best regards,
Svein Tore Koksrud Seljebotn




**** Example VCF: *****

##fileformat=VCFv4.1
##INFO=<ID=class,Number=.,Type=String,Description="class">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO FORMAT    H02
1   197390368   rs3902057   A   G   7128.77 . 
AC=2;AF=1.00;AN=2;DB;DP=193;Dels=0.00;FS=0.000;HaplotypeScore=4.6974;MLEAC=2;MLEAF=1.00;MQ=70.00;MQ0=0;QD=29.21 
GT:AD:DP:GQ:PL  1/1:0,192:193:99:7157,518,0

***** Command: *****
vep --cache --dir_cache=/work/VEP/cache/ 
--fasta=/work/human_g1k_v37_decoy.fasta --offline --sift=b --polyphen=b 
--ccds --hgvs --numbers --domains --regulatory --canonical --protein 
--biotype --gmaf --maf_1kg --maf_esp --pubmed --allow_non_variant 
--fork=4 --vcf --allele_number --no_escape --failed=1 --no_stats 
--merged --symbol -i testfile.vcf -o testfile.annotated.vcf





More information about the Dev mailing list