[ensembl-dev] Understanding VEP Rest output

Anja Thormann anja at ebi.ac.uk
Tue Apr 10 17:11:34 BST 2018


Dear Beat,

First of all thank you for reporting the problem about the strange allele frequencies. This has been corrected. Please check again the REST output: http://rest.ensembl.org/vep/hsapiens/id/rs351771?ExAC=1&content-type=application/json <http://rest.ensembl.org/vep/hsapiens/id/rs351771?ExAC=1&content-type=application/json>

Secondly, we recently replaced the usage of _maf with _af because of the exact reasons for confusion which you mention. We are also no longer reporting the minor allele and and minor allele frequency with the VEP.

Unfortunately, we overlooked the usage of maf in our REST output.

We will update the REST output to match the output format of the VEP script output. This is planned for release/94.

To summarise: When annotating a variant with the VEP, the VEP reports allele frequencies from co-located variants. Frequencies are only reported for the non-reference input allele.  

The new keys in the REST output will be:

AF (global allele frequency (AF) from 1000 Genomes Phase 3)
MAX_AF, MAX_AF_POPS (Report the highest allele frequency observed in any population from 1000 genomes, ESP or gnomAD.)
AFR_AF, AMR_AF, EAS_AF, EUR_AF, SAS_AF (allele frequency from continental populations (AFR,AMR,EAS,EUR,SAS) of 1000 Genomes Phase 3)
AA_AF, EA_AF (allele frequency from NHLBI-ESP)
gnomAD_AF, gnomAD_AFR_AF, gnomAD_AMR_AF, gnomAD_ASJ_AF, gnomAD_EAS_AF, gnomAD_FIN_AF, gnomAD_NFE_AF, gnomAD_OTH_AF, gnomAD_SAS_AF (allele frequency from Genome Aggregation Database (gnomAD))

Here is a more detailed description of frequency related output fields: https://www.ensembl.org/info/docs/tools/vep/script/vep_options.html#existing <https://www.ensembl.org/info/docs/tools/vep/script/vep_options.html#existing>

For now please don’t be confused by the usage of maf in our REST output.

minor_allele_freq and minor_allele: refer to the frequency of the second most common allele at the position where a sequence variant (such as a SNP) has been identified. In Ensembl, the global MAF is calculated using the allele frequencies across all 1000 Genomes Phase 3 populations.

population_maf and population_allele match the input non-reference allele and report the allele frequency as reported for the respective population


Please let me know if you have any further questions,

Kind regards,
Anja

> On 10 Apr 2018, at 12:35, Wolf Beat <Beat.Wolf at hefr.ch> wrote:
> 
> Hi, i have a question out the minor_allele field of the colocated_variants field in the VEP response.
> 
> 
> for rs351771 (G>A) i don't fully understand the logic of the answer:
> 
> 
> http://rest.ensembl.org/vep/hsapiens/id/rs351771?ExAC=1&content-type=application/json
> 
> 
> What i have trouble understanding is the minor_allele for 1000 genomes (thats minor_allele i think) and for the others.
> 
> For all, except minor_alelle, the minor allele is "A". But for minor_allele its "G".
> 
> 
> From what i can guess, the variant is a little borderline between a variant and a false reference, meaning, the reference sequence might actually be the minor_allele.
> 
> 
> But what i don't understand are the numbers from gnomad etc. reported by VEP. If i read the http://gnomad.broadinstitute.org/variant/5-112164561-G-A website correctly, the frequency of A should be quite high, 50%+. Yet VEP tells me that the frequency of A in gnomad is way under 1%.
> 
> 
> Am I missing something here? Can somebody help me to better understand the output of VEP?
> 
> 
> Kind regards
> 
> 
> Beat Wolf
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20180410/df065ff1/attachment.html>


More information about the Dev mailing list