[ensembl-dev] Understanding VEP Rest output

Wolf Beat Beat.Wolf at hefr.ch
Wed Apr 11 08:05:57 BST 2018


Thank you very much for the fast answer and fix to the problem. Also thank you for the heads up for the API changes in the next version.


I do have a small request though. Is there a way to disable the whole transcript consequences part of the REST VEP api? I know that sounds counter intuitive, but i'm only interested in the allele frequencies.


While we are on the subject of requests, i could further optimize my code and reduce the stress on the REST server by being able to combine two of my queries.


Currently i'm first searching for variants overlapping a certain region (where my variants of interest are) using the overlap/region entry point.

I then determine the ID of all my variants and then querry the VEP endpoint to get the MAF numbers.

If there was an optional way to get the MAF numbers directly with the overlap endpoint, i would not have to query the VEP endpoint.


Kind regards


Beat Wolf

________________________________
From: Dev <dev-bounces at ensembl.org> on behalf of Anja Thormann <anja at ebi.acuk>
Sent: Tuesday, April 10, 2018 6:11:34 PM
To: Ensembl developers list
Subject: Re: [ensembl-dev] Understanding VEP Rest output


Dear Beat,

First of all thank you for reporting the problem about the strange allele frequencies. This has been corrected. Please check again the REST output: http://rest.ensembl.org/vep/hsapiens/id/rs351771?ExAC=1&content-type=application/json

Secondly, we recently replaced the usage of _maf with _af because of the exact reasons for confusion which you mention. We are also no longer reporting the minor allele and and minor allele frequency with the VEP.

Unfortunately, we overlooked the usage of maf in our REST output.

We will update the REST output to match the output format of the VEP script output. This is planned for release/94.

To summarise: When annotating a variant with the VEP, the VEP reports allele frequencies from co-located variants. Frequencies are only reported for the non-reference input allele.

The new keys in the REST output will be:

AF (global allele frequency (AF) from 1000 Genomes Phase 3)
MAX_AF, MAX_AF_POPS (Report the highest allele frequency observed in any population from 1000 genomes, ESP or gnomAD.)
AFR_AF, AMR_AF, EAS_AF, EUR_AF, SAS_AF (allele frequency from continental populations (AFR,AMR,EAS,EUR,SAS) of 1000 Genomes Phase 3)
AA_AF, EA_AF (allele frequency from NHLBI-ESP)
gnomAD_AF, gnomAD_AFR_AF, gnomAD_AMR_AF, gnomAD_ASJ_AF, gnomAD_EAS_AF, gnomAD_FIN_AF, gnomAD_NFE_AF, gnomAD_OTH_AF, gnomAD_SAS_AF (allele frequency from Genome Aggregation Database (gnomAD))

Here is a more detailed description of frequency related output fields: https://www.ensembl.org/info/docs/tools/vep/script/vep_options.html#existing

For now please don’t be confused by the usage of maf in our REST output.

minor_allele_freq and minor_allele: refer to the frequency of the second most common allele at the position where a sequence variant (such as a SNP) has been identified. In Ensembl, the global MAF is calculated using the allele frequencies across all 1000 Genomes Phase 3 populations.

population_maf and population_allele match the input non-reference allele and report the allele frequency as reported for the respective population


Please let me know if you have any further questions,

Kind regards,
Anja

On 10 Apr 2018, at 12:35, Wolf Beat <Beat.Wolf at hefr.ch<mailto:Beat.Wolf at hefr.ch>> wrote:

Hi, i have a question out the minor_allele field of the colocated_variants field in the VEP response.


for rs351771 (G>A) i don't fully understand the logic of the answer:


http://rest.ensembl.org/vep/hsapiens/id/rs351771?ExAC=1&content-type=application/json


What i have trouble understanding is the minor_allele for 1000 genomes (thats minor_allele i think) and for the others.

For all, except minor_alelle, the minor allele is "A". But for minor_allele its "G".


>From what i can guess, the variant is a little borderline between a variant and a false reference, meaning, the reference sequence might actually be the minor_allele.


But what i don't understand are the numbers from gnomad etc. reported by VEP. If i read the http://gnomad.broadinstitute.org/variant/5-112164561-G-A website correctly, the frequency of A should be quite high, 50%+. Yet VEP tells me that the frequency of A in gnomad is way under 1%.


Am I missing something here? Can somebody help me to better understand the output of VEP?


Kind regards


Beat Wolf
_______________________________________________
Dev mailing list    Dev at ensembl.org
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/




More information about the Dev mailing list