[ensembl-dev] VEP maf problems with certain variants

Wolf Beat Beat.Wolf at hefr.ch
Wed Aug 31 16:12:16 BST 2016


Thank you for your answer.

Is there a bugtracker where i can track issues i submit? there was another one a couple of weeks back and i would like to stay up to date.

So i guess there is no way for me to detect such a situation client side? Well i guess i can detect when the maf and exac totally disagree, but that won't tell me which one is "correct".
Just trying to figure out how to correctly annotate variants in the meantime.

Kind regards

Beat Wolf
________________________________________
From: dev-bounces at ensembl.org [dev-bounces at ensembl.org] on behalf of Will McLaren [wm2 at ebi.ac.uk]
Sent: Wednesday, August 31, 2016 5:04 PM
To: Ensembl developers list
Subject: Re: [ensembl-dev] VEP maf problems with certain variants

Hello,

This is a not uncommon situation on GRCh37 particularly, where the reference allele is the least frequently observed. In situations like this VEP doesn't currently deal well with consistently reporting frequencies.

The minor_allele and minor_allele_freq are just that; they represent the least frequently observed allele and the frequency of that allele.

For the ExAC data (and other data sources like 1000 genomes), the exac_allele represents the non-reference allele (regardless of its minor/major status), and any frequencies refer to this allele.

The fields really should not be named *_maf, rather *_af, and should consistently refer to the non-reference allele as typically this is the one of interest to users. We are working on correcting these behaviours in a future VEP release.

Hopefully that clarifies things!

I can't really explain what's going on with the ClinVar page - possibly this is a consequence of the RefSeq transcript sequence differing from the reference genome.

Regards

Will McLaren
Ensembl Variation


On 31 August 2016 at 15:39, Wolf Beat <Beat.Wolf at hefr.ch<mailto:Beat.Wolf at hefr.ch>> wrote:
Hello,

i stumbled uppon an issue i don't really know how to solve right now.

When looking at rs4784677:
http://grch37.ensembl.org/Homo_sapiens/Variation/Explore?r=16:56548001-56549001;v=rs4784677;vdb=variation;vf=105727621

We see an MAF of < 0.01 and its pathogenic.
The Variant is C>T.

Then we look at clinvar for this variant:
http://www.ncbi.nlm.nih.gov/clinvar/variation/4576/

The same variant, but now its C>C, looks like some reference problem/mismatch.

Looking at exac:
http://exac.broadinstitute.org/variant/16-56548501-C-T

We get an MAF of 0.9938, so its clearly a reference sequence problem.

The problem arrises now when using VEP:

http://grch37.rest.ensembl.org/vep/human/id/rs4784677?content-type=application/json

There we find:
"minor_allele_freq":0.0036
"allele_string":"C/T"
as well as:
"minor_allele":"C"
"exac_allele":"T"
"exac_maf":0.994

So both the "general" maf and the exac maf seem to talk about the same rs number, but in reality they are not.

How do i solve this? Can i solve this? Its not the first time i have seen this but i never really investigated the reasons before.

Should i take the higher MAF between the two? Should i take the exac one? I'm afraid by using such a workaround i will just run into trouble elsewhere where the situation is reversed.

Thank you for your help

Kind regards

Beat Wolf
_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/





More information about the Dev mailing list