[ensembl-dev] VEP maf problems with certain variants

Will McLaren wm2 at ebi.ac.uk
Wed Aug 31 16:04:03 BST 2016


Hello,

This is a not uncommon situation on GRCh37 particularly, where the
reference allele is the least frequently observed. In situations like this
VEP doesn't currently deal well with consistently reporting frequencies.

The minor_allele and minor_allele_freq are just that; they represent the
least frequently observed allele and the frequency of that allele.

For the ExAC data (and other data sources like 1000 genomes), the
exac_allele represents the non-reference allele (regardless of its
minor/major status), and any frequencies refer to this allele.

The fields really should not be named *_maf, rather *_af, and should
consistently refer to the non-reference allele as typically this is the one
of interest to users. We are working on correcting these behaviours in a
future VEP release.

Hopefully that clarifies things!

I can't really explain what's going on with the ClinVar page - possibly
this is a consequence of the RefSeq transcript sequence differing from the
reference genome.

Regards

Will McLaren
Ensembl Variation


On 31 August 2016 at 15:39, Wolf Beat <Beat.Wolf at hefr.ch> wrote:

> Hello,
>
> i stumbled uppon an issue i don't really know how to solve right now.
>
> When looking at rs4784677:
> http://grch37.ensembl.org/Homo_sapiens/Variation/Explore?r=16:56548001-
> 56549001;v=rs4784677;vdb=variation;vf=105727621
>
> We see an MAF of < 0.01 and its pathogenic.
> The Variant is C>T.
>
> Then we look at clinvar for this variant:
> http://www.ncbi.nlm.nih.gov/clinvar/variation/4576/
>
> The same variant, but now its C>C, looks like some reference
> problem/mismatch.
>
> Looking at exac:
> http://exac.broadinstitute.org/variant/16-56548501-C-T
>
> We get an MAF of 0.9938, so its clearly a reference sequence problem.
>
> The problem arrises now when using VEP:
>
> http://grch37.rest.ensembl.org/vep/human/id/rs4784677?
> content-type=application/json
>
> There we find:
> "minor_allele_freq":0.0036
> "allele_string":"C/T"
> as well as:
> "minor_allele":"C"
> "exac_allele":"T"
> "exac_maf":0.994
>
> So both the "general" maf and the exac maf seem to talk about the same rs
> number, but in reality they are not.
>
> How do i solve this? Can i solve this? Its not the first time i have seen
> this but i never really investigated the reasons before.
>
> Should i take the higher MAF between the two? Should i take the exac one?
> I'm afraid by using such a workaround i will just run into trouble
> elsewhere where the situation is reversed.
>
> Thank you for your help
>
> Kind regards
>
> Beat Wolf
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160831/6d45e741/attachment.html>


More information about the Dev mailing list