[ensembl-dev] VEP: minimal representation

Will McLaren wm2 at ebi.ac.uk
Mon Aug 17 16:39:45 BST 2015


Hi Joey,

VEP treats each individual ALT allele from an input VCF separately; this
applies to calling consequence types and to handling the data in plugins
like the ExAC one.

The ExAC plugin will only report the frequency of the currently processed
allele (if available in the ExAC VCF). This can involve a bit of internal
gymnastics when insertions or deletions are being considered; see
http://www.ensembl.org/info/docs/tools/vep/vep_formats.html#vcf for how VEP
handles VCF unbalanced variants. Within the ExAC plugin the same
allele/coord transformation is applied, so if for example your input is:

chr1 1000 ins1 A AC

and in the ExAC VCF you have

chr1 1000 var1 A AC,ACTT ....

the VEP will convert both to an insertion of "C" internally and match the
alleles to report the frequencies.

However, I can foresee situations where this will slip up; the VEP only
does the transformation if the first base of each ALT/REF allele is the
same. So if one individual in the ExAC had a SNP at that position, e.g.

chr1 1000 var1 A AC,G

the transformation wouldn't happen, and the alleles wouldn't get matched up.

We do actually have a command line flag to attempt to deal with this sort
of scenario, --minimal (see
http://www.ensembl.org/info/docs/tools/vep/script/vep_options.html#opt_minimal).
This converts each ALT allele to its minimal representation before
processing by comparing with the REF allele, independently of any other
ALTs, before the results are merged back together on the same VCF output
line. This option was created in collaboration with the guys from ExAC to
deal with some of the more horrendous VCF lines that appear from the
project. However, currently the ExAC plugin is not configured to take
advantage of this, though in theory I believe it could be.

Anyway, hope all that is of some help.

Regards

Will McLaren
Ensembl Variation

On 17 August 2015 at 16:20, Joseph A Prinz <jp102 at duke.edu> wrote:

> Hi Ensembl-dev!
>
> I am curious to know how VEP handles annotating multiple alternative
> alleles when processing a VCF file. Specifically, I am interested in
> knowing more about the workings of the ExAC plugin in this regard; but more
> broadly, I am interested to know how VEP deals with the potential
> ambiguities inherent in dealing with VCF formatted multiple alternatives.
> Did I miss this in the documentation?
>
> My apologies if this subject has been discussed before.
>
> Thanks!
> Joey
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150817/83c41ee0/attachment.html>


More information about the Dev mailing list