[ensembl-dev] VEP: minimal representation

Joseph A Prinz jp102 at duke.edu
Mon Aug 17 17:16:28 BST 2015


Hi Will,

Thank you for your excellent explanation--this helps quite a bit. I hope to make the time to sit down with the plugin and the API to better understand their workings.

Best,
Joey
________________________________
From: dev-bounces at ensembl.org <dev-bounces at ensembl.org> on behalf of Will McLaren <wm2 at ebi.ac.uk>
Sent: Monday, August 17, 2015 11:39 AM
To: Ensembl developers list
Subject: Re: [ensembl-dev] VEP: minimal representation

Hi Joey,

VEP treats each individual ALT allele from an input VCF separately; this applies to calling consequence types and to handling the data in plugins like the ExAC one.

The ExAC plugin will only report the frequency of the currently processed allele (if available in the ExAC VCF). This can involve a bit of internal gymnastics when insertions or deletions are being considered; see http://www.ensembl.org/info/docs/tools/vep/vep_formats.html#vcf for how VEP handles VCF unbalanced variants. Within the ExAC plugin the same allele/coord transformation is applied, so if for example your input is:

chr1 1000 ins1 A AC

and in the ExAC VCF you have

chr1 1000 var1 A AC,ACTT ....

the VEP will convert both to an insertion of "C" internally and match the alleles to report the frequencies.

However, I can foresee situations where this will slip up; the VEP only does the transformation if the first base of each ALT/REF allele is the same. So if one individual in the ExAC had a SNP at that position, e.g.

chr1 1000 var1 A AC,G

the transformation wouldn't happen, and the alleles wouldn't get matched up.

We do actually have a command line flag to attempt to deal with this sort of scenario, --minimal (see http://www.ensembl.org/info/docs/tools/vep/script/vep_options.html#opt_minimal). This converts each ALT allele to its minimal representation before processing by comparing with the REF allele, independently of any other ALTs, before the results are merged back together on the same VCF output line. This option was created in collaboration with the guys from ExAC to deal with some of the more horrendous VCF lines that appear from the project. However, currently the ExAC plugin is not configured to take advantage of this, though in theory I believe it could be.

Anyway, hope all that is of some help.

Regards

Will McLaren
Ensembl Variation

On 17 August 2015 at 16:20, Joseph A Prinz <jp102 at duke.edu<mailto:jp102 at duke.edu>> wrote:
Hi Ensembl-dev!

I am curious to know how VEP handles annotating multiple alternative alleles when processing a VCF file. Specifically, I am interested in knowing more about the workings of the ExAC plugin in this regard; but more broadly, I am interested to know how VEP deals with the potential ambiguities inherent in dealing with VCF formatted multiple alternatives. Did I miss this in the documentation?

My apologies if this subject has been discussed before.

Thanks!
Joey
_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.inf<http://www.ensembl.info/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150817/7ca026e7/attachment.html>


More information about the Dev mailing list