[ensembl-dev] VEP Alleles and ALTs

Katarzyna Hutnik katarzyna.hutnik at oncology.ox.ac.uk
Thu Jun 1 16:07:02 BST 2017


Hi
Could you please kindly remove my address from your mailing list?
Thank you
Katarzyna

Katarzyna Hutnik
University of Oxford
Department of Oncology
Old Road Campus Research Building
OX3 7DQ
Oxford
01865 617 423
________________________________
From: Dev [dev-bounces at ensembl.org] on behalf of Will McLaren [wm2 at ebi.ac.uk]
Sent: 01 June 2017 4:04 PM
To: Ensembl developers list
Subject: Re: [ensembl-dev] VEP Alleles and ALTs

Hi Nicolas,

This is a long-standing issue with converting between variants as they are described in VCF and how they are described in Ensembl and therefore VEP. It's discussed in part in [1].

By default, the leading base is trimmed from all alleles (with the start coordinate adjusted accordingly) if and only if it is the same across all REF and ALTs; otherwise it remains. You may force VEP to treat each REF/ALT pair as a separate variant and trim identical sequence from both (which may be more than one base) using --minimal [2]. This is not the default behaviour as it may lead to some confusing coordinate changes.

To track which allele ends up where, the best solution is to use --allele_number; this adds the index for the relevant allele from your input to the output, regardless of how it is modified by VEP.

HTH

Will McLaren
Ensembl Variation

[1]: http://www.ensembl.org/info/docs/tools/vep/vep_formats.html#vcf
[2]: http://www.ensembl.org/info/docs/tools/vep/script/vep_options.html#opt_minimal
[3]: http://www.ensembl.org/info/docs/tools/vep/script/vep_options.html#opt_allele_number

On 1 June 2017 at 15:53, Nicolas Thierry-Mieg <Nicolas.Thierry-Mieg at univ-grenoble-alpes.fr<mailto:Nicolas.Thierry-Mieg at univ-grenoble-alpes.fr>> wrote:
Hello,

I am trying to systematically match VEP consequences (based on the VEP Allele" field) to the correct ALT allele. This is a lot harder than it sounds, and gets really tricky with indels and/or when the VCF has several ALT alleles (on a single line).

Here is an example input VCF:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
1       69469   .       ACAATT  A,ACA   .       PASS
1       69469   .       ACAATT  A,ACA,T .       PASS

For the first line, VEP uses "-" and "CA" for "Allele"; but for the second line they are "A", "ACA" and "T", although the first two ALT alleles are the same as in line 1! This shows that the content of the "Allele" field depends on the whole list of ALT alleles in the VCF...


To make a long story short, I end up with the following rule to construct the "Allele" field from VEP's CSQ:
if the first nucleotides of the REF allele and of all the ALT alleles are the same, then this nucleotide is omitted from VEP's "Allele" field.

Is this correct?

Note that for reverse-engineering and testing this, I used an older VEP release (v81). Perhaps my rule is no longer valid... I wanted to test with the latest VEP version, but I'm having issues installing it, as discussed in another thread.

Regards,
Nicolas


_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20170601/c13129db/attachment.html>


More information about the Dev mailing list