[ensembl-dev] VEP Alleles and ALTs

Will McLaren wm2 at ebi.ac.uk
Thu Jun 1 16:04:17 BST 2017


Hi Nicolas,

This is a long-standing issue with converting between variants as they are
described in VCF and how they are described in Ensembl and therefore VEP.
It's discussed in part in [1].

By default, the leading base is trimmed from all alleles (with the start
coordinate adjusted accordingly) if and only if it is the same across all
REF and ALTs; otherwise it remains. You may force VEP to treat each REF/ALT
pair as a separate variant and trim identical sequence from both (which may
be more than one base) using --minimal [2]. This is not the default
behaviour as it may lead to some confusing coordinate changes.

To track which allele ends up where, the best solution is to use
--allele_number; this adds the index for the relevant allele from your
input to the output, regardless of how it is modified by VEP.

HTH

Will McLaren
Ensembl Variation

[1]: http://www.ensembl.org/info/docs/tools/vep/vep_formats.html#vcf
[2]:
http://www.ensembl.org/info/docs/tools/vep/script/vep_options.html#opt_minimal
[3]:
http://www.ensembl.org/info/docs/tools/vep/script/vep_options.html#opt_allele_number

On 1 June 2017 at 15:53, Nicolas Thierry-Mieg <
Nicolas.Thierry-Mieg at univ-grenoble-alpes.fr> wrote:

> Hello,
>
> I am trying to systematically match VEP consequences (based on the VEP
> Allele" field) to the correct ALT allele. This is a lot harder than it
> sounds, and gets really tricky with indels and/or when the VCF has several
> ALT alleles (on a single line).
>
> Here is an example input VCF:
>
> #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
> 1       69469   .       ACAATT  A,ACA   .       PASS
> 1       69469   .       ACAATT  A,ACA,T .       PASS
>
> For the first line, VEP uses "-" and "CA" for "Allele"; but for the second
> line they are "A", "ACA" and "T", although the first two ALT alleles are
> the same as in line 1! This shows that the content of the "Allele" field
> depends on the whole list of ALT alleles in the VCF...
>
>
> To make a long story short, I end up with the following rule to construct
> the "Allele" field from VEP's CSQ:
> if the first nucleotides of the REF allele and of all the ALT alleles are
> the same, then this nucleotide is omitted from VEP's "Allele" field.
>
> Is this correct?
>
> Note that for reverse-engineering and testing this, I used an older VEP
> release (v81). Perhaps my rule is no longer valid... I wanted to test with
> the latest VEP version, but I'm having issues installing it, as discussed
> in another thread.
>
> Regards,
> Nicolas
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20170601/ce770a07/attachment.html>


More information about the Dev mailing list