[ensembl-dev] VEP Alleles and ALTs

Nicolas Thierry-Mieg Nicolas.Thierry-Mieg at univ-grenoble-alpes.fr
Wed Jun 14 14:36:33 BST 2017


Hi Will,

thanks for your explanations.

You confirmed that I had the default VEP Allele-generation process 
figured out, and more importantly you set me on the correct path with 
--allele_number.

I am actually working on the gnomad VCF and the above process wasn't 
working well. But it turns out that they apparently used --minimal 
--allele_number ; they have an ALLELE_NUM field, and with that I believe 
I am all set.

Posting this to the list as a heads-up for anyone working with the 
gnomad VCF: use ALLELE_NUM and you should be fine.

Thanks again!

Regards,
Nicolas



On 06/01/2017 05:04 PM, Will McLaren wrote:
> Hi Nicolas,
>
> This is a long-standing issue with converting between variants as they
> are described in VCF and how they are described in Ensembl and therefore
> VEP. It's discussed in part in [1].
>
> By default, the leading base is trimmed from all alleles (with the start
> coordinate adjusted accordingly) if and only if it is the same across
> all REF and ALTs; otherwise it remains. You may force VEP to treat each
> REF/ALT pair as a separate variant and trim identical sequence from both
> (which may be more than one base) using --minimal [2]. This is not the
> default behaviour as it may lead to some confusing coordinate changes.
>
> To track which allele ends up where, the best solution is to use
> --allele_number; this adds the index for the relevant allele from your
> input to the output, regardless of how it is modified by VEP.
>
> HTH
>
> Will McLaren
> Ensembl Variation
>
> [1]: http://www.ensembl.org/info/docs/tools/vep/vep_formats.html#vcf
> [2]: http://www.ensembl.org/info/docs/tools/vep/script/vep_options.html#opt_minimal
> [3]:
> http://www.ensembl.org/info/docs/tools/vep/script/vep_options.html#opt_allele_number
>
> On 1 June 2017 at 15:53, Nicolas Thierry-Mieg
> <Nicolas.Thierry-Mieg at univ-grenoble-alpes.fr
> <mailto:Nicolas.Thierry-Mieg at univ-grenoble-alpes.fr>> wrote:
>
>     Hello,
>
>     I am trying to systematically match VEP consequences (based on the
>     VEP Allele" field) to the correct ALT allele. This is a lot harder
>     than it sounds, and gets really tricky with indels and/or when the
>     VCF has several ALT alleles (on a single line).
>
>     Here is an example input VCF:
>
>     #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
>     1       69469   .       ACAATT  A,ACA   .       PASS
>     1       69469   .       ACAATT  A,ACA,T .       PASS
>
>     For the first line, VEP uses "-" and "CA" for "Allele"; but for the
>     second line they are "A", "ACA" and "T", although the first two ALT
>     alleles are the same as in line 1! This shows that the content of
>     the "Allele" field depends on the whole list of ALT alleles in the
>     VCF...
>
>
>     To make a long story short, I end up with the following rule to
>     construct the "Allele" field from VEP's CSQ:
>     if the first nucleotides of the REF allele and of all the ALT
>     alleles are the same, then this nucleotide is omitted from VEP's
>     "Allele" field.
>
>     Is this correct?
>
>     Note that for reverse-engineering and testing this, I used an older
>     VEP release (v81). Perhaps my rule is no longer valid... I wanted to
>     test with the latest VEP version, but I'm having issues installing
>     it, as discussed in another thread.
>
>     Regards,
>     Nicolas
>



More information about the Dev mailing list