[ensembl-dev] VEP Alleles and ALTs
Nicolas Thierry-Mieg
Nicolas.Thierry-Mieg at univ-grenoble-alpes.fr
Wed Jun 14 14:36:33 BST 2017
Hi Will,
thanks for your explanations.
You confirmed that I had the default VEP Allele-generation process
figured out, and more importantly you set me on the correct path with
--allele_number.
I am actually working on the gnomad VCF and the above process wasn't
working well. But it turns out that they apparently used --minimal
--allele_number ; they have an ALLELE_NUM field, and with that I believe
I am all set.
Posting this to the list as a heads-up for anyone working with the
gnomad VCF: use ALLELE_NUM and you should be fine.
Thanks again!
Regards,
Nicolas
On 06/01/2017 05:04 PM, Will McLaren wrote:
> Hi Nicolas,
>
> This is a long-standing issue with converting between variants as they
> are described in VCF and how they are described in Ensembl and therefore
> VEP. It's discussed in part in [1].
>
> By default, the leading base is trimmed from all alleles (with the start
> coordinate adjusted accordingly) if and only if it is the same across
> all REF and ALTs; otherwise it remains. You may force VEP to treat each
> REF/ALT pair as a separate variant and trim identical sequence from both
> (which may be more than one base) using --minimal [2]. This is not the
> default behaviour as it may lead to some confusing coordinate changes.
>
> To track which allele ends up where, the best solution is to use
> --allele_number; this adds the index for the relevant allele from your
> input to the output, regardless of how it is modified by VEP.
>
> HTH
>
> Will McLaren
> Ensembl Variation
>
> [1]: http://www.ensembl.org/info/docs/tools/vep/vep_formats.html#vcf
> [2]: http://www.ensembl.org/info/docs/tools/vep/script/vep_options.html#opt_minimal
> [3]:
> http://www.ensembl.org/info/docs/tools/vep/script/vep_options.html#opt_allele_number
>
> On 1 June 2017 at 15:53, Nicolas Thierry-Mieg
> <Nicolas.Thierry-Mieg at univ-grenoble-alpes.fr
> <mailto:Nicolas.Thierry-Mieg at univ-grenoble-alpes.fr>> wrote:
>
> Hello,
>
> I am trying to systematically match VEP consequences (based on the
> VEP Allele" field) to the correct ALT allele. This is a lot harder
> than it sounds, and gets really tricky with indels and/or when the
> VCF has several ALT alleles (on a single line).
>
> Here is an example input VCF:
>
> #CHROM POS ID REF ALT QUAL FILTER INFO
> 1 69469 . ACAATT A,ACA . PASS
> 1 69469 . ACAATT A,ACA,T . PASS
>
> For the first line, VEP uses "-" and "CA" for "Allele"; but for the
> second line they are "A", "ACA" and "T", although the first two ALT
> alleles are the same as in line 1! This shows that the content of
> the "Allele" field depends on the whole list of ALT alleles in the
> VCF...
>
>
> To make a long story short, I end up with the following rule to
> construct the "Allele" field from VEP's CSQ:
> if the first nucleotides of the REF allele and of all the ALT
> alleles are the same, then this nucleotide is omitted from VEP's
> "Allele" field.
>
> Is this correct?
>
> Note that for reverse-engineering and testing this, I used an older
> VEP release (v81). Perhaps my rule is no longer valid... I wanted to
> test with the latest VEP version, but I'm having issues installing
> it, as discussed in another thread.
>
> Regards,
> Nicolas
>
More information about the Dev
mailing list