[ensembl-dev] VEP: --minimal doesn't seem to do its thing

Will McLaren wm2 at ebi.ac.uk
Mon Sep 21 09:48:12 BST 2015


Hi Cyriac,

--minimal does not reformat your VCF, but rather uses the minimised alleles
and positions when calculating consequences. Apologies if this is not clear
in the documentation.

The VEP output for the first line (on GRCh37, using --minimal) is:

1       198498326       .       ATATAT  ATATATAT        .       .
CSQ=AT|intron_variant|MODIFIER.....

compared to not using --minimal

1       198498326       .       ATATAT  ATATATAT        .       .
CSQ=TATATAT|intron_variant|MODIFIER.....

so it has been reformatted internally to an insertion of AT (Ensembl treats
this as REF="-", ALT="AT", see
http://www.ensembl.org/info/docs/tools/vep/vep_formats.html#vcf).

The reason we don't reformat the VCF is that each ALT is treated
separately, which might result in a different effective POS being used for
each ALT. This is quite commonly seen in the ExAC project data, which was
in fact the reason why we have introduced this flag as it is.

You can track which allele is which from the VCF input using
--allele_numbers.

Regards

Will McLaren
Ensembl Variation

On 18 September 2015 at 23:34, Cyriac Kandoth <kandoth at cbio.mskcc.org>
wrote:

> This might be limited to VCFs. I haven't tested it on other input formats.
> Here 2 sample variants that I ran through VEP in offline cached mode...
>
> #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT
>  TUMOR   NORMAL
> 1       198498326       .       ATATAT  ATATATAT        .       .       .
>       GT:AD:DP        0/1:10,10:20       0/0:30,5:35
> 1       198498325       .       AATATAT A,AATATATAT     .       .       .
>       GT:AD:DP        0/2:10,0,10:20  0/0:30,0,5:35
>
> Here is the full command used:
> perl variant_effect_predictor.pl --species homo_sapiens --assembly GRCh37
> --offline --no_progress --no_stats --sift b --ccds --uniprot --hgvs
> --symbol --numbers --domains --regulatory --canonical --protein --biotype
> --uniprot --tsl --pubmed --variant_class --shift_hgvs 1 --check_existing
> --check_alleles --check_ref --total_length --allele_number --no_escape
> --xref_refseq --failed 1 --vcf --minimal --flag_pick_allele --pick_order
> canonical,tsl,biotype,rank,ccds,length --dir <vep_data> --fasta <ref_fasta>
> --input_file <input_vcf> --output_file <output_vcf>
>
> This is the way I would expect the output VCF to be re-formatted...
>
> #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT
>  TUMOR   NORMAL
> 1       198498331       .       T  TAT        .       .       .
> GT:AD:DP        0/1:10,10:20       0/0:30,5:35
> 1       198498325       .       AATATAT A,AATATATAT     .       .       .
>       GT:AD:DP        0/2:10,0,10:20  0/0:30,0,5:35
>
> Thanks,
> Cyriac
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150921/769b711a/attachment.html>


More information about the Dev mailing list