[ensembl-dev] VEP: --minimal doesn't seem to do its thing
Will McLaren
wm2 at ebi.ac.uk
Mon Sep 21 09:48:12 BST 2015
Hi Cyriac,
--minimal does not reformat your VCF, but rather uses the minimised alleles
and positions when calculating consequences. Apologies if this is not clear
in the documentation.
The VEP output for the first line (on GRCh37, using --minimal) is:
1 198498326 . ATATAT ATATATAT . .
CSQ=AT|intron_variant|MODIFIER.....
compared to not using --minimal
1 198498326 . ATATAT ATATATAT . .
CSQ=TATATAT|intron_variant|MODIFIER.....
so it has been reformatted internally to an insertion of AT (Ensembl treats
this as REF="-", ALT="AT", see
http://www.ensembl.org/info/docs/tools/vep/vep_formats.html#vcf).
The reason we don't reformat the VCF is that each ALT is treated
separately, which might result in a different effective POS being used for
each ALT. This is quite commonly seen in the ExAC project data, which was
in fact the reason why we have introduced this flag as it is.
You can track which allele is which from the VCF input using
--allele_numbers.
Regards
Will McLaren
Ensembl Variation
On 18 September 2015 at 23:34, Cyriac Kandoth <kandoth at cbio.mskcc.org>
wrote:
> This might be limited to VCFs. I haven't tested it on other input formats.
> Here 2 sample variants that I ran through VEP in offline cached mode...
>
> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT
> TUMOR NORMAL
> 1 198498326 . ATATAT ATATATAT . . .
> GT:AD:DP 0/1:10,10:20 0/0:30,5:35
> 1 198498325 . AATATAT A,AATATATAT . . .
> GT:AD:DP 0/2:10,0,10:20 0/0:30,0,5:35
>
> Here is the full command used:
> perl variant_effect_predictor.pl --species homo_sapiens --assembly GRCh37
> --offline --no_progress --no_stats --sift b --ccds --uniprot --hgvs
> --symbol --numbers --domains --regulatory --canonical --protein --biotype
> --uniprot --tsl --pubmed --variant_class --shift_hgvs 1 --check_existing
> --check_alleles --check_ref --total_length --allele_number --no_escape
> --xref_refseq --failed 1 --vcf --minimal --flag_pick_allele --pick_order
> canonical,tsl,biotype,rank,ccds,length --dir <vep_data> --fasta <ref_fasta>
> --input_file <input_vcf> --output_file <output_vcf>
>
> This is the way I would expect the output VCF to be re-formatted...
>
> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT
> TUMOR NORMAL
> 1 198498331 . T TAT . . .
> GT:AD:DP 0/1:10,10:20 0/0:30,5:35
> 1 198498325 . AATATAT A,AATATATAT . . .
> GT:AD:DP 0/2:10,0,10:20 0/0:30,0,5:35
>
> Thanks,
> Cyriac
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150921/769b711a/attachment.html>
More information about the Dev
mailing list