[ensembl-dev] VEP annotation discrepancy

Andrew Carson acarson at invivoscribe.com
Wed Apr 16 23:06:45 BST 2014


Hi,
I noticed some strange behavior when annotating variants from different variant calling strategies. Basically, I call the same variant by 2 different strategies, but I get different annotations from VEP. The only difference between the two input variants is in the INFO/FORMAT/Genotype fields of the vcf file. But, for one of the outputs, the --pick option is not working. And there are some clear differences in the output.

Here is an example:

Input 1)
13      28602226        .     AAG     A .       PASS END=28602228;HOMLEN=20;HOMSEQ=AGAGAGAGAGAGAGAGAGAG;SVLEN=-2;SVTYPE=DEL     GT:AD   0/1:49

Input 2)
13      28602226        .     AAG     A .       PASS ADP=72;WT=0;HET=1;HOM=0;NC=0       GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR    0/1:255:72:72:2:66:91.67%:4.0594E-37:33:32:1:1:40:26

As you can see, the only differences occur after column 8, but I don't think these should affect the annotation of the deletion.

When I run these two inputs through VEP using the following command:

perl /path/to/vep --fork 4 --no_stats --everything --cache -i input.vcf -o outpu.VEP.vcf --format vcf --force_overwrite --check_existing --check_alleles --vcf --no_progress --pubmed --gmaf --maf_1kg --pick

I get the following:

Output 1)
13      28602226        .     AAG     A .       PASS END=28602228;HOMLEN=20;HOMSEQ=AGAGAGAGAGAGAGAGAGAG;SVLEN=-2;SVTYPE=DEL;CSQ=deletion|ENSG00000122025|ENST00000380987|Transcript|intron_variant&NMD_transcript_variant&feature_truncation||||||||||16/24||||||-1|||FLT3|HGNC||||nonsense_mediated_decay|ENSP00000370374|||||||||,deletion|ENSG00000122025|ENST00000241453|Transcript|intron_variant&feature_truncation||||||||||16/23||||||-1||YES|FLT3|HGNC||||protein_coding|ENSP00000241453||CCDS31953.1|||||||,deletion|ENSG00000122025|ENST00000537084|Transcript|intron_variant&feature_truncation||||||||||16/22||||||-1|||FLT3|HGNC||||protein_coding|ENSP00000438139|||||||||,deletion|ENSG00000122025|ENST00000380982|Transcript|intron_variant&feature_truncation||||||||||16/23||||||-1|||FLT3|HGNC||||protein_coding|ENSP00000370369|||||||||  GT:AD   0/1:49

Output 2)
13      28602226        .     AAG     A .       PASS ADP=72;WT=0;HET=1;HOM=0;NC=0;CSQ=-|ENSG00000122025|ENST00000241453|Transcript|intron_variant&feature_truncation||||||rs60462219||||16/23||||||-1||YES|FLT3|HGNC||||protein_coding|ENSP00000241453||CCDS31953.1|ENST00000241453.7:c.2053+87_2053+88delCT||||||   GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR    0/1:255:72:72:2:66:91.67%:4.0594E-37:33:32:1:1:40:26

As you can see, the first output isn't using --pick as it outputs multiple annotations. In addition, the annotations are slightly different from the "pick"d variant in output 2. The consequence changes from "deletion" to "-". And in output 2, I get the CDS: "ENST00000241453.7:c.2053+87_2053+88delCT" which is not provided in the annotation of output 1.

Is there are reason for this discrepancy? Is there something I can do to avoid getting these differences? Is my input or commands incorrect in this instance?

Any help would be greatly appreciated.
Thank you!

Andrew R. Carson, Ph.D.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20140416/c07f8913/attachment.html>


More information about the Dev mailing list