[ensembl-dev] VEP annotation discrepancy
Andrew Carson
acarson at invivoscribe.com
Wed Apr 16 23:06:45 BST 2014
Hi,
I noticed some strange behavior when annotating variants from different variant calling strategies. Basically, I call the same variant by 2 different strategies, but I get different annotations from VEP. The only difference between the two input variants is in the INFO/FORMAT/Genotype fields of the vcf file. But, for one of the outputs, the --pick option is not working. And there are some clear differences in the output.
Here is an example:
Input 1)
13 28602226 . AAG A . PASS END=28602228;HOMLEN=20;HOMSEQ=AGAGAGAGAGAGAGAGAGAG;SVLEN=-2;SVTYPE=DEL GT:AD 0/1:49
Input 2)
13 28602226 . AAG A . PASS ADP=72;WT=0;HET=1;HOM=0;NC=0 GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR 0/1:255:72:72:2:66:91.67%:4.0594E-37:33:32:1:1:40:26
As you can see, the only differences occur after column 8, but I don't think these should affect the annotation of the deletion.
When I run these two inputs through VEP using the following command:
perl /path/to/vep --fork 4 --no_stats --everything --cache -i input.vcf -o outpu.VEP.vcf --format vcf --force_overwrite --check_existing --check_alleles --vcf --no_progress --pubmed --gmaf --maf_1kg --pick
I get the following:
Output 1)
13 28602226 . AAG A . PASS END=28602228;HOMLEN=20;HOMSEQ=AGAGAGAGAGAGAGAGAGAG;SVLEN=-2;SVTYPE=DEL;CSQ=deletion|ENSG00000122025|ENST00000380987|Transcript|intron_variant&NMD_transcript_variant&feature_truncation||||||||||16/24||||||-1|||FLT3|HGNC||||nonsense_mediated_decay|ENSP00000370374|||||||||,deletion|ENSG00000122025|ENST00000241453|Transcript|intron_variant&feature_truncation||||||||||16/23||||||-1||YES|FLT3|HGNC||||protein_coding|ENSP00000241453||CCDS31953.1|||||||,deletion|ENSG00000122025|ENST00000537084|Transcript|intron_variant&feature_truncation||||||||||16/22||||||-1|||FLT3|HGNC||||protein_coding|ENSP00000438139|||||||||,deletion|ENSG00000122025|ENST00000380982|Transcript|intron_variant&feature_truncation||||||||||16/23||||||-1|||FLT3|HGNC||||protein_coding|ENSP00000370369||||||||| GT:AD 0/1:49
Output 2)
13 28602226 . AAG A . PASS ADP=72;WT=0;HET=1;HOM=0;NC=0;CSQ=-|ENSG00000122025|ENST00000241453|Transcript|intron_variant&feature_truncation||||||rs60462219||||16/23||||||-1||YES|FLT3|HGNC||||protein_coding|ENSP00000241453||CCDS31953.1|ENST00000241453.7:c.2053+87_2053+88delCT|||||| GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR 0/1:255:72:72:2:66:91.67%:4.0594E-37:33:32:1:1:40:26
As you can see, the first output isn't using --pick as it outputs multiple annotations. In addition, the annotations are slightly different from the "pick"d variant in output 2. The consequence changes from "deletion" to "-". And in output 2, I get the CDS: "ENST00000241453.7:c.2053+87_2053+88delCT" which is not provided in the annotation of output 1.
Is there are reason for this discrepancy? Is there something I can do to avoid getting these differences? Is my input or commands incorrect in this instance?
Any help would be greatly appreciated.
Thank you!
Andrew R. Carson, Ph.D.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20140416/c07f8913/attachment.html>
More information about the Dev
mailing list