[ensembl-dev] VEP breaking VCF format due to whitespace in INFO field
Noah Reinhardt
noah.reinhardt at sickkids.ca
Tue Oct 25 15:58:44 BST 2016
Hello,
I have a VCF file that I have annotated with Annovar, and I have been experimenting with adding VEP's annotations to this VCF. I am having some difficulty since Annovar will occasionally introduce spaces into the INFO field, and VEP is having a hard time handling these.
I am aware that VEP only supports VCF format 4.0 (source: http://useast.ensembl.org/info/docs/tools/vep/vep_formats.html), which does not allow spaces in the INFO field (alphanumeric string, see http://www.internationalgenome.org/wiki/Analysis/vcf4.0/). Thus, Annovar's annotations and VEP are technically incompatible, but I would like to explore using both.
I have attached a single-line VCF with very minimal information (space.vcf). In it, one of the annotations in the INFO field contains a space ("VT=nonsynonymous SNV"). If I attempt to decompose the VCF and annotate it with VEP (see attached script "vep.sh"), VEP will add its annotations to a "CSQ" field, but will interpret the space as the end of all INFO annotations, and will convert the space into a tab. Of course, this will introduce a new column, which breaks the rest of my tooling. See below:
Original variant:
1 877831 rs6672356 T C 1413.77 PASS VT=nonsynonymous SNV GT:AD:DP:GQ:PL 1/1:0,48:48:99:1442,144,0
Annotated variant:
1 877831 rs6672356 T C 1413.77 PASS VT=nonsynonymous;CSQ=<vep info here...><new TAB here>SNV GT:AD:DP:GQ:PL 1/1:0,48:48:99:1442,144,0
VEP introduces a tab before "SNV", which creates a new column. It appears as though VEP is scanning the columns by whitespace (which includes spaces) instead of by TAB only. Are there any solutions you can recommend? At the moment, I am replacing spaces with underscores, but this appears to be an inelegant solution. Any suggestions are appreciated.
Thanks,
Noah Reinhardt
Research Trainee in Computational Medicine
The Hospital for Sick Children
________________________________
This e-mail may contain confidential, personal and/or health information(information which may be subject to legal restrictions on use, retention and/or disclosure) for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this e-mail in error, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20161025/4e1fd768/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: space.vcf
Type: text/x-vcard
Size: 30600 bytes
Desc: space.vcf
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20161025/4e1fd768/attachment.vcf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: space-output.vcf
Type: text/x-vcard
Size: 32881 bytes
Desc: space-output.vcf
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20161025/4e1fd768/attachment-0001.vcf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vep.sh
Type: application/octet-stream
Size: 597 bytes
Desc: vep.sh
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20161025/4e1fd768/attachment.obj>
More information about the Dev
mailing list