[ensembl-dev] VEP breaking VCF format due to whitespace in INFO field

Noah Reinhardt noah.reinhardt at sickkids.ca
Tue Oct 25 15:58:44 BST 2016


Hello,

I have a VCF file that I have annotated with Annovar, and I have been experimenting with adding VEP's annotations to this VCF.  I am having some difficulty since Annovar will occasionally introduce spaces into the INFO field, and VEP is having a hard time handling these.

I am aware that VEP only supports VCF format 4.0 (source: http://useast.ensembl.org/info/docs/tools/vep/vep_formats.html), which does not allow spaces in the INFO field (alphanumeric string, see http://www.internationalgenome.org/wiki/Analysis/vcf4.0/).  Thus, Annovar's annotations and VEP are technically incompatible, but I would like to explore using both.

I have attached a single-line VCF with very minimal information (space.vcf).  In it, one of the annotations in the INFO field contains a space ("VT=nonsynonymous SNV").  If I attempt to decompose the VCF and annotate it with VEP (see attached script "vep.sh"), VEP will add its annotations to a "CSQ" field, but will interpret the space as the end of all INFO annotations, and will convert the space into a tab.  Of course, this will introduce a new column, which breaks the rest of my tooling.  See below:

Original variant:

1       877831  rs6672356       T       C       1413.77 PASS    VT=nonsynonymous SNV    GT:AD:DP:GQ:PL  1/1:0,48:48:99:1442,144,0

Annotated variant:

1       877831  rs6672356       T       C       1413.77 PASS    VT=nonsynonymous;CSQ=<vep info here...><new TAB here>SNV     GT:AD:DP:GQ:PL  1/1:0,48:48:99:1442,144,0


VEP introduces a tab before "SNV", which creates a new column.  It appears as though VEP is scanning the columns by whitespace (which includes spaces) instead of by TAB only.  Are there any solutions you can recommend?  At the moment, I am replacing spaces with underscores, but this appears to be an inelegant solution.  Any suggestions are appreciated.

Thanks,
Noah Reinhardt
Research Trainee in Computational Medicine
The Hospital for Sick Children

________________________________

This e-mail may contain confidential, personal and/or health information(information which may be subject to legal restrictions on use, retention and/or disclosure) for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this e-mail in error, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20161025/4e1fd768/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: space.vcf
Type: text/x-vcard
Size: 30600 bytes
Desc: space.vcf
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20161025/4e1fd768/attachment.vcf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: space-output.vcf
Type: text/x-vcard
Size: 32881 bytes
Desc: space-output.vcf
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20161025/4e1fd768/attachment-0001.vcf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vep.sh
Type: application/octet-stream
Size: 597 bytes
Desc: vep.sh
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20161025/4e1fd768/attachment.obj>


More information about the Dev mailing list