[ensembl-dev] VEP breaking VCF format due to whitespace in INFO field

Will McLaren wm2 at ebi.ac.uk
Tue Oct 25 16:26:41 BST 2016


Hi Noah,

Thanks for the report.

As you've noted, VEP currently treats all whitespace in a VCF as
column-separators. This issue will be resolved in a future version of VEP
[1], but for now your solution of translating spaces to underscores is
probably the best.

Regards

Will McLaren
Ensembl Variation

[1] : if you are interested, this is available for beta testing at
https://github.com/willmclaren/ensembl-vep , though note this is currently
only suitable for experienced VEP users or those competent on the UNIX
command line

On 25 October 2016 at 15:58, Noah Reinhardt <noah.reinhardt at sickkids.ca>
wrote:

> Hello,
>
> I have a VCF file that I have annotated with Annovar, and I have been
> experimenting with adding VEP's annotations to this VCF.  I am having some
> difficulty since Annovar will occasionally introduce spaces into the INFO
> field, and VEP is having a hard time handling these.
>
> I am aware that VEP only supports VCF format 4.0 (source:
> http://useast.ensembl.org/info/docs/tools/vep/vep_formats.html), which
> does not allow spaces in the INFO field (alphanumeric string, see
> http://www.internationalgenome.org/wiki/Analysis/vcf4.0/).  Thus,
> Annovar's annotations and VEP are technically incompatible, but I would
> like to explore using both.
>
> I have attached a single-line VCF with very minimal information
> (space.vcf).  In it, one of the annotations in the INFO field contains a
> space ("VT=nonsynonymous SNV").  If I attempt to decompose the VCF and
> annotate it with VEP (see attached script "vep.sh"), VEP will add its
> annotations to a "CSQ" field, but will interpret the space as the end of
> all INFO annotations, and will convert the space into a tab.  Of course,
> this will introduce a new column, which breaks the rest of my tooling.  See
> below:
>
> Original variant:
>
> 1       877831  rs6672356       T       C       1413.77 PASS    VT=*nonsynonymous
> SNV*    GT:AD:DP:GQ:PL  1/1:0,48:48:99:1442,144,0
>
> Annotated variant:
>
> 1       877831  rs6672356       T       C       1413.77 PASS    VT=*nonsynonymous;CSQ=<vep
> info here...>**<new TAB here>**SNV*     GT:AD:DP:GQ:PL
> 1/1:0,48:48:99:1442,144,0
>
>
> VEP introduces a tab before "SNV", which creates a new column.  It appears
> as though VEP is scanning the columns by whitespace (which includes spaces)
> instead of by TAB only.  Are there any solutions you can recommend?  At the
> moment, I am replacing spaces with underscores, but this appears to be an
> inelegant solution.  Any suggestions are appreciated.
>
> Thanks,
> Noah Reinhardt
> Research Trainee in Computational Medicine
> The Hospital for Sick Children
>
> ------------------------------
>
> This e-mail may contain confidential, personal and/or health
> information(information which may be subject to legal restrictions on use,
> retention and/or disclosure) for the sole use of the intended recipient.
> Any review or distribution by anyone other than the person for whom it was
> originally intended is strictly prohibited. If you have received this
> e-mail in error, please contact the sender and delete all copies.
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20161025/4432831d/attachment.html>


More information about the Dev mailing list