[ensembl-dev] VEP 93 some string fields look like numbers in the json output

Michael Yourshaw myourshaw at gmail.com
Tue Dec 18 23:46:12 GMT 2018


I'm using VEP version 93, locally installed with the cache.

This VCF input in the NANS gene:
        chr9 98056788 rs3199064 T G ...

yields the VEP json output:
        ...,"gene_symbol":NaN,...

and fails a python script, which expects a string, because an unquoted NaN
is treated as a float and casts to the string "nan".

A similar problem can happen when the transcript_id looks like a number:
        chr9 98056788 rs3199064 T G
yields
        ...,"transcript_id":3494,...
In this latter case, casting transcript_id as a string works, but the need
to cast is a trap for the unwary.

The json writer seems to be too clever for its own good, as values that
don't look like numbers, such as "ENST00000495319" and "IGHA2" have the
expected double-quoting in the json.

Indeed it seems odd that NANS becomes float NaN, whereas NANP and other
NAN* genes remain proper json strings.

ॐ
Michael Yourshaw
myourshaw at gmail.com <myourshaw at ucla.edu>

This message is intended only for the use of the addressee and may contain
information that is PRIVILEGED and CONFIDENTIAL, and/or may contain
ATTORNEY WORK PRODUCT. If you are not the intended recipient, you are
hereby notified that any dissemination of this communication is strictly
prohibited. If you have received this communication in error, please erase
all copies of the message and its attachments and notify us immediately.
Thank you.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20181218/9b64b505/attachment.html>


More information about the Dev mailing list