[ensembl-dev] VCFtools parsing error in Ensembl Homo sapiens VCF files.

Anja Thormann anja at ebi.ac.uk
Thu Oct 10 14:04:01 BST 2013


Hi Tjaart,

The VCF specification doesn't provide a way for representing variant consequences in a VCF file.
Until the specification contains a way of storing consequence information we decided to store the 
data as a list of strings. We store our consequence data in our VCF files similar to how
we store consequence data in our GVF files. There is a specification for how to store
consequence data in GVF format (http://www.sequenceontology.org/resources/gvf.html).
We are aware that this can cause problems with VCF parsers and we could include changes
for next releases by storing maybe only the most severe consequence for a variant which should be easier
to model with the current VCF specification.
In the meantime you could use the file Homo_sapiens.vcf instead which inlcudes the same data as
the file Homo_sapiens_incl_consequences.vcf except for the consequence information.

Best regards,
Anja

On 10 Oct 2013, at 13:20, Tjaart de Beer wrote:

> Hi,
> 
> I am trying to look for specific rsids in the latest release of human vcf
> files from
> 
> ftp://ftp.ensembl.org/pub/release-73/variation/vcf/homo_sapiens/
> 
> I am using this file
> 
> Homo_sapiens_incl_consequences.vcf.gz
> 
> I installed the latest vcftools (0.1.11) and when I run the following command
> 
> vcftools --vcf Homo_sapiens_incl_consequences.vcf --snps test.dat
> 
> I get this error:
> 
> VCFtools - v0.1.11
> (C) Adam Auton 2009
> 
> Parameters as interpreted:
>        --vcf Homo_sapiens_incl_consequences.vcf
>        --snps test.dat
> 
> Reading Index file.
> Building new index file.
> Error:Unknown Type in INFO meta-information:
> ##INFO=<ID=VE,Number=.,Type=ListOfString,Description="Effect that a
> sequence alteration has on a sequence feature that overlaps
> it.Format=SV|IDX|FT|FID">
> 
> According to the vcftools page, the only valid options for Type is
> Integer, Float, Flag, Character, and String and not ListOfString
> 
> This thread from the vcftools mailing seems to support this that the
> ListOfstring is an invalid option.
> 
> http://sourceforge.net/mailarchive/message.php?msg_id=31150267
> 
> Could this perhaps be a bug in the way the Ensembl vcf files are
> generated? Or am I missing something?
> 
> --
> Dr. Tjaart de Beer
> Thornton group
> European Bioinformatics Institute (EMBL-EBI)
> European Molecular Biology Laboratory
> Wellcome Trust Genome Campus
> Hinxton
> Cambridge CB10 1SD
> United Kingdom
> 
> 
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/





More information about the Dev mailing list