[ensembl-dev] VCFtools parsing error in Ensembl Homo sapiens VCF files.
Anja Thormann
anja at ebi.ac.uk
Thu Oct 10 14:04:01 BST 2013
Hi Tjaart,
The VCF specification doesn't provide a way for representing variant consequences in a VCF file.
Until the specification contains a way of storing consequence information we decided to store the
data as a list of strings. We store our consequence data in our VCF files similar to how
we store consequence data in our GVF files. There is a specification for how to store
consequence data in GVF format (http://www.sequenceontology.org/resources/gvf.html).
We are aware that this can cause problems with VCF parsers and we could include changes
for next releases by storing maybe only the most severe consequence for a variant which should be easier
to model with the current VCF specification.
In the meantime you could use the file Homo_sapiens.vcf instead which inlcudes the same data as
the file Homo_sapiens_incl_consequences.vcf except for the consequence information.
Best regards,
Anja
On 10 Oct 2013, at 13:20, Tjaart de Beer wrote:
> Hi,
>
> I am trying to look for specific rsids in the latest release of human vcf
> files from
>
> ftp://ftp.ensembl.org/pub/release-73/variation/vcf/homo_sapiens/
>
> I am using this file
>
> Homo_sapiens_incl_consequences.vcf.gz
>
> I installed the latest vcftools (0.1.11) and when I run the following command
>
> vcftools --vcf Homo_sapiens_incl_consequences.vcf --snps test.dat
>
> I get this error:
>
> VCFtools - v0.1.11
> (C) Adam Auton 2009
>
> Parameters as interpreted:
> --vcf Homo_sapiens_incl_consequences.vcf
> --snps test.dat
>
> Reading Index file.
> Building new index file.
> Error:Unknown Type in INFO meta-information:
> ##INFO=<ID=VE,Number=.,Type=ListOfString,Description="Effect that a
> sequence alteration has on a sequence feature that overlaps
> it.Format=SV|IDX|FT|FID">
>
> According to the vcftools page, the only valid options for Type is
> Integer, Float, Flag, Character, and String and not ListOfString
>
> This thread from the vcftools mailing seems to support this that the
> ListOfstring is an invalid option.
>
> http://sourceforge.net/mailarchive/message.php?msg_id=31150267
>
> Could this perhaps be a bug in the way the Ensembl vcf files are
> generated? Or am I missing something?
>
> --
> Dr. Tjaart de Beer
> Thornton group
> European Bioinformatics Institute (EMBL-EBI)
> European Molecular Biology Laboratory
> Wellcome Trust Genome Campus
> Hinxton
> Cambridge CB10 1SD
> United Kingdom
>
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
More information about the Dev
mailing list