[ensembl-dev] VEP incorrectly detects pileup, if the first line of a VCF is missing an ID

Will McLaren wm2 at ebi.ac.uk
Thu Oct 6 11:36:01 BST 2016


Hi Cyriac,

Thanks for the report.

Obviously a quick solution for now is to specify the input format with
"--format vcf". We'll get the format detection fixed in a future version.

Regards

Will McLaren
Ensembl Variation

On 5 October 2016 at 19:17, Cyriac Kandoth <kandothc at mskcc.org> wrote:

> Hi Dev,
>
> This exception shouldn't ever happen because VCF specs require that the ID
> field always be non-empty. Specifically "If there is no identifier
> available, then the missing value '.' should be used".
>
> However, we all deal with crappy VCFs, so its worthwhile to support it in
> code, or least fail gracefully. Attached is a sample VCF, and here is the
> command I used to reproduce the bug:
>
> /home/kandoth/perl/perl-5.22.2/bin/perl /home/kandoth/vep/variant_
> effect_predictor.pl --species homo_sapiens --assembly GRCh37 --offline
> --no_progress --no_stats --sift b --ccds --uniprot --hgvs --symbol
> --numbers --domains --gene_phenotype --canonical --protein --biotype
> --uniprot --polyphen b --gmaf --maf_1kg --maf_esp --regulatory --tsl
> --pubmed --variant_class --shift_hgvs 1 --check_existing --total_length
> --allele_number --no_escape --xref_refseq --failed 1 --vcf --minimal
> --flag_pick_allele --pick_order canonical,tsl,biotype,rank,ccds,length
> --dir /home/kandoth/.vep --fasta /home/kandoth/.vep/homo_
> sapiens/85_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz
> --input_file test.vcf --output_file test.vep.vcf
>
> This is the terminal output I get:
>
> 2016-10-05 18:01:37 - Read existing cache info
> 2016-10-05 18:01:37 - Starting...
> 2016-10-05 18:01:37 - *Detected format of input file as pileup*
> WARNING: Length of reference allele (A length 1) does not match
> co-ordinates 112175600-112175599 on line 3
> 2016-10-05 18:01:37 - Read 2 variants into buffer
> 2016-10-05 18:01:37 - Checking for existing variations
> 2016-10-05 18:01:37 - Reading transcript data from cache and/or database
> 2016-10-05 18:01:38 - Retrieved 653 transcripts (0 mem, 653 cached, 0 DB,
> 0 duplicates)
> 2016-10-05 18:01:38 - Reading regulatory data from cache and/or database
> 2016-10-05 18:01:38 - Retrieved 1940 regulatory features (0 mem, 1940
> cached, 0 DB, 0 duplicates)
> 2016-10-05 18:01:38 - Analyzing chromosome 12
> 2016-10-05 18:01:38 - Analyzing variants
> 2016-10-05 18:01:38 - Analyzing RegulatoryFeatures
> 2016-10-05 18:01:38 - Analyzing MotifFeatures
> 2016-10-05 18:01:38 - Calculating consequences
> Use of uninitialized value in string eq at /home/kandoth/vep/Bio/EnsEMBL/
> Variation/TranscriptVariationAllele.pm line 1269.
> Use of uninitialized value in uc at /home/kandoth/vep/Bio/SeqUtils.pm
> line 290.
> Use of uninitialized value in string eq at /home/kandoth/vep/Bio/EnsEMBL/
> Variation/TranscriptVariationAllele.pm line 1269.
> Use of uninitialized value in uc at /home/kandoth/vep/Bio/SeqUtils.pm
> line 290.
> Use of uninitialized value in string eq at /home/kandoth/vep/Bio/EnsEMBL/
> Variation/TranscriptVariationAllele.pm line 1269.
> Use of uninitialized value in uc at /home/kandoth/vep/Bio/SeqUtils.pm
> line 290.
> Use of uninitialized value in string eq at /home/kandoth/vep/Bio/EnsEMBL/
> Variation/TranscriptVariationAllele.pm line 1269.
> Use of uninitialized value in uc at /home/kandoth/vep/Bio/SeqUtils.pm
> line 290.
> 2016-10-05 18:01:44 - Analyzing chromosome 17
> 2016-10-05 18:01:44 - Analyzing variants
> 2016-10-05 18:01:44 - Analyzing RegulatoryFeatures
> 2016-10-05 18:01:44 - Analyzing MotifFeatures
> 2016-10-05 18:01:44 - Calculating consequences
> Use of uninitialized value in string eq at /home/kandoth/vep/Bio/EnsEMBL/
> Variation/TranscriptVariationAllele.pm line 1269.
> Use of uninitialized value in uc at /home/kandoth/vep/Bio/SeqUtils.pm
> line 290.
> Use of uninitialized value in string eq at /home/kandoth/vep/Bio/EnsEMBL/
> Variation/TranscriptVariationAllele.pm line 1269.
> Use of uninitialized value in uc at /home/kandoth/vep/Bio/SeqUtils.pm
> line 290.
> Use of uninitialized value in string eq at /home/kandoth/vep/Bio/EnsEMBL/
> Variation/TranscriptVariationAllele.pm line 1269.
> Use of uninitialized value in uc at /home/kandoth/vep/Bio/SeqUtils.pm
> line 290.
> Use of uninitialized value in string eq at /home/kandoth/vep/Bio/EnsEMBL/
> Variation/TranscriptVariationAllele.pm line 1269.
> Use of uninitialized value in uc at /home/kandoth/vep/Bio/SeqUtils.pm
> line 290.
> Use of uninitialized value in string eq at /home/kandoth/vep/Bio/EnsEMBL/
> Variation/TranscriptVariationAllele.pm line 1269.
> Use of uninitialized value in uc at /home/kandoth/vep/Bio/SeqUtils.pm
> line 290.
> Use of uninitialized value in string eq at /home/kandoth/vep/Bio/EnsEMBL/
> Variation/TranscriptVariationAllele.pm line 1269.
> Use of uninitialized value in uc at /home/kandoth/vep/Bio/SeqUtils.pm
> line 290.
> 2016-10-05 18:01:52 - Processed 2 total variants (0 vars/sec, 0 vars/sec
> total)
> 2016-10-05 18:01:52 - See test.vep.vcf_warnings.txt for details of 1
> warnings
> 2016-10-05 18:01:52 - Finished!
>
> The output has errors too. Basically, it thinks that the rsIDs are the
> reference alleles.
>
> ~Cyriac
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20161006/8ee4f212/attachment.html>


More information about the Dev mailing list