[ensembl-dev] Possible problems using the Variant Effect Predictor

Philip Appleby p.appleby at dundee.ac.uk
Thu Apr 3 11:01:16 BST 2014


Hi Will,


Thanks for your response.


I ran the VEP separately to produce an updated vcf and then tried the Gemini load.


The following is the content of the .sh wrapper I used:


#!/bin/sh
export DATADIR=/home/hicadmin/data/cpgenome
export BINDIR=/home/hicadmin/src/ensembl-tools/scripts/variant_effect_predictor
perl ${BINDIR}/variant_effect_predictor.pl -i ${DATADIR}/$1.vcf \
   --cache \
   --sift b \
   --polyphen b \
   --symbol \
   --numbers \
   --total_length \
   -o ${DATADIR}/$1_vep.vcf \
   --vcf \
   --fields Consequence,Codons,Amino_acids,Gene,SYMBOL,Feature,EXON,PolyPhen,SIFT,Protein_position

And the requested CSQ info line:

##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence type as predicted by VEP. Format: Consequence|Codons|Amino_acids|Gene|SYMBOL|Feature|EXON|PolyPhen|SIFT|Protein_position">

Looking at original vcf file that string is present, so that's where it originates, so I think I have run VEP against an already-annotated file, meaning I need to learn more and investigate more before posting again!

The original file had this:

##INFO=<ID=CSQ,Number=A,Type=String,Description="Consequence type as predicted by VEP. Format: Allele|Gene|Feature|Feature_type|Consequence|cDNA_position|CDS_position|Protein_position|Amino_acids|Codons|Existing_variation|EXON|INTRON|HGNC|MOTIF_NAME|MOTIF_POS|HIGH_INF_POS|MOTIF_SCORE_CHANGE|DISTANCE|CANONICAL|SIFT|PolyPhen|GMAF|ENSP|DOMAINS|CCDS|HGVSc|HGVSp|CELL_TYPE">​

Thanks,
Phil

Phil Appleby
Programmer
Health Informatics Centre (HIC)
University of Dundee
Mackenzie Building
Ninewells
DD2 4BF
+44 (0) 1382 383971
________________________________
From: dev-bounces at ensembl.org <dev-bounces at ensembl.org> on behalf of Will McLaren <wm2 at ebi.ac.uk>
Sent: 03 April 2014 10:44
To: Ensembl developers list
Subject: Re: [ensembl-dev] Possible problems using the Variant Effect Predictor

Hi Phil,

I don't believe that the string "    infant  death   syndrome,       association  " is being added to the output by the VEP - there aren't any options in the core VEP code that could add such a string to the output.

I don't know exactly what Gemini does but it may be that it is adding phenotype-based information somehow, possibly using a VEP plugin?

Could you send me the header line from your results file that contains the CSQ INFO description? It will look something like:

##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence type as predicted by VEP. Format: Allele|Gene|Feature|Feature_type|Consequence|cDNA_position|CDS_position|Protein_position|Amino_acids|Codons|Existing_variation|DISTANCE|STRAND">

I'd suggest contacting the Gemini developers also if you haven't already.

Regards

Will McLaren
Ensembl Variation


On 3 April 2014 10:32, Philip Appleby <p.appleby at dundee.ac.uk<mailto:p.appleby at dundee.ac.uk>> wrote:
Hi,

This question relates to problems found in while using the script 'variant_effect_predictor.pl<http://variant_effect_predictor.pl>', version 75.

I have been experimenting with annotating a VCF file (output from a personal genome sequencing run on an Illumina HiSeq 2000 machine) for loading into Gemini (http://gemini.readthedocs.org/en/latest/index.html) using VEP as recommended in the Gemini documentation.

I find that the Gemini parse and load fails with an error seemingly due to tabs embedded in one of the annotation strings (where the string "    infant  death   syndrome,       association  " appears meaning that the simple split('\t') statement in the parser is fooled.

I have reproduced one of the offending lines below, my apologies for not knowing more about VEP, it's my first time using I don't understand the vertical bar delimited text yet.

chr18   907710  rs2856966       A       G       108     PASS    VARTYPE_SNV;hgmd_alleles=A/G;hgmd_id=CM092913;hgmd_disease=Sudden;CSQ=upstream_gene_variant|||ENSG00000265179|RP11-672L10.2|ENST00000582921||||,missense_variant|gAt/gGt|D/G|ENSG00000141433|ADCYAP1|ENST00000579794|2/4|benign(0.001)|tolerated(0.33)|54/176,missense_variant|gAt/gGt|D/G|ENSG00000141433|ADCYAP1|ENST00000450565|3/5|benign(0.001)|tolerated(0.33)|54/176,upstream_gene_variant|||ENSG00000265179|RP11-672L10.2|ENST00000580612||||,upstream_gene_variant|||ENSG00000265179|RP11-672L10.2|ENST00000577358||||,upstream_gene_variant|||ENSG00000265179|RP11-672L10.2|ENST00000581719||||,upstream_gene_variant|||ENSG00000265671|RP11-672L10.3|ENST00000582554||||,non_coding_exon_variant&nc_transcript_variant|||ENSG00000141433|ADCYAP1|ENST00000269200|1/3|||,upstream_gene_variant|||ENSG00000141433|ADCYAP1|ENST00000581602||||  infant  death   syndrome,       association     with;hgmd_gene=ADCYAP1;AA=A;EUR_AF=G:0.22;AMR_AF=G:0.17;AF=G:0.15;AFR_AF=G:0.11;ASN_AF=G:0.06;CSQ=G||NM_001099733.1|Transcript|missense_variant|280|161|54|D/G|gAt/gGt||3/5||ADCYAP1||||||YES|tolerated(0.6)|benign(0.001)|G:0.147|NP_001093203.1|||NM_001099733.1:c.161A>G|NP_001093203.1:p.Asp54Gly|,G||NM_001117.3|Transcript|missense_variant|252|161|54|D/G|gAt/gGt||2/4||ADCYAP1|||||||tolerated(0.6)|benign(0.001)|G:0.147|NP_001108.2|||NM_001117.3:c.161A>G|NP_001108.2:p.Asp54Gly|    GT:GQX:DPU:DPF:AU       0/1:108:25:2:11,0,10,0

I was wondering if you'd seen this kind of problem before.

Thanks
Phil

Phil Appleby
Programmer
Health Informatics Centre (HIC)
University of Dundee
Mackenzie Building
Ninewells
DD2 4BF
+44 (0) 1382 383971<tel:%2B44%20%280%29%201382%20383971>

The University of Dundee is a registered Scottish Charity, No: SC015096



Phil Appleby
Programmer
Health Informatics Centre (HIC)
University of Dundee
Mackenzie Building
Ninewells
DD2 4BF
+44 (0) 1382 383971<tel:%2B44%20%280%29%201382%20383971>

The University of Dundee is a registered Scottish Charity, No: SC015096

_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/



The University of Dundee is a registered Scottish Charity, No: SC015096
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20140403/05707265/attachment.html>


More information about the Dev mailing list