[ensembl-dev] Variation Effect Predictor 2.5 - Release 67

Ricardo Parolin Schnekenberg ricardos at well.ox.ac.uk
Mon May 14 12:21:09 BST 2012


Hello,

I am trying to annotate some vcf files called from whole exome sequencing
data using the latest version of VEP 2.5 and release 67 downloaded cache
(with polyphen and sift).

The issue is that I am not getting my gene names populated as well as
other information.

So with this command:

perl /home/ricardos/variant_effect_predictor.pl -i 990_06.vcf --format vcf
--verbose --hgnc --protein --most_severe --no_intergenic --coding_only -o
990_06.vep  --force_overwrite --cache --sift=b --polyphen=b

I get this output (no matter how much I play around with --gene --cache
--offline --registry_file [we also have a mysql server with release 67]):


## ENSEMBL VARIANT EFFECT PREDICTOR v2.5
## Output produced at 2012-05-14 12:13:40
## Connected to homo_sapiens_core_67_37 on ensembldb.ensembl.org
## Using cache in /home/ricardos/.vep/homo_sapiens/67
## Using API version 67, DB version 67
## Extra column keys:
## CANONICAL    : Indicates if transcript is canonical for this gene
## CCDS         : Indicates if transcript is a CCDS transcript
## HGNC         : HGNC gene identifier
## ENSP         : Ensembl protein identifer
## HGVSc        : HGVS coding sequence name
## HGVSp        : HGVS protein sequence name
## SIFT         : SIFT prediction
## PolyPhen     : PolyPhen prediction
## EXON         : Exon number
## INTRON       : Intron number
## DOMAINS      : The source and identifer of any overlapping protein domains
## MOTIF_NAME   : The source and identifier of a transcription factor
binding profile (TFBP) aligned at this position
## MOTIF_POS    : The relative position of the variation in the aligned TFBP
## HIGH_INF_POS : A flag indicating if the variant falls in a high
information position of the TFBP
## MOTIF_SCORE_CHANGE : The difference in motif score of the reference and
variant sequences for the TFBP
## CELL_TYPE    : List of cell types and classifications for regulatory
feature
## IND          : Individual name
## SV           : IDs of overlapping structural variants
## FREQS        : Frequencies of overlapping variants used in filtering
#Uploaded_variation     Location        Allele  Gene    Feature
Feature_type    Consequence     cDNA_position   CDS_position   
Protein_position        Amino_acids     Codons  Existing_variation     
Extra
1_14653_C/T     1:14653 -       -       -       -       INTRONIC        - 
     -       -       -       -       -
1_14907_A/G     1:14907 -       -       -       -       INTRONIC        - 
     -       -       -       -       -
1_14930_A/G     1:14930 -       -       -       -       INTRONIC        - 
     -       -       -       -       -
1_15118_A/G     1:15118 -       -       -       -       INTRONIC        - 
     -       -       -       -       -
1_15211_T/G     1:15211 -       -       -       -       INTRONIC        - 
     -       -       -       -       -
1_17538_C/A     1:17538 -       -       -       -       INTRONIC        - 
     -       -       -       -       -
1_63736_CTA/-   1:63736-63738   -       -       -       -      
WITHIN_NON_CODING_GENE  -       -       -       -       -       -
1_69270_A/G     1:69270 -       -       -       -       SYNONYMOUS_CODING 
     -       -       -       -       -       -
1_69511_A/G     1:69511 -       -       -       -      
NON_SYNONYMOUS_CODING   -       -       -       -       -       -
1_69897_T/C     1:69897 -       -       -       -       SYNONYMOUS_CODING 
     -       -       -       -       -       -
1_5021170_A/G   1:5021170       -       -       -       -       INTERGENIC
     -       -       -       -       -       -
1_5025179_A/G   1:5025179       -       -       -       -       INTERGENIC
     -       -       -       -       -       -
1_5206445_G/A   1:5206445       -       -       -       -       INTERGENIC
     -       -       -       -       -       -


Obviously what I am trying to achieve is this:

#Uploaded_variation     Location        Allele  Gene    Feature
Feature_type    Consequence     cDNA_position   CDS_position   
Protein_position        Amino_acids     Codons  Existing_variation     
Extra
2_69565157_C/T  2:69565157      T       ENSG00000198380 ENST00000361060
Transcript      NON_SYNONYMOUS_CODING   1478    1301    434     R/H    
cGt/cAt -
       PolyPhen=benign(0.028);Condel=neutral(0.407);SIFT=deleterious(0.01);HGNC=GFPT1
2_69565157_C/T  2:69565157      T       ENSG00000252250 ENST00000516441
Transcript      DOWNSTREAM      -       -       -       -       -       - 
     -
2_69565157_C/T  2:69565157      T       ENSG00000198380 ENST00000357308
Transcript      NON_SYNONYMOUS_CODING   1534    1355    452     R/H    
cGt/cAt -
       PolyPhen=benign(0.014);Condel=neutral(0.406);SIFT=deleterious(0.01);HGNC=GFPT1
2_69565693_C/T  2:69565693      T       ENSG00000198380 ENST00000361060
Transcript      NON_SYNONYMOUS_CODING   1331    1154    385     R/H    
cGt/cAt -
       PolyPhen=probably_damaging(1);Condel=deleterious(0.945);SIFT=deleterious(0);HGNC=GFPT1
2_69565693_C/T  2:69565693      T       ENSG00000252250 ENST00000516441
Transcript      DOWNSTREAM      -       -       -       -       -       - 
     -
2_69565693_C/T  2:69565693      T       ENSG00000198380 ENST00000357308
Transcript      NON_SYNONYMOUS_CODING   1387    1208    403     R/H    
cGt/cAt -
       PolyPhen=probably_damaging(1);Condel=deleterious(0.945);SIFT=deleterious(0);HGNC=GFPT1


So, what am I doing wrong? It seems that even though I used the command
--no_intergenic and --coding_only it still outputs intronic and intergenic
variants.

help is appreciated!

Thanks!
-- 
Ricardo Parolin Schnekenberg
Genomics Research
Wellcome Trust Centre for Human Genetics
University of Oxford





More information about the Dev mailing list