[ensembl-dev] VEP ignoring SNVs when called alongisde an insertion or deletion

David Parry D.A.Parry at leeds.ac.uk
Tue Sep 17 10:22:24 BST 2013


Hi,

I apologize if I have misunderstood the caveats given regarding the VCF 
input format for the VEP but I am observing unexpected behavior that I 
don't think is covered by the documentation. If I provide a multiallelic 
variant with both an insertion and a deletion call at the same site the 
VEP correctly outputs both consequences. However, if a variant contains 
either an insertion or deletion alongside a substitution the VEP ignores 
the substitution variant.  For example, while the following variant in a 
VCF:

6       32634300        .       G       C,CTA

gives the output:

## ENSEMBL VARIANT EFFECT PREDICTOR v73
## Output produced at 2013-09-17 09:57:41
## Connected to
## Using cache in /home/davidparry/.vep/homo_sapiens/73
## Using API version 73, DB version ?
## Extra column keys:
## DISTANCE : Shortest distance from variant to transcript
#Uploaded_variation     Location        Allele  Gene    Feature 
Feature_type    Consequence     cDNA_position   CDS_position    
Protein_position        Amino_acids     Codons  Existing_variation      
Extra
6_32634301_-/-/TA       6:32634300-32634301     TA      ENSG00000179344 
ENST00000484729 Transcript      
frameshift_variant,NMD_transcript_variant,feature_elongation    115-116 
84-85   28-29   -       -       -
6_32634301_-/-/TA       6:32634300-32634301     TA      ENSG00000179344 
ENST00000399082 Transcript      frameshift_variant,feature_elongation   
129-130 84-85   28-29   -       -       -
6_32634301_-/-/TA       6:32634300-32634301     TA      ENSG00000179344 
ENST00000399084 Transcript      frameshift_variant,feature_elongation   
263-264 84-85   28-29   -       -       -
6_32634301_-/-/TA       6:32634300-32634301     TA      ENSG00000179344 
ENST00000434651 Transcript      frameshift_variant,feature_elongation   
171-172 84-85   28-29   -       -       -
6_32634301_-/-/TA       6:32634300-32634301     TA      ENSG00000179344 
ENST00000399079 Transcript      frameshift_variant,feature_elongation   
141-142 84-85   28-29   -       -       -
6_32634301_-/-/TA       6:32634300-32634301     TA      ENSG00000179344 
ENST00000374943 Transcript      frameshift_variant,feature_elongation   
161-162 84-85   28-29   -       -       -
6_32634301_-/-/TA       6:32634300-32634301     TA      ENSG00000241287 
ENST00000443574 Transcript      upstream_gene_variant   -       -       
-       -       -       -       DISTANCE=4073
6_32634301_-/-/TA       6:32634300-32634301     TA      ENSG00000179344 
ENST00000487676 Transcript      
non_coding_exon_variant,nc_transcript_variant,feature_elongation        
115-116 -       -       -       -  -

In this case the substitution variant is ignored and we only get a 
consequence for the insertion.  Similarly, for a deletion at the same 
site as a substitution:

6       32634300        .       GTA     G,CTA

gives:

## ENSEMBL VARIANT EFFECT PREDICTOR v73
## Output produced at 2013-09-17 09:51:08
## Connected to
## Using cache in /home/davidparry/.vep/homo_sapiens/73
## Using API version 73, DB version ?
## Extra column keys:
## DISTANCE : Shortest distance from variant to transcript
#Uploaded_variation     Location        Allele  Gene    Feature 
Feature_type    Consequence     cDNA_position   CDS_position    
Protein_position        Amino_acids     Codons  Existing_variation      
Extra
6_32634301_TA/-/TA      6:32634301-32634302     -       ENSG00000179344 
ENST00000484729 Transcript      
frameshift_variant,NMD_transcript_variant,feature_truncation    114-115 
83-84   28      -       -       -
6_32634301_TA/-/TA      6:32634301-32634302     -       ENSG00000179344 
ENST00000399082 Transcript      frameshift_variant,feature_truncation   
128-129 83-84   28      -       -       -
6_32634301_TA/-/TA      6:32634301-32634302     -       ENSG00000179344 
ENST00000399084 Transcript      frameshift_variant,feature_truncation   
262-263 83-84   28      -       -       -
6_32634301_TA/-/TA      6:32634301-32634302     -       ENSG00000179344 
ENST00000434651 Transcript      frameshift_variant,feature_truncation   
170-171 83-84   28      -       -       -
6_32634301_TA/-/TA      6:32634301-32634302     -       ENSG00000179344 
ENST00000399079 Transcript      frameshift_variant,feature_truncation   
140-141 83-84   28      -       -       -
6_32634301_TA/-/TA      6:32634301-32634302     -       ENSG00000179344 
ENST00000374943 Transcript      frameshift_variant,feature_truncation   
160-161 83-84   28      -       -       -
6_32634301_TA/-/TA      6:32634301-32634302     -       ENSG00000241287 
ENST00000443574 Transcript      upstream_gene_variant   -       -       
-       -       -       -       DISTANCE=4074
6_32634301_TA/-/TA      6:32634301-32634302     -       ENSG00000179344 
ENST00000487676 Transcript      
non_coding_exon_variant,nc_transcript_variant,feature_truncation        
114-115 -       -       -       -  -

...we only get the consequence for the deletion.

Generally I am processing multisample VCF files with VEP and outputting 
in VCF format.  I want to be able to assess the consequences for a given 
sample's genotype but this sometimes fails at sites like this where my 
script can't find an allele corresponding to the substitution in the VEP 
output.  A workaround would be to separate my indel and my substitution 
calls before running the VEP, but I wondered whether this is 
known/desired behaviour for this tool?

The VEP is a really great tool, so it would be brilliant if there were a 
fix for this.

Cheers,

Dave




More information about the Dev mailing list