[ensembl-dev] VEP ignoring SNVs when called alongisde an insertion or deletion
David Parry
D.A.Parry at leeds.ac.uk
Tue Sep 17 10:22:24 BST 2013
Hi,
I apologize if I have misunderstood the caveats given regarding the VCF
input format for the VEP but I am observing unexpected behavior that I
don't think is covered by the documentation. If I provide a multiallelic
variant with both an insertion and a deletion call at the same site the
VEP correctly outputs both consequences. However, if a variant contains
either an insertion or deletion alongside a substitution the VEP ignores
the substitution variant. For example, while the following variant in a
VCF:
6 32634300 . G C,CTA
gives the output:
## ENSEMBL VARIANT EFFECT PREDICTOR v73
## Output produced at 2013-09-17 09:57:41
## Connected to
## Using cache in /home/davidparry/.vep/homo_sapiens/73
## Using API version 73, DB version ?
## Extra column keys:
## DISTANCE : Shortest distance from variant to transcript
#Uploaded_variation Location Allele Gene Feature
Feature_type Consequence cDNA_position CDS_position
Protein_position Amino_acids Codons Existing_variation
Extra
6_32634301_-/-/TA 6:32634300-32634301 TA ENSG00000179344
ENST00000484729 Transcript
frameshift_variant,NMD_transcript_variant,feature_elongation 115-116
84-85 28-29 - - -
6_32634301_-/-/TA 6:32634300-32634301 TA ENSG00000179344
ENST00000399082 Transcript frameshift_variant,feature_elongation
129-130 84-85 28-29 - - -
6_32634301_-/-/TA 6:32634300-32634301 TA ENSG00000179344
ENST00000399084 Transcript frameshift_variant,feature_elongation
263-264 84-85 28-29 - - -
6_32634301_-/-/TA 6:32634300-32634301 TA ENSG00000179344
ENST00000434651 Transcript frameshift_variant,feature_elongation
171-172 84-85 28-29 - - -
6_32634301_-/-/TA 6:32634300-32634301 TA ENSG00000179344
ENST00000399079 Transcript frameshift_variant,feature_elongation
141-142 84-85 28-29 - - -
6_32634301_-/-/TA 6:32634300-32634301 TA ENSG00000179344
ENST00000374943 Transcript frameshift_variant,feature_elongation
161-162 84-85 28-29 - - -
6_32634301_-/-/TA 6:32634300-32634301 TA ENSG00000241287
ENST00000443574 Transcript upstream_gene_variant - -
- - - - DISTANCE=4073
6_32634301_-/-/TA 6:32634300-32634301 TA ENSG00000179344
ENST00000487676 Transcript
non_coding_exon_variant,nc_transcript_variant,feature_elongation
115-116 - - - - -
In this case the substitution variant is ignored and we only get a
consequence for the insertion. Similarly, for a deletion at the same
site as a substitution:
6 32634300 . GTA G,CTA
gives:
## ENSEMBL VARIANT EFFECT PREDICTOR v73
## Output produced at 2013-09-17 09:51:08
## Connected to
## Using cache in /home/davidparry/.vep/homo_sapiens/73
## Using API version 73, DB version ?
## Extra column keys:
## DISTANCE : Shortest distance from variant to transcript
#Uploaded_variation Location Allele Gene Feature
Feature_type Consequence cDNA_position CDS_position
Protein_position Amino_acids Codons Existing_variation
Extra
6_32634301_TA/-/TA 6:32634301-32634302 - ENSG00000179344
ENST00000484729 Transcript
frameshift_variant,NMD_transcript_variant,feature_truncation 114-115
83-84 28 - - -
6_32634301_TA/-/TA 6:32634301-32634302 - ENSG00000179344
ENST00000399082 Transcript frameshift_variant,feature_truncation
128-129 83-84 28 - - -
6_32634301_TA/-/TA 6:32634301-32634302 - ENSG00000179344
ENST00000399084 Transcript frameshift_variant,feature_truncation
262-263 83-84 28 - - -
6_32634301_TA/-/TA 6:32634301-32634302 - ENSG00000179344
ENST00000434651 Transcript frameshift_variant,feature_truncation
170-171 83-84 28 - - -
6_32634301_TA/-/TA 6:32634301-32634302 - ENSG00000179344
ENST00000399079 Transcript frameshift_variant,feature_truncation
140-141 83-84 28 - - -
6_32634301_TA/-/TA 6:32634301-32634302 - ENSG00000179344
ENST00000374943 Transcript frameshift_variant,feature_truncation
160-161 83-84 28 - - -
6_32634301_TA/-/TA 6:32634301-32634302 - ENSG00000241287
ENST00000443574 Transcript upstream_gene_variant - -
- - - - DISTANCE=4074
6_32634301_TA/-/TA 6:32634301-32634302 - ENSG00000179344
ENST00000487676 Transcript
non_coding_exon_variant,nc_transcript_variant,feature_truncation
114-115 - - - - -
...we only get the consequence for the deletion.
Generally I am processing multisample VCF files with VEP and outputting
in VCF format. I want to be able to assess the consequences for a given
sample's genotype but this sometimes fails at sites like this where my
script can't find an allele corresponding to the substitution in the VEP
output. A workaround would be to separate my indel and my substitution
calls before running the VEP, but I wondered whether this is
known/desired behaviour for this tool?
The VEP is a really great tool, so it would be brilliant if there were a
fix for this.
Cheers,
Dave
More information about the Dev
mailing list