[ensembl-dev] Variation Effect Predictor 2.5 - Release 67

Will McLaren wm2 at ebi.ac.uk
Mon May 14 13:07:44 BST 2012


Hello,

This is because you are using the --most_severe flag.

What this does is establish the most severe consequence for each
variant across all genes/transcripts, and outputs ONLY the consequence
type - the documentation states that transcript-specific columns are
left blank.

Using --most_severe takes precedence over --coding_only and
--no_intergenic - I should change the script to warn about this!

Double check what it is that you want from the output; it may be that
using --per_gene (which outputs the most severe consequence per gene
amongst all of the transcripts in that gene) would suit your use case.

However, I would always recommend that you output all of the
consequence types as a matter of course, since otherwise you are
relying on our designation of what is the most severe, and you may
miss something that might be biologically significant.

Hope this helps!

Cheers

Will



On 14 May 2012 12:21, Ricardo Parolin Schnekenberg
<ricardos at well.ox.ac.uk> wrote:
> Hello,
>
> I am trying to annotate some vcf files called from whole exome sequencing
> data using the latest version of VEP 2.5 and release 67 downloaded cache
> (with polyphen and sift).
>
> The issue is that I am not getting my gene names populated as well as
> other information.
>
> So with this command:
>
> perl /home/ricardos/variant_effect_predictor.pl -i 990_06.vcf --format vcf
> --verbose --hgnc --protein --most_severe --no_intergenic --coding_only -o
> 990_06.vep  --force_overwrite --cache --sift=b --polyphen=b
>
> I get this output (no matter how much I play around with --gene --cache
> --offline --registry_file [we also have a mysql server with release 67]):
>
>
> ## ENSEMBL VARIANT EFFECT PREDICTOR v2.5
> ## Output produced at 2012-05-14 12:13:40
> ## Connected to homo_sapiens_core_67_37 on ensembldb.ensembl.org
> ## Using cache in /home/ricardos/.vep/homo_sapiens/67
> ## Using API version 67, DB version 67
> ## Extra column keys:
> ## CANONICAL    : Indicates if transcript is canonical for this gene
> ## CCDS         : Indicates if transcript is a CCDS transcript
> ## HGNC         : HGNC gene identifier
> ## ENSP         : Ensembl protein identifer
> ## HGVSc        : HGVS coding sequence name
> ## HGVSp        : HGVS protein sequence name
> ## SIFT         : SIFT prediction
> ## PolyPhen     : PolyPhen prediction
> ## EXON         : Exon number
> ## INTRON       : Intron number
> ## DOMAINS      : The source and identifer of any overlapping protein domains
> ## MOTIF_NAME   : The source and identifier of a transcription factor
> binding profile (TFBP) aligned at this position
> ## MOTIF_POS    : The relative position of the variation in the aligned TFBP
> ## HIGH_INF_POS : A flag indicating if the variant falls in a high
> information position of the TFBP
> ## MOTIF_SCORE_CHANGE : The difference in motif score of the reference and
> variant sequences for the TFBP
> ## CELL_TYPE    : List of cell types and classifications for regulatory
> feature
> ## IND          : Individual name
> ## SV           : IDs of overlapping structural variants
> ## FREQS        : Frequencies of overlapping variants used in filtering
> #Uploaded_variation     Location        Allele  Gene    Feature
> Feature_type    Consequence     cDNA_position   CDS_position
> Protein_position        Amino_acids     Codons  Existing_variation
> Extra
> 1_14653_C/T     1:14653 -       -       -       -       INTRONIC        -
>     -       -       -       -       -
> 1_14907_A/G     1:14907 -       -       -       -       INTRONIC        -
>     -       -       -       -       -
> 1_14930_A/G     1:14930 -       -       -       -       INTRONIC        -
>     -       -       -       -       -
> 1_15118_A/G     1:15118 -       -       -       -       INTRONIC        -
>     -       -       -       -       -
> 1_15211_T/G     1:15211 -       -       -       -       INTRONIC        -
>     -       -       -       -       -
> 1_17538_C/A     1:17538 -       -       -       -       INTRONIC        -
>     -       -       -       -       -
> 1_63736_CTA/-   1:63736-63738   -       -       -       -
> WITHIN_NON_CODING_GENE  -       -       -       -       -       -
> 1_69270_A/G     1:69270 -       -       -       -       SYNONYMOUS_CODING
>     -       -       -       -       -       -
> 1_69511_A/G     1:69511 -       -       -       -
> NON_SYNONYMOUS_CODING   -       -       -       -       -       -
> 1_69897_T/C     1:69897 -       -       -       -       SYNONYMOUS_CODING
>     -       -       -       -       -       -
> 1_5021170_A/G   1:5021170       -       -       -       -       INTERGENIC
>     -       -       -       -       -       -
> 1_5025179_A/G   1:5025179       -       -       -       -       INTERGENIC
>     -       -       -       -       -       -
> 1_5206445_G/A   1:5206445       -       -       -       -       INTERGENIC
>     -       -       -       -       -       -
>
>
> Obviously what I am trying to achieve is this:
>
> #Uploaded_variation     Location        Allele  Gene    Feature
> Feature_type    Consequence     cDNA_position   CDS_position
> Protein_position        Amino_acids     Codons  Existing_variation
> Extra
> 2_69565157_C/T  2:69565157      T       ENSG00000198380 ENST00000361060
> Transcript      NON_SYNONYMOUS_CODING   1478    1301    434     R/H
> cGt/cAt -
>       PolyPhen=benign(0.028);Condel=neutral(0.407);SIFT=deleterious(0.01);HGNC=GFPT1
> 2_69565157_C/T  2:69565157      T       ENSG00000252250 ENST00000516441
> Transcript      DOWNSTREAM      -       -       -       -       -       -
>     -
> 2_69565157_C/T  2:69565157      T       ENSG00000198380 ENST00000357308
> Transcript      NON_SYNONYMOUS_CODING   1534    1355    452     R/H
> cGt/cAt -
>       PolyPhen=benign(0.014);Condel=neutral(0.406);SIFT=deleterious(0.01);HGNC=GFPT1
> 2_69565693_C/T  2:69565693      T       ENSG00000198380 ENST00000361060
> Transcript      NON_SYNONYMOUS_CODING   1331    1154    385     R/H
> cGt/cAt -
>       PolyPhen=probably_damaging(1);Condel=deleterious(0.945);SIFT=deleterious(0);HGNC=GFPT1
> 2_69565693_C/T  2:69565693      T       ENSG00000252250 ENST00000516441
> Transcript      DOWNSTREAM      -       -       -       -       -       -
>     -
> 2_69565693_C/T  2:69565693      T       ENSG00000198380 ENST00000357308
> Transcript      NON_SYNONYMOUS_CODING   1387    1208    403     R/H
> cGt/cAt -
>       PolyPhen=probably_damaging(1);Condel=deleterious(0.945);SIFT=deleterious(0);HGNC=GFPT1
>
>
> So, what am I doing wrong? It seems that even though I used the command
> --no_intergenic and --coding_only it still outputs intronic and intergenic
> variants.
>
> help is appreciated!
>
> Thanks!
> --
> Ricardo Parolin Schnekenberg
> Genomics Research
> Wellcome Trust Centre for Human Genetics
> University of Oxford
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/




More information about the Dev mailing list