[ensembl-dev] VEP plugins for intergenic variants

Genomeo Dev genomeodev at gmail.com
Wed May 7 17:59:19 BST 2014


Thanks Will.

I am working with non-coding and intergenic variants and wanted to run VEP
with the following plugins:

--plugin UpDownDistance,100000 \
--plugin TSSDistance \
--plugin
Condel,/media/sf_D_DRIVE/Projects/Databases/ensembl/Plugins/Condel/config,b
\
--plugin CADD,/media/sf_D_DRIVE/Projects/Databases/CADD/v1.0/1000G.tsv.gz \
--plugin
Gwava,tss,/media/sf_D_DRIVE/Projects/Databases/gwava/gwava_scores.bed.gz \
--plugin Conservation,GERP_CONSERVATION_SCORE,mammals \
--plugin
dbNSFP,/media/sf/data/dbNSFP/dbNSFP2.4.gz,GERP++_NR,GERP++_RS,LRT_score,LRT_pred,MutationTaster_score,MutationTaster_pred,MutationAssessor_score,MutationAssessor_pred,FATHMM_score,FATHMM_pred,RadialSVM_score,RadialSVM_pred,LR_score,LR_pred,Reliability_index,SiPhy_29way_logOdds,Polyphen2_HVAR_score,Polyphen2_HVAR_pred,SIFT_score,SIFT_pred,CADD_raw,CADD_phred


As shown in the output below, apart from CADD.pm and Gwava.pm, no scores
are returned for the others. dbNSFP.pm should  get at least CADD scores
because these exist. As recommended I tried using:

sub feature_types {
    return ['Feature', 'Intergenic'];
}

or

sub feature_types {
   return ['Transcript', 'Intergenic'];
}

in dbNFSP.pm but does not help. When I tried that in TSSDistance.pm I get
this error:

Plugin 'TSSDistance' went wrong: Can't locate object method "transcript"
via package "Bio::EnsEMBL::Variation::IntergenicVariationAllele" at
/media/sf_D_DRIVE/Projects/Databases/ensembl/Plugins//TSSDistance.pm line
56.

For UpDownDistance.pm, it does not seem to work as for instance rs140931361
is 58298 bp from ENSG00000198822 but this is gene is not returned.


OUTPUT:

  ## ENSEMBL VARIANT EFFECT PREDICTOR v75                               ##
Output produced at 2014-05-07 17:28:44                               ##
Connected to homo_sapiens_core_75_37 on ensembldb.ensembl.org
                    ##
Using cache in /media/sf_D_DRIVE/Projects/Databases/ensembl//homo_sapiens/75
                            ##
Using API version 75, DB version 75                               ## sift
version sift5.0.2                                ## polyphen version
2.2.2                                ##
Extra column keys:                                ## BIOTYPE : Biotype of
transcript                               ## CANONICAL : Indicates if
transcript is canonical for this gene                              ##
CELL_TYPE : List of cell types and classifications for regulatory
feature                              ##
CLIN_SIG : Clinical significance of variant from dbSNP
             ##
DISTANCE : Shortest distance from variant to transcript
              ##
DOMAINS : The source and identifer of any overlapping protein domains
                           ##
ENSP : Ensembl protein identifer                               ## EXON :
Exon number(s) / total                               ## HIGH_INF_POS : A
flag indicating if the variant falls in a high information position of the
TFBP                            ## INTRON : Intron number(s) / total
                            ##
MOTIF_NAME : The source and identifier of a transcription factor binding
profile (TFBP) aligned at this position                            ##
MOTIF_POS : The relative position of the variation in the aligned TFBP
                             ##
MOTIF_SCORE_CHANGE : The difference in motif score of the reference and
variant sequences for the TFBP                            ## PUBMED :
Pubmed ID(s) of publications that cite existing variant
              ##
PolyPhen : PolyPhen prediction and/or score                               ##
SIFT : SIFT prediction and/or score                               ## SYMBOL
: Gene symbol (e.g. HGNC)                               ## SYMBOL_SOURCE :
Source of gene symbol                               ## TSSDistance :
Distance from the transcription start site                              ##
Condel : Consensus deleteriousness score for an amino acid substitution
based on SIFT and PolyPhen-2                           ## CADD_RAW : Raw
CADD score                               ## CADD_PHRED : PHRED-like scaled
CADD score                              ## GWAVA : Genome Wide Annotation
of VAriants score (tss model)                             ## Conservation :
The conservation score for this site
(method_link_type="GERP_CONSERVATION_SCORE", species_set="mammals")
                      ##
MutationTaster_score : MutationTaster_score from dbNSFP file
/media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz
               ##
Polyphen2_HVAR_score : Polyphen2_HVAR_score from dbNSFP file
/media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz
               ##
LRT_pred : LRT_pred from dbNSFP file
/media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz
               ##
MutationAssessor_score : MutationAssessor_score from dbNSFP file
/media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz
               ##
FATHMM_pred : FATHMM_pred from dbNSFP file
/media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz
               ##
LR_score : LR_score from dbNSFP file
/media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz
               ##
MutationTaster_pred : MutationTaster_pred from dbNSFP file
/media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz
               ##
SiPhy_29way_logOdds : SiPhy_29way_logOdds from dbNSFP file
/media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz
               ##
CADD_phred : CADD_phred from dbNSFP file
/media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz
               ##
Polyphen2_HVAR_pred : Polyphen2_HVAR_pred from dbNSFP file
/media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz
               ##
RadialSVM_pred : RadialSVM_pred from dbNSFP file
/media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz
               ##
Reliability_index : Reliability_index from dbNSFP file
/media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz
               ##
GERP++_NR : GERP++_NR from dbNSFP file
/media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz
               ##
MutationAssessor_pred : MutationAssessor_pred from dbNSFP file
/media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz
               ##
LRT_score : LRT_score from dbNSFP file
/media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz
               ##
CADD_raw : CADD_raw from dbNSFP file
/media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz
               ##
LR_pred : LR_pred from dbNSFP file
/media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz
                ##
FATHMM_score : FATHMM_score from dbNSFP file
/media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz
               ##
SIFT_score : SIFT_score from dbNSFP file
/media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz
               ##
GERP++_RS : GERP++_RS from dbNSFP file
/media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz
               ##
SIFT_pred : SIFT_pred from dbNSFP file
/media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz
               ##
RadialSVM_score : RadialSVM_score from dbNSFP file
/media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz
#Uploaded_variation Location Allele Existing_variation SYMBOL SYMBOL_SOURCE
Gene ENSP Feature Feature_type BIOTYPE STRAND CANONICAL EXON INTRON DISTANCE
TSSDistance Consequence cDNA_position CDS_position Protein_position
Amino_acids Codons PolyPhen SIFT Condel CELL_TYPE SV PUBMED CLIN_SIG
HIGH_INF_POS MOTIF_NAME MOTIF_POS MOTIF_SCORE_CHANGE TSSDistance CADD_RAW
CADD_PHRED GWAVA Conservation GERP++_NR GERP++_RS LRT_score LRT_pred
MutationTaster_score MutationTaster_pred MutationAssessor_score
MutationAssessor_pred FATHMM_score FATHMM_pred RadialSVM_score
RadialSVM_pred LR_score LR_pred Reliability_index SiPhy_29way_logOdds
Polyphen2_HVAR_score Polyphen2_HVAR_pred SIFT_score SIFT_pred CADD_raw
CADD_phred Extra  rs13247133 7:86199080 A rs13247133 - - - - - - - - - - - -
- intergenic_variant - - - - - - - - - - - - - - - - - -0.25769 2.762 0.11 -
- - - - - - - - - - - - - - - - - - - - - -
CADD_RAW=-0.257691;CADD_PHRED=2.762;GWAVA=0.11  rs13244782 7:86202665 T
rs13244782 - - - - - - - - - - - - - intergenic_variant - - - - - - - - - -
- - - - - - - 1.957591 12.5 0.15 - - - - - - - - - - - - - - - - - - - - - -
- CADD_RAW=1.957591;CADD_PHRED=12.50;GWAVA=0.15  rs12704267 7:86206830 T
rs12704267 - - - - - - - - - - - - - intergenic_variant - - - - - - - - - -
- - - - - - - 0.111018 4.597 0.16 - - - - - - - - - - - - - - - - - - - - -
- - CADD_RAW=0.111018;CADD_PHRED=4.597;GWAVA=0.16  rs140931361
7:86214933-86214937 - rs140931361 - - - - - - - - - - - - -
intergenic_variant - - - - - - - - - - - - - - - - - -0.42024 2.04 - - - - -
- - - - - - - - - - - - - - - - - - - CADD_RAW=-0.420243;CADD_PHRED=2.040
rs34536358 7:86222651 G rs34536358 - - - - - - - - - - - - -
intergenic_variant - - - - - - - - - - - - - - - - - -0.31002 2.524 0.18 - -
- - - - - - - - - - - - - - - - - - - - -
CADD_RAW=-0.310016;CADD_PHRED=2.524;GWAVA=0.18  rs36006360 7:86224933 T
rs36006360 - - - - - - - - - - - - - intergenic_variant - - - - - - - - - -
- - - - - - - 2.513017 14.36 0.36 - - - - - - - - - - - - - - - - - - - - -
- - CADD_RAW=2.513017;CADD_PHRED=14.36;GWAVA=0.36  rs13244678 7:86232583 T
rs13244678 - - - - - - - - - - - - - intergenic_variant - - - - - - - - - -
- - - - - - - -0.52024 1.626 0.05 - - - - - - - - - - - - - - - - - - - - -
- - CADD_RAW=-0.520238;CADD_PHRED=1.626;GWAVA=0.05  rs12704279 7:86238294 T
rs12704279 - - - - - - - - - - - - - intergenic_variant - - - - - - - - - -
- - - - - - - 0.454708 6.469 0.16 - - - - - - - - - - - - - - - - - - - - -
- - CADD_RAW=0.454708;CADD_PHRED=6.469;GWAVA=0.16  rs13228078 7:86240691 C
rs13228078 - - - - - - - - - - - - - intergenic_variant - - - - - - - - - -
- - - - - - - 0.980262 9.002 0.1 - - - - - - - - - - - - - - - - - - - - - -
- CADD_RAW=0.980262;CADD_PHRED=9.002;GWAVA=0.1  rs140931361
7:86214933-86214937 - rs140931361 - - - - - - - - - - - - -
intergenic_variant - - - - - - - - - - - - - - - - - -0.42024 2.04 - - - - -
- - - - - - - - - - - - - - - - - - - CADD_RAW=-0.420243;CADD_PHRED=2.040

Thanks,

G.

On 7 May 2014 16:13, Will McLaren <wm2 at ebi.ac.uk> wrote:

> Hello,
>
> Correct, the plugin was intended to work with the whole_genome_SNVs.tsv
> file, which only contains data for SNVs.
>
> I've modified the plugin so that it should be able to cope with indel data
> files such as you have; please do let me know if you have any problems as
> I've only sparingly tested it on made-up data!
>
> Regards
>
> Will McLaren
> Ensembl Variation
>
>
> On 7 May 2014 15:37, Genomeo Dev <genomeodev at gmail.com> wrote:
>
>> Hi,
>>
>> There seem to be a discrepancy between the CADD score calculated using
>> VEP with the CADD.pm plugin and the tabix direct output:
>>
>> For example using this 1000G variant:
>>
>> #CHROM POS ID REF ALT QUAL FILTER INFO
>> 7 86214932 rs140931361 TTACTC T . PASS .
>>
>> variant_effect_predictor.pl -i input.txt --format vcf --plugin
>> CADD,/media/sf_D_DRIVE/Projects/Databases/CADD/v1.0/1000G.tsv.gz
>> does not return any CADD score
>>
>> whereas
>> $ tabix -p vcf 1000G.tsv.gz 7:86214932-86214932
>> 7 86214932 TTACTC T -0.420243 2.040
>>
>> This seems to affect indels and not SNVs. I could see in the plugin that
>> there is a rule to ignore indels. Any suggestions please how to safely
>> change that?
>>
>> Also, in the plugin, I assume there is a test to ensure the alleles are
>> identical between the input file and the 1000G.tsv.gz file. Is this correct?
>>
>> Thanks.
>>
>> --
>> G.
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>


-- 
G.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20140507/54cf781d/attachment.html>


More information about the Dev mailing list