[ensembl-dev] VEP plugins for intergenic variants

Genomeo Dev genomeodev at gmail.com
Tue May 27 11:03:55 BST 2014


Sorry seems the plug-in already does that thanks!

G.


On 23 May 2014 19:14, Genomeo Dev <genomeodev at gmail.com> wrote:

> Hi Will,
>
> Thanks very much. That worked nicely.
>
> I am working with a set of variants within a locus where I know that they
> are LD-independent with other genes from outside this locus. Therefore, I
> want only to focus on genes inside this physically defined locus.
>
> Rarely do these variants fall exactly at the centre of the locus so
> distances to the right and left boundaries are not equal. Would it be
> possible to alter UpDownDistance.pm to be able to specify a start and end
> coordinate within which VEP should be constrained instead of the current
> distance cutoff?
>
> Many thanks,
>
> G.
>
>
> On 8 May 2014 16:12, Will McLaren <wm2 at ebi.ac.uk> wrote:
>
>> Hello again,
>>
>> I've fixed a bug that prevented UpDownDistance functioning correctly - it
>> hadn't been tested with larger distances such as you specified which broke
>> some assumptions in the core VEP code.
>> You will need to update your ensembl-variation module or re-run the VEP
>> INSTALL.pl script to pick up the new API code.
>>
>> As far as the other plugins go, I think you are misunderstanding how some
>> of them work:
>>
>> TSSDistance - this gives the distance between a variant and the annotated
>> transcript start site. If a variant is annotated as intergenic, there is no
>> transcript to give the distance to! Changing the code to force it to assess
>> intergenic variants will of course break here. Of course if you alter the
>> up/down-stream distance using UpDownStream such that this then finds a
>> transcript in range, the plugin will then work as expected without
>> modification. It seems to me that you are expecting that this plugin will
>> find the shortest distance to _any_ transcript start site, which is not the
>> intended purpose of the code.
>>
>> Condel & dbNSFP - these two plugins work exclusively on missense AKA
>> non-synonymous SNVs (hence the NS in the name dbNSFP). While dbNSFP carries
>> scores for CADD, and CADD gives scores for any genomic position, the CADD
>> scores in dbNSFP are only for missense variants.
>>
>> The feature_types() subroutine should be used when writing your own
>> plugin to determine which kind of variant/feature combinations are
>> considered by the plugin, since the run() sub is executed once for each
>> variant/feature overlap found by the core VEP code. Modifying existing
>> plugins like this should be done only if you are confident that the
>> modification achieves what you intend.
>>
>> Hope that all helps
>>
>> Will
>>
>>
>> On 7 May 2014 17:59, Genomeo Dev <genomeodev at gmail.com> wrote:
>>
>>> Thanks Will.
>>>
>>> I am working with non-coding and intergenic variants and wanted to run
>>> VEP with the following plugins:
>>>
>>> --plugin UpDownDistance,100000 \
>>> --plugin TSSDistance \
>>> --plugin
>>> Condel,/media/sf_D_DRIVE/Projects/Databases/ensembl/Plugins/Condel/config,b
>>> \
>>> --plugin
>>> CADD,/media/sf_D_DRIVE/Projects/Databases/CADD/v1.0/1000G.tsv.gz \
>>> --plugin
>>> Gwava,tss,/media/sf_D_DRIVE/Projects/Databases/gwava/gwava_scores.bed.gz \
>>> --plugin Conservation,GERP_CONSERVATION_SCORE,mammals \
>>> --plugin
>>> dbNSFP,/media/sf/data/dbNSFP/dbNSFP2.4.gz,GERP++_NR,GERP++_RS,LRT_score,LRT_pred,MutationTaster_score,MutationTaster_pred,MutationAssessor_score,MutationAssessor_pred,FATHMM_score,FATHMM_pred,RadialSVM_score,RadialSVM_pred,LR_score,LR_pred,Reliability_index,SiPhy_29way_logOdds,Polyphen2_HVAR_score,Polyphen2_HVAR_pred,SIFT_score,SIFT_pred,CADD_raw,CADD_phred
>>>
>>>
>>> As shown in the output below, apart from CADD.pm and Gwava.pm, no scores
>>> are returned for the others. dbNSFP.pm should  get at least CADD scores
>>> because these exist. As recommended I tried using:
>>>
>>> sub feature_types {
>>>     return ['Feature', 'Intergenic'];
>>> }
>>>
>>> or
>>>
>>> sub feature_types {
>>>    return ['Transcript', 'Intergenic'];
>>> }
>>>
>>> in dbNFSP.pm but does not help. When I tried that in TSSDistance.pm I
>>> get this error:
>>>
>>> Plugin 'TSSDistance' went wrong: Can't locate object method "transcript"
>>> via package "Bio::EnsEMBL::Variation::IntergenicVariationAllele" at
>>> /media/sf_D_DRIVE/Projects/Databases/ensembl/Plugins//TSSDistance.pm line
>>> 56.
>>>
>>> For UpDownDistance.pm, it does not seem to work as for instance rs140931361
>>> is 58298 bp from ENSG00000198822 but this is gene is not returned.
>>>
>>>
>>> OUTPUT:
>>>
>>>   ## ENSEMBL VARIANT EFFECT PREDICTOR v75                               ##
>>> Output produced at 2014-05-07 17:28:44                               ##
>>> Connected to homo_sapiens_core_75_37 on ensembldb.ensembl.org                              ##
>>> Using cache in /media/sf_D_DRIVE/Projects/Databases/ensembl//homo_sapiens/75                             ##
>>> Using API version 75, DB version 75                               ##
>>> sift version sift5.0.2                                ## polyphen
>>> version 2.2.2                                ## Extra column keys:                                ##
>>> BIOTYPE : Biotype of transcript                               ##
>>> CANONICAL : Indicates if transcript is canonical for this gene                              ##
>>> CELL_TYPE : List of cell types and classifications for regulatory feature                              ##
>>> CLIN_SIG : Clinical significance of variant from dbSNP                              ##
>>> DISTANCE : Shortest distance from variant to transcript                              ##
>>> DOMAINS : The source and identifer of any overlapping protein domains                             ##
>>> ENSP : Ensembl protein identifer                               ## EXON
>>> : Exon number(s) / total                               ## HIGH_INF_POS
>>> : A flag indicating if the variant falls in a high information position of
>>> the TFBP                            ## INTRON : Intron number(s) / total                               ##
>>> MOTIF_NAME : The source and identifier of a transcription factor binding
>>> profile (TFBP) aligned at this position                            ##
>>> MOTIF_POS : The relative position of the variation in the aligned TFBP                              ##
>>> MOTIF_SCORE_CHANGE : The difference in motif score of the reference and
>>> variant sequences for the TFBP                            ## PUBMED :
>>> Pubmed ID(s) of publications that cite existing variant                              ##
>>> PolyPhen : PolyPhen prediction and/or score                               ##
>>> SIFT : SIFT prediction and/or score                               ##
>>> SYMBOL : Gene symbol (e.g. HGNC)                               ##
>>> SYMBOL_SOURCE : Source of gene symbol                               ##
>>> TSSDistance : Distance from the transcription start site                              ##
>>> Condel : Consensus deleteriousness score for an amino acid substitution
>>> based on SIFT and PolyPhen-2                           ## CADD_RAW :
>>> Raw CADD score                               ## CADD_PHRED : PHRED-like
>>> scaled CADD score                              ## GWAVA : Genome Wide
>>> Annotation of VAriants score (tss model)                             ##
>>> Conservation : The conservation score for this site
>>> (method_link_type="GERP_CONSERVATION_SCORE", species_set="mammals")                          ##
>>> MutationTaster_score : MutationTaster_score from dbNSFP file
>>> /media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz                           ##
>>> Polyphen2_HVAR_score : Polyphen2_HVAR_score from dbNSFP file
>>> /media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz                           ##
>>> LRT_pred : LRT_pred from dbNSFP file
>>> /media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz                           ##
>>> MutationAssessor_score : MutationAssessor_score from dbNSFP file
>>> /media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz                           ##
>>> FATHMM_pred : FATHMM_pred from dbNSFP file
>>> /media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz                           ##
>>> LR_score : LR_score from dbNSFP file
>>> /media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz                           ##
>>> MutationTaster_pred : MutationTaster_pred from dbNSFP file
>>> /media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz                           ##
>>> SiPhy_29way_logOdds : SiPhy_29way_logOdds from dbNSFP file
>>> /media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz                           ##
>>> CADD_phred : CADD_phred from dbNSFP file
>>> /media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz                           ##
>>> Polyphen2_HVAR_pred : Polyphen2_HVAR_pred from dbNSFP file
>>> /media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz                           ##
>>> RadialSVM_pred : RadialSVM_pred from dbNSFP file
>>> /media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz                           ##
>>> Reliability_index : Reliability_index from dbNSFP file
>>> /media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz                           ##
>>> GERP++_NR : GERP++_NR from dbNSFP file
>>> /media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz                           ##
>>> MutationAssessor_pred : MutationAssessor_pred from dbNSFP file
>>> /media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz                           ##
>>> LRT_score : LRT_score from dbNSFP file
>>> /media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz                           ##
>>> CADD_raw : CADD_raw from dbNSFP file
>>> /media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz                           ##
>>> LR_pred : LR_pred from dbNSFP file
>>> /media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz                            ##
>>> FATHMM_score : FATHMM_score from dbNSFP file
>>> /media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz                           ##
>>> SIFT_score : SIFT_score from dbNSFP file
>>> /media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz                           ##
>>> GERP++_RS : GERP++_RS from dbNSFP file
>>> /media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz                           ##
>>> SIFT_pred : SIFT_pred from dbNSFP file
>>> /media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz                           ##
>>> RadialSVM_score : RadialSVM_score from dbNSFP file
>>> /media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz
>>> #Uploaded_variation Location Allele Existing_variation SYMBOL
>>> SYMBOL_SOURCE Gene ENSP Feature Feature_type BIOTYPE STRAND CANONICAL
>>> EXON INTRON DISTANCE TSSDistance Consequence cDNA_position CDS_position
>>> Protein_position Amino_acids Codons PolyPhen SIFT Condel CELL_TYPE SV
>>> PUBMED CLIN_SIG HIGH_INF_POS MOTIF_NAME MOTIF_POS MOTIF_SCORE_CHANGE
>>> TSSDistance CADD_RAW CADD_PHRED GWAVA Conservation GERP++_NR GERP++_RS
>>> LRT_score LRT_pred MutationTaster_score MutationTaster_pred
>>> MutationAssessor_score MutationAssessor_pred FATHMM_score FATHMM_pred
>>> RadialSVM_score RadialSVM_pred LR_score LR_pred Reliability_index
>>> SiPhy_29way_logOdds Polyphen2_HVAR_score Polyphen2_HVAR_pred SIFT_score
>>> SIFT_pred CADD_raw CADD_phred Extra  rs13247133 7:86199080 A rs13247133
>>> - - - - - - - - - - - - - intergenic_variant - - - - - - - - - - - - - -
>>> - - - -0.25769 2.762 0.11 - - - - - - - - - - - - - - - - - - - - - - -
>>> CADD_RAW=-0.257691;CADD_PHRED=2.762;GWAVA=0.11  rs13244782 7:86202665 T
>>> rs13244782 - - - - - - - - - - - - - intergenic_variant - - - - - - - -
>>> - - - - - - - - - 1.957591 12.5 0.15 - - - - - - - - - - - - - - - - - -
>>> - - - - - CADD_RAW=1.957591;CADD_PHRED=12.50;GWAVA=0.15  rs12704267
>>> 7:86206830 T rs12704267 - - - - - - - - - - - - - intergenic_variant - -
>>> - - - - - - - - - - - - - - - 0.111018 4.597 0.16 - - - - - - - - - - -
>>> - - - - - - - - - - - - CADD_RAW=0.111018;CADD_PHRED=4.597;GWAVA=0.16
>>> rs140931361 7:86214933-86214937 - rs140931361 - - - - - - - - - - - - -
>>> intergenic_variant - - - - - - - - - - - - - - - - - -0.42024 2.04 - - -
>>> - - - - - - - - - - - - - - - - - - - - -
>>> CADD_RAW=-0.420243;CADD_PHRED=2.040  rs34536358 7:86222651 G rs34536358
>>> - - - - - - - - - - - - - intergenic_variant - - - - - - - - - - - - - -
>>> - - - -0.31002 2.524 0.18 - - - - - - - - - - - - - - - - - - - - - - -
>>> CADD_RAW=-0.310016;CADD_PHRED=2.524;GWAVA=0.18  rs36006360 7:86224933 T
>>> rs36006360 - - - - - - - - - - - - - intergenic_variant - - - - - - - -
>>> - - - - - - - - - 2.513017 14.36 0.36 - - - - - - - - - - - - - - - - -
>>> - - - - - - CADD_RAW=2.513017;CADD_PHRED=14.36;GWAVA=0.36  rs13244678
>>> 7:86232583 T rs13244678 - - - - - - - - - - - - - intergenic_variant - -
>>> - - - - - - - - - - - - - - - -0.52024 1.626 0.05 - - - - - - - - - - -
>>> - - - - - - - - - - - - CADD_RAW=-0.520238;CADD_PHRED=1.626;GWAVA=0.05
>>> rs12704279 7:86238294 T rs12704279 - - - - - - - - - - - - -
>>> intergenic_variant - - - - - - - - - - - - - - - - - 0.454708 6.469 0.16
>>> - - - - - - - - - - - - - - - - - - - - - - -
>>> CADD_RAW=0.454708;CADD_PHRED=6.469;GWAVA=0.16  rs13228078 7:86240691 C
>>> rs13228078 - - - - - - - - - - - - - intergenic_variant - - - - - - - -
>>> - - - - - - - - - 0.980262 9.002 0.1 - - - - - - - - - - - - - - - - - -
>>> - - - - - CADD_RAW=0.980262;CADD_PHRED=9.002;GWAVA=0.1  rs140931361
>>> 7:86214933-86214937 - rs140931361 - - - - - - - - - - - - -
>>> intergenic_variant - - - - - - - - - - - - - - - - - -0.42024 2.04 - - -
>>> - - - - - - - - - - - - - - - - - - - - -
>>> CADD_RAW=-0.420243;CADD_PHRED=2.040
>>>
>>> Thanks,
>>>
>>> G.
>>>
>>> On 7 May 2014 16:13, Will McLaren <wm2 at ebi.ac.uk> wrote:
>>>
>>>> Hello,
>>>>
>>>> Correct, the plugin was intended to work with the whole_genome_SNVs.tsv
>>>> file, which only contains data for SNVs.
>>>>
>>>> I've modified the plugin so that it should be able to cope with indel
>>>> data files such as you have; please do let me know if you have any problems
>>>> as I've only sparingly tested it on made-up data!
>>>>
>>>> Regards
>>>>
>>>> Will McLaren
>>>> Ensembl Variation
>>>>
>>>>
>>>> On 7 May 2014 15:37, Genomeo Dev <genomeodev at gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> There seem to be a discrepancy between the CADD score calculated using
>>>>> VEP with the CADD.pm plugin and the tabix direct output:
>>>>>
>>>>> For example using this 1000G variant:
>>>>>
>>>>> #CHROM POS ID REF ALT QUAL FILTER INFO
>>>>> 7 86214932 rs140931361 TTACTC T . PASS .
>>>>>
>>>>> variant_effect_predictor.pl -i input.txt --format vcf --plugin
>>>>> CADD,/media/sf_D_DRIVE/Projects/Databases/CADD/v1.0/1000G.tsv.gz
>>>>> does not return any CADD score
>>>>>
>>>>> whereas
>>>>> $ tabix -p vcf 1000G.tsv.gz 7:86214932-86214932
>>>>> 7 86214932 TTACTC T -0.420243 2.040
>>>>>
>>>>> This seems to affect indels and not SNVs. I could see in the plugin
>>>>> that there is a rule to ignore indels. Any suggestions please how to safely
>>>>> change that?
>>>>>
>>>>> Also, in the plugin, I assume there is a test to ensure the alleles
>>>>> are identical between the input file and the 1000G.tsv.gz file. Is this
>>>>> correct?
>>>>>
>>>>> Thanks.
>>>>>
>>>>> --
>>>>> G.
>>>>>
>>>>> _______________________________________________
>>>>> Dev mailing list    Dev at ensembl.org
>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Dev mailing list    Dev at ensembl.org
>>>> Posting guidelines and subscribe/unsubscribe info:
>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>> Ensembl Blog: http://www.ensembl.info/
>>>>
>>>>
>>>
>>>
>>> --
>>> G.
>>>
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info:
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>
>
> --
> G.
>



-- 
G.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20140527/0eacb648/attachment.html>


More information about the Dev mailing list