[ensembl-dev] UpDownDistance using

Genomeo Dev genomeodev at gmail.com
Fri May 30 17:49:31 BST 2014


I did eventually figure out the answer to the first question: my ($self,
$tva) = @_; $self->params()

For my second question, more specifically, what I want to do is to be able
to use the original input coordinate for each individual input variant to
then specify the UPSTREAM_DISTANCE and DOWNSTREAM_DISTANCE limits per
variant in UpDownDistance.pm. The reason for that is I have a large group
of variants for which I want to consider consequences within the same
physical range which I can already pass on the the plugin as arguments.
Running VEP per variant is not efficient hence the question.

Regards,

G.


On 30 May 2014 15:48, Genomeo Dev <genomeodev at gmail.com> wrote:

> A related question is where how to get the inputed variant attributes
> (e.g. position, reference ID) so to process that within the subroutine.
>
> Thanks,
>
> G.
>
>
> On 30 May 2014 13:01, Genomeo Dev <genomeodev at gmail.com> wrote:
>
>> Thanks Will. It is working fine now.
>>
>> I wanted to modify the UpDownDistance.pm to produce two separate columns
>> in the VEP output showing the UPDIST_CUTOFF and UPDIST_CUTOFF parameters
>> (See below). Please how do I fetch the plugin arguments into the run
>> subroutine?
>>
>> Thanks,
>>
>> G.
>>
>>
>> use strict;
>> use warnings;
>> use base qw(Bio::EnsEMBL::Variation::Utils::BaseVepPlugin);
>>
>> sub feature_types {
>>     return ['Feature', 'Intergenic'];
>> }
>>
>> sub get_header_info {
>>     return {
>>         UPDIST_CUTOFF => "distance cutoff upstream variant where
>> consequences are calculated",
>>         DOWNIDST_CUTOFF => "distance cutoff downstream variant where
>> consequences are calculated"
>>     };
>> }
>>
>> sub new {
>>
>>   my $class = shift;
>>   my $self = $class->SUPER::new(@_);
>>
>>   # change up/down
>>   my $up = $self->params->[0] || 5000;
>>
>>   my $down = $self->params->[1] || $up;
>>    $Bio::EnsEMBL::Variation::Utils::VariationEffect::UPSTREAM_DISTANCE =
>> $up;
>>   $Bio::EnsEMBL::Variation::Utils::VariationEffect::DOWNSTREAM_DISTANCE =
>> $down;
>>
>>   return $self;
>>
>> }
>>
>>
>> sub run {
>>         my $upstream_distance = ?
>>         my $downstream_distance = ?
>>  return {
>> UPDIST_CUTOFF => $upstream_distance,
>> DOWNDIST_CUTOFF => $downstream_distance
>>  }
>> };
>>
>> 1;
>>
>>
>>
>>
>>
>>
>>
>> On 29 May 2014 09:57, Will McLaren <wm2 at ebi.ac.uk> wrote:
>>
>>> Hi,
>>>
>>> I've patched a fix in for the UpDownDistance issue, the fix is in the
>>> main ensembl-variation API.
>>>
>>> Regarding the DISTANCE field, perhaps you could write a plugin that does
>>> exactly what you want? Changing the behaviour of this field may not be
>>> compatible with other people's pipelines, and the plugin system is the
>>> perfect way for you to have annotations customised to your requirements.
>>>
>>> Regards
>>>
>>> Will
>>>
>>>
>>> On 28 May 2014 18:58, Genomeo Dev <genomeodev at gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> When using different up and down arguments in UpDownDistance.pm, VEP
>>>> returns genes outside the specified range as shown in the example below (MIR1302-4
>>>> is 94161 upstream of rs17808606 but is still reported using
>>>> UpDownDistance,5000,100000). For the genes which are outside the
>>>> range, the DISTANCE and Consequence columns are empty while for example
>>>> TSSDistance is not empty which might indicate the up and down arguments may
>>>> not be processed correctly.
>>>>
>>>> It would be helpful to only return genes whose coordinates satisfy
>>>> the specified range. Also, it would immensely help as well if DISTANCE is
>>>> set to 0 for variants falling within genes and is otherwise calculated even
>>>> for non-transcript feature types.
>>>>
>>>> Note that I am using Ensembl 75 updated with the recently updated
>>>> ensembl variantion module which allowed UpDownDistance.pm to work for
>>>> distances beyond 5kb.
>>>>
>>>> Thanks,
>>>>
>>>> G.
>>>>
>>>> ##UpDownDistance,5000,100000
>>>> ##TSSDistance
>>>>        #Uploaded_variation Location Allele Existing_variation SYMBOL
>>>> SYMBOL_SOURCE Gene ENSP Feature Feature_type BIOTYPE STRAND CANONICAL
>>>> EXON INTRON DISTANCE TSSDistance Consequence  rs17808606  2:208228309 T
>>>> rs17808606 AC007879.5 Clone_based_vega_gene ENSG00000223725 -
>>>> ENST00000412387 Transcript antisense -1 - - 3/4 - -
>>>> intron_variant,nc_transcript_variant  rs17808606 2:208228309 T
>>>> rs17808606 MIR1302-4 HGNC ENSG00000221628 - ENST00000408701 Transcript
>>>> miRNA -1 YES - - - 94161  rs17808606  2:208228309 T rs17808606
>>>> AC007879.6 Clone_based_vega_gene ENSG00000225064 - ENST00000438824
>>>> Transcript lincRNA 1 YES - - 92895 - downstream_gene_variant
>>>> rs17808606  2:208228309 T rs17808606 AC007879.5 Clone_based_vega_gene
>>>> ENSG00000223725 - ENST00000418850 Transcript antisense -1 YES - 4/5 - -
>>>> intron_variant,nc_transcript_variant
>>>> ##UpDownDistance,100000
>>>> ##TSSDistance
>>>>       #Uploaded_variation Location Allele Existing_variation SYMBOL
>>>> SYMBOL_SOURCE Gene ENSP Feature Feature_type BIOTYPE STRAND CANONICAL
>>>> EXON INTRON DISTANCE TSSDistance Consequence  rs17808606  2:208228309 T
>>>> rs17808606 AC007879.5 Clone_based_vega_gene ENSG00000223725 -
>>>> ENST00000412387 Transcript antisense -1 - - 3/4 - -
>>>> intron_variant,nc_transcript_variant  rs17808606  2:208228309 T
>>>> rs17808606 MIR1302-4 HGNC ENSG00000221628 - ENST00000408701 Transcript
>>>> miRNA -1 YES - - 94161 94161 upstream_gene_variant  rs17808606
>>>> 2:208228309 T rs17808606 AC007879.6 Clone_based_vega_gene
>>>> ENSG00000225064 - ENST00000438824 Transcript lincRNA 1 YES - - 92895 -
>>>> downstream_gene_variant  rs17808606  2:208228309 T rs17808606
>>>> AC007879.5 Clone_based_vega_gene ENSG00000223725 - ENST00000418850
>>>> Transcript antisense -1 YES - 4/5 - -
>>>> intron_variant,nc_transcript_variant
>>>>
>>>>
>>>> On 27 May 2014 11:03, Genomeo Dev <genomeodev at gmail.com> wrote:
>>>>
>>>>> Sorry seems the plug-in already does that thanks!
>>>>>
>>>>> G.
>>>>>
>>>>>
>>>>> On 23 May 2014 19:14, Genomeo Dev <genomeodev at gmail.com> wrote:
>>>>>
>>>>>> Hi Will,
>>>>>>
>>>>>> Thanks very much. That worked nicely.
>>>>>>
>>>>>> I am working with a set of variants within a locus where I know that
>>>>>> they are LD-independent with other genes from outside this locus.
>>>>>> Therefore, I want only to focus on genes inside this physically defined
>>>>>> locus.
>>>>>>
>>>>>> Rarely do these variants fall exactly at the centre of the locus so
>>>>>> distances to the right and left boundaries are not equal. Would it be
>>>>>> possible to alter UpDownDistance.pm to be able to specify a start
>>>>>> and end coordinate within which VEP should be constrained instead of the
>>>>>> current distance cutoff?
>>>>>>
>>>>>> Many thanks,
>>>>>>
>>>>>> G.
>>>>>>
>>>>>>
>>>>>> On 8 May 2014 16:12, Will McLaren <wm2 at ebi.ac.uk> wrote:
>>>>>>
>>>>>>> Hello again,
>>>>>>>
>>>>>>> I've fixed a bug that prevented UpDownDistance functioning correctly
>>>>>>> - it hadn't been tested with larger distances such as you specified which
>>>>>>> broke some assumptions in the core VEP code.
>>>>>>> You will need to update your ensembl-variation module or re-run the
>>>>>>> VEP INSTALL.pl script to pick up the new API code.
>>>>>>>
>>>>>>> As far as the other plugins go, I think you are misunderstanding how
>>>>>>> some of them work:
>>>>>>>
>>>>>>> TSSDistance - this gives the distance between a variant and the
>>>>>>> annotated transcript start site. If a variant is annotated as intergenic,
>>>>>>> there is no transcript to give the distance to! Changing the code to force
>>>>>>> it to assess intergenic variants will of course break here. Of course if
>>>>>>> you alter the up/down-stream distance using UpDownStream such that this
>>>>>>> then finds a transcript in range, the plugin will then work as expected
>>>>>>> without modification. It seems to me that you are expecting that this
>>>>>>> plugin will find the shortest distance to _any_ transcript start site,
>>>>>>> which is not the intended purpose of the code.
>>>>>>>
>>>>>>> Condel & dbNSFP - these two plugins work exclusively on missense AKA
>>>>>>> non-synonymous SNVs (hence the NS in the name dbNSFP). While dbNSFP carries
>>>>>>> scores for CADD, and CADD gives scores for any genomic position, the CADD
>>>>>>> scores in dbNSFP are only for missense variants.
>>>>>>>
>>>>>>> The feature_types() subroutine should be used when writing your own
>>>>>>> plugin to determine which kind of variant/feature combinations are
>>>>>>> considered by the plugin, since the run() sub is executed once for each
>>>>>>> variant/feature overlap found by the core VEP code. Modifying existing
>>>>>>> plugins like this should be done only if you are confident that the
>>>>>>> modification achieves what you intend.
>>>>>>>
>>>>>>> Hope that all helps
>>>>>>>
>>>>>>> Will
>>>>>>>
>>>>>>>
>>>>>>> On 7 May 2014 17:59, Genomeo Dev <genomeodev at gmail.com> wrote:
>>>>>>>
>>>>>>>> Thanks Will.
>>>>>>>>
>>>>>>>> I am working with non-coding and intergenic variants and wanted to
>>>>>>>> run VEP with the following plugins:
>>>>>>>>
>>>>>>>> --plugin UpDownDistance,100000 \
>>>>>>>> --plugin TSSDistance \
>>>>>>>> --plugin
>>>>>>>> Condel,/media/sf_D_DRIVE/Projects/Databases/ensembl/Plugins/Condel/config,b
>>>>>>>> \
>>>>>>>> --plugin
>>>>>>>> CADD,/media/sf_D_DRIVE/Projects/Databases/CADD/v1.0/1000G.tsv.gz \
>>>>>>>> --plugin
>>>>>>>> Gwava,tss,/media/sf_D_DRIVE/Projects/Databases/gwava/gwava_scores.bed.gz \
>>>>>>>> --plugin Conservation,GERP_CONSERVATION_SCORE,mammals \
>>>>>>>> --plugin
>>>>>>>> dbNSFP,/media/sf/data/dbNSFP/dbNSFP2.4.gz,GERP++_NR,GERP++_RS,LRT_score,LRT_pred,MutationTaster_score,MutationTaster_pred,MutationAssessor_score,MutationAssessor_pred,FATHMM_score,FATHMM_pred,RadialSVM_score,RadialSVM_pred,LR_score,LR_pred,Reliability_index,SiPhy_29way_logOdds,Polyphen2_HVAR_score,Polyphen2_HVAR_pred,SIFT_score,SIFT_pred,CADD_raw,CADD_phred
>>>>>>>>
>>>>>>>>
>>>>>>>> As shown in the output below, apart from CADD.pm and Gwava.pm, no
>>>>>>>> scores are returned for the others. dbNSFP.pm should  get at least CADD
>>>>>>>> scores because these exist. As recommended I tried using:
>>>>>>>>
>>>>>>>> sub feature_types {
>>>>>>>>     return ['Feature', 'Intergenic'];
>>>>>>>> }
>>>>>>>>
>>>>>>>> or
>>>>>>>>
>>>>>>>> sub feature_types {
>>>>>>>>    return ['Transcript', 'Intergenic'];
>>>>>>>> }
>>>>>>>>
>>>>>>>> in dbNFSP.pm but does not help. When I tried that in TSSDistance.pm
>>>>>>>> I get this error:
>>>>>>>>
>>>>>>>> Plugin 'TSSDistance' went wrong: Can't locate object method
>>>>>>>> "transcript" via package
>>>>>>>> "Bio::EnsEMBL::Variation::IntergenicVariationAllele" at
>>>>>>>> /media/sf_D_DRIVE/Projects/Databases/ensembl/Plugins//TSSDistance.pm line
>>>>>>>> 56.
>>>>>>>>
>>>>>>>> For UpDownDistance.pm, it does not seem to work as for instance rs140931361
>>>>>>>> is 58298 bp from ENSG00000198822 but this is gene is not returned.
>>>>>>>>
>>>>>>>>
>>>>>>>> OUTPUT:
>>>>>>>>
>>>>>>>>   ## ENSEMBL VARIANT EFFECT PREDICTOR v75                               ##
>>>>>>>> Output produced at 2014-05-07 17:28:44                               ##
>>>>>>>> Connected to homo_sapiens_core_75_37 on ensembldb.ensembl.org                              ##
>>>>>>>> Using cache in /media/sf_D_DRIVE/Projects/Databases/ensembl//homo_sapiens/75                             ##
>>>>>>>> Using API version 75, DB version 75                               ##
>>>>>>>> sift version sift5.0.2                                ## polyphen
>>>>>>>> version 2.2.2                                ## Extra column keys:                                ##
>>>>>>>> BIOTYPE : Biotype of transcript                               ##
>>>>>>>> CANONICAL : Indicates if transcript is canonical for this gene                              ##
>>>>>>>> CELL_TYPE : List of cell types and classifications for regulatory feature                              ##
>>>>>>>> CLIN_SIG : Clinical significance of variant from dbSNP                              ##
>>>>>>>> DISTANCE : Shortest distance from variant to transcript                              ##
>>>>>>>> DOMAINS : The source and identifer of any overlapping protein domains                             ##
>>>>>>>> ENSP : Ensembl protein identifer                               ##
>>>>>>>> EXON : Exon number(s) / total                               ##
>>>>>>>> HIGH_INF_POS : A flag indicating if the variant falls in a high information
>>>>>>>> position of the TFBP                            ## INTRON : Intron
>>>>>>>> number(s) / total                               ## MOTIF_NAME :
>>>>>>>> The source and identifier of a transcription factor binding profile (TFBP)
>>>>>>>> aligned at this position                            ## MOTIF_POS :
>>>>>>>> The relative position of the variation in the aligned TFBP                              ##
>>>>>>>> MOTIF_SCORE_CHANGE : The difference in motif score of the reference and
>>>>>>>> variant sequences for the TFBP                            ##
>>>>>>>> PUBMED : Pubmed ID(s) of publications that cite existing variant                              ##
>>>>>>>> PolyPhen : PolyPhen prediction and/or score                               ##
>>>>>>>> SIFT : SIFT prediction and/or score                               ##
>>>>>>>> SYMBOL : Gene symbol (e.g. HGNC)                               ##
>>>>>>>> SYMBOL_SOURCE : Source of gene symbol                               ##
>>>>>>>> TSSDistance : Distance from the transcription start site                              ##
>>>>>>>> Condel : Consensus deleteriousness score for an amino acid
>>>>>>>> substitution based on SIFT and PolyPhen-2                           ##
>>>>>>>> CADD_RAW : Raw CADD score                               ##
>>>>>>>> CADD_PHRED : PHRED-like scaled CADD score                              ##
>>>>>>>> GWAVA : Genome Wide Annotation of VAriants score (tss model)                             ##
>>>>>>>> Conservation : The conservation score for this site
>>>>>>>> (method_link_type="GERP_CONSERVATION_SCORE", species_set="mammals")                          ##
>>>>>>>> MutationTaster_score : MutationTaster_score from dbNSFP file
>>>>>>>> /media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz                           ##
>>>>>>>> Polyphen2_HVAR_score : Polyphen2_HVAR_score from dbNSFP file
>>>>>>>> /media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz                           ##
>>>>>>>> LRT_pred : LRT_pred from dbNSFP file
>>>>>>>> /media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz                           ##
>>>>>>>> MutationAssessor_score : MutationAssessor_score from dbNSFP file
>>>>>>>> /media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz                           ##
>>>>>>>> FATHMM_pred : FATHMM_pred from dbNSFP file
>>>>>>>> /media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz                           ##
>>>>>>>> LR_score : LR_score from dbNSFP file
>>>>>>>> /media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz                           ##
>>>>>>>> MutationTaster_pred : MutationTaster_pred from dbNSFP file
>>>>>>>> /media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz                           ##
>>>>>>>> SiPhy_29way_logOdds : SiPhy_29way_logOdds from dbNSFP file
>>>>>>>> /media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz                           ##
>>>>>>>> CADD_phred : CADD_phred from dbNSFP file
>>>>>>>> /media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz                           ##
>>>>>>>> Polyphen2_HVAR_pred : Polyphen2_HVAR_pred from dbNSFP file
>>>>>>>> /media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz                           ##
>>>>>>>> RadialSVM_pred : RadialSVM_pred from dbNSFP file
>>>>>>>> /media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz                           ##
>>>>>>>> Reliability_index : Reliability_index from dbNSFP file
>>>>>>>> /media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz                           ##
>>>>>>>> GERP++_NR : GERP++_NR from dbNSFP file
>>>>>>>> /media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz                           ##
>>>>>>>> MutationAssessor_pred : MutationAssessor_pred from dbNSFP file
>>>>>>>> /media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz                           ##
>>>>>>>> LRT_score : LRT_score from dbNSFP file
>>>>>>>> /media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz                           ##
>>>>>>>> CADD_raw : CADD_raw from dbNSFP file
>>>>>>>> /media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz                           ##
>>>>>>>> LR_pred : LR_pred from dbNSFP file
>>>>>>>> /media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz                            ##
>>>>>>>> FATHMM_score : FATHMM_score from dbNSFP file
>>>>>>>> /media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz                           ##
>>>>>>>> SIFT_score : SIFT_score from dbNSFP file
>>>>>>>> /media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz                           ##
>>>>>>>> GERP++_RS : GERP++_RS from dbNSFP file
>>>>>>>> /media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz                           ##
>>>>>>>> SIFT_pred : SIFT_pred from dbNSFP file
>>>>>>>> /media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz                           ##
>>>>>>>> RadialSVM_score : RadialSVM_score from dbNSFP file
>>>>>>>> /media/sf_Psychiatric_Genetics_yo2/data/dbNSFP/dbNSFP2.4.gz
>>>>>>>> #Uploaded_variation Location Allele Existing_variation SYMBOL
>>>>>>>> SYMBOL_SOURCE Gene ENSP Feature Feature_type BIOTYPE STRAND
>>>>>>>> CANONICAL EXON INTRON DISTANCE TSSDistance Consequence
>>>>>>>> cDNA_position CDS_position Protein_position Amino_acids Codons
>>>>>>>> PolyPhen SIFT Condel CELL_TYPE SV PUBMED CLIN_SIG HIGH_INF_POS
>>>>>>>> MOTIF_NAME MOTIF_POS MOTIF_SCORE_CHANGE TSSDistance CADD_RAW
>>>>>>>> CADD_PHRED GWAVA Conservation GERP++_NR GERP++_RS LRT_score
>>>>>>>> LRT_pred MutationTaster_score MutationTaster_pred
>>>>>>>> MutationAssessor_score MutationAssessor_pred FATHMM_score
>>>>>>>> FATHMM_pred RadialSVM_score RadialSVM_pred LR_score LR_pred
>>>>>>>> Reliability_index SiPhy_29way_logOdds Polyphen2_HVAR_score
>>>>>>>> Polyphen2_HVAR_pred SIFT_score SIFT_pred CADD_raw CADD_phred Extra
>>>>>>>> rs13247133 7:86199080 A rs13247133 - - - - - - - - - - - - -
>>>>>>>> intergenic_variant - - - - - - - - - - - - - - - - - -0.25769 2.762
>>>>>>>> 0.11 - - - - - - - - - - - - - - - - - - - - - - -
>>>>>>>> CADD_RAW=-0.257691;CADD_PHRED=2.762;GWAVA=0.11  rs13244782
>>>>>>>> 7:86202665 T rs13244782 - - - - - - - - - - - - -
>>>>>>>> intergenic_variant - - - - - - - - - - - - - - - - - 1.957591 12.5
>>>>>>>> 0.15 - - - - - - - - - - - - - - - - - - - - - - -
>>>>>>>> CADD_RAW=1.957591;CADD_PHRED=12.50;GWAVA=0.15  rs12704267
>>>>>>>> 7:86206830 T rs12704267 - - - - - - - - - - - - -
>>>>>>>> intergenic_variant - - - - - - - - - - - - - - - - - 0.111018 4.597
>>>>>>>> 0.16 - - - - - - - - - - - - - - - - - - - - - - -
>>>>>>>> CADD_RAW=0.111018;CADD_PHRED=4.597;GWAVA=0.16  rs140931361
>>>>>>>> 7:86214933-86214937 - rs140931361 - - - - - - - - - - - - -
>>>>>>>> intergenic_variant - - - - - - - - - - - - - - - - - -0.42024 2.04
>>>>>>>> - - - - - - - - - - - - - - - - - - - - - - - -
>>>>>>>> CADD_RAW=-0.420243;CADD_PHRED=2.040  rs34536358 7:86222651 G
>>>>>>>> rs34536358 - - - - - - - - - - - - - intergenic_variant - - - - - -
>>>>>>>> - - - - - - - - - - - -0.31002 2.524 0.18 - - - - - - - - - - - - -
>>>>>>>> - - - - - - - - - - CADD_RAW=-0.310016;CADD_PHRED=2.524;GWAVA=0.18
>>>>>>>> rs36006360 7:86224933 T rs36006360 - - - - - - - - - - - - -
>>>>>>>> intergenic_variant - - - - - - - - - - - - - - - - - 2.513017 14.36
>>>>>>>> 0.36 - - - - - - - - - - - - - - - - - - - - - - -
>>>>>>>> CADD_RAW=2.513017;CADD_PHRED=14.36;GWAVA=0.36  rs13244678
>>>>>>>> 7:86232583 T rs13244678 - - - - - - - - - - - - -
>>>>>>>> intergenic_variant - - - - - - - - - - - - - - - - - -0.52024 1.626
>>>>>>>> 0.05 - - - - - - - - - - - - - - - - - - - - - - -
>>>>>>>> CADD_RAW=-0.520238;CADD_PHRED=1.626;GWAVA=0.05  rs12704279
>>>>>>>> 7:86238294 T rs12704279 - - - - - - - - - - - - -
>>>>>>>> intergenic_variant - - - - - - - - - - - - - - - - - 0.454708 6.469
>>>>>>>> 0.16 - - - - - - - - - - - - - - - - - - - - - - -
>>>>>>>> CADD_RAW=0.454708;CADD_PHRED=6.469;GWAVA=0.16  rs13228078
>>>>>>>> 7:86240691 C rs13228078 - - - - - - - - - - - - -
>>>>>>>> intergenic_variant - - - - - - - - - - - - - - - - - 0.980262 9.002
>>>>>>>> 0.1 - - - - - - - - - - - - - - - - - - - - - - -
>>>>>>>> CADD_RAW=0.980262;CADD_PHRED=9.002;GWAVA=0.1  rs140931361
>>>>>>>> 7:86214933-86214937 - rs140931361 - - - - - - - - - - - - -
>>>>>>>> intergenic_variant - - - - - - - - - - - - - - - - - -0.42024 2.04
>>>>>>>> - - - - - - - - - - - - - - - - - - - - - - - -
>>>>>>>> CADD_RAW=-0.420243;CADD_PHRED=2.040
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> G.
>>>>>>>>
>>>>>>>> On 7 May 2014 16:13, Will McLaren <wm2 at ebi.ac.uk> wrote:
>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> Correct, the plugin was intended to work with
>>>>>>>>> the whole_genome_SNVs.tsv file, which only contains data for SNVs.
>>>>>>>>>
>>>>>>>>> I've modified the plugin so that it should be able to cope with
>>>>>>>>> indel data files such as you have; please do let me know if you have any
>>>>>>>>> problems as I've only sparingly tested it on made-up data!
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>>
>>>>>>>>> Will McLaren
>>>>>>>>> Ensembl Variation
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 7 May 2014 15:37, Genomeo Dev <genomeodev at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> There seem to be a discrepancy between the CADD score calculated
>>>>>>>>>> using VEP with the CADD.pm plugin and the tabix direct output:
>>>>>>>>>>
>>>>>>>>>> For example using this 1000G variant:
>>>>>>>>>>
>>>>>>>>>> #CHROM POS ID REF ALT QUAL FILTER INFO
>>>>>>>>>> 7 86214932 rs140931361 TTACTC T . PASS .
>>>>>>>>>>
>>>>>>>>>> variant_effect_predictor.pl -i input.txt --format vcf --plugin
>>>>>>>>>> CADD,/media/sf_D_DRIVE/Projects/Databases/CADD/v1.0/1000G.tsv.gz
>>>>>>>>>> does not return any CADD score
>>>>>>>>>>
>>>>>>>>>> whereas
>>>>>>>>>> $ tabix -p vcf 1000G.tsv.gz 7:86214932-86214932
>>>>>>>>>> 7 86214932 TTACTC T -0.420243 2.040
>>>>>>>>>>
>>>>>>>>>> This seems to affect indels and not SNVs. I could see in the
>>>>>>>>>> plugin that there is a rule to ignore indels. Any suggestions please how to
>>>>>>>>>> safely change that?
>>>>>>>>>>
>>>>>>>>>> Also, in the plugin, I assume there is a test to ensure the
>>>>>>>>>> alleles are identical between the input file and the 1000G.tsv.gz file. Is
>>>>>>>>>> this correct?
>>>>>>>>>>
>>>>>>>>>> Thanks.
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> G.
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Dev mailing list    Dev at ensembl.org
>>>>>>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>>>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>>>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Dev mailing list    Dev at ensembl.org
>>>>>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> G.
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Dev mailing list    Dev at ensembl.org
>>>>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Dev mailing list    Dev at ensembl.org
>>>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> G.
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> G.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> G.
>>>>
>>>> _______________________________________________
>>>> Dev mailing list    Dev at ensembl.org
>>>> Posting guidelines and subscribe/unsubscribe info:
>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>> Ensembl Blog: http://www.ensembl.info/
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info:
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>>
>>
>>
>> --
>> G.
>>
>
>
>
> --
> G.
>



-- 
G.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20140530/42bf1c72/attachment.html>


More information about the Dev mailing list