[ensembl-dev] VEP ClinVar information

Will McLaren wm2 at ebi.ac.uk
Thu Mar 26 11:57:38 GMT 2015


You will probably see odd behaviour if you mix versions of the script / API
/ caches / databases. Feel free to experiment, though we can't support such
setups obviously.

It works fine for me on 75 or 79 (i.e. using "git checkout --release 75" in
ensembl-tools, ensembl-variation, ensembl-core)

Will

On 26 March 2015 at 11:26, Guillermo Marco Puche <
guillermo.marco at sistemasgenomicos.com> wrote:

>  I guess it's a problem with my API installation then.
>
> If I install latest API(79) and continue to use
> variant_effect_predictor.pl from version 75 will it continue working or
> I'll get conflicts/weird behaviours?
>
> Thanks!
>
> Regards,
> Guillermo.
>
>
>
> On 26/03/15 11:53, Will McLaren wrote:
>
> Example output (I set $Data::Dumper::Maxdepth = 1;):
>
>  > echo "rs699" | perl -I ~/Git/guillermo/vep/ variant_effect_predictor.pl
> -plugin Clinvar -data -force -db 75
>  2015-03-26 10:51:38 - Reading input from STDIN (or maybe you forgot to
> specify an input file?)...
> 2015-03-26 10:51:38 - Starting...
> 2015-03-26 10:51:38 - Detected format of input file as id
> 2015-03-26 10:51:38 - Read 1 variants into buffer
> 2015-03-26 10:51:38 - Checking for existing variations
> [===================================================================================================================================================================================================]
>  [ 100% ]
> 2015-03-26 10:51:38 - Reading transcript data from cache and/or database
> [===================================================================================================================================================================================================]
>  [ 100% ]
> 2015-03-26 10:51:38 - Retrieved 4 transcripts (0 mem, 0 cached, 4 DB, 0
> duplicates)
> 2015-03-26 10:51:38 - Analyzing chromosome 1
> 2015-03-26 10:51:38 - Analyzing variants
> [===================================================================================================================================================================================================]
>  [ 100% ]
>  2015-03-26 10:51:38 - Calculating consequences
> $VAR1 = bless( {
>                  'adaptor' =>
> 'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
>                  'dbID' => '2853',
>                  'name' => undef,
>                  'description' => 'HYPERTENSION, ESSENTIAL, SUSCEPTIBILITY
> TO'
>                }, 'Bio::EnsEMBL::Variation::Phenotype' );
> $VAR1 = bless( {
>                  'adaptor' =>
> 'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
>                  'dbID' => '20384',
>                  'name' => undef,
>                  'description' =>
> 'Susceptibility_to_progression_to_renal_failure_in_IgA_nephropathy'
>                }, 'Bio::EnsEMBL::Variation::Phenotype' );
> $VAR1 = bless( {
>                  'adaptor' =>
> 'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
>                  'dbID' => '20369',
>                  'name' => undef,
>                  'description' => 'Preeclampsia,_susceptibility_to'
>                }, 'Bio::EnsEMBL::Variation::Phenotype' );
> $VAR1 = bless( {
>                  'adaptor' =>
> 'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
>                  'dbID' => '26451',
>                  'name' => undef,
>                  'description' =>
> 'HYPERTENSION,_ESSENTIAL,_SUSCEPTIBILITY_TO'
>                }, 'Bio::EnsEMBL::Variation::Phenotype' );
> $VAR1 = bless( {
>                  'adaptor' =>
> 'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
>                  'dbID' => '1',
>                  'name' => 'HGMD_MUTATION',
>                  'description' => 'Annotated by HGMD but no phenotype
> description is publicly available'
>                }, 'Bio::EnsEMBL::Variation::Phenotype' );
> $VAR1 = bless( {
>                  'adaptor' =>
> 'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
>                  'dbID' => '6522',
>                  'name' => undef,
>                  'description' => 'COSMIC:tumour_site:large_intestine'
>                }, 'Bio::EnsEMBL::Variation::Phenotype' );
> $VAR1 = bless( {
>                  'adaptor' =>
> 'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
>                  'dbID' => '6529',
>                  'name' => undef,
>                  'description' => 'COSMIC:tumour_site:breast'
>                }, 'Bio::EnsEMBL::Variation::Phenotype' );
> $VAR1 = bless( {
>                  'adaptor' =>
> 'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
>                  'dbID' => '2853',
>                  'name' => undef,
>                  'description' => 'HYPERTENSION, ESSENTIAL, SUSCEPTIBILITY
> TO'
>                }, 'Bio::EnsEMBL::Variation::Phenotype' );
> $VAR1 = bless( {
>                  'adaptor' =>
> 'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
>                  'dbID' => '20384',
>                  'name' => undef,
>                  'description' =>
> 'Susceptibility_to_progression_to_renal_failure_in_IgA_nephropathy'
>                }, 'Bio::EnsEMBL::Variation::Phenotype' );
> $VAR1 = bless( {
>                  'adaptor' =>
> 'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
>                  'dbID' => '20369',
>                  'name' => undef,
>                  'description' => 'Preeclampsia,_susceptibility_to'
>                }, 'Bio::EnsEMBL::Variation::Phenotype' );
> $VAR1 = bless( {
>                  'adaptor' =>
> 'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
>                  'dbID' => '26451',
>                  'name' => undef,
>                  'description' =>
> 'HYPERTENSION,_ESSENTIAL,_SUSCEPTIBILITY_TO'
>                }, 'Bio::EnsEMBL::Variation::Phenotype' );
> $VAR1 = bless( {
>                  'adaptor' =>
> 'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
>                  'dbID' => '1',
>                  'name' => 'HGMD_MUTATION',
>                  'description' => 'Annotated by HGMD but no phenotype
> description is publicly available'
>                }, 'Bio::EnsEMBL::Variation::Phenotype' );
> $VAR1 = bless( {
>                  'adaptor' =>
> 'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
>                  'dbID' => '6522',
>                  'name' => undef,
>                  'description' => 'COSMIC:tumour_site:large_intestine'
>                }, 'Bio::EnsEMBL::Variation::Phenotype' );
> $VAR1 = bless( {
>                  'adaptor' =>
> 'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
>                  'dbID' => '6529',
>                  'name' => undef,
>                  'description' => 'COSMIC:tumour_site:breast'
>                }, 'Bio::EnsEMBL::Variation::Phenotype' );
> 2015-03-26 10:51:38 - Processed 1 total variants (1 vars/sec, 1 vars/sec
> total)
> 2015-03-26 10:51:38 - Wrote stats summary to
> variant_effect_output.txt_summary.html
> 2015-03-26 10:51:38 - Finished!
>
> On 26 March 2015 at 10:43, Guillermo Marco Puche <
> guillermo.marco at sistemasgenomicos.com> wrote:
>
>>  Hello Will,
>>
>> I already had enabled "check_existing" on my VEP config template, however
>> I followed your advice and updated code to force in the new() method with
>> your code.
>> I'm still getting no prints of line 64:
>>
>> print Dumper($pf->phenotype());
>>
>> Are you getting any output printed? As I said I get no errors but nothing
>> is printed neither. This data dumper should be printing result of
>> phenotype() method call.
>>
>> Regards,
>> Guillermo.
>>
>>
>>
>> On 26/03/15 11:05, Will McLaren wrote:
>>
>> I think perhaps you haven't enabled --check_existing; this is required
>> for $vf->{existing} to get populated.
>>
>>  You can force it on in the new() method of your plugin:
>>
>>  $self->{config}->{check_existing} = 1;
>>
>>  It then works for me on release/75 and release/79.
>>
>>  Will
>>
>> On 25 March 2015 at 17:35, Guillermo Marco Puche <
>> guillermo.marco at sistemasgenomicos.com> wrote:
>>
>>>  Hello Will,
>>>
>>> With your explanations I'm trying to call phenotype (as you said I was
>>> accessing the hashref directly).
>>> I'm using input set you linked. However my local Ensembl installation is
>>> v75.
>>>
>>> This is the code of the plugin:
>>> https://github.com/guillermomarco/vep/blob/master/Clinvar.pm
>>>
>>> I'm getting absolutelty no info nor errors. I've no idea if this is an
>>> issue with my database/API version or with the plugin code itself.
>>>
>>> Regards,
>>> Guillermo.
>>>
>>>
>>>
>>> On 16/03/15 17:50, Will McLaren wrote:
>>>
>>> The "is_significant" field is an internal flag that doesn't necessarily
>>> have the meaning you expect; it is used to distinguish between genuine
>>> reported associations and e.g. non-significant associations reported from
>>> genome-wide studies.
>>>
>>>  You should not see undef for phenotype; I suspect you are accessing
>>> the hashref directly ($pf->{phenotype}) rather than making the method call
>>> ($pf->phenotype()).
>>>
>>>  You could try
>>> ftp://ftp.ensembl.org/pub/release-79/variation/vcf/homo_sapiens/Homo_sapiens_clinically_associated.vcf.gz
>>> as a test input set.
>>>
>>>  Will
>>>
>>> On 16 March 2015 at 16:39, Guillermo Marco Puche <
>>> guillermo.marco at sistemasgenomicos.com> wrote:
>>>
>>>>  Hi Will,
>>>>
>>>> Thank you for your quick response! Very clarifying.
>>>>
>>>> I guess that the way to retrieve ClinVar data I posted is correct. With
>>>> my test dataset I've only seen "is_significant" values of "1" and undef
>>>> 'phenotype' values. I think I need a synthetic vcf with ClinVar annotation
>>>> variants to very that the plugin is working.
>>>>
>>>> I've been looking on Ensembl website for a test dataset. I think you
>>>> don't provide any right? Correct me if I'm wrong.
>>>>
>>>> Thanks!
>>>>
>>>> Regards,
>>>> Guillermo.
>>>>
>>>>
>>>> On 16/03/15 16:16, Will McLaren wrote:
>>>>
>>>> Hi Guillermo,
>>>>
>>>>  To get the rest of that data in the table you need to access the
>>>> additional attributes of the PhenotypeFeature object, something like:
>>>>
>>>>  my $attr = $pfs->[0]->get_all_attributes;
>>>>  print "$_:".$attr->{$_}."\t" for keys %$attr;
>>>> print "\n;
>>>>
>>>>  Regards
>>>>
>>>>  Will
>>>>
>>>>  More info: the reason these data are stored as attributes is due to
>>>> the diverse data sources and types that we import into our phenotype
>>>> schema; to create a database column and corresponding API method for each
>>>> data type (p-value, review status, risk allele, external ID etc etc) would
>>>> be cumbersome and inefficient. To this end we provide a few methods that
>>>> shortcut the attribute approach for the most common data types; everything
>>>> else must be accessed through the attributes method. This is a common theme
>>>> across the Ensembl API.
>>>>
>>>> On 13 March 2015 at 12:03, Guillermo Marco Puche <
>>>> guillermo.marco at sistemasgenomicos.com> wrote:
>>>>
>>>>>  Hi,
>>>>>
>>>>> I'm trying to retrieve ClinVar information with the code example you
>>>>> provided.
>>>>>
>>>>>     my $self = shift;
>>>>>     my $tva = shift;
>>>>>     my $vf = $tva->variation_feature;
>>>>>     my $pfa =
>>>>> $self->{config}->{reg}->get_adaptor('human','variation','phenotypefeature');
>>>>>
>>>>>     foreach my $known_var(@{$vf->{existing} || []}) {
>>>>>         foreach my
>>>>> $pf(@{$pfa->fetch_all_by_object_id($known_var->{variation_name})}) {
>>>>>              if ($pf->{'source'} eq "dbSNP_ClinVar"){
>>>>>                 print
>>>>> "$pf->{'source'}\t$pf->{'external_id'}\t$pf->{'is_significant'}\t$pf->{'phenotype'}\n",
>>>>> ;
>>>>>             }
>>>>>         }
>>>>>     }
>>>>>
>>>>> As you can see I'm "filtering" the results to only output phenotype
>>>>> feature when source is dbSNP_ClinVar. I don't know why but I guess
>>>>> filtering should be done when doing the "fetch_all".
>>>>>
>>>>> On the other hand I'm trying to retrieve Disease, Source and Clinical
>>>>> Significance from this example table:
>>>>> http://www.ensembl.org/Homo_sapiens/Variation/Phenotype?db=core;r=8:19955518-19956518;v=rs268;vdb=variation;vf=266
>>>>>
>>>>> I think I'm doing something wrong I got totally lost in
>>>>> Phenotypefeature.
>>>>>
>>>>> Regards,
>>>>> Guillermo.
>>>>>
>>>>>
>>>>> On 02/03/15 16:05, Will McLaren wrote:
>>>>>
>>>>> If you enable the --check_existing flag when you run the VEP, you'll
>>>>> be able to see any known co-located variants attached to the
>>>>> VariationFeature object in your plugin:
>>>>>
>>>>>  sub run {
>>>>>   my $self = shift;
>>>>>   my $tva = shift;
>>>>>   my $vf = $tva->variation_feature;
>>>>>
>>>>>    foreach my $known_var(@{$vf->{existing} || []}) {
>>>>>      # do stuff
>>>>>   }
>>>>> }
>>>>>
>>>>>  The $known_var is not an API object but a simple hashref with a
>>>>> number of fields; you're probably interested in $known_var->{clin_sig}
>>>>>
>>>>>  However, as I mentioned, this is the only data that is stored in the
>>>>> cache. To access the rating and the specific disease association, you'll
>>>>> need to make calls to the database by getting an adaptor, something like:
>>>>>
>>>>>  sub run {
>>>>>   my $self = shift;
>>>>>   my $tva = shift;
>>>>>   my $vf = $tva->variation_feature;
>>>>>   my $pfa =
>>>>> $self->{config}->{reg}->get_adaptor('human','variation','phenotypefeature');
>>>>>
>>>>>    foreach my $known_var(@{$vf->{existing} || []}) {
>>>>>      foreach my
>>>>> $pf(@{$pfa->fetch_all_by_object_id($known_var->{variation_name})}) {
>>>>>        # do stuff
>>>>>      }
>>>>>   }
>>>>> }
>>>>>
>>>>>  Be aware that this will access the database, so unless you have a
>>>>> local copy please don't run this sort of code on genome-wide VCFs using our
>>>>> public DB server.
>>>>>
>>>>>  Regards
>>>>>
>>>>>  Will
>>>>>
>>>>> On 2 March 2015 at 14:47, Guillermo Marco Puche <
>>>>> guillermo.marco at sistemasgenomicos.com> wrote:
>>>>>
>>>>>>  Hi Will,
>>>>>>
>>>>>> Indeed I'm looking to retrieve this information from VEP plugin.
>>>>>>
>>>>>> Regards,
>>>>>> Guillermo.
>>>>>>
>>>>>>
>>>>>> On 02/03/15 15:25, Will McLaren wrote:
>>>>>>
>>>>>> Hi Guillermo,
>>>>>>
>>>>>>  The detailed ClinVar information is stored against PhenotypeFeature
>>>>>> objects (each SNP/disease pairing gets its own entry in ClinVar, e.g.
>>>>>> http://www.ncbi.nlm.nih.gov/clinvar/RCV000019691.2,
>>>>>> http://www.ncbi.nlm.nih.gov/clinvar/RCV000019692.2/,
>>>>>> http://www.ncbi.nlm.nih.gov/clinvar/RCV000019693.2/ for rs699).
>>>>>>
>>>>>>  The rating (and indeed the clinical significance) is stored as an
>>>>>> attribute on the PhenotypeFeature object; you can retrieve this with the
>>>>>> get_all_attributes() method.
>>>>>>
>>>>>>  See
>>>>>> http://www.ensembl.org/info/docs/Doxygen/variation-api/classBio_1_1EnsEMBL_1_1Variation_1_1PhenotypeFeature.html
>>>>>> and
>>>>>> http://www.ensembl.org/info/docs/api/variation/variation_tutorial.html#phenotype
>>>>>> for more info.
>>>>>>
>>>>>>  Bio::EnsEMBL::Variation::Utils::VEP::get_clin_sig() is an internal
>>>>>> method that you should not use.
>>>>>>
>>>>>>  The VEP cache contains the list of clinical significance states for
>>>>>> each variant, but neither the disease association or the rating. If you
>>>>>> want help getting access to this data via a plugin, let me know as it's a
>>>>>> little more involved than the API methods above (though it is faster as no
>>>>>> database access is required).
>>>>>>
>>>>>>  Regards
>>>>>>
>>>>>>  Will McLaren
>>>>>> Ensembl Variation
>>>>>>
>>>>>> On 2 March 2015 at 14:06, Guillermo Marco Puche <
>>>>>> guillermo.marco at sistemasgenomicos.com> wrote:
>>>>>>
>>>>>>>  Dear devs,
>>>>>>>
>>>>>>> I'm looking forward to retrieve ClinVar information and add it to
>>>>>>> VEP annotation. From my understanding I should be able to retrieve
>>>>>>> "Clinical significance" and "ClinVar Rating".
>>>>>>>
>>>>>>> I've been looking the Varation API, and I'm confused. I guess for
>>>>>>> significance I should use
>>>>>>> Bio::EnsEMBL::Variation::Utils::VEP::get_clin_sig() or
>>>>>>> Bio::EnsEMBL::Variation::VariationFeature::get_all_clinical_significance_states().
>>>>>>>
>>>>>>> What about ClinVar rating? Is it possible to retrieve it from API?
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>> Regards,
>>>>>>> Guillermo.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Dev mailing list    Dev at ensembl.org
>>>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>> _______________________________________________
>>>> Dev mailing list    Dev at ensembl.org
>>>> Posting guidelines and subscribe/unsubscribe info:
>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>> Ensembl Blog: http://www.ensembl.info/
>>>>
>>>>
>>>
>>>
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>>
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info:
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>>
>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150326/37d129a7/attachment.html>


More information about the Dev mailing list