[ensembl-dev] VEP ClinVar information

Will McLaren wm2 at ebi.ac.uk
Thu Mar 26 10:53:02 GMT 2015


Example output (I set $Data::Dumper::Maxdepth = 1;):

> echo "rs699" | perl -I ~/Git/guillermo/vep/ variant_effect_predictor.pl
-plugin Clinvar -data -force -db 75
2015-03-26 10:51:38 - Reading input from STDIN (or maybe you forgot to
specify an input file?)...
2015-03-26 10:51:38 - Starting...
2015-03-26 10:51:38 - Detected format of input file as id
2015-03-26 10:51:38 - Read 1 variants into buffer
2015-03-26 10:51:38 - Checking for existing variations
[===================================================================================================================================================================================================]
 [ 100% ]
2015-03-26 10:51:38 - Reading transcript data from cache and/or database
[===================================================================================================================================================================================================]
 [ 100% ]
2015-03-26 10:51:38 - Retrieved 4 transcripts (0 mem, 0 cached, 4 DB, 0
duplicates)
2015-03-26 10:51:38 - Analyzing chromosome 1
2015-03-26 10:51:38 - Analyzing variants
[===================================================================================================================================================================================================]
 [ 100% ]
2015-03-26 10:51:38 - Calculating consequences
$VAR1 = bless( {
                 'adaptor' =>
'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
                 'dbID' => '2853',
                 'name' => undef,
                 'description' => 'HYPERTENSION, ESSENTIAL, SUSCEPTIBILITY
TO'
               }, 'Bio::EnsEMBL::Variation::Phenotype' );
$VAR1 = bless( {
                 'adaptor' =>
'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
                 'dbID' => '20384',
                 'name' => undef,
                 'description' =>
'Susceptibility_to_progression_to_renal_failure_in_IgA_nephropathy'
               }, 'Bio::EnsEMBL::Variation::Phenotype' );
$VAR1 = bless( {
                 'adaptor' =>
'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
                 'dbID' => '20369',
                 'name' => undef,
                 'description' => 'Preeclampsia,_susceptibility_to'
               }, 'Bio::EnsEMBL::Variation::Phenotype' );
$VAR1 = bless( {
                 'adaptor' =>
'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
                 'dbID' => '26451',
                 'name' => undef,
                 'description' =>
'HYPERTENSION,_ESSENTIAL,_SUSCEPTIBILITY_TO'
               }, 'Bio::EnsEMBL::Variation::Phenotype' );
$VAR1 = bless( {
                 'adaptor' =>
'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
                 'dbID' => '1',
                 'name' => 'HGMD_MUTATION',
                 'description' => 'Annotated by HGMD but no phenotype
description is publicly available'
               }, 'Bio::EnsEMBL::Variation::Phenotype' );
$VAR1 = bless( {
                 'adaptor' =>
'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
                 'dbID' => '6522',
                 'name' => undef,
                 'description' => 'COSMIC:tumour_site:large_intestine'
               }, 'Bio::EnsEMBL::Variation::Phenotype' );
$VAR1 = bless( {
                 'adaptor' =>
'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
                 'dbID' => '6529',
                 'name' => undef,
                 'description' => 'COSMIC:tumour_site:breast'
               }, 'Bio::EnsEMBL::Variation::Phenotype' );
$VAR1 = bless( {
                 'adaptor' =>
'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
                 'dbID' => '2853',
                 'name' => undef,
                 'description' => 'HYPERTENSION, ESSENTIAL, SUSCEPTIBILITY
TO'
               }, 'Bio::EnsEMBL::Variation::Phenotype' );
$VAR1 = bless( {
                 'adaptor' =>
'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
                 'dbID' => '20384',
                 'name' => undef,
                 'description' =>
'Susceptibility_to_progression_to_renal_failure_in_IgA_nephropathy'
               }, 'Bio::EnsEMBL::Variation::Phenotype' );
$VAR1 = bless( {
                 'adaptor' =>
'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
                 'dbID' => '20369',
                 'name' => undef,
                 'description' => 'Preeclampsia,_susceptibility_to'
               }, 'Bio::EnsEMBL::Variation::Phenotype' );
$VAR1 = bless( {
                 'adaptor' =>
'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
                 'dbID' => '26451',
                 'name' => undef,
                 'description' =>
'HYPERTENSION,_ESSENTIAL,_SUSCEPTIBILITY_TO'
               }, 'Bio::EnsEMBL::Variation::Phenotype' );
$VAR1 = bless( {
                 'adaptor' =>
'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
                 'dbID' => '1',
                 'name' => 'HGMD_MUTATION',
                 'description' => 'Annotated by HGMD but no phenotype
description is publicly available'
               }, 'Bio::EnsEMBL::Variation::Phenotype' );
$VAR1 = bless( {
                 'adaptor' =>
'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
                 'dbID' => '6522',
                 'name' => undef,
                 'description' => 'COSMIC:tumour_site:large_intestine'
               }, 'Bio::EnsEMBL::Variation::Phenotype' );
$VAR1 = bless( {
                 'adaptor' =>
'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
                 'dbID' => '6529',
                 'name' => undef,
                 'description' => 'COSMIC:tumour_site:breast'
               }, 'Bio::EnsEMBL::Variation::Phenotype' );
2015-03-26 10:51:38 - Processed 1 total variants (1 vars/sec, 1 vars/sec
total)
2015-03-26 10:51:38 - Wrote stats summary to
variant_effect_output.txt_summary.html
2015-03-26 10:51:38 - Finished!

On 26 March 2015 at 10:43, Guillermo Marco Puche <
guillermo.marco at sistemasgenomicos.com> wrote:

>  Hello Will,
>
> I already had enabled "check_existing" on my VEP config template, however
> I followed your advice and updated code to force in the new() method with
> your code.
> I'm still getting no prints of line 64:
>
> print Dumper($pf->phenotype());
>
> Are you getting any output printed? As I said I get no errors but nothing
> is printed neither. This data dumper should be printing result of
> phenotype() method call.
>
> Regards,
> Guillermo.
>
>
>
> On 26/03/15 11:05, Will McLaren wrote:
>
> I think perhaps you haven't enabled --check_existing; this is required for
> $vf->{existing} to get populated.
>
>  You can force it on in the new() method of your plugin:
>
>  $self->{config}->{check_existing} = 1;
>
>  It then works for me on release/75 and release/79.
>
>  Will
>
> On 25 March 2015 at 17:35, Guillermo Marco Puche <
> guillermo.marco at sistemasgenomicos.com> wrote:
>
>>  Hello Will,
>>
>> With your explanations I'm trying to call phenotype (as you said I was
>> accessing the hashref directly).
>> I'm using input set you linked. However my local Ensembl installation is
>> v75.
>>
>> This is the code of the plugin:
>> https://github.com/guillermomarco/vep/blob/master/Clinvar.pm
>>
>> I'm getting absolutelty no info nor errors. I've no idea if this is an
>> issue with my database/API version or with the plugin code itself.
>>
>> Regards,
>> Guillermo.
>>
>>
>>
>> On 16/03/15 17:50, Will McLaren wrote:
>>
>> The "is_significant" field is an internal flag that doesn't necessarily
>> have the meaning you expect; it is used to distinguish between genuine
>> reported associations and e.g. non-significant associations reported from
>> genome-wide studies.
>>
>>  You should not see undef for phenotype; I suspect you are accessing the
>> hashref directly ($pf->{phenotype}) rather than making the method call
>> ($pf->phenotype()).
>>
>>  You could try
>> ftp://ftp.ensembl.org/pub/release-79/variation/vcf/homo_sapiens/Homo_sapiens_clinically_associated.vcf.gz
>> as a test input set.
>>
>>  Will
>>
>> On 16 March 2015 at 16:39, Guillermo Marco Puche <
>> guillermo.marco at sistemasgenomicos.com> wrote:
>>
>>>  Hi Will,
>>>
>>> Thank you for your quick response! Very clarifying.
>>>
>>> I guess that the way to retrieve ClinVar data I posted is correct. With
>>> my test dataset I've only seen "is_significant" values of "1" and undef
>>> 'phenotype' values. I think I need a synthetic vcf with ClinVar annotation
>>> variants to very that the plugin is working.
>>>
>>> I've been looking on Ensembl website for a test dataset. I think you
>>> don't provide any right? Correct me if I'm wrong.
>>>
>>> Thanks!
>>>
>>> Regards,
>>> Guillermo.
>>>
>>>
>>> On 16/03/15 16:16, Will McLaren wrote:
>>>
>>> Hi Guillermo,
>>>
>>>  To get the rest of that data in the table you need to access the
>>> additional attributes of the PhenotypeFeature object, something like:
>>>
>>>  my $attr = $pfs->[0]->get_all_attributes;
>>>  print "$_:".$attr->{$_}."\t" for keys %$attr;
>>> print "\n;
>>>
>>>  Regards
>>>
>>>  Will
>>>
>>>  More info: the reason these data are stored as attributes is due to
>>> the diverse data sources and types that we import into our phenotype
>>> schema; to create a database column and corresponding API method for each
>>> data type (p-value, review status, risk allele, external ID etc etc) would
>>> be cumbersome and inefficient. To this end we provide a few methods that
>>> shortcut the attribute approach for the most common data types; everything
>>> else must be accessed through the attributes method. This is a common theme
>>> across the Ensembl API.
>>>
>>> On 13 March 2015 at 12:03, Guillermo Marco Puche <
>>> guillermo.marco at sistemasgenomicos.com> wrote:
>>>
>>>>  Hi,
>>>>
>>>> I'm trying to retrieve ClinVar information with the code example you
>>>> provided.
>>>>
>>>>     my $self = shift;
>>>>     my $tva = shift;
>>>>     my $vf = $tva->variation_feature;
>>>>     my $pfa =
>>>> $self->{config}->{reg}->get_adaptor('human','variation','phenotypefeature');
>>>>
>>>>     foreach my $known_var(@{$vf->{existing} || []}) {
>>>>         foreach my
>>>> $pf(@{$pfa->fetch_all_by_object_id($known_var->{variation_name})}) {
>>>>              if ($pf->{'source'} eq "dbSNP_ClinVar"){
>>>>                 print
>>>> "$pf->{'source'}\t$pf->{'external_id'}\t$pf->{'is_significant'}\t$pf->{'phenotype'}\n",
>>>> ;
>>>>             }
>>>>         }
>>>>     }
>>>>
>>>> As you can see I'm "filtering" the results to only output phenotype
>>>> feature when source is dbSNP_ClinVar. I don't know why but I guess
>>>> filtering should be done when doing the "fetch_all".
>>>>
>>>> On the other hand I'm trying to retrieve Disease, Source and Clinical
>>>> Significance from this example table:
>>>> http://www.ensembl.org/Homo_sapiens/Variation/Phenotype?db=core;r=8:19955518-19956518;v=rs268;vdb=variation;vf=266
>>>>
>>>> I think I'm doing something wrong I got totally lost in
>>>> Phenotypefeature.
>>>>
>>>> Regards,
>>>> Guillermo.
>>>>
>>>>
>>>> On 02/03/15 16:05, Will McLaren wrote:
>>>>
>>>> If you enable the --check_existing flag when you run the VEP, you'll be
>>>> able to see any known co-located variants attached to the VariationFeature
>>>> object in your plugin:
>>>>
>>>>  sub run {
>>>>   my $self = shift;
>>>>   my $tva = shift;
>>>>   my $vf = $tva->variation_feature;
>>>>
>>>>    foreach my $known_var(@{$vf->{existing} || []}) {
>>>>      # do stuff
>>>>   }
>>>> }
>>>>
>>>>  The $known_var is not an API object but a simple hashref with a
>>>> number of fields; you're probably interested in $known_var->{clin_sig}
>>>>
>>>>  However, as I mentioned, this is the only data that is stored in the
>>>> cache. To access the rating and the specific disease association, you'll
>>>> need to make calls to the database by getting an adaptor, something like:
>>>>
>>>>  sub run {
>>>>   my $self = shift;
>>>>   my $tva = shift;
>>>>   my $vf = $tva->variation_feature;
>>>>   my $pfa =
>>>> $self->{config}->{reg}->get_adaptor('human','variation','phenotypefeature');
>>>>
>>>>    foreach my $known_var(@{$vf->{existing} || []}) {
>>>>      foreach my
>>>> $pf(@{$pfa->fetch_all_by_object_id($known_var->{variation_name})}) {
>>>>        # do stuff
>>>>      }
>>>>   }
>>>> }
>>>>
>>>>  Be aware that this will access the database, so unless you have a
>>>> local copy please don't run this sort of code on genome-wide VCFs using our
>>>> public DB server.
>>>>
>>>>  Regards
>>>>
>>>>  Will
>>>>
>>>> On 2 March 2015 at 14:47, Guillermo Marco Puche <
>>>> guillermo.marco at sistemasgenomicos.com> wrote:
>>>>
>>>>>  Hi Will,
>>>>>
>>>>> Indeed I'm looking to retrieve this information from VEP plugin.
>>>>>
>>>>> Regards,
>>>>> Guillermo.
>>>>>
>>>>>
>>>>> On 02/03/15 15:25, Will McLaren wrote:
>>>>>
>>>>> Hi Guillermo,
>>>>>
>>>>>  The detailed ClinVar information is stored against PhenotypeFeature
>>>>> objects (each SNP/disease pairing gets its own entry in ClinVar, e.g.
>>>>> http://www.ncbi.nlm.nih.gov/clinvar/RCV000019691.2,
>>>>> http://www.ncbi.nlm.nih.gov/clinvar/RCV000019692.2/,
>>>>> http://www.ncbi.nlm.nih.gov/clinvar/RCV000019693.2/ for rs699).
>>>>>
>>>>>  The rating (and indeed the clinical significance) is stored as an
>>>>> attribute on the PhenotypeFeature object; you can retrieve this with the
>>>>> get_all_attributes() method.
>>>>>
>>>>>  See
>>>>> http://www.ensembl.org/info/docs/Doxygen/variation-api/classBio_1_1EnsEMBL_1_1Variation_1_1PhenotypeFeature.html
>>>>> and
>>>>> http://www.ensembl.org/info/docs/api/variation/variation_tutorial.html#phenotype
>>>>> for more info.
>>>>>
>>>>>  Bio::EnsEMBL::Variation::Utils::VEP::get_clin_sig() is an internal
>>>>> method that you should not use.
>>>>>
>>>>>  The VEP cache contains the list of clinical significance states for
>>>>> each variant, but neither the disease association or the rating. If you
>>>>> want help getting access to this data via a plugin, let me know as it's a
>>>>> little more involved than the API methods above (though it is faster as no
>>>>> database access is required).
>>>>>
>>>>>  Regards
>>>>>
>>>>>  Will McLaren
>>>>> Ensembl Variation
>>>>>
>>>>> On 2 March 2015 at 14:06, Guillermo Marco Puche <
>>>>> guillermo.marco at sistemasgenomicos.com> wrote:
>>>>>
>>>>>>  Dear devs,
>>>>>>
>>>>>> I'm looking forward to retrieve ClinVar information and add it to VEP
>>>>>> annotation. From my understanding I should be able to retrieve "Clinical
>>>>>> significance" and "ClinVar Rating".
>>>>>>
>>>>>> I've been looking the Varation API, and I'm confused. I guess for
>>>>>> significance I should use
>>>>>> Bio::EnsEMBL::Variation::Utils::VEP::get_clin_sig() or
>>>>>> Bio::EnsEMBL::Variation::VariationFeature::get_all_clinical_significance_states().
>>>>>>
>>>>>> What about ClinVar rating? Is it possible to retrieve it from API?
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> Regards,
>>>>>> Guillermo.
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Dev mailing list    Dev at ensembl.org
>>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>>
>>>>>>
>>>>>
>>>>>
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info:
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>>
>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150326/6201cfc1/attachment.html>


More information about the Dev mailing list