[ensembl-dev] VEP ClinVar information

Will McLaren wm2 at ebi.ac.uk
Thu Mar 26 10:05:32 GMT 2015


I think perhaps you haven't enabled --check_existing; this is required for
$vf->{existing} to get populated.

You can force it on in the new() method of your plugin:

$self->{config}->{check_existing} = 1;

It then works for me on release/75 and release/79.

Will

On 25 March 2015 at 17:35, Guillermo Marco Puche <
guillermo.marco at sistemasgenomicos.com> wrote:

>  Hello Will,
>
> With your explanations I'm trying to call phenotype (as you said I was
> accessing the hashref directly).
> I'm using input set you linked. However my local Ensembl installation is
> v75.
>
> This is the code of the plugin:
> https://github.com/guillermomarco/vep/blob/master/Clinvar.pm
>
> I'm getting absolutelty no info nor errors. I've no idea if this is an
> issue with my database/API version or with the plugin code itself.
>
> Regards,
> Guillermo.
>
>
>
> On 16/03/15 17:50, Will McLaren wrote:
>
> The "is_significant" field is an internal flag that doesn't necessarily
> have the meaning you expect; it is used to distinguish between genuine
> reported associations and e.g. non-significant associations reported from
> genome-wide studies.
>
>  You should not see undef for phenotype; I suspect you are accessing the
> hashref directly ($pf->{phenotype}) rather than making the method call
> ($pf->phenotype()).
>
>  You could try
> ftp://ftp.ensembl.org/pub/release-79/variation/vcf/homo_sapiens/Homo_sapiens_clinically_associated.vcf.gz
> as a test input set.
>
>  Will
>
> On 16 March 2015 at 16:39, Guillermo Marco Puche <
> guillermo.marco at sistemasgenomicos.com> wrote:
>
>>  Hi Will,
>>
>> Thank you for your quick response! Very clarifying.
>>
>> I guess that the way to retrieve ClinVar data I posted is correct. With
>> my test dataset I've only seen "is_significant" values of "1" and undef
>> 'phenotype' values. I think I need a synthetic vcf with ClinVar annotation
>> variants to very that the plugin is working.
>>
>> I've been looking on Ensembl website for a test dataset. I think you
>> don't provide any right? Correct me if I'm wrong.
>>
>> Thanks!
>>
>> Regards,
>> Guillermo.
>>
>>
>> On 16/03/15 16:16, Will McLaren wrote:
>>
>> Hi Guillermo,
>>
>>  To get the rest of that data in the table you need to access the
>> additional attributes of the PhenotypeFeature object, something like:
>>
>>  my $attr = $pfs->[0]->get_all_attributes;
>>  print "$_:".$attr->{$_}."\t" for keys %$attr;
>> print "\n;
>>
>>  Regards
>>
>>  Will
>>
>>  More info: the reason these data are stored as attributes is due to the
>> diverse data sources and types that we import into our phenotype schema; to
>> create a database column and corresponding API method for each data type
>> (p-value, review status, risk allele, external ID etc etc) would be
>> cumbersome and inefficient. To this end we provide a few methods that
>> shortcut the attribute approach for the most common data types; everything
>> else must be accessed through the attributes method. This is a common theme
>> across the Ensembl API.
>>
>> On 13 March 2015 at 12:03, Guillermo Marco Puche <
>> guillermo.marco at sistemasgenomicos.com> wrote:
>>
>>>  Hi,
>>>
>>> I'm trying to retrieve ClinVar information with the code example you
>>> provided.
>>>
>>>     my $self = shift;
>>>     my $tva = shift;
>>>     my $vf = $tva->variation_feature;
>>>     my $pfa =
>>> $self->{config}->{reg}->get_adaptor('human','variation','phenotypefeature');
>>>
>>>     foreach my $known_var(@{$vf->{existing} || []}) {
>>>         foreach my
>>> $pf(@{$pfa->fetch_all_by_object_id($known_var->{variation_name})}) {
>>>              if ($pf->{'source'} eq "dbSNP_ClinVar"){
>>>                 print
>>> "$pf->{'source'}\t$pf->{'external_id'}\t$pf->{'is_significant'}\t$pf->{'phenotype'}\n",
>>> ;
>>>             }
>>>         }
>>>     }
>>>
>>> As you can see I'm "filtering" the results to only output phenotype
>>> feature when source is dbSNP_ClinVar. I don't know why but I guess
>>> filtering should be done when doing the "fetch_all".
>>>
>>> On the other hand I'm trying to retrieve Disease, Source and Clinical
>>> Significance from this example table:
>>> http://www.ensembl.org/Homo_sapiens/Variation/Phenotype?db=core;r=8:19955518-19956518;v=rs268;vdb=variation;vf=266
>>>
>>> I think I'm doing something wrong I got totally lost in Phenotypefeature.
>>>
>>> Regards,
>>> Guillermo.
>>>
>>>
>>> On 02/03/15 16:05, Will McLaren wrote:
>>>
>>> If you enable the --check_existing flag when you run the VEP, you'll be
>>> able to see any known co-located variants attached to the VariationFeature
>>> object in your plugin:
>>>
>>>  sub run {
>>>   my $self = shift;
>>>   my $tva = shift;
>>>   my $vf = $tva->variation_feature;
>>>
>>>    foreach my $known_var(@{$vf->{existing} || []}) {
>>>      # do stuff
>>>   }
>>> }
>>>
>>>  The $known_var is not an API object but a simple hashref with a number
>>> of fields; you're probably interested in $known_var->{clin_sig}
>>>
>>>  However, as I mentioned, this is the only data that is stored in the
>>> cache. To access the rating and the specific disease association, you'll
>>> need to make calls to the database by getting an adaptor, something like:
>>>
>>>  sub run {
>>>   my $self = shift;
>>>   my $tva = shift;
>>>   my $vf = $tva->variation_feature;
>>>   my $pfa =
>>> $self->{config}->{reg}->get_adaptor('human','variation','phenotypefeature');
>>>
>>>    foreach my $known_var(@{$vf->{existing} || []}) {
>>>      foreach my
>>> $pf(@{$pfa->fetch_all_by_object_id($known_var->{variation_name})}) {
>>>        # do stuff
>>>      }
>>>   }
>>> }
>>>
>>>  Be aware that this will access the database, so unless you have a
>>> local copy please don't run this sort of code on genome-wide VCFs using our
>>> public DB server.
>>>
>>>  Regards
>>>
>>>  Will
>>>
>>> On 2 March 2015 at 14:47, Guillermo Marco Puche <
>>> guillermo.marco at sistemasgenomicos.com> wrote:
>>>
>>>>  Hi Will,
>>>>
>>>> Indeed I'm looking to retrieve this information from VEP plugin.
>>>>
>>>> Regards,
>>>> Guillermo.
>>>>
>>>>
>>>> On 02/03/15 15:25, Will McLaren wrote:
>>>>
>>>> Hi Guillermo,
>>>>
>>>>  The detailed ClinVar information is stored against PhenotypeFeature
>>>> objects (each SNP/disease pairing gets its own entry in ClinVar, e.g.
>>>> http://www.ncbi.nlm.nih.gov/clinvar/RCV000019691.2,
>>>> http://www.ncbi.nlm.nih.gov/clinvar/RCV000019692.2/,
>>>> http://www.ncbi.nlm.nih.gov/clinvar/RCV000019693.2/ for rs699).
>>>>
>>>>  The rating (and indeed the clinical significance) is stored as an
>>>> attribute on the PhenotypeFeature object; you can retrieve this with the
>>>> get_all_attributes() method.
>>>>
>>>>  See
>>>> http://www.ensembl.org/info/docs/Doxygen/variation-api/classBio_1_1EnsEMBL_1_1Variation_1_1PhenotypeFeature.html
>>>> and
>>>> http://www.ensembl.org/info/docs/api/variation/variation_tutorial.html#phenotype
>>>> for more info.
>>>>
>>>>  Bio::EnsEMBL::Variation::Utils::VEP::get_clin_sig() is an internal
>>>> method that you should not use.
>>>>
>>>>  The VEP cache contains the list of clinical significance states for
>>>> each variant, but neither the disease association or the rating. If you
>>>> want help getting access to this data via a plugin, let me know as it's a
>>>> little more involved than the API methods above (though it is faster as no
>>>> database access is required).
>>>>
>>>>  Regards
>>>>
>>>>  Will McLaren
>>>> Ensembl Variation
>>>>
>>>> On 2 March 2015 at 14:06, Guillermo Marco Puche <
>>>> guillermo.marco at sistemasgenomicos.com> wrote:
>>>>
>>>>>  Dear devs,
>>>>>
>>>>> I'm looking forward to retrieve ClinVar information and add it to VEP
>>>>> annotation. From my understanding I should be able to retrieve "Clinical
>>>>> significance" and "ClinVar Rating".
>>>>>
>>>>> I've been looking the Varation API, and I'm confused. I guess for
>>>>> significance I should use
>>>>> Bio::EnsEMBL::Variation::Utils::VEP::get_clin_sig() or
>>>>> Bio::EnsEMBL::Variation::VariationFeature::get_all_clinical_significance_states().
>>>>>
>>>>> What about ClinVar rating? Is it possible to retrieve it from API?
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Regards,
>>>>> Guillermo.
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Dev mailing list    Dev at ensembl.org
>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>
>>>>>
>>>>
>>>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150326/88a4665f/attachment.html>


More information about the Dev mailing list