[ensembl-dev] VEP ClinVar information
Guillermo Marco Puche
guillermo.marco at sistemasgenomicos.com
Mon Mar 16 16:39:03 GMT 2015
Hi Will,
Thank you for your quick response! Very clarifying.
I guess that the way to retrieve ClinVar data I posted is correct. With
my test dataset I've only seen "is_significant" values of "1" and undef
'phenotype' values. I think I need a synthetic vcf with ClinVar
annotation variants to very that the plugin is working.
I've been looking on Ensembl website for a test dataset. I think you
don't provide any right? Correct me if I'm wrong.
Thanks!
Regards,
Guillermo.
On 16/03/15 16:16, Will McLaren wrote:
> Hi Guillermo,
>
> To get the rest of that data in the table you need to access the
> additional attributes of the PhenotypeFeature object, something like:
>
> my $attr = $pfs->[0]->get_all_attributes;
> print "$_:".$attr->{$_}."\t" for keys %$attr;
> print "\n;
>
> Regards
>
> Will
>
> More info: the reason these data are stored as attributes is due to
> the diverse data sources and types that we import into our phenotype
> schema; to create a database column and corresponding API method for
> each data type (p-value, review status, risk allele, external ID etc
> etc) would be cumbersome and inefficient. To this end we provide a few
> methods that shortcut the attribute approach for the most common data
> types; everything else must be accessed through the attributes method.
> This is a common theme across the Ensembl API.
>
> On 13 March 2015 at 12:03, Guillermo Marco Puche
> <guillermo.marco at sistemasgenomicos.com
> <mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
>
> Hi,
>
> I'm trying to retrieve ClinVar information with the code example
> you provided.
>
> my $self = shift;
> my $tva = shift;
> my $vf = $tva->variation_feature;
> my $pfa =
> $self->{config}->{reg}->get_adaptor('human','variation','phenotypefeature');
>
> foreach my $known_var(@{$vf->{existing} || []}) {
> foreach my
> $pf(@{$pfa->fetch_all_by_object_id($known_var->{variation_name})}) {
> if ($pf->{'source'} eq "dbSNP_ClinVar"){
> print
> "$pf->{'source'}\t$pf->{'external_id'}\t$pf->{'is_significant'}\t$pf->{'phenotype'}\n",
> ;
> }
> }
> }
>
> As you can see I'm "filtering" the results to only output
> phenotype feature when source is dbSNP_ClinVar. I don't know why
> but I guess filtering should be done when doing the "fetch_all".
>
> On the other hand I'm trying to retrieve Disease, Source and
> Clinical Significance from this example table:
> http://www.ensembl.org/Homo_sapiens/Variation/Phenotype?db=core;r=8:19955518-19956518;v=rs268;vdb=variation;vf=266
>
> I think I'm doing something wrong I got totally lost in
> Phenotypefeature.
>
> Regards,
> Guillermo.
>
>
> On 02/03/15 16:05, Will McLaren wrote:
>> If you enable the --check_existing flag when you run the VEP,
>> you'll be able to see any known co-located variants attached to
>> the VariationFeature object in your plugin:
>>
>> sub run {
>> my $self = shift;
>> my $tva = shift;
>> my $vf = $tva->variation_feature;
>>
>> foreach my $known_var(@{$vf->{existing} || []}) {
>> # do stuff
>> }
>> }
>>
>> The $known_var is not an API object but a simple hashref with a
>> number of fields; you're probably interested in
>> $known_var->{clin_sig}
>>
>> However, as I mentioned, this is the only data that is stored in
>> the cache. To access the rating and the specific disease
>> association, you'll need to make calls to the database by getting
>> an adaptor, something like:
>>
>> sub run {
>> my $self = shift;
>> my $tva = shift;
>> my $vf = $tva->variation_feature;
>> my $pfa =
>> $self->{config}->{reg}->get_adaptor('human','variation','phenotypefeature');
>>
>> foreach my $known_var(@{$vf->{existing} || []}) {
>> foreach my
>> $pf(@{$pfa->fetch_all_by_object_id($known_var->{variation_name})}) {
>> # do stuff
>> }
>> }
>> }
>>
>> Be aware that this will access the database, so unless you have a
>> local copy please don't run this sort of code on genome-wide VCFs
>> using our public DB server.
>>
>> Regards
>>
>> Will
>>
>> On 2 March 2015 at 14:47, Guillermo Marco Puche
>> <guillermo.marco at sistemasgenomicos.com
>> <mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
>>
>> Hi Will,
>>
>> Indeed I'm looking to retrieve this information from VEP plugin.
>>
>> Regards,
>> Guillermo.
>>
>>
>> On 02/03/15 15:25, Will McLaren wrote:
>>> Hi Guillermo,
>>>
>>> The detailed ClinVar information is stored against
>>> PhenotypeFeature objects (each SNP/disease pairing gets its
>>> own entry in ClinVar, e.g.
>>> http://www.ncbi.nlm.nih.gov/clinvar/RCV000019691.2,
>>> http://www.ncbi.nlm.nih.gov/clinvar/RCV000019692.2/,
>>> http://www.ncbi.nlm.nih.gov/clinvar/RCV000019693.2/ for rs699).
>>>
>>> The rating (and indeed the clinical significance) is stored
>>> as an attribute on the PhenotypeFeature object; you can
>>> retrieve this with the get_all_attributes() method.
>>>
>>> See
>>> http://www.ensembl.org/info/docs/Doxygen/variation-api/classBio_1_1EnsEMBL_1_1Variation_1_1PhenotypeFeature.html
>>> and
>>> http://www.ensembl.org/info/docs/api/variation/variation_tutorial.html#phenotype
>>> for more info.
>>>
>>> Bio::EnsEMBL::Variation::Utils::VEP::get_clin_sig() is an
>>> internal method that you should not use.
>>>
>>> The VEP cache contains the list of clinical significance
>>> states for each variant, but neither the disease association
>>> or the rating. If you want help getting access to this data
>>> via a plugin, let me know as it's a little more involved
>>> than the API methods above (though it is faster as no
>>> database access is required).
>>>
>>> Regards
>>>
>>> Will McLaren
>>> Ensembl Variation
>>>
>>> On 2 March 2015 at 14:06, Guillermo Marco Puche
>>> <guillermo.marco at sistemasgenomicos.com
>>> <mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
>>>
>>> Dear devs,
>>>
>>> I'm looking forward to retrieve ClinVar information and
>>> add it to VEP annotation. From my understanding I should
>>> be able to retrieve "Clinical significance" and "ClinVar
>>> Rating".
>>>
>>> I've been looking the Varation API, and I'm confused. I
>>> guess for significance I should use
>>> Bio::EnsEMBL::Variation::Utils::VEP::get_clin_sig() or
>>> Bio::EnsEMBL::Variation::VariationFeature::get_all_clinical_significance_states().
>>>
>>> What about ClinVar rating? Is it possible to retrieve it
>>> from API?
>>>
>>> Thanks!
>>>
>>> Regards,
>>> Guillermo.
>>>
>>>
>>>
>>> _______________________________________________
>>> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>>> Posting guidelines and subscribe/unsubscribe info:
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150316/03395b5a/attachment.html>
More information about the Dev
mailing list