[ensembl-dev] VEP ClinVar information

Guillermo Marco Puche guillermo.marco at sistemasgenomicos.com
Wed Mar 25 17:35:51 GMT 2015


Hello Will,

With your explanations I'm trying to call phenotype (as you said I was 
accessing the hashref directly).
I'm using input set you linked. However my local Ensembl installation is 
v75.

This is the code of the plugin:
https://github.com/guillermomarco/vep/blob/master/Clinvar.pm

I'm getting absolutelty no info nor errors. I've no idea if this is an 
issue with my database/API version or with the plugin code itself.

Regards,
Guillermo.


On 16/03/15 17:50, Will McLaren wrote:
> The "is_significant" field is an internal flag that doesn't 
> necessarily have the meaning you expect; it is used to distinguish 
> between genuine reported associations and e.g. non-significant 
> associations reported from genome-wide studies.
>
> You should not see undef for phenotype; I suspect you are accessing 
> the hashref directly ($pf->{phenotype}) rather than making the method 
> call ($pf->phenotype()).
>
> You could try 
> ftp://ftp.ensembl.org/pub/release-79/variation/vcf/homo_sapiens/Homo_sapiens_clinically_associated.vcf.gz 
> as a test input set.
>
> Will
>
> On 16 March 2015 at 16:39, Guillermo Marco Puche 
> <guillermo.marco at sistemasgenomicos.com 
> <mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
>
>     Hi Will,
>
>     Thank you for your quick response! Very clarifying.
>
>     I guess that the way to retrieve ClinVar data I posted is correct.
>     With my test dataset I've only seen "is_significant" values of "1"
>     and undef 'phenotype' values. I think I need a synthetic vcf with
>     ClinVar annotation variants to very that the plugin is working.
>
>     I've been looking on Ensembl website for a test dataset. I think
>     you don't provide any right? Correct me if I'm wrong.
>
>     Thanks!
>
>     Regards,
>     Guillermo.
>
>
>     On 16/03/15 16:16, Will McLaren wrote:
>>     Hi Guillermo,
>>
>>     To get the rest of that data in the table you need to access the
>>     additional attributes of the PhenotypeFeature object, something like:
>>
>>     my $attr = $pfs->[0]->get_all_attributes;
>>     print "$_:".$attr->{$_}."\t" for keys %$attr;
>>     print "\n;
>>
>>     Regards
>>
>>     Will
>>
>>     More info: the reason these data are stored as attributes is due
>>     to the diverse data sources and types that we import into our
>>     phenotype schema; to create a database column and corresponding
>>     API method for each data type (p-value, review status, risk
>>     allele, external ID etc etc) would be cumbersome and inefficient.
>>     To this end we provide a few methods that shortcut the attribute
>>     approach for the most common data types; everything else must be
>>     accessed through the attributes method. This is a common theme
>>     across the Ensembl API.
>>
>>     On 13 March 2015 at 12:03, Guillermo Marco Puche
>>     <guillermo.marco at sistemasgenomicos.com
>>     <mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
>>
>>         Hi,
>>
>>         I'm trying to retrieve ClinVar information with the code
>>         example you provided.
>>
>>             my $self = shift;
>>             my $tva = shift;
>>             my $vf = $tva->variation_feature;
>>             my $pfa =
>>         $self->{config}->{reg}->get_adaptor('human','variation','phenotypefeature');
>>
>>             foreach my $known_var(@{$vf->{existing} || []}) {
>>                 foreach my
>>         $pf(@{$pfa->fetch_all_by_object_id($known_var->{variation_name})})
>>         {
>>                     if ($pf->{'source'} eq "dbSNP_ClinVar"){
>>                         print
>>         "$pf->{'source'}\t$pf->{'external_id'}\t$pf->{'is_significant'}\t$pf->{'phenotype'}\n",
>>         ;
>>                     }
>>                 }
>>             }
>>
>>         As you can see I'm "filtering" the results to only output
>>         phenotype feature when source is dbSNP_ClinVar. I don't know
>>         why but I guess filtering should be done when doing the
>>         "fetch_all".
>>
>>         On the other hand I'm trying to retrieve Disease, Source and
>>         Clinical Significance from this example table:
>>         http://www.ensembl.org/Homo_sapiens/Variation/Phenotype?db=core;r=8:19955518-19956518;v=rs268;vdb=variation;vf=266
>>
>>         I think I'm doing something wrong I got totally lost in
>>         Phenotypefeature.
>>
>>         Regards,
>>         Guillermo.
>>
>>
>>         On 02/03/15 16:05, Will McLaren wrote:
>>>         If you enable the --check_existing flag when you run the
>>>         VEP, you'll be able to see any known co-located variants
>>>         attached to the VariationFeature object in your plugin:
>>>
>>>         sub run {
>>>           my $self = shift;
>>>           my $tva = shift;
>>>           my $vf = $tva->variation_feature;
>>>
>>>           foreach my $known_var(@{$vf->{existing} || []}) {
>>>              # do stuff
>>>           }
>>>         }
>>>
>>>         The $known_var is not an API object but a simple hashref
>>>         with a number of fields; you're probably interested in
>>>         $known_var->{clin_sig}
>>>
>>>         However, as I mentioned, this is the only data that is
>>>         stored in the cache. To access the rating and the specific
>>>         disease association, you'll need to make calls to the
>>>         database by getting an adaptor, something like:
>>>
>>>         sub run {
>>>           my $self = shift;
>>>           my $tva = shift;
>>>           my $vf = $tva->variation_feature;
>>>           my $pfa =
>>>         $self->{config}->{reg}->get_adaptor('human','variation','phenotypefeature');
>>>
>>>           foreach my $known_var(@{$vf->{existing} || []}) {
>>>              foreach my
>>>         $pf(@{$pfa->fetch_all_by_object_id($known_var->{variation_name})})
>>>         {
>>>                # do stuff
>>>              }
>>>           }
>>>         }
>>>
>>>         Be aware that this will access the database, so unless you
>>>         have a local copy please don't run this sort of code on
>>>         genome-wide VCFs using our public DB server.
>>>
>>>         Regards
>>>
>>>         Will
>>>
>>>         On 2 March 2015 at 14:47, Guillermo Marco Puche
>>>         <guillermo.marco at sistemasgenomicos.com
>>>         <mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
>>>
>>>             Hi Will,
>>>
>>>             Indeed I'm looking to retrieve this information from VEP
>>>             plugin.
>>>
>>>             Regards,
>>>             Guillermo.
>>>
>>>
>>>             On 02/03/15 15:25, Will McLaren wrote:
>>>>             Hi Guillermo,
>>>>
>>>>             The detailed ClinVar information is stored against
>>>>             PhenotypeFeature objects (each SNP/disease pairing gets
>>>>             its own entry in ClinVar, e.g.
>>>>             http://www.ncbi.nlm.nih.gov/clinvar/RCV000019691.2,
>>>>             http://www.ncbi.nlm.nih.gov/clinvar/RCV000019692.2/,
>>>>             http://www.ncbi.nlm.nih.gov/clinvar/RCV000019693.2/ for
>>>>             rs699).
>>>>
>>>>             The rating (and indeed the clinical significance) is
>>>>             stored as an attribute on the PhenotypeFeature object;
>>>>             you can retrieve this with the get_all_attributes() method.
>>>>
>>>>             See
>>>>             http://www.ensembl.org/info/docs/Doxygen/variation-api/classBio_1_1EnsEMBL_1_1Variation_1_1PhenotypeFeature.html
>>>>             and
>>>>             http://www.ensembl.org/info/docs/api/variation/variation_tutorial.html#phenotype
>>>>             for more info.
>>>>
>>>>             Bio::EnsEMBL::Variation::Utils::VEP::get_clin_sig() is
>>>>             an internal method that you should not use.
>>>>
>>>>             The VEP cache contains the list of clinical
>>>>             significance states for each variant, but neither the
>>>>             disease association or the rating. If you want help
>>>>             getting access to this data via a plugin, let me know
>>>>             as it's a little more involved than the API methods
>>>>             above (though it is faster as no database access is
>>>>             required).
>>>>
>>>>             Regards
>>>>
>>>>             Will McLaren
>>>>             Ensembl Variation
>>>>
>>>>             On 2 March 2015 at 14:06, Guillermo Marco Puche
>>>>             <guillermo.marco at sistemasgenomicos.com
>>>>             <mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
>>>>
>>>>                 Dear devs,
>>>>
>>>>                 I'm looking forward to retrieve ClinVar information
>>>>                 and add it to VEP annotation. From my understanding
>>>>                 I should be able to retrieve "Clinical
>>>>                 significance" and "ClinVar Rating".
>>>>
>>>>                 I've been looking the Varation API, and I'm
>>>>                 confused. I guess for significance I should use
>>>>                 Bio::EnsEMBL::Variation::Utils::VEP::get_clin_sig()
>>>>                 or
>>>>                 Bio::EnsEMBL::Variation::VariationFeature::get_all_clinical_significance_states().
>>>>
>>>>                 What about ClinVar rating? Is it possible to
>>>>                 retrieve it from API?
>>>>
>>>>                 Thanks!
>>>>
>>>>                 Regards,
>>>>                 Guillermo.
>>>>
>>>>
>>>>
>>>>                 _______________________________________________
>>>>                 Dev mailing list Dev at ensembl.org
>>>>                 <mailto:Dev at ensembl.org>
>>>>                 Posting guidelines and subscribe/unsubscribe info:
>>>>                 http://lists.ensembl.org/mailman/listinfo/dev
>>>>                 Ensembl Blog: http://www.ensembl.info/
>>>>
>>>>
>>>>
>
>     _______________________________________________
>     Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>     Posting guidelines and subscribe/unsubscribe info:
>     http://lists.ensembl.org/mailman/listinfo/dev
>     Ensembl Blog: http://www.ensembl.info/
>
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150325/6b13f408/attachment.html>


More information about the Dev mailing list