[ensembl-dev] VEP ClinVar information

Guillermo Marco Puche guillermo.marco at sistemasgenomicos.com
Mon Mar 16 16:39:03 GMT 2015


Hi Will,

Thank you for your quick response! Very clarifying.

I guess that the way to retrieve ClinVar data I posted is correct. With 
my test dataset I've only seen "is_significant" values of "1" and undef 
'phenotype' values. I think I need a synthetic vcf with ClinVar 
annotation variants to very that the plugin is working.

I've been looking on Ensembl website for a test dataset. I think you 
don't provide any right? Correct me if I'm wrong.

Thanks!

Regards,
Guillermo.

On 16/03/15 16:16, Will McLaren wrote:
> Hi Guillermo,
>
> To get the rest of that data in the table you need to access the 
> additional attributes of the PhenotypeFeature object, something like:
>
> my $attr = $pfs->[0]->get_all_attributes;
> print "$_:".$attr->{$_}."\t" for keys %$attr;
> print "\n;
>
> Regards
>
> Will
>
> More info: the reason these data are stored as attributes is due to 
> the diverse data sources and types that we import into our phenotype 
> schema; to create a database column and corresponding API method for 
> each data type (p-value, review status, risk allele, external ID etc 
> etc) would be cumbersome and inefficient. To this end we provide a few 
> methods that shortcut the attribute approach for the most common data 
> types; everything else must be accessed through the attributes method. 
> This is a common theme across the Ensembl API.
>
> On 13 March 2015 at 12:03, Guillermo Marco Puche 
> <guillermo.marco at sistemasgenomicos.com 
> <mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
>
>     Hi,
>
>     I'm trying to retrieve ClinVar information with the code example
>     you provided.
>
>         my $self = shift;
>         my $tva = shift;
>         my $vf = $tva->variation_feature;
>         my $pfa =
>     $self->{config}->{reg}->get_adaptor('human','variation','phenotypefeature');
>
>         foreach my $known_var(@{$vf->{existing} || []}) {
>             foreach my
>     $pf(@{$pfa->fetch_all_by_object_id($known_var->{variation_name})}) {
>                 if ($pf->{'source'} eq "dbSNP_ClinVar"){
>                     print
>     "$pf->{'source'}\t$pf->{'external_id'}\t$pf->{'is_significant'}\t$pf->{'phenotype'}\n",
>     ;
>                 }
>             }
>         }
>
>     As you can see I'm "filtering" the results to only output
>     phenotype feature when source is dbSNP_ClinVar. I don't know why
>     but I guess filtering should be done when doing the "fetch_all".
>
>     On the other hand I'm trying to retrieve Disease, Source and
>     Clinical Significance from this example table:
>     http://www.ensembl.org/Homo_sapiens/Variation/Phenotype?db=core;r=8:19955518-19956518;v=rs268;vdb=variation;vf=266
>
>     I think I'm doing something wrong I got totally lost in
>     Phenotypefeature.
>
>     Regards,
>     Guillermo.
>
>
>     On 02/03/15 16:05, Will McLaren wrote:
>>     If you enable the --check_existing flag when you run the VEP,
>>     you'll be able to see any known co-located variants attached to
>>     the VariationFeature object in your plugin:
>>
>>     sub run {
>>       my $self = shift;
>>       my $tva = shift;
>>       my $vf = $tva->variation_feature;
>>
>>       foreach my $known_var(@{$vf->{existing} || []}) {
>>          # do stuff
>>       }
>>     }
>>
>>     The $known_var is not an API object but a simple hashref with a
>>     number of fields; you're probably interested in
>>     $known_var->{clin_sig}
>>
>>     However, as I mentioned, this is the only data that is stored in
>>     the cache. To access the rating and the specific disease
>>     association, you'll need to make calls to the database by getting
>>     an adaptor, something like:
>>
>>     sub run {
>>       my $self = shift;
>>       my $tva = shift;
>>       my $vf = $tva->variation_feature;
>>       my $pfa =
>>     $self->{config}->{reg}->get_adaptor('human','variation','phenotypefeature');
>>
>>       foreach my $known_var(@{$vf->{existing} || []}) {
>>          foreach my
>>     $pf(@{$pfa->fetch_all_by_object_id($known_var->{variation_name})}) {
>>            # do stuff
>>          }
>>       }
>>     }
>>
>>     Be aware that this will access the database, so unless you have a
>>     local copy please don't run this sort of code on genome-wide VCFs
>>     using our public DB server.
>>
>>     Regards
>>
>>     Will
>>
>>     On 2 March 2015 at 14:47, Guillermo Marco Puche
>>     <guillermo.marco at sistemasgenomicos.com
>>     <mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
>>
>>         Hi Will,
>>
>>         Indeed I'm looking to retrieve this information from VEP plugin.
>>
>>         Regards,
>>         Guillermo.
>>
>>
>>         On 02/03/15 15:25, Will McLaren wrote:
>>>         Hi Guillermo,
>>>
>>>         The detailed ClinVar information is stored against
>>>         PhenotypeFeature objects (each SNP/disease pairing gets its
>>>         own entry in ClinVar, e.g.
>>>         http://www.ncbi.nlm.nih.gov/clinvar/RCV000019691.2,
>>>         http://www.ncbi.nlm.nih.gov/clinvar/RCV000019692.2/,
>>>         http://www.ncbi.nlm.nih.gov/clinvar/RCV000019693.2/ for rs699).
>>>
>>>         The rating (and indeed the clinical significance) is stored
>>>         as an attribute on the PhenotypeFeature object; you can
>>>         retrieve this with the get_all_attributes() method.
>>>
>>>         See
>>>         http://www.ensembl.org/info/docs/Doxygen/variation-api/classBio_1_1EnsEMBL_1_1Variation_1_1PhenotypeFeature.html
>>>         and
>>>         http://www.ensembl.org/info/docs/api/variation/variation_tutorial.html#phenotype
>>>         for more info.
>>>
>>>         Bio::EnsEMBL::Variation::Utils::VEP::get_clin_sig() is an
>>>         internal method that you should not use.
>>>
>>>         The VEP cache contains the list of clinical significance
>>>         states for each variant, but neither the disease association
>>>         or the rating. If you want help getting access to this data
>>>         via a plugin, let me know as it's a little more involved
>>>         than the API methods above (though it is faster as no
>>>         database access is required).
>>>
>>>         Regards
>>>
>>>         Will McLaren
>>>         Ensembl Variation
>>>
>>>         On 2 March 2015 at 14:06, Guillermo Marco Puche
>>>         <guillermo.marco at sistemasgenomicos.com
>>>         <mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
>>>
>>>             Dear devs,
>>>
>>>             I'm looking forward to retrieve ClinVar information and
>>>             add it to VEP annotation. From my understanding I should
>>>             be able to retrieve "Clinical significance" and "ClinVar
>>>             Rating".
>>>
>>>             I've been looking the Varation API, and I'm confused. I
>>>             guess for significance I should use
>>>             Bio::EnsEMBL::Variation::Utils::VEP::get_clin_sig() or
>>>             Bio::EnsEMBL::Variation::VariationFeature::get_all_clinical_significance_states().
>>>
>>>             What about ClinVar rating? Is it possible to retrieve it
>>>             from API?
>>>
>>>             Thanks!
>>>
>>>             Regards,
>>>             Guillermo.
>>>
>>>
>>>
>>>             _______________________________________________
>>>             Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>>>             Posting guidelines and subscribe/unsubscribe info:
>>>             http://lists.ensembl.org/mailman/listinfo/dev
>>>             Ensembl Blog: http://www.ensembl.info/
>>>
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150316/03395b5a/attachment.html>


More information about the Dev mailing list