[ensembl-dev] VEP ClinVar information

Will McLaren wm2 at ebi.ac.uk
Mon Mar 16 16:50:44 GMT 2015


The "is_significant" field is an internal flag that doesn't necessarily
have the meaning you expect; it is used to distinguish between genuine
reported associations and e.g. non-significant associations reported from
genome-wide studies.

You should not see undef for phenotype; I suspect you are accessing the
hashref directly ($pf->{phenotype}) rather than making the method call
($pf->phenotype()).

You could try
ftp://ftp.ensembl.org/pub/release-79/variation/vcf/homo_sapiens/Homo_sapiens_clinically_associated.vcf.gz
as a test input set.

Will

On 16 March 2015 at 16:39, Guillermo Marco Puche <
guillermo.marco at sistemasgenomicos.com> wrote:

>  Hi Will,
>
> Thank you for your quick response! Very clarifying.
>
> I guess that the way to retrieve ClinVar data I posted is correct. With my
> test dataset I've only seen "is_significant" values of "1" and undef
> 'phenotype' values. I think I need a synthetic vcf with ClinVar annotation
> variants to very that the plugin is working.
>
> I've been looking on Ensembl website for a test dataset. I think you don't
> provide any right? Correct me if I'm wrong.
>
> Thanks!
>
> Regards,
> Guillermo.
>
>
> On 16/03/15 16:16, Will McLaren wrote:
>
> Hi Guillermo,
>
>  To get the rest of that data in the table you need to access the
> additional attributes of the PhenotypeFeature object, something like:
>
>  my $attr = $pfs->[0]->get_all_attributes;
>  print "$_:".$attr->{$_}."\t" for keys %$attr;
> print "\n;
>
>  Regards
>
>  Will
>
>  More info: the reason these data are stored as attributes is due to the
> diverse data sources and types that we import into our phenotype schema; to
> create a database column and corresponding API method for each data type
> (p-value, review status, risk allele, external ID etc etc) would be
> cumbersome and inefficient. To this end we provide a few methods that
> shortcut the attribute approach for the most common data types; everything
> else must be accessed through the attributes method. This is a common theme
> across the Ensembl API.
>
> On 13 March 2015 at 12:03, Guillermo Marco Puche <
> guillermo.marco at sistemasgenomicos.com> wrote:
>
>>  Hi,
>>
>> I'm trying to retrieve ClinVar information with the code example you
>> provided.
>>
>>     my $self = shift;
>>     my $tva = shift;
>>     my $vf = $tva->variation_feature;
>>     my $pfa =
>> $self->{config}->{reg}->get_adaptor('human','variation','phenotypefeature');
>>
>>     foreach my $known_var(@{$vf->{existing} || []}) {
>>         foreach my
>> $pf(@{$pfa->fetch_all_by_object_id($known_var->{variation_name})}) {
>>              if ($pf->{'source'} eq "dbSNP_ClinVar"){
>>                 print
>> "$pf->{'source'}\t$pf->{'external_id'}\t$pf->{'is_significant'}\t$pf->{'phenotype'}\n",
>> ;
>>             }
>>         }
>>     }
>>
>> As you can see I'm "filtering" the results to only output phenotype
>> feature when source is dbSNP_ClinVar. I don't know why but I guess
>> filtering should be done when doing the "fetch_all".
>>
>> On the other hand I'm trying to retrieve Disease, Source and Clinical
>> Significance from this example table:
>> http://www.ensembl.org/Homo_sapiens/Variation/Phenotype?db=core;r=8:19955518-19956518;v=rs268;vdb=variation;vf=266
>>
>> I think I'm doing something wrong I got totally lost in Phenotypefeature.
>>
>> Regards,
>> Guillermo.
>>
>>
>> On 02/03/15 16:05, Will McLaren wrote:
>>
>> If you enable the --check_existing flag when you run the VEP, you'll be
>> able to see any known co-located variants attached to the VariationFeature
>> object in your plugin:
>>
>>  sub run {
>>   my $self = shift;
>>   my $tva = shift;
>>   my $vf = $tva->variation_feature;
>>
>>    foreach my $known_var(@{$vf->{existing} || []}) {
>>      # do stuff
>>   }
>> }
>>
>>  The $known_var is not an API object but a simple hashref with a number
>> of fields; you're probably interested in $known_var->{clin_sig}
>>
>>  However, as I mentioned, this is the only data that is stored in the
>> cache. To access the rating and the specific disease association, you'll
>> need to make calls to the database by getting an adaptor, something like:
>>
>>  sub run {
>>   my $self = shift;
>>   my $tva = shift;
>>   my $vf = $tva->variation_feature;
>>   my $pfa =
>> $self->{config}->{reg}->get_adaptor('human','variation','phenotypefeature');
>>
>>    foreach my $known_var(@{$vf->{existing} || []}) {
>>      foreach my
>> $pf(@{$pfa->fetch_all_by_object_id($known_var->{variation_name})}) {
>>        # do stuff
>>      }
>>   }
>> }
>>
>>  Be aware that this will access the database, so unless you have a local
>> copy please don't run this sort of code on genome-wide VCFs using our
>> public DB server.
>>
>>  Regards
>>
>>  Will
>>
>> On 2 March 2015 at 14:47, Guillermo Marco Puche <
>> guillermo.marco at sistemasgenomicos.com> wrote:
>>
>>>  Hi Will,
>>>
>>> Indeed I'm looking to retrieve this information from VEP plugin.
>>>
>>> Regards,
>>> Guillermo.
>>>
>>>
>>> On 02/03/15 15:25, Will McLaren wrote:
>>>
>>> Hi Guillermo,
>>>
>>>  The detailed ClinVar information is stored against PhenotypeFeature
>>> objects (each SNP/disease pairing gets its own entry in ClinVar, e.g.
>>> http://www.ncbi.nlm.nih.gov/clinvar/RCV000019691.2,
>>> http://www.ncbi.nlm.nih.gov/clinvar/RCV000019692.2/,
>>> http://www.ncbi.nlm.nih.gov/clinvar/RCV000019693.2/ for rs699).
>>>
>>>  The rating (and indeed the clinical significance) is stored as an
>>> attribute on the PhenotypeFeature object; you can retrieve this with the
>>> get_all_attributes() method.
>>>
>>>  See
>>> http://www.ensembl.org/info/docs/Doxygen/variation-api/classBio_1_1EnsEMBL_1_1Variation_1_1PhenotypeFeature.html
>>> and
>>> http://www.ensembl.org/info/docs/api/variation/variation_tutorial.html#phenotype
>>> for more info.
>>>
>>>  Bio::EnsEMBL::Variation::Utils::VEP::get_clin_sig() is an internal
>>> method that you should not use.
>>>
>>>  The VEP cache contains the list of clinical significance states for
>>> each variant, but neither the disease association or the rating. If you
>>> want help getting access to this data via a plugin, let me know as it's a
>>> little more involved than the API methods above (though it is faster as no
>>> database access is required).
>>>
>>>  Regards
>>>
>>>  Will McLaren
>>> Ensembl Variation
>>>
>>> On 2 March 2015 at 14:06, Guillermo Marco Puche <
>>> guillermo.marco at sistemasgenomicos.com> wrote:
>>>
>>>>  Dear devs,
>>>>
>>>> I'm looking forward to retrieve ClinVar information and add it to VEP
>>>> annotation. From my understanding I should be able to retrieve "Clinical
>>>> significance" and "ClinVar Rating".
>>>>
>>>> I've been looking the Varation API, and I'm confused. I guess for
>>>> significance I should use
>>>> Bio::EnsEMBL::Variation::Utils::VEP::get_clin_sig() or
>>>> Bio::EnsEMBL::Variation::VariationFeature::get_all_clinical_significance_states().
>>>>
>>>> What about ClinVar rating? Is it possible to retrieve it from API?
>>>>
>>>> Thanks!
>>>>
>>>> Regards,
>>>> Guillermo.
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Dev mailing list    Dev at ensembl.org
>>>> Posting guidelines and subscribe/unsubscribe info:
>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>> Ensembl Blog: http://www.ensembl.info/
>>>>
>>>>
>>>
>>>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150316/cfa123b3/attachment.html>


More information about the Dev mailing list