[ensembl-dev] VEP ClinVar information

Guillermo Marco Puche guillermo.marco at sistemasgenomicos.com
Thu Mar 26 10:43:21 GMT 2015


Hello Will,

I already had enabled "check_existing" on my VEP config template, 
however I followed your advice and updated code to force in the new() 
method with your code.
I'm still getting no prints of line 64:

printDumper($pf->phenotype());

Are you getting any output printed? As I said I get no errors but 
nothing is printed neither. This data dumper should be printing result 
of phenotype() method call.

Regards,
Guillermo.


On 26/03/15 11:05, Will McLaren wrote:
> I think perhaps you haven't enabled --check_existing; this is required 
> for $vf->{existing} to get populated.
>
> You can force it on in the new() method of your plugin:
>
> $self->{config}->{check_existing} = 1;
>
> It then works for me on release/75 and release/79.
>
> Will
>
> On 25 March 2015 at 17:35, Guillermo Marco Puche 
> <guillermo.marco at sistemasgenomicos.com 
> <mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
>
>     Hello Will,
>
>     With your explanations I'm trying to call phenotype (as you said I
>     was accessing the hashref directly).
>     I'm using input set you linked. However my local Ensembl
>     installation is v75.
>
>     This is the code of the plugin:
>     https://github.com/guillermomarco/vep/blob/master/Clinvar.pm
>
>     I'm getting absolutelty no info nor errors. I've no idea if this
>     is an issue with my database/API version or with the plugin code
>     itself.
>
>     Regards,
>     Guillermo.
>
>
>
>     On 16/03/15 17:50, Will McLaren wrote:
>>     The "is_significant" field is an internal flag that doesn't
>>     necessarily have the meaning you expect; it is used to
>>     distinguish between genuine reported associations and e.g.
>>     non-significant associations reported from genome-wide studies.
>>
>>     You should not see undef for phenotype; I suspect you are
>>     accessing the hashref directly ($pf->{phenotype}) rather than
>>     making the method call ($pf->phenotype()).
>>
>>     You could try
>>     ftp://ftp.ensembl.org/pub/release-79/variation/vcf/homo_sapiens/Homo_sapiens_clinically_associated.vcf.gz
>>     as a test input set.
>>
>>     Will
>>
>>     On 16 March 2015 at 16:39, Guillermo Marco Puche
>>     <guillermo.marco at sistemasgenomicos.com
>>     <mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
>>
>>         Hi Will,
>>
>>         Thank you for your quick response! Very clarifying.
>>
>>         I guess that the way to retrieve ClinVar data I posted is
>>         correct. With my test dataset I've only seen "is_significant"
>>         values of "1" and undef 'phenotype' values. I think I need a
>>         synthetic vcf with ClinVar annotation variants to very that
>>         the plugin is working.
>>
>>         I've been looking on Ensembl website for a test dataset. I
>>         think you don't provide any right? Correct me if I'm wrong.
>>
>>         Thanks!
>>
>>         Regards,
>>         Guillermo.
>>
>>
>>         On 16/03/15 16:16, Will McLaren wrote:
>>>         Hi Guillermo,
>>>
>>>         To get the rest of that data in the table you need to access
>>>         the additional attributes of the PhenotypeFeature object,
>>>         something like:
>>>
>>>         my $attr = $pfs->[0]->get_all_attributes;
>>>         print "$_:".$attr->{$_}."\t" for keys %$attr;
>>>         print "\n;
>>>
>>>         Regards
>>>
>>>         Will
>>>
>>>         More info: the reason these data are stored as attributes is
>>>         due to the diverse data sources and types that we import
>>>         into our phenotype schema; to create a database column and
>>>         corresponding API method for each data type (p-value, review
>>>         status, risk allele, external ID etc etc) would be
>>>         cumbersome and inefficient. To this end we provide a few
>>>         methods that shortcut the attribute approach for the most
>>>         common data types; everything else must be accessed through
>>>         the attributes method. This is a common theme across the
>>>         Ensembl API.
>>>
>>>         On 13 March 2015 at 12:03, Guillermo Marco Puche
>>>         <guillermo.marco at sistemasgenomicos.com
>>>         <mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
>>>
>>>             Hi,
>>>
>>>             I'm trying to retrieve ClinVar information with the code
>>>             example you provided.
>>>
>>>                 my $self = shift;
>>>                 my $tva = shift;
>>>                 my $vf = $tva->variation_feature;
>>>                 my $pfa =
>>>             $self->{config}->{reg}->get_adaptor('human','variation','phenotypefeature');
>>>
>>>                 foreach my $known_var(@{$vf->{existing} || []}) {
>>>                     foreach my
>>>             $pf(@{$pfa->fetch_all_by_object_id($known_var->{variation_name})})
>>>             {
>>>                         if ($pf->{'source'} eq "dbSNP_ClinVar"){
>>>                             print
>>>             "$pf->{'source'}\t$pf->{'external_id'}\t$pf->{'is_significant'}\t$pf->{'phenotype'}\n",
>>>             ;
>>>                         }
>>>                     }
>>>                 }
>>>
>>>             As you can see I'm "filtering" the results to only
>>>             output phenotype feature when source is dbSNP_ClinVar. I
>>>             don't know why but I guess filtering should be done when
>>>             doing the "fetch_all".
>>>
>>>             On the other hand I'm trying to retrieve Disease, Source
>>>             and Clinical Significance from this example table:
>>>             http://www.ensembl.org/Homo_sapiens/Variation/Phenotype?db=core;r=8:19955518-19956518;v=rs268;vdb=variation;vf=266
>>>
>>>             I think I'm doing something wrong I got totally lost in
>>>             Phenotypefeature.
>>>
>>>             Regards,
>>>             Guillermo.
>>>
>>>
>>>             On 02/03/15 16:05, Will McLaren wrote:
>>>>             If you enable the --check_existing flag when you run
>>>>             the VEP, you'll be able to see any known co-located
>>>>             variants attached to the VariationFeature object in
>>>>             your plugin:
>>>>
>>>>             sub run {
>>>>               my $self = shift;
>>>>               my $tva = shift;
>>>>               my $vf = $tva->variation_feature;
>>>>
>>>>               foreach my $known_var(@{$vf->{existing} || []}) {
>>>>                  # do stuff
>>>>               }
>>>>             }
>>>>
>>>>             The $known_var is not an API object but a simple
>>>>             hashref with a number of fields; you're probably
>>>>             interested in $known_var->{clin_sig}
>>>>
>>>>             However, as I mentioned, this is the only data that is
>>>>             stored in the cache. To access the rating and the
>>>>             specific disease association, you'll need to make calls
>>>>             to the database by getting an adaptor, something like:
>>>>
>>>>             sub run {
>>>>               my $self = shift;
>>>>               my $tva = shift;
>>>>               my $vf = $tva->variation_feature;
>>>>               my $pfa =
>>>>             $self->{config}->{reg}->get_adaptor('human','variation','phenotypefeature');
>>>>
>>>>               foreach my $known_var(@{$vf->{existing} || []}) {
>>>>                  foreach my
>>>>             $pf(@{$pfa->fetch_all_by_object_id($known_var->{variation_name})})
>>>>             {
>>>>                    # do stuff
>>>>                  }
>>>>               }
>>>>             }
>>>>
>>>>             Be aware that this will access the database, so unless
>>>>             you have a local copy please don't run this sort of
>>>>             code on genome-wide VCFs using our public DB server.
>>>>
>>>>             Regards
>>>>
>>>>             Will
>>>>
>>>>             On 2 March 2015 at 14:47, Guillermo Marco Puche
>>>>             <guillermo.marco at sistemasgenomicos.com
>>>>             <mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
>>>>
>>>>                 Hi Will,
>>>>
>>>>                 Indeed I'm looking to retrieve this information
>>>>                 from VEP plugin.
>>>>
>>>>                 Regards,
>>>>                 Guillermo.
>>>>
>>>>
>>>>                 On 02/03/15 15:25, Will McLaren wrote:
>>>>>                 Hi Guillermo,
>>>>>
>>>>>                 The detailed ClinVar information is stored against
>>>>>                 PhenotypeFeature objects (each SNP/disease pairing
>>>>>                 gets its own entry in ClinVar, e.g.
>>>>>                 http://www.ncbi.nlm.nih.gov/clinvar/RCV000019691.2, http://www.ncbi.nlm.nih.gov/clinvar/RCV000019692.2/,
>>>>>                 http://www.ncbi.nlm.nih.gov/clinvar/RCV000019693.2/ for
>>>>>                 rs699).
>>>>>
>>>>>                 The rating (and indeed the clinical significance)
>>>>>                 is stored as an attribute on the PhenotypeFeature
>>>>>                 object; you can retrieve this with the
>>>>>                 get_all_attributes() method.
>>>>>
>>>>>                 See
>>>>>                 http://www.ensembl.org/info/docs/Doxygen/variation-api/classBio_1_1EnsEMBL_1_1Variation_1_1PhenotypeFeature.html
>>>>>                 and
>>>>>                 http://www.ensembl.org/info/docs/api/variation/variation_tutorial.html#phenotype
>>>>>                 for more info.
>>>>>
>>>>>                 Bio::EnsEMBL::Variation::Utils::VEP::get_clin_sig() is
>>>>>                 an internal method that you should not use.
>>>>>
>>>>>                 The VEP cache contains the list of clinical
>>>>>                 significance states for each variant, but neither
>>>>>                 the disease association or the rating. If you want
>>>>>                 help getting access to this data via a plugin, let
>>>>>                 me know as it's a little more involved than the
>>>>>                 API methods above (though it is faster as no
>>>>>                 database access is required).
>>>>>
>>>>>                 Regards
>>>>>
>>>>>                 Will McLaren
>>>>>                 Ensembl Variation
>>>>>
>>>>>                 On 2 March 2015 at 14:06, Guillermo Marco Puche
>>>>>                 <guillermo.marco at sistemasgenomicos.com
>>>>>                 <mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
>>>>>
>>>>>                     Dear devs,
>>>>>
>>>>>                     I'm looking forward to retrieve ClinVar
>>>>>                     information and add it to VEP annotation. From
>>>>>                     my understanding I should be able to retrieve
>>>>>                     "Clinical significance" and "ClinVar Rating".
>>>>>
>>>>>                     I've been looking the Varation API, and I'm
>>>>>                     confused. I guess for significance I should
>>>>>                     use
>>>>>                     Bio::EnsEMBL::Variation::Utils::VEP::get_clin_sig()
>>>>>                     or
>>>>>                     Bio::EnsEMBL::Variation::VariationFeature::get_all_clinical_significance_states().
>>>>>
>>>>>                     What about ClinVar rating? Is it possible to
>>>>>                     retrieve it from API?
>>>>>
>>>>>                     Thanks!
>>>>>
>>>>>                     Regards,
>>>>>                     Guillermo.
>>>>>
>>>>>
>>>>>
>>>>>                     _______________________________________________
>>>>>                     Dev mailing list Dev at ensembl.org
>>>>>                     <mailto:Dev at ensembl.org>
>>>>>                     Posting guidelines and subscribe/unsubscribe
>>>>>                     info:
>>>>>                     http://lists.ensembl.org/mailman/listinfo/dev
>>>>>                     Ensembl Blog: http://www.ensembl.info/
>>>>>
>>>>>
>>>>>
>>
>>         _______________________________________________
>>         Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>>         Posting guidelines and subscribe/unsubscribe info:
>>         http://lists.ensembl.org/mailman/listinfo/dev
>>         Ensembl Blog: http://www.ensembl.info/
>>
>>
>>
>>
>>     _______________________________________________
>>     Dev mailing listDev at ensembl.org  <mailto:Dev at ensembl.org>
>>     Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>>     Ensembl Blog:http://www.ensembl.info/
>
>     _______________________________________________
>     Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>     Posting guidelines and subscribe/unsubscribe info:
>     http://lists.ensembl.org/mailman/listinfo/dev
>     Ensembl Blog: http://www.ensembl.info/
>
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150326/3a7b6e55/attachment.html>


More information about the Dev mailing list