[ensembl-dev] VEP ClinVar information

Guillermo Marco Puche guillermo.marco at sistemasgenomicos.com
Thu Mar 26 11:26:16 GMT 2015


I guess it's a problem with my API installation then.

If I install latest API(79) and continue to use 
variant_effect_predictor.pl from version 75 will it continue working or 
I'll get conflicts/weird behaviours?

Thanks!

Regards,
Guillermo.


On 26/03/15 11:53, Will McLaren wrote:
> Example output (I set $Data::Dumper::Maxdepth = 1;):
>
> > echo "rs699" | perl -I ~/Git/guillermo/vep/ 
> variant_effect_predictor.pl <http://variant_effect_predictor.pl> 
> -plugin Clinvar -data -force -db 75
> 2015-03-26 10:51:38 - Reading input from STDIN (or maybe you forgot to 
> specify an input file?)...
> 2015-03-26 10:51:38 - Starting...
> 2015-03-26 10:51:38 - Detected format of input file as id
> 2015-03-26 10:51:38 - Read 1 variants into buffer
> 2015-03-26 10:51:38 - Checking for existing variations
> [===================================================================================================================================================================================================] 
>  [ 100% ]
> 2015-03-26 10:51:38 - Reading transcript data from cache and/or database
> [===================================================================================================================================================================================================] 
>  [ 100% ]
> 2015-03-26 10:51:38 - Retrieved 4 transcripts (0 mem, 0 cached, 4 DB, 
> 0 duplicates)
> 2015-03-26 10:51:38 - Analyzing chromosome 1
> 2015-03-26 10:51:38 - Analyzing variants
> [===================================================================================================================================================================================================] 
>  [ 100% ]
> 2015-03-26 10:51:38 - Calculating consequences
> $VAR1 = bless( {
>                  'adaptor' => 
> 'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
>                  'dbID' => '2853',
>                  'name' => undef,
>                  'description' => 'HYPERTENSION, ESSENTIAL, 
> SUSCEPTIBILITY TO'
>                }, 'Bio::EnsEMBL::Variation::Phenotype' );
> $VAR1 = bless( {
>                  'adaptor' => 
> 'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
>                  'dbID' => '20384',
>                  'name' => undef,
>                  'description' => 
> 'Susceptibility_to_progression_to_renal_failure_in_IgA_nephropathy'
>                }, 'Bio::EnsEMBL::Variation::Phenotype' );
> $VAR1 = bless( {
>                  'adaptor' => 
> 'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
>                  'dbID' => '20369',
>                  'name' => undef,
>                  'description' => 'Preeclampsia,_susceptibility_to'
>                }, 'Bio::EnsEMBL::Variation::Phenotype' );
> $VAR1 = bless( {
>                  'adaptor' => 
> 'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
>                  'dbID' => '26451',
>                  'name' => undef,
>                  'description' => 
> 'HYPERTENSION,_ESSENTIAL,_SUSCEPTIBILITY_TO'
>                }, 'Bio::EnsEMBL::Variation::Phenotype' );
> $VAR1 = bless( {
>                  'adaptor' => 
> 'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
>                  'dbID' => '1',
>                  'name' => 'HGMD_MUTATION',
>                  'description' => 'Annotated by HGMD but no phenotype 
> description is publicly available'
>                }, 'Bio::EnsEMBL::Variation::Phenotype' );
> $VAR1 = bless( {
>                  'adaptor' => 
> 'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
>                  'dbID' => '6522',
>                  'name' => undef,
>                  'description' => 'COSMIC:tumour_site:large_intestine'
>                }, 'Bio::EnsEMBL::Variation::Phenotype' );
> $VAR1 = bless( {
>                  'adaptor' => 
> 'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
>                  'dbID' => '6529',
>                  'name' => undef,
>                  'description' => 'COSMIC:tumour_site:breast'
>                }, 'Bio::EnsEMBL::Variation::Phenotype' );
> $VAR1 = bless( {
>                  'adaptor' => 
> 'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
>                  'dbID' => '2853',
>                  'name' => undef,
>                  'description' => 'HYPERTENSION, ESSENTIAL, 
> SUSCEPTIBILITY TO'
>                }, 'Bio::EnsEMBL::Variation::Phenotype' );
> $VAR1 = bless( {
>                  'adaptor' => 
> 'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
>                  'dbID' => '20384',
>                  'name' => undef,
>                  'description' => 
> 'Susceptibility_to_progression_to_renal_failure_in_IgA_nephropathy'
>                }, 'Bio::EnsEMBL::Variation::Phenotype' );
> $VAR1 = bless( {
>                  'adaptor' => 
> 'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
>                  'dbID' => '20369',
>                  'name' => undef,
>                  'description' => 'Preeclampsia,_susceptibility_to'
>                }, 'Bio::EnsEMBL::Variation::Phenotype' );
> $VAR1 = bless( {
>                  'adaptor' => 
> 'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
>                  'dbID' => '26451',
>                  'name' => undef,
>                  'description' => 
> 'HYPERTENSION,_ESSENTIAL,_SUSCEPTIBILITY_TO'
>                }, 'Bio::EnsEMBL::Variation::Phenotype' );
> $VAR1 = bless( {
>                  'adaptor' => 
> 'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
>                  'dbID' => '1',
>                  'name' => 'HGMD_MUTATION',
>                  'description' => 'Annotated by HGMD but no phenotype 
> description is publicly available'
>                }, 'Bio::EnsEMBL::Variation::Phenotype' );
> $VAR1 = bless( {
>                  'adaptor' => 
> 'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
>                  'dbID' => '6522',
>                  'name' => undef,
>                  'description' => 'COSMIC:tumour_site:large_intestine'
>                }, 'Bio::EnsEMBL::Variation::Phenotype' );
> $VAR1 = bless( {
>                  'adaptor' => 
> 'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
>                  'dbID' => '6529',
>                  'name' => undef,
>                  'description' => 'COSMIC:tumour_site:breast'
>                }, 'Bio::EnsEMBL::Variation::Phenotype' );
> 2015-03-26 10:51:38 - Processed 1 total variants (1 vars/sec, 1 
> vars/sec total)
> 2015-03-26 10:51:38 - Wrote stats summary to 
> variant_effect_output.txt_summary.html
> 2015-03-26 10:51:38 - Finished!
>
> On 26 March 2015 at 10:43, Guillermo Marco Puche 
> <guillermo.marco at sistemasgenomicos.com 
> <mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
>
>     Hello Will,
>
>     I already had enabled "check_existing" on my VEP config template,
>     however I followed your advice and updated code to force in the
>     new() method with your code.
>     I'm still getting no prints of line 64:
>
>     printDumper($pf->phenotype());
>
>     Are you getting any output printed? As I said I get no errors but
>     nothing is printed neither. This data dumper should be printing
>     result of phenotype() method call.
>
>     Regards,
>     Guillermo.
>
>
>
>     On 26/03/15 11:05, Will McLaren wrote:
>>     I think perhaps you haven't enabled --check_existing; this is
>>     required for $vf->{existing} to get populated.
>>
>>     You can force it on in the new() method of your plugin:
>>
>>     $self->{config}->{check_existing} = 1;
>>
>>     It then works for me on release/75 and release/79.
>>
>>     Will
>>
>>     On 25 March 2015 at 17:35, Guillermo Marco Puche
>>     <guillermo.marco at sistemasgenomicos.com
>>     <mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
>>
>>         Hello Will,
>>
>>         With your explanations I'm trying to call phenotype (as you
>>         said I was accessing the hashref directly).
>>         I'm using input set you linked. However my local Ensembl
>>         installation is v75.
>>
>>         This is the code of the plugin:
>>         https://github.com/guillermomarco/vep/blob/master/Clinvar.pm
>>
>>         I'm getting absolutelty no info nor errors. I've no idea if
>>         this is an issue with my database/API version or with the
>>         plugin code itself.
>>
>>         Regards,
>>         Guillermo.
>>
>>
>>
>>         On 16/03/15 17:50, Will McLaren wrote:
>>>         The "is_significant" field is an internal flag that doesn't
>>>         necessarily have the meaning you expect; it is used to
>>>         distinguish between genuine reported associations and e.g.
>>>         non-significant associations reported from genome-wide studies.
>>>
>>>         You should not see undef for phenotype; I suspect you are
>>>         accessing the hashref directly ($pf->{phenotype}) rather
>>>         than making the method call ($pf->phenotype()).
>>>
>>>         You could try
>>>         ftp://ftp.ensembl.org/pub/release-79/variation/vcf/homo_sapiens/Homo_sapiens_clinically_associated.vcf.gz
>>>         as a test input set.
>>>
>>>         Will
>>>
>>>         On 16 March 2015 at 16:39, Guillermo Marco Puche
>>>         <guillermo.marco at sistemasgenomicos.com
>>>         <mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
>>>
>>>             Hi Will,
>>>
>>>             Thank you for your quick response! Very clarifying.
>>>
>>>             I guess that the way to retrieve ClinVar data I posted
>>>             is correct. With my test dataset I've only seen
>>>             "is_significant" values of "1" and undef 'phenotype'
>>>             values. I think I need a synthetic vcf with ClinVar
>>>             annotation variants to very that the plugin is working.
>>>
>>>             I've been looking on Ensembl website for a test dataset.
>>>             I think you don't provide any right? Correct me if I'm
>>>             wrong.
>>>
>>>             Thanks!
>>>
>>>             Regards,
>>>             Guillermo.
>>>
>>>
>>>             On 16/03/15 16:16, Will McLaren wrote:
>>>>             Hi Guillermo,
>>>>
>>>>             To get the rest of that data in the table you need to
>>>>             access the additional attributes of the
>>>>             PhenotypeFeature object, something like:
>>>>
>>>>             my $attr = $pfs->[0]->get_all_attributes;
>>>>             print "$_:".$attr->{$_}."\t" for keys %$attr;
>>>>             print "\n;
>>>>
>>>>             Regards
>>>>
>>>>             Will
>>>>
>>>>             More info: the reason these data are stored as
>>>>             attributes is due to the diverse data sources and types
>>>>             that we import into our phenotype schema; to create a
>>>>             database column and corresponding API method for each
>>>>             data type (p-value, review status, risk allele,
>>>>             external ID etc etc) would be cumbersome and
>>>>             inefficient. To this end we provide a few methods that
>>>>             shortcut the attribute approach for the most common
>>>>             data types; everything else must be accessed through
>>>>             the attributes method. This is a common theme across
>>>>             the Ensembl API.
>>>>
>>>>             On 13 March 2015 at 12:03, Guillermo Marco Puche
>>>>             <guillermo.marco at sistemasgenomicos.com
>>>>             <mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
>>>>
>>>>                 Hi,
>>>>
>>>>                 I'm trying to retrieve ClinVar information with the
>>>>                 code example you provided.
>>>>
>>>>                     my $self = shift;
>>>>                     my $tva = shift;
>>>>                     my $vf = $tva->variation_feature;
>>>>                     my $pfa =
>>>>                 $self->{config}->{reg}->get_adaptor('human','variation','phenotypefeature');
>>>>
>>>>                     foreach my $known_var(@{$vf->{existing} || []}) {
>>>>                 foreach my
>>>>                 $pf(@{$pfa->fetch_all_by_object_id($known_var->{variation_name})})
>>>>                 {
>>>>                     if ($pf->{'source'} eq "dbSNP_ClinVar"){
>>>>                 print
>>>>                 "$pf->{'source'}\t$pf->{'external_id'}\t$pf->{'is_significant'}\t$pf->{'phenotype'}\n",
>>>>                 ;
>>>>                             }
>>>>                         }
>>>>                     }
>>>>
>>>>                 As you can see I'm "filtering" the results to only
>>>>                 output phenotype feature when source is
>>>>                 dbSNP_ClinVar. I don't know why but I guess
>>>>                 filtering should be done when doing the "fetch_all".
>>>>
>>>>                 On the other hand I'm trying to retrieve Disease,
>>>>                 Source and Clinical Significance from this example
>>>>                 table:
>>>>                 http://www.ensembl.org/Homo_sapiens/Variation/Phenotype?db=core;r=8:19955518-19956518;v=rs268;vdb=variation;vf=266
>>>>
>>>>                 I think I'm doing something wrong I got totally
>>>>                 lost in Phenotypefeature.
>>>>
>>>>                 Regards,
>>>>                 Guillermo.
>>>>
>>>>
>>>>                 On 02/03/15 16:05, Will McLaren wrote:
>>>>>                 If you enable the --check_existing flag when you
>>>>>                 run the VEP, you'll be able to see any known
>>>>>                 co-located variants attached to the
>>>>>                 VariationFeature object in your plugin:
>>>>>
>>>>>                 sub run {
>>>>>                   my $self = shift;
>>>>>                   my $tva = shift;
>>>>>                   my $vf = $tva->variation_feature;
>>>>>
>>>>>                   foreach my $known_var(@{$vf->{existing} || []}) {
>>>>>                      # do stuff
>>>>>                   }
>>>>>                 }
>>>>>
>>>>>                 The $known_var is not an API object but a simple
>>>>>                 hashref with a number of fields; you're probably
>>>>>                 interested in $known_var->{clin_sig}
>>>>>
>>>>>                 However, as I mentioned, this is the only data
>>>>>                 that is stored in the cache. To access the rating
>>>>>                 and the specific disease association, you'll need
>>>>>                 to make calls to the database by getting an
>>>>>                 adaptor, something like:
>>>>>
>>>>>                 sub run {
>>>>>                   my $self = shift;
>>>>>                   my $tva = shift;
>>>>>                   my $vf = $tva->variation_feature;
>>>>>                   my $pfa =
>>>>>                 $self->{config}->{reg}->get_adaptor('human','variation','phenotypefeature');
>>>>>
>>>>>                   foreach my $known_var(@{$vf->{existing} || []}) {
>>>>>                  foreach my
>>>>>                 $pf(@{$pfa->fetch_all_by_object_id($known_var->{variation_name})})
>>>>>                 {
>>>>>                        # do stuff
>>>>>                      }
>>>>>                   }
>>>>>                 }
>>>>>
>>>>>                 Be aware that this will access the database, so
>>>>>                 unless you have a local copy please don't run this
>>>>>                 sort of code on genome-wide VCFs using our public
>>>>>                 DB server.
>>>>>
>>>>>                 Regards
>>>>>
>>>>>                 Will
>>>>>
>>>>>                 On 2 March 2015 at 14:47, Guillermo Marco Puche
>>>>>                 <guillermo.marco at sistemasgenomicos.com
>>>>>                 <mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
>>>>>
>>>>>                     Hi Will,
>>>>>
>>>>>                     Indeed I'm looking to retrieve this
>>>>>                     information from VEP plugin.
>>>>>
>>>>>                     Regards,
>>>>>                     Guillermo.
>>>>>
>>>>>
>>>>>                     On 02/03/15 15:25, Will McLaren wrote:
>>>>>>                     Hi Guillermo,
>>>>>>
>>>>>>                     The detailed ClinVar information is stored
>>>>>>                     against PhenotypeFeature objects (each
>>>>>>                     SNP/disease pairing gets its own entry in
>>>>>>                     ClinVar, e.g.
>>>>>>                     http://www.ncbi.nlm.nih.gov/clinvar/RCV000019691.2,
>>>>>>                     http://www.ncbi.nlm.nih.gov/clinvar/RCV000019692.2/,
>>>>>>                     http://www.ncbi.nlm.nih.gov/clinvar/RCV000019693.2/
>>>>>>                     for rs699).
>>>>>>
>>>>>>                     The rating (and indeed the clinical
>>>>>>                     significance) is stored as an attribute on
>>>>>>                     the PhenotypeFeature object; you can retrieve
>>>>>>                     this with the get_all_attributes() method.
>>>>>>
>>>>>>                     See
>>>>>>                     http://www.ensembl.org/info/docs/Doxygen/variation-api/classBio_1_1EnsEMBL_1_1Variation_1_1PhenotypeFeature.html
>>>>>>                     and
>>>>>>                     http://www.ensembl.org/info/docs/api/variation/variation_tutorial.html#phenotype
>>>>>>                     for more info.
>>>>>>
>>>>>>                     Bio::EnsEMBL::Variation::Utils::VEP::get_clin_sig()
>>>>>>                     is an internal method that you should not use.
>>>>>>
>>>>>>                     The VEP cache contains the list of clinical
>>>>>>                     significance states for each variant, but
>>>>>>                     neither the disease association or the
>>>>>>                     rating. If you want help getting access to
>>>>>>                     this data via a plugin, let me know as it's a
>>>>>>                     little more involved than the API methods
>>>>>>                     above (though it is faster as no database
>>>>>>                     access is required).
>>>>>>
>>>>>>                     Regards
>>>>>>
>>>>>>                     Will McLaren
>>>>>>                     Ensembl Variation
>>>>>>
>>>>>>                     On 2 March 2015 at 14:06, Guillermo Marco
>>>>>>                     Puche <guillermo.marco at sistemasgenomicos.com
>>>>>>                     <mailto:guillermo.marco at sistemasgenomicos.com>>
>>>>>>                     wrote:
>>>>>>
>>>>>>                         Dear devs,
>>>>>>
>>>>>>                         I'm looking forward to retrieve ClinVar
>>>>>>                         information and add it to VEP annotation.
>>>>>>                         From my understanding I should be able to
>>>>>>                         retrieve "Clinical significance" and
>>>>>>                         "ClinVar Rating".
>>>>>>
>>>>>>                         I've been looking the Varation API, and
>>>>>>                         I'm confused. I guess for significance I
>>>>>>                         should use
>>>>>>                         Bio::EnsEMBL::Variation::Utils::VEP::get_clin_sig()
>>>>>>                         or
>>>>>>                         Bio::EnsEMBL::Variation::VariationFeature::get_all_clinical_significance_states().
>>>>>>
>>>>>>                         What about ClinVar rating? Is it possible
>>>>>>                         to retrieve it from API?
>>>>>>
>>>>>>                         Thanks!
>>>>>>
>>>>>>                         Regards,
>>>>>>                         Guillermo.
>>>>>>
>>>>>>
>>>>>>
>>>>>>                         _______________________________________________
>>>>>>                         Dev mailing list Dev at ensembl.org
>>>>>>                         <mailto:Dev at ensembl.org>
>>>>>>                         Posting guidelines and
>>>>>>                         subscribe/unsubscribe info:
>>>>>>                         http://lists.ensembl.org/mailman/listinfo/dev
>>>>>>                         Ensembl Blog: http://www.ensembl.info/
>>>>>>
>>>>>>
>>>>>>
>>>
>>>             _______________________________________________
>>>             Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>>>             Posting guidelines and subscribe/unsubscribe info:
>>>             http://lists.ensembl.org/mailman/listinfo/dev
>>>             Ensembl Blog: http://www.ensembl.info/
>>>
>>>
>>>
>>>
>>>         _______________________________________________
>>>         Dev mailing listDev at ensembl.org  <mailto:Dev at ensembl.org>
>>>         Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>>>         Ensembl Blog:http://www.ensembl.info/
>>
>>         _______________________________________________
>>         Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>>         Posting guidelines and subscribe/unsubscribe info:
>>         http://lists.ensembl.org/mailman/listinfo/dev
>>         Ensembl Blog: http://www.ensembl.info/
>>
>>
>>
>>
>>     _______________________________________________
>>     Dev mailing listDev at ensembl.org  <mailto:Dev at ensembl.org>
>>     Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>>     Ensembl Blog:http://www.ensembl.info/
>
>     _______________________________________________
>     Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>     Posting guidelines and subscribe/unsubscribe info:
>     http://lists.ensembl.org/mailman/listinfo/dev
>     Ensembl Blog: http://www.ensembl.info/
>
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150326/77e7f888/attachment.html>


More information about the Dev mailing list