[ensembl-dev] VEP ClinVar information
Guillermo Marco Puche
guillermo.marco at sistemasgenomicos.com
Thu Mar 26 11:26:16 GMT 2015
I guess it's a problem with my API installation then.
If I install latest API(79) and continue to use
variant_effect_predictor.pl from version 75 will it continue working or
I'll get conflicts/weird behaviours?
Thanks!
Regards,
Guillermo.
On 26/03/15 11:53, Will McLaren wrote:
> Example output (I set $Data::Dumper::Maxdepth = 1;):
>
> > echo "rs699" | perl -I ~/Git/guillermo/vep/
> variant_effect_predictor.pl <http://variant_effect_predictor.pl>
> -plugin Clinvar -data -force -db 75
> 2015-03-26 10:51:38 - Reading input from STDIN (or maybe you forgot to
> specify an input file?)...
> 2015-03-26 10:51:38 - Starting...
> 2015-03-26 10:51:38 - Detected format of input file as id
> 2015-03-26 10:51:38 - Read 1 variants into buffer
> 2015-03-26 10:51:38 - Checking for existing variations
> [===================================================================================================================================================================================================]
> [ 100% ]
> 2015-03-26 10:51:38 - Reading transcript data from cache and/or database
> [===================================================================================================================================================================================================]
> [ 100% ]
> 2015-03-26 10:51:38 - Retrieved 4 transcripts (0 mem, 0 cached, 4 DB,
> 0 duplicates)
> 2015-03-26 10:51:38 - Analyzing chromosome 1
> 2015-03-26 10:51:38 - Analyzing variants
> [===================================================================================================================================================================================================]
> [ 100% ]
> 2015-03-26 10:51:38 - Calculating consequences
> $VAR1 = bless( {
> 'adaptor' =>
> 'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
> 'dbID' => '2853',
> 'name' => undef,
> 'description' => 'HYPERTENSION, ESSENTIAL,
> SUSCEPTIBILITY TO'
> }, 'Bio::EnsEMBL::Variation::Phenotype' );
> $VAR1 = bless( {
> 'adaptor' =>
> 'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
> 'dbID' => '20384',
> 'name' => undef,
> 'description' =>
> 'Susceptibility_to_progression_to_renal_failure_in_IgA_nephropathy'
> }, 'Bio::EnsEMBL::Variation::Phenotype' );
> $VAR1 = bless( {
> 'adaptor' =>
> 'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
> 'dbID' => '20369',
> 'name' => undef,
> 'description' => 'Preeclampsia,_susceptibility_to'
> }, 'Bio::EnsEMBL::Variation::Phenotype' );
> $VAR1 = bless( {
> 'adaptor' =>
> 'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
> 'dbID' => '26451',
> 'name' => undef,
> 'description' =>
> 'HYPERTENSION,_ESSENTIAL,_SUSCEPTIBILITY_TO'
> }, 'Bio::EnsEMBL::Variation::Phenotype' );
> $VAR1 = bless( {
> 'adaptor' =>
> 'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
> 'dbID' => '1',
> 'name' => 'HGMD_MUTATION',
> 'description' => 'Annotated by HGMD but no phenotype
> description is publicly available'
> }, 'Bio::EnsEMBL::Variation::Phenotype' );
> $VAR1 = bless( {
> 'adaptor' =>
> 'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
> 'dbID' => '6522',
> 'name' => undef,
> 'description' => 'COSMIC:tumour_site:large_intestine'
> }, 'Bio::EnsEMBL::Variation::Phenotype' );
> $VAR1 = bless( {
> 'adaptor' =>
> 'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
> 'dbID' => '6529',
> 'name' => undef,
> 'description' => 'COSMIC:tumour_site:breast'
> }, 'Bio::EnsEMBL::Variation::Phenotype' );
> $VAR1 = bless( {
> 'adaptor' =>
> 'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
> 'dbID' => '2853',
> 'name' => undef,
> 'description' => 'HYPERTENSION, ESSENTIAL,
> SUSCEPTIBILITY TO'
> }, 'Bio::EnsEMBL::Variation::Phenotype' );
> $VAR1 = bless( {
> 'adaptor' =>
> 'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
> 'dbID' => '20384',
> 'name' => undef,
> 'description' =>
> 'Susceptibility_to_progression_to_renal_failure_in_IgA_nephropathy'
> }, 'Bio::EnsEMBL::Variation::Phenotype' );
> $VAR1 = bless( {
> 'adaptor' =>
> 'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
> 'dbID' => '20369',
> 'name' => undef,
> 'description' => 'Preeclampsia,_susceptibility_to'
> }, 'Bio::EnsEMBL::Variation::Phenotype' );
> $VAR1 = bless( {
> 'adaptor' =>
> 'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
> 'dbID' => '26451',
> 'name' => undef,
> 'description' =>
> 'HYPERTENSION,_ESSENTIAL,_SUSCEPTIBILITY_TO'
> }, 'Bio::EnsEMBL::Variation::Phenotype' );
> $VAR1 = bless( {
> 'adaptor' =>
> 'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
> 'dbID' => '1',
> 'name' => 'HGMD_MUTATION',
> 'description' => 'Annotated by HGMD but no phenotype
> description is publicly available'
> }, 'Bio::EnsEMBL::Variation::Phenotype' );
> $VAR1 = bless( {
> 'adaptor' =>
> 'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
> 'dbID' => '6522',
> 'name' => undef,
> 'description' => 'COSMIC:tumour_site:large_intestine'
> }, 'Bio::EnsEMBL::Variation::Phenotype' );
> $VAR1 = bless( {
> 'adaptor' =>
> 'Bio::EnsEMBL::Variation::DBSQL::PhenotypeAdaptor=HASH(0x43c39a8)',
> 'dbID' => '6529',
> 'name' => undef,
> 'description' => 'COSMIC:tumour_site:breast'
> }, 'Bio::EnsEMBL::Variation::Phenotype' );
> 2015-03-26 10:51:38 - Processed 1 total variants (1 vars/sec, 1
> vars/sec total)
> 2015-03-26 10:51:38 - Wrote stats summary to
> variant_effect_output.txt_summary.html
> 2015-03-26 10:51:38 - Finished!
>
> On 26 March 2015 at 10:43, Guillermo Marco Puche
> <guillermo.marco at sistemasgenomicos.com
> <mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
>
> Hello Will,
>
> I already had enabled "check_existing" on my VEP config template,
> however I followed your advice and updated code to force in the
> new() method with your code.
> I'm still getting no prints of line 64:
>
> printDumper($pf->phenotype());
>
> Are you getting any output printed? As I said I get no errors but
> nothing is printed neither. This data dumper should be printing
> result of phenotype() method call.
>
> Regards,
> Guillermo.
>
>
>
> On 26/03/15 11:05, Will McLaren wrote:
>> I think perhaps you haven't enabled --check_existing; this is
>> required for $vf->{existing} to get populated.
>>
>> You can force it on in the new() method of your plugin:
>>
>> $self->{config}->{check_existing} = 1;
>>
>> It then works for me on release/75 and release/79.
>>
>> Will
>>
>> On 25 March 2015 at 17:35, Guillermo Marco Puche
>> <guillermo.marco at sistemasgenomicos.com
>> <mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
>>
>> Hello Will,
>>
>> With your explanations I'm trying to call phenotype (as you
>> said I was accessing the hashref directly).
>> I'm using input set you linked. However my local Ensembl
>> installation is v75.
>>
>> This is the code of the plugin:
>> https://github.com/guillermomarco/vep/blob/master/Clinvar.pm
>>
>> I'm getting absolutelty no info nor errors. I've no idea if
>> this is an issue with my database/API version or with the
>> plugin code itself.
>>
>> Regards,
>> Guillermo.
>>
>>
>>
>> On 16/03/15 17:50, Will McLaren wrote:
>>> The "is_significant" field is an internal flag that doesn't
>>> necessarily have the meaning you expect; it is used to
>>> distinguish between genuine reported associations and e.g.
>>> non-significant associations reported from genome-wide studies.
>>>
>>> You should not see undef for phenotype; I suspect you are
>>> accessing the hashref directly ($pf->{phenotype}) rather
>>> than making the method call ($pf->phenotype()).
>>>
>>> You could try
>>> ftp://ftp.ensembl.org/pub/release-79/variation/vcf/homo_sapiens/Homo_sapiens_clinically_associated.vcf.gz
>>> as a test input set.
>>>
>>> Will
>>>
>>> On 16 March 2015 at 16:39, Guillermo Marco Puche
>>> <guillermo.marco at sistemasgenomicos.com
>>> <mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
>>>
>>> Hi Will,
>>>
>>> Thank you for your quick response! Very clarifying.
>>>
>>> I guess that the way to retrieve ClinVar data I posted
>>> is correct. With my test dataset I've only seen
>>> "is_significant" values of "1" and undef 'phenotype'
>>> values. I think I need a synthetic vcf with ClinVar
>>> annotation variants to very that the plugin is working.
>>>
>>> I've been looking on Ensembl website for a test dataset.
>>> I think you don't provide any right? Correct me if I'm
>>> wrong.
>>>
>>> Thanks!
>>>
>>> Regards,
>>> Guillermo.
>>>
>>>
>>> On 16/03/15 16:16, Will McLaren wrote:
>>>> Hi Guillermo,
>>>>
>>>> To get the rest of that data in the table you need to
>>>> access the additional attributes of the
>>>> PhenotypeFeature object, something like:
>>>>
>>>> my $attr = $pfs->[0]->get_all_attributes;
>>>> print "$_:".$attr->{$_}."\t" for keys %$attr;
>>>> print "\n;
>>>>
>>>> Regards
>>>>
>>>> Will
>>>>
>>>> More info: the reason these data are stored as
>>>> attributes is due to the diverse data sources and types
>>>> that we import into our phenotype schema; to create a
>>>> database column and corresponding API method for each
>>>> data type (p-value, review status, risk allele,
>>>> external ID etc etc) would be cumbersome and
>>>> inefficient. To this end we provide a few methods that
>>>> shortcut the attribute approach for the most common
>>>> data types; everything else must be accessed through
>>>> the attributes method. This is a common theme across
>>>> the Ensembl API.
>>>>
>>>> On 13 March 2015 at 12:03, Guillermo Marco Puche
>>>> <guillermo.marco at sistemasgenomicos.com
>>>> <mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I'm trying to retrieve ClinVar information with the
>>>> code example you provided.
>>>>
>>>> my $self = shift;
>>>> my $tva = shift;
>>>> my $vf = $tva->variation_feature;
>>>> my $pfa =
>>>> $self->{config}->{reg}->get_adaptor('human','variation','phenotypefeature');
>>>>
>>>> foreach my $known_var(@{$vf->{existing} || []}) {
>>>> foreach my
>>>> $pf(@{$pfa->fetch_all_by_object_id($known_var->{variation_name})})
>>>> {
>>>> if ($pf->{'source'} eq "dbSNP_ClinVar"){
>>>> print
>>>> "$pf->{'source'}\t$pf->{'external_id'}\t$pf->{'is_significant'}\t$pf->{'phenotype'}\n",
>>>> ;
>>>> }
>>>> }
>>>> }
>>>>
>>>> As you can see I'm "filtering" the results to only
>>>> output phenotype feature when source is
>>>> dbSNP_ClinVar. I don't know why but I guess
>>>> filtering should be done when doing the "fetch_all".
>>>>
>>>> On the other hand I'm trying to retrieve Disease,
>>>> Source and Clinical Significance from this example
>>>> table:
>>>> http://www.ensembl.org/Homo_sapiens/Variation/Phenotype?db=core;r=8:19955518-19956518;v=rs268;vdb=variation;vf=266
>>>>
>>>> I think I'm doing something wrong I got totally
>>>> lost in Phenotypefeature.
>>>>
>>>> Regards,
>>>> Guillermo.
>>>>
>>>>
>>>> On 02/03/15 16:05, Will McLaren wrote:
>>>>> If you enable the --check_existing flag when you
>>>>> run the VEP, you'll be able to see any known
>>>>> co-located variants attached to the
>>>>> VariationFeature object in your plugin:
>>>>>
>>>>> sub run {
>>>>> my $self = shift;
>>>>> my $tva = shift;
>>>>> my $vf = $tva->variation_feature;
>>>>>
>>>>> foreach my $known_var(@{$vf->{existing} || []}) {
>>>>> # do stuff
>>>>> }
>>>>> }
>>>>>
>>>>> The $known_var is not an API object but a simple
>>>>> hashref with a number of fields; you're probably
>>>>> interested in $known_var->{clin_sig}
>>>>>
>>>>> However, as I mentioned, this is the only data
>>>>> that is stored in the cache. To access the rating
>>>>> and the specific disease association, you'll need
>>>>> to make calls to the database by getting an
>>>>> adaptor, something like:
>>>>>
>>>>> sub run {
>>>>> my $self = shift;
>>>>> my $tva = shift;
>>>>> my $vf = $tva->variation_feature;
>>>>> my $pfa =
>>>>> $self->{config}->{reg}->get_adaptor('human','variation','phenotypefeature');
>>>>>
>>>>> foreach my $known_var(@{$vf->{existing} || []}) {
>>>>> foreach my
>>>>> $pf(@{$pfa->fetch_all_by_object_id($known_var->{variation_name})})
>>>>> {
>>>>> # do stuff
>>>>> }
>>>>> }
>>>>> }
>>>>>
>>>>> Be aware that this will access the database, so
>>>>> unless you have a local copy please don't run this
>>>>> sort of code on genome-wide VCFs using our public
>>>>> DB server.
>>>>>
>>>>> Regards
>>>>>
>>>>> Will
>>>>>
>>>>> On 2 March 2015 at 14:47, Guillermo Marco Puche
>>>>> <guillermo.marco at sistemasgenomicos.com
>>>>> <mailto:guillermo.marco at sistemasgenomicos.com>> wrote:
>>>>>
>>>>> Hi Will,
>>>>>
>>>>> Indeed I'm looking to retrieve this
>>>>> information from VEP plugin.
>>>>>
>>>>> Regards,
>>>>> Guillermo.
>>>>>
>>>>>
>>>>> On 02/03/15 15:25, Will McLaren wrote:
>>>>>> Hi Guillermo,
>>>>>>
>>>>>> The detailed ClinVar information is stored
>>>>>> against PhenotypeFeature objects (each
>>>>>> SNP/disease pairing gets its own entry in
>>>>>> ClinVar, e.g.
>>>>>> http://www.ncbi.nlm.nih.gov/clinvar/RCV000019691.2,
>>>>>> http://www.ncbi.nlm.nih.gov/clinvar/RCV000019692.2/,
>>>>>> http://www.ncbi.nlm.nih.gov/clinvar/RCV000019693.2/
>>>>>> for rs699).
>>>>>>
>>>>>> The rating (and indeed the clinical
>>>>>> significance) is stored as an attribute on
>>>>>> the PhenotypeFeature object; you can retrieve
>>>>>> this with the get_all_attributes() method.
>>>>>>
>>>>>> See
>>>>>> http://www.ensembl.org/info/docs/Doxygen/variation-api/classBio_1_1EnsEMBL_1_1Variation_1_1PhenotypeFeature.html
>>>>>> and
>>>>>> http://www.ensembl.org/info/docs/api/variation/variation_tutorial.html#phenotype
>>>>>> for more info.
>>>>>>
>>>>>> Bio::EnsEMBL::Variation::Utils::VEP::get_clin_sig()
>>>>>> is an internal method that you should not use.
>>>>>>
>>>>>> The VEP cache contains the list of clinical
>>>>>> significance states for each variant, but
>>>>>> neither the disease association or the
>>>>>> rating. If you want help getting access to
>>>>>> this data via a plugin, let me know as it's a
>>>>>> little more involved than the API methods
>>>>>> above (though it is faster as no database
>>>>>> access is required).
>>>>>>
>>>>>> Regards
>>>>>>
>>>>>> Will McLaren
>>>>>> Ensembl Variation
>>>>>>
>>>>>> On 2 March 2015 at 14:06, Guillermo Marco
>>>>>> Puche <guillermo.marco at sistemasgenomicos.com
>>>>>> <mailto:guillermo.marco at sistemasgenomicos.com>>
>>>>>> wrote:
>>>>>>
>>>>>> Dear devs,
>>>>>>
>>>>>> I'm looking forward to retrieve ClinVar
>>>>>> information and add it to VEP annotation.
>>>>>> From my understanding I should be able to
>>>>>> retrieve "Clinical significance" and
>>>>>> "ClinVar Rating".
>>>>>>
>>>>>> I've been looking the Varation API, and
>>>>>> I'm confused. I guess for significance I
>>>>>> should use
>>>>>> Bio::EnsEMBL::Variation::Utils::VEP::get_clin_sig()
>>>>>> or
>>>>>> Bio::EnsEMBL::Variation::VariationFeature::get_all_clinical_significance_states().
>>>>>>
>>>>>> What about ClinVar rating? Is it possible
>>>>>> to retrieve it from API?
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> Regards,
>>>>>> Guillermo.
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Dev mailing list Dev at ensembl.org
>>>>>> <mailto:Dev at ensembl.org>
>>>>>> Posting guidelines and
>>>>>> subscribe/unsubscribe info:
>>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>>
>>>>>>
>>>>>>
>>>
>>> _______________________________________________
>>> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>>> Posting guidelines and subscribe/unsubscribe info:
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Dev mailing listDev at ensembl.org <mailto:Dev at ensembl.org>
>>> Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog:http://www.ensembl.info/
>>
>> _______________________________________________
>> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>>
>>
>> _______________________________________________
>> Dev mailing listDev at ensembl.org <mailto:Dev at ensembl.org>
>> Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog:http://www.ensembl.info/
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150326/77e7f888/attachment.html>
More information about the Dev
mailing list