[ensembl-dev] trouble fetching all phenotypic variants
Nicole Washington
nlwashington at lbl.gov
Fri Aug 30 15:56:30 BST 2013
thanks so much, sarah. i will update and give it a try in a bit (though you'll probably be asleep).
On Aug 30, 2013, at 7:45 AM, Sarah Hunt wrote:
> Hi Nicole,
>
> Sorry about the delay in getting back to you.
>
> You are right there was a problem with the iteration. I've just
> checked a fix to VariationAdaptor.pm
> into branch 72 CVS. Can you update your version of this file?
>
>
> In my test script I was ending the loop using :
>
> while ($fetched < $limit ) {
>
> my $var = $it->next();
> last unless defined $var;
> print $var->name() . "\n";
>
>
> Best wishes,
>
>
> Sarah
>
> On Fri, Aug 30, 2013 at 3:28 PM, Nicole Washington <nlwashington at lbl.gov> wrote:
>> Hi, I just wanted to check back to see if there was anything possibly erroneous with the iterator on the server side?
>>
>> I am wondering if the problem is that i never reach the limit so therefore it just goes into an infinite loop? How in the world do I break out of the loop if i'm not allowed to use $it->has_next() in the while statement?
>>
>> On Aug 28, 2013, at 12:26 PM, Nicole Washington wrote:
>>
>>> Ah, getting better, but....
>>>
>>> it seems to hang after fetching 792,554 features, never breaking out of the loop and never printing the last print statement.
>>>
>>> here's my revision based on your suggestion:
>>>
>>> while ($fetched < $limit) {
>>> $it_counter++;
>>> next unless $it->has_next();
>>> my $var = $it->next();
>>> print STDOUT "Found $rsid, number $fetched.\n" if ($var->name() eq $rsid);
>>> print STDOUT "." if ($fetched % 100 == 0);
>>> print STDOUT "$it_counter($fetched)." if ($it_counter % 1000 == 0);
>>> $fetched++;
>>> }
>>> print "There are $fetched variants in the \'$variant_set\'.\n";
>>>
>>>
>>> my last bit of output looked like:
>>>
>>> 794000(790554)...........795000(791554)...........796000(792554).......
>>>
>>> and then just hung for a long time.
>>>
>>> is that expected? the above is completed in under an hour. should i let it run for several hours?
>>>
>>> nicole
>>>
>>>
>>> On Aug 28, 2013, at 3:08 AM, Sarah Hunt wrote:
>>>
>>>> Hi Nicole,
>>>>
>>>> It looks as if the problem is in the line :
>>>>
>>>> while ($fetched < $limit && $it->has_next()) {
>>>>
>>>> The iterator logic finds the first and last relevant variant ids and
>>>> divides the range into equal chunks to return. If one of these chunks
>>>> has no variants, $it->has_next() returns false.
>>>>
>>>> Try replacing the line with this::
>>>>
>>>> while ($fetched < $limit ){
>>>> next unless $it->has_next();
>>>>
>>>>
>>>> Best wishes,
>>>>
>>>> Sarah
>>>>
>>>> On Tue, Aug 27, 2013 at 5:56 PM, Nicole Washington <nlwashington at lbl.gov> wrote:
>>>>> hi,
>>>>>
>>>>> i'm having a bit of trouble fetching all of the phenotypic variants.
>>>>>
>>>>> i've written a script, following the example in your documentation, to fetch
>>>>> all the variants in 'ph_variants' set using an iterator from VariantSets.
>>>>> but i don't think it's enough variants...i only seem to fetch ~5068.
>>>>>
>>>>> in particular, i've tried looking for some specific rs ids in my output that
>>>>> have phenotypes, and they don't show up. for example, take rs10757274.
>>>>> http://uswest.ensembl.org/Homo_sapiens/Variation/Phenotype?db=core;r=9:22095555-22096555;v=rs10757274;vdb=variation;vf=7132897
>>>>> if i look at my output for rs10757274, it doesn't show up. however, if i
>>>>> specifically query for that variant (using a variant adaptor directly), i
>>>>> can fetch it and it's VariantFeatures, and it says it is in the ph_variants
>>>>> set.
>>>>>
>>>>> not sure i understand this discrepancy. perhaps the recursion (fetching of
>>>>> the subfeatures) in the API isn't working?
>>>>>
>>>>> here's a simple version of my code that just counts the variations found.
>>>>> it ought to print a statement if it finds the rsid, but it never prints it.
>>>>> i also include the code that fetches that rsid directly, where it shows all
>>>>> the variationsets it's found in.
>>>>>
>>>>> note, i'm using ensembl v72.
>>>>>
>>>>> any ideas?
>>>>>
>>>>> Nicole
>>>>>
>>>>>
>>>>> ########## OUTPUT ################
>>>>>
>>>>> Fetching ensembl variants.
>>>>> Started: Tue Aug 27 09:30:59 2013
>>>>> Initializing...
>>>>> Connecting to Ensembl Variation DB...0 but trueDone.
>>>>> ...................................................There are 5068 variants
>>>>> in the 'ph_variants'.
>>>>>
>>>>> rs10757274 found in the following sets:
>>>>> 1kg_amr,Cardio-Metabo_Chip,1kg_amr_com,hapmap_hcb,1kg_asn,1kg_eur,ind_watson,hapmap_yri,1kg_afr_com,1kg_asn_com,ind_angrist,1kg_eur_com,ind_yh,hapmap,ph_variants,ph_nhgri,ind_gill,ind_venter,1kg_afr,HumanOmni1-Quad,HumanOmni2.5,ind_sjk,1kg_com,hapmap_jpt,ph_omim,1kg,Illumina_1M-duo
>>>>>
>>>>>
>>>>> ########## CODE FOR VARIANTSETS ################
>>>>>
>>>>> $registry->load_registry_from_db(
>>>>> -host => 'useastdb.ensembl.org',
>>>>> #-host => 'ensembldb.ensembl.org', #this is ~4x slower
>>>>> -user => 'anonymous',
>>>>> -port => '5306'
>>>>> );
>>>>> print STDOUT "Done.\n";
>>>>>
>>>>> my $species = 'homo_sapiens';
>>>>>
>>>>> my $vs_adaptor =
>>>>> $registry->get_adaptor($species,'variation','variationset');
>>>>>
>>>>> my $variant_set = "ph_variants"; #variants from all sources
>>>>> my $vs = $vs_adaptor->fetch_by_short_name($variant_set);
>>>>>
>>>>> my $limit = 999999; #
>>>>> my $fetched = 0;
>>>>> my $it = $vs->get_Variation_Iterator();
>>>>>
>>>>> #for testing, only print the first $limit
>>>>> while ($fetched < $limit && $it->has_next()) {
>>>>> my $var = $it->next();
>>>>> print STDOUT "Found $rsid, number $fetched.\n" if ($var->name() eq $rsid);
>>>>> print STDOUT "." if ($fetched % 100 == 0);
>>>>> $fetched++;
>>>>> }
>>>>> print "There are $fetched variants in the \'$variant_set\'.\n";
>>>>>
>>>>> ########## CODE FOR VARIANT ################
>>>>>
>>>>> my $rsid = 'rs10757274';
>>>>> my $variation = $variation_adaptor->fetch_by_name($rsid);
>>>>>
>>>>> my @sets = ();
>>>>> for my $vf (@{$variation->get_all_VariationFeatures()}) {
>>>>> for my $vs (@{$vf->get_all_VariationSets()}) {
>>>>> push(@sets,$vs->short_name());
>>>>> }
>>>>> }
>>>>> print "$rsid found in the following sets: " . join(",",uniq @sets) . "\n";
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Dev mailing list Dev at ensembl.org
>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>
>>>>
>>>> _______________________________________________
>>>> Dev mailing list Dev at ensembl.org
>>>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>
>>
>> _______________________________________________
>> Dev mailing list Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
More information about the Dev
mailing list