[ensembl-dev] trouble fetching all phenotypic variants

Nicole Washington nlwashington at lbl.gov
Fri Aug 30 15:28:17 BST 2013


Hi, I just wanted to check back to see if there was anything possibly erroneous with the iterator on the server side?

I am wondering if the problem is that i never reach the limit so therefore it just goes into an infinite loop?  How in the world do I break out of the loop if i'm not allowed to use $it->has_next() in the while statement?

On Aug 28, 2013, at 12:26 PM, Nicole Washington wrote:

> Ah, getting better, but....
> 
> it seems to hang after fetching 792,554 features, never breaking out of the loop and never printing the last print statement. 
> 
> here's my revision based on your suggestion:
> 
> while ($fetched < $limit) {
>  $it_counter++;
>  next unless $it->has_next();
>  my $var = $it->next();
>  print STDOUT "Found $rsid, number $fetched.\n" if ($var->name() eq $rsid);
>  print STDOUT "." if ($fetched % 100 == 0);
>  print STDOUT "$it_counter($fetched)." if ($it_counter % 1000 == 0);
>  $fetched++;
> }
> print "There are $fetched variants in the \'$variant_set\'.\n";
> 
> 
> my last bit of output looked like:
> 
> 794000(790554)...........795000(791554)...........796000(792554).......
> 
> and then just hung for a long time.
> 
> is that expected?  the above is completed in under an hour.  should i let it run for several hours?
> 
> nicole
> 
> 
> On Aug 28, 2013, at 3:08 AM, Sarah Hunt wrote:
> 
>> Hi Nicole,
>> 
>> It looks as if the problem is in the line :
>> 
>> while ($fetched < $limit  && $it->has_next()) {
>> 
>> The iterator logic finds the first and last relevant variant ids and
>> divides the range into equal chunks to return. If one of these chunks
>> has no variants, $it->has_next() returns false.
>> 
>> Try replacing the line with this::
>> 
>> while ($fetched < $limit ){
>>   next unless $it->has_next();
>> 
>> 
>> Best wishes,
>> 
>> Sarah
>> 
>> On Tue, Aug 27, 2013 at 5:56 PM, Nicole Washington <nlwashington at lbl.gov> wrote:
>>> hi,
>>> 
>>> i'm having a bit of trouble fetching all of the phenotypic variants.
>>> 
>>> i've written a script, following the example in your documentation, to fetch
>>> all the variants in 'ph_variants' set using an iterator from VariantSets.
>>> but i don't think it's enough variants...i only seem to fetch ~5068.
>>> 
>>> in particular, i've tried looking for some specific rs ids in my output that
>>> have phenotypes, and they don't show up.  for example, take rs10757274.
>>> http://uswest.ensembl.org/Homo_sapiens/Variation/Phenotype?db=core;r=9:22095555-22096555;v=rs10757274;vdb=variation;vf=7132897
>>> if i look at my output for rs10757274, it doesn't show up.  however, if i
>>> specifically query for that variant (using a variant adaptor directly), i
>>> can fetch it and it's VariantFeatures, and it says it is in the ph_variants
>>> set.
>>> 
>>> not sure i understand this discrepancy.  perhaps the recursion (fetching of
>>> the subfeatures) in the API isn't working?
>>> 
>>> here's a simple version of my code that just counts the variations found.
>>> it ought to print a statement if it finds the rsid, but it never prints it.
>>> i also include the code that fetches that rsid directly, where it shows all
>>> the variationsets it's found in.
>>> 
>>> note, i'm using ensembl v72.
>>> 
>>> any ideas?
>>> 
>>> Nicole
>>> 
>>> 
>>> ##########  OUTPUT  ################
>>> 
>>> Fetching ensembl variants.
>>> Started: Tue Aug 27 09:30:59 2013
>>> Initializing...
>>> Connecting to Ensembl Variation DB...0 but trueDone.
>>> ...................................................There are 5068 variants
>>> in the 'ph_variants'.
>>> 
>>> rs10757274 found in the following sets:
>>> 1kg_amr,Cardio-Metabo_Chip,1kg_amr_com,hapmap_hcb,1kg_asn,1kg_eur,ind_watson,hapmap_yri,1kg_afr_com,1kg_asn_com,ind_angrist,1kg_eur_com,ind_yh,hapmap,ph_variants,ph_nhgri,ind_gill,ind_venter,1kg_afr,HumanOmni1-Quad,HumanOmni2.5,ind_sjk,1kg_com,hapmap_jpt,ph_omim,1kg,Illumina_1M-duo
>>> 
>>> 
>>> ##########  CODE FOR VARIANTSETS  ################
>>> 
>>> $registry->load_registry_from_db(
>>>       -host => 'useastdb.ensembl.org',
>>> #-host => 'ensembldb.ensembl.org', #this is ~4x slower
>>>       -user => 'anonymous',
>>>       -port => '5306'
>>>       );
>>> print STDOUT "Done.\n";
>>> 
>>> my $species = 'homo_sapiens';
>>> 
>>> my $vs_adaptor =
>>> $registry->get_adaptor($species,'variation','variationset');
>>> 
>>> my $variant_set = "ph_variants";  #variants from all sources
>>> my $vs = $vs_adaptor->fetch_by_short_name($variant_set);
>>> 
>>> my $limit = 999999;  #
>>> my $fetched = 0;
>>> my $it = $vs->get_Variation_Iterator();
>>> 
>>> #for testing, only print the first $limit
>>> while ($fetched < $limit && $it->has_next()) {
>>> my $var = $it->next();
>>> print STDOUT "Found $rsid, number $fetched.\n" if ($var->name() eq $rsid);
>>> print STDOUT "." if ($fetched % 100 == 0);
>>> $fetched++;
>>> }
>>> print "There are $fetched variants in the \'$variant_set\'.\n";
>>> 
>>> ##########  CODE FOR VARIANT ################
>>> 
>>> my $rsid = 'rs10757274';
>>> my $variation = $variation_adaptor->fetch_by_name($rsid);
>>> 
>>> my @sets = ();
>>> for my $vf (@{$variation->get_all_VariationFeatures()}) {
>>> for my $vs (@{$vf->get_all_VariationSets()}) {
>>>   push(@sets,$vs->short_name());
>>> }
>>> }
>>> print "$rsid found in the following sets: " . join(",",uniq @sets) . "\n";
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info:
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>> 
>> 
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
> 





More information about the Dev mailing list