[ensembl-dev] trouble fetching all phenotypic variants

Nicole Washington nlwashington at lbl.gov
Fri Aug 30 15:56:30 BST 2013


thanks so much, sarah.  i will update and give it a try in a bit (though you'll probably be asleep).


On Aug 30, 2013, at 7:45 AM, Sarah Hunt wrote:

> Hi Nicole,
> 
> Sorry about the delay in getting back to you.
> 
> You are right there was a problem with the iteration. I've just
> checked a fix to VariationAdaptor.pm
> into branch 72 CVS. Can you update your version of this file?
> 
> 
> In my test script I was ending the loop using :
> 
> while ($fetched < $limit ) {
> 
>    my $var = $it->next();
>    last unless defined $var;
>  print $var->name() . "\n";
> 
> 
> Best wishes,
> 
> 
> Sarah
> 
> On Fri, Aug 30, 2013 at 3:28 PM, Nicole Washington <nlwashington at lbl.gov> wrote:
>> Hi, I just wanted to check back to see if there was anything possibly erroneous with the iterator on the server side?
>> 
>> I am wondering if the problem is that i never reach the limit so therefore it just goes into an infinite loop?  How in the world do I break out of the loop if i'm not allowed to use $it->has_next() in the while statement?
>> 
>> On Aug 28, 2013, at 12:26 PM, Nicole Washington wrote:
>> 
>>> Ah, getting better, but....
>>> 
>>> it seems to hang after fetching 792,554 features, never breaking out of the loop and never printing the last print statement.
>>> 
>>> here's my revision based on your suggestion:
>>> 
>>> while ($fetched < $limit) {
>>> $it_counter++;
>>> next unless $it->has_next();
>>> my $var = $it->next();
>>> print STDOUT "Found $rsid, number $fetched.\n" if ($var->name() eq $rsid);
>>> print STDOUT "." if ($fetched % 100 == 0);
>>> print STDOUT "$it_counter($fetched)." if ($it_counter % 1000 == 0);
>>> $fetched++;
>>> }
>>> print "There are $fetched variants in the \'$variant_set\'.\n";
>>> 
>>> 
>>> my last bit of output looked like:
>>> 
>>> 794000(790554)...........795000(791554)...........796000(792554).......
>>> 
>>> and then just hung for a long time.
>>> 
>>> is that expected?  the above is completed in under an hour.  should i let it run for several hours?
>>> 
>>> nicole
>>> 
>>> 
>>> On Aug 28, 2013, at 3:08 AM, Sarah Hunt wrote:
>>> 
>>>> Hi Nicole,
>>>> 
>>>> It looks as if the problem is in the line :
>>>> 
>>>> while ($fetched < $limit  && $it->has_next()) {
>>>> 
>>>> The iterator logic finds the first and last relevant variant ids and
>>>> divides the range into equal chunks to return. If one of these chunks
>>>> has no variants, $it->has_next() returns false.
>>>> 
>>>> Try replacing the line with this::
>>>> 
>>>> while ($fetched < $limit ){
>>>>  next unless $it->has_next();
>>>> 
>>>> 
>>>> Best wishes,
>>>> 
>>>> Sarah
>>>> 
>>>> On Tue, Aug 27, 2013 at 5:56 PM, Nicole Washington <nlwashington at lbl.gov> wrote:
>>>>> hi,
>>>>> 
>>>>> i'm having a bit of trouble fetching all of the phenotypic variants.
>>>>> 
>>>>> i've written a script, following the example in your documentation, to fetch
>>>>> all the variants in 'ph_variants' set using an iterator from VariantSets.
>>>>> but i don't think it's enough variants...i only seem to fetch ~5068.
>>>>> 
>>>>> in particular, i've tried looking for some specific rs ids in my output that
>>>>> have phenotypes, and they don't show up.  for example, take rs10757274.
>>>>> http://uswest.ensembl.org/Homo_sapiens/Variation/Phenotype?db=core;r=9:22095555-22096555;v=rs10757274;vdb=variation;vf=7132897
>>>>> if i look at my output for rs10757274, it doesn't show up.  however, if i
>>>>> specifically query for that variant (using a variant adaptor directly), i
>>>>> can fetch it and it's VariantFeatures, and it says it is in the ph_variants
>>>>> set.
>>>>> 
>>>>> not sure i understand this discrepancy.  perhaps the recursion (fetching of
>>>>> the subfeatures) in the API isn't working?
>>>>> 
>>>>> here's a simple version of my code that just counts the variations found.
>>>>> it ought to print a statement if it finds the rsid, but it never prints it.
>>>>> i also include the code that fetches that rsid directly, where it shows all
>>>>> the variationsets it's found in.
>>>>> 
>>>>> note, i'm using ensembl v72.
>>>>> 
>>>>> any ideas?
>>>>> 
>>>>> Nicole
>>>>> 
>>>>> 
>>>>> ##########  OUTPUT  ################
>>>>> 
>>>>> Fetching ensembl variants.
>>>>> Started: Tue Aug 27 09:30:59 2013
>>>>> Initializing...
>>>>> Connecting to Ensembl Variation DB...0 but trueDone.
>>>>> ...................................................There are 5068 variants
>>>>> in the 'ph_variants'.
>>>>> 
>>>>> rs10757274 found in the following sets:
>>>>> 1kg_amr,Cardio-Metabo_Chip,1kg_amr_com,hapmap_hcb,1kg_asn,1kg_eur,ind_watson,hapmap_yri,1kg_afr_com,1kg_asn_com,ind_angrist,1kg_eur_com,ind_yh,hapmap,ph_variants,ph_nhgri,ind_gill,ind_venter,1kg_afr,HumanOmni1-Quad,HumanOmni2.5,ind_sjk,1kg_com,hapmap_jpt,ph_omim,1kg,Illumina_1M-duo
>>>>> 
>>>>> 
>>>>> ##########  CODE FOR VARIANTSETS  ################
>>>>> 
>>>>> $registry->load_registry_from_db(
>>>>>      -host => 'useastdb.ensembl.org',
>>>>> #-host => 'ensembldb.ensembl.org', #this is ~4x slower
>>>>>      -user => 'anonymous',
>>>>>      -port => '5306'
>>>>>      );
>>>>> print STDOUT "Done.\n";
>>>>> 
>>>>> my $species = 'homo_sapiens';
>>>>> 
>>>>> my $vs_adaptor =
>>>>> $registry->get_adaptor($species,'variation','variationset');
>>>>> 
>>>>> my $variant_set = "ph_variants";  #variants from all sources
>>>>> my $vs = $vs_adaptor->fetch_by_short_name($variant_set);
>>>>> 
>>>>> my $limit = 999999;  #
>>>>> my $fetched = 0;
>>>>> my $it = $vs->get_Variation_Iterator();
>>>>> 
>>>>> #for testing, only print the first $limit
>>>>> while ($fetched < $limit && $it->has_next()) {
>>>>> my $var = $it->next();
>>>>> print STDOUT "Found $rsid, number $fetched.\n" if ($var->name() eq $rsid);
>>>>> print STDOUT "." if ($fetched % 100 == 0);
>>>>> $fetched++;
>>>>> }
>>>>> print "There are $fetched variants in the \'$variant_set\'.\n";
>>>>> 
>>>>> ##########  CODE FOR VARIANT ################
>>>>> 
>>>>> my $rsid = 'rs10757274';
>>>>> my $variation = $variation_adaptor->fetch_by_name($rsid);
>>>>> 
>>>>> my @sets = ();
>>>>> for my $vf (@{$variation->get_all_VariationFeatures()}) {
>>>>> for my $vs (@{$vf->get_all_VariationSets()}) {
>>>>>  push(@sets,$vs->short_name());
>>>>> }
>>>>> }
>>>>> print "$rsid found in the following sets: " . join(",",uniq @sets) . "\n";
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> Dev mailing list    Dev at ensembl.org
>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>> 
>>>> 
>>>> _______________________________________________
>>>> Dev mailing list    Dev at ensembl.org
>>>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>>>> Ensembl Blog: http://www.ensembl.info/
>>> 
>> 
>> 
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/





More information about the Dev mailing list