[ensembl-dev] trouble fetching all phenotypic variants

Sarah Hunt seh at ebi.ac.uk
Fri Aug 30 15:45:48 BST 2013


Hi Nicole,

Sorry about the delay in getting back to you.

You are right there was a problem with the iteration. I've just
checked a fix to VariationAdaptor.pm
into branch 72 CVS. Can you update your version of this file?


In my test script I was ending the loop using :

while ($fetched < $limit ) {

    my $var = $it->next();
    last unless defined $var;
  print $var->name() . "\n";


Best wishes,


Sarah

On Fri, Aug 30, 2013 at 3:28 PM, Nicole Washington <nlwashington at lbl.gov> wrote:
> Hi, I just wanted to check back to see if there was anything possibly erroneous with the iterator on the server side?
>
> I am wondering if the problem is that i never reach the limit so therefore it just goes into an infinite loop?  How in the world do I break out of the loop if i'm not allowed to use $it->has_next() in the while statement?
>
> On Aug 28, 2013, at 12:26 PM, Nicole Washington wrote:
>
>> Ah, getting better, but....
>>
>> it seems to hang after fetching 792,554 features, never breaking out of the loop and never printing the last print statement.
>>
>> here's my revision based on your suggestion:
>>
>> while ($fetched < $limit) {
>>  $it_counter++;
>>  next unless $it->has_next();
>>  my $var = $it->next();
>>  print STDOUT "Found $rsid, number $fetched.\n" if ($var->name() eq $rsid);
>>  print STDOUT "." if ($fetched % 100 == 0);
>>  print STDOUT "$it_counter($fetched)." if ($it_counter % 1000 == 0);
>>  $fetched++;
>> }
>> print "There are $fetched variants in the \'$variant_set\'.\n";
>>
>>
>> my last bit of output looked like:
>>
>> 794000(790554)...........795000(791554)...........796000(792554).......
>>
>> and then just hung for a long time.
>>
>> is that expected?  the above is completed in under an hour.  should i let it run for several hours?
>>
>> nicole
>>
>>
>> On Aug 28, 2013, at 3:08 AM, Sarah Hunt wrote:
>>
>>> Hi Nicole,
>>>
>>> It looks as if the problem is in the line :
>>>
>>> while ($fetched < $limit  && $it->has_next()) {
>>>
>>> The iterator logic finds the first and last relevant variant ids and
>>> divides the range into equal chunks to return. If one of these chunks
>>> has no variants, $it->has_next() returns false.
>>>
>>> Try replacing the line with this::
>>>
>>> while ($fetched < $limit ){
>>>   next unless $it->has_next();
>>>
>>>
>>> Best wishes,
>>>
>>> Sarah
>>>
>>> On Tue, Aug 27, 2013 at 5:56 PM, Nicole Washington <nlwashington at lbl.gov> wrote:
>>>> hi,
>>>>
>>>> i'm having a bit of trouble fetching all of the phenotypic variants.
>>>>
>>>> i've written a script, following the example in your documentation, to fetch
>>>> all the variants in 'ph_variants' set using an iterator from VariantSets.
>>>> but i don't think it's enough variants...i only seem to fetch ~5068.
>>>>
>>>> in particular, i've tried looking for some specific rs ids in my output that
>>>> have phenotypes, and they don't show up.  for example, take rs10757274.
>>>> http://uswest.ensembl.org/Homo_sapiens/Variation/Phenotype?db=core;r=9:22095555-22096555;v=rs10757274;vdb=variation;vf=7132897
>>>> if i look at my output for rs10757274, it doesn't show up.  however, if i
>>>> specifically query for that variant (using a variant adaptor directly), i
>>>> can fetch it and it's VariantFeatures, and it says it is in the ph_variants
>>>> set.
>>>>
>>>> not sure i understand this discrepancy.  perhaps the recursion (fetching of
>>>> the subfeatures) in the API isn't working?
>>>>
>>>> here's a simple version of my code that just counts the variations found.
>>>> it ought to print a statement if it finds the rsid, but it never prints it.
>>>> i also include the code that fetches that rsid directly, where it shows all
>>>> the variationsets it's found in.
>>>>
>>>> note, i'm using ensembl v72.
>>>>
>>>> any ideas?
>>>>
>>>> Nicole
>>>>
>>>>
>>>> ##########  OUTPUT  ################
>>>>
>>>> Fetching ensembl variants.
>>>> Started: Tue Aug 27 09:30:59 2013
>>>> Initializing...
>>>> Connecting to Ensembl Variation DB...0 but trueDone.
>>>> ...................................................There are 5068 variants
>>>> in the 'ph_variants'.
>>>>
>>>> rs10757274 found in the following sets:
>>>> 1kg_amr,Cardio-Metabo_Chip,1kg_amr_com,hapmap_hcb,1kg_asn,1kg_eur,ind_watson,hapmap_yri,1kg_afr_com,1kg_asn_com,ind_angrist,1kg_eur_com,ind_yh,hapmap,ph_variants,ph_nhgri,ind_gill,ind_venter,1kg_afr,HumanOmni1-Quad,HumanOmni2.5,ind_sjk,1kg_com,hapmap_jpt,ph_omim,1kg,Illumina_1M-duo
>>>>
>>>>
>>>> ##########  CODE FOR VARIANTSETS  ################
>>>>
>>>> $registry->load_registry_from_db(
>>>>       -host => 'useastdb.ensembl.org',
>>>> #-host => 'ensembldb.ensembl.org', #this is ~4x slower
>>>>       -user => 'anonymous',
>>>>       -port => '5306'
>>>>       );
>>>> print STDOUT "Done.\n";
>>>>
>>>> my $species = 'homo_sapiens';
>>>>
>>>> my $vs_adaptor =
>>>> $registry->get_adaptor($species,'variation','variationset');
>>>>
>>>> my $variant_set = "ph_variants";  #variants from all sources
>>>> my $vs = $vs_adaptor->fetch_by_short_name($variant_set);
>>>>
>>>> my $limit = 999999;  #
>>>> my $fetched = 0;
>>>> my $it = $vs->get_Variation_Iterator();
>>>>
>>>> #for testing, only print the first $limit
>>>> while ($fetched < $limit && $it->has_next()) {
>>>> my $var = $it->next();
>>>> print STDOUT "Found $rsid, number $fetched.\n" if ($var->name() eq $rsid);
>>>> print STDOUT "." if ($fetched % 100 == 0);
>>>> $fetched++;
>>>> }
>>>> print "There are $fetched variants in the \'$variant_set\'.\n";
>>>>
>>>> ##########  CODE FOR VARIANT ################
>>>>
>>>> my $rsid = 'rs10757274';
>>>> my $variation = $variation_adaptor->fetch_by_name($rsid);
>>>>
>>>> my @sets = ();
>>>> for my $vf (@{$variation->get_all_VariationFeatures()}) {
>>>> for my $vs (@{$vf->get_all_VariationSets()}) {
>>>>   push(@sets,$vs->short_name());
>>>> }
>>>> }
>>>> print "$rsid found in the following sets: " . join(",",uniq @sets) . "\n";
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Dev mailing list    Dev at ensembl.org
>>>> Posting guidelines and subscribe/unsubscribe info:
>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>> Ensembl Blog: http://www.ensembl.info/
>>>>
>>>
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/




More information about the Dev mailing list