[ensembl-dev] trouble fetching all phenotypic variants

Nicole Washington nlwashington at lbl.gov
Wed Aug 28 20:26:43 BST 2013


Ah, getting better, but....

it seems to hang after fetching 792,554 features, never breaking out of the loop and never printing the last print statement. 

here's my revision based on your suggestion:

while ($fetched < $limit) {
  $it_counter++;
  next unless $it->has_next();
  my $var = $it->next();
  print STDOUT "Found $rsid, number $fetched.\n" if ($var->name() eq $rsid);
  print STDOUT "." if ($fetched % 100 == 0);
  print STDOUT "$it_counter($fetched)." if ($it_counter % 1000 == 0);
  $fetched++;
}
print "There are $fetched variants in the \'$variant_set\'.\n";


my last bit of output looked like:

794000(790554)...........795000(791554)...........796000(792554).......

and then just hung for a long time.

 is that expected?  the above is completed in under an hour.  should i let it run for several hours?

nicole


On Aug 28, 2013, at 3:08 AM, Sarah Hunt wrote:

> Hi Nicole,
> 
> It looks as if the problem is in the line :
> 
> while ($fetched < $limit  && $it->has_next()) {
> 
> The iterator logic finds the first and last relevant variant ids and
> divides the range into equal chunks to return. If one of these chunks
> has no variants, $it->has_next() returns false.
> 
> Try replacing the line with this::
> 
> while ($fetched < $limit ){
>    next unless $it->has_next();
> 
> 
> Best wishes,
> 
> Sarah
> 
> On Tue, Aug 27, 2013 at 5:56 PM, Nicole Washington <nlwashington at lbl.gov> wrote:
>> hi,
>> 
>> i'm having a bit of trouble fetching all of the phenotypic variants.
>> 
>> i've written a script, following the example in your documentation, to fetch
>> all the variants in 'ph_variants' set using an iterator from VariantSets.
>> but i don't think it's enough variants...i only seem to fetch ~5068.
>> 
>> in particular, i've tried looking for some specific rs ids in my output that
>> have phenotypes, and they don't show up.  for example, take rs10757274.
>> http://uswest.ensembl.org/Homo_sapiens/Variation/Phenotype?db=core;r=9:22095555-22096555;v=rs10757274;vdb=variation;vf=7132897
>> if i look at my output for rs10757274, it doesn't show up.  however, if i
>> specifically query for that variant (using a variant adaptor directly), i
>> can fetch it and it's VariantFeatures, and it says it is in the ph_variants
>> set.
>> 
>> not sure i understand this discrepancy.  perhaps the recursion (fetching of
>> the subfeatures) in the API isn't working?
>> 
>> here's a simple version of my code that just counts the variations found.
>> it ought to print a statement if it finds the rsid, but it never prints it.
>> i also include the code that fetches that rsid directly, where it shows all
>> the variationsets it's found in.
>> 
>> note, i'm using ensembl v72.
>> 
>> any ideas?
>> 
>> Nicole
>> 
>> 
>> ##########  OUTPUT  ################
>> 
>> Fetching ensembl variants.
>> Started: Tue Aug 27 09:30:59 2013
>> Initializing...
>>  Connecting to Ensembl Variation DB...0 but trueDone.
>> ...................................................There are 5068 variants
>> in the 'ph_variants'.
>> 
>> rs10757274 found in the following sets:
>> 1kg_amr,Cardio-Metabo_Chip,1kg_amr_com,hapmap_hcb,1kg_asn,1kg_eur,ind_watson,hapmap_yri,1kg_afr_com,1kg_asn_com,ind_angrist,1kg_eur_com,ind_yh,hapmap,ph_variants,ph_nhgri,ind_gill,ind_venter,1kg_afr,HumanOmni1-Quad,HumanOmni2.5,ind_sjk,1kg_com,hapmap_jpt,ph_omim,1kg,Illumina_1M-duo
>> 
>> 
>> ##########  CODE FOR VARIANTSETS  ################
>> 
>> $registry->load_registry_from_db(
>>        -host => 'useastdb.ensembl.org',
>> #-host => 'ensembldb.ensembl.org', #this is ~4x slower
>>        -user => 'anonymous',
>>        -port => '5306'
>>        );
>> print STDOUT "Done.\n";
>> 
>> my $species = 'homo_sapiens';
>> 
>> my $vs_adaptor =
>> $registry->get_adaptor($species,'variation','variationset');
>> 
>> my $variant_set = "ph_variants";  #variants from all sources
>> my $vs = $vs_adaptor->fetch_by_short_name($variant_set);
>> 
>> my $limit = 999999;  #
>> my $fetched = 0;
>> my $it = $vs->get_Variation_Iterator();
>> 
>> #for testing, only print the first $limit
>> while ($fetched < $limit && $it->has_next()) {
>>  my $var = $it->next();
>>  print STDOUT "Found $rsid, number $fetched.\n" if ($var->name() eq $rsid);
>>  print STDOUT "." if ($fetched % 100 == 0);
>>  $fetched++;
>> }
>> print "There are $fetched variants in the \'$variant_set\'.\n";
>> 
>> ##########  CODE FOR VARIANT ################
>> 
>> my $rsid = 'rs10757274';
>> my $variation = $variation_adaptor->fetch_by_name($rsid);
>> 
>> my @sets = ();
>> for my $vf (@{$variation->get_all_VariationFeatures()}) {
>>  for my $vs (@{$vf->get_all_VariationSets()}) {
>>    push(@sets,$vs->short_name());
>>  }
>> }
>> print "$rsid found in the following sets: " . join(",",uniq @sets) . "\n";
>> 
>> 
>> 
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>> 
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/





More information about the Dev mailing list