[ensembl-dev] trouble fetching all phenotypic variants

Sarah Hunt seh at ebi.ac.uk
Wed Aug 28 11:08:37 BST 2013


Hi Nicole,

It looks as if the problem is in the line :

while ($fetched < $limit  && $it->has_next()) {

The iterator logic finds the first and last relevant variant ids and
divides the range into equal chunks to return. If one of these chunks
has no variants, $it->has_next() returns false.

Try replacing the line with this::

while ($fetched < $limit ){
    next unless $it->has_next();


Best wishes,

Sarah

On Tue, Aug 27, 2013 at 5:56 PM, Nicole Washington <nlwashington at lbl.gov> wrote:
> hi,
>
> i'm having a bit of trouble fetching all of the phenotypic variants.
>
> i've written a script, following the example in your documentation, to fetch
> all the variants in 'ph_variants' set using an iterator from VariantSets.
> but i don't think it's enough variants...i only seem to fetch ~5068.
>
> in particular, i've tried looking for some specific rs ids in my output that
> have phenotypes, and they don't show up.  for example, take rs10757274.
> http://uswest.ensembl.org/Homo_sapiens/Variation/Phenotype?db=core;r=9:22095555-22096555;v=rs10757274;vdb=variation;vf=7132897
> if i look at my output for rs10757274, it doesn't show up.  however, if i
> specifically query for that variant (using a variant adaptor directly), i
> can fetch it and it's VariantFeatures, and it says it is in the ph_variants
> set.
>
> not sure i understand this discrepancy.  perhaps the recursion (fetching of
> the subfeatures) in the API isn't working?
>
> here's a simple version of my code that just counts the variations found.
> it ought to print a statement if it finds the rsid, but it never prints it.
> i also include the code that fetches that rsid directly, where it shows all
> the variationsets it's found in.
>
> note, i'm using ensembl v72.
>
> any ideas?
>
> Nicole
>
>
> ##########  OUTPUT  ################
>
> Fetching ensembl variants.
> Started: Tue Aug 27 09:30:59 2013
> Initializing...
>   Connecting to Ensembl Variation DB...0 but trueDone.
> ...................................................There are 5068 variants
> in the 'ph_variants'.
>
> rs10757274 found in the following sets:
> 1kg_amr,Cardio-Metabo_Chip,1kg_amr_com,hapmap_hcb,1kg_asn,1kg_eur,ind_watson,hapmap_yri,1kg_afr_com,1kg_asn_com,ind_angrist,1kg_eur_com,ind_yh,hapmap,ph_variants,ph_nhgri,ind_gill,ind_venter,1kg_afr,HumanOmni1-Quad,HumanOmni2.5,ind_sjk,1kg_com,hapmap_jpt,ph_omim,1kg,Illumina_1M-duo
>
>
> ##########  CODE FOR VARIANTSETS  ################
>
> $registry->load_registry_from_db(
>         -host => 'useastdb.ensembl.org',
> #-host => 'ensembldb.ensembl.org', #this is ~4x slower
>         -user => 'anonymous',
>         -port => '5306'
>         );
> print STDOUT "Done.\n";
>
> my $species = 'homo_sapiens';
>
> my $vs_adaptor =
> $registry->get_adaptor($species,'variation','variationset');
>
> my $variant_set = "ph_variants";  #variants from all sources
> my $vs = $vs_adaptor->fetch_by_short_name($variant_set);
>
> my $limit = 999999;  #
> my $fetched = 0;
> my $it = $vs->get_Variation_Iterator();
>
> #for testing, only print the first $limit
> while ($fetched < $limit && $it->has_next()) {
>   my $var = $it->next();
>   print STDOUT "Found $rsid, number $fetched.\n" if ($var->name() eq $rsid);
>   print STDOUT "." if ($fetched % 100 == 0);
>   $fetched++;
> }
> print "There are $fetched variants in the \'$variant_set\'.\n";
>
> ##########  CODE FOR VARIANT ################
>
> my $rsid = 'rs10757274';
> my $variation = $variation_adaptor->fetch_by_name($rsid);
>
> my @sets = ();
> for my $vf (@{$variation->get_all_VariationFeatures()}) {
>   for my $vs (@{$vf->get_all_VariationSets()}) {
>     push(@sets,$vs->short_name());
>   }
> }
> print "$rsid found in the following sets: " . join(",",uniq @sets) . "\n";
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>




More information about the Dev mailing list