[ensembl-dev] trouble fetching all phenotypic variants

Nicole Washington nlwashington at lbl.gov
Tue Aug 27 17:56:30 BST 2013


hi,

i'm having a bit of trouble fetching all of the phenotypic variants.

i've written a script, following the example in your documentation, to fetch all the variants in 'ph_variants' set using an iterator from VariantSets.  but i don't think it's enough variants...i only seem to fetch ~5068.

in particular, i've tried looking for some specific rs ids in my output that have phenotypes, and they don't show up.  for example, take rs10757274.
http://uswest.ensembl.org/Homo_sapiens/Variation/Phenotype?db=core;r=9:22095555-22096555;v=rs10757274;vdb=variation;vf=7132897
if i look at my output for rs10757274, it doesn't show up.  however, if i specifically query for that variant (using a variant adaptor directly), i can fetch it and it's VariantFeatures, and it says it is in the ph_variants set.

not sure i understand this discrepancy.  perhaps the recursion (fetching of the subfeatures) in the API isn't working?

here's a simple version of my code that just counts the variations found.  it ought to print a statement if it finds the rsid, but it never prints it.  i also include the code that fetches that rsid directly, where it shows all the variationsets it's found in.

note, i'm using ensembl v72. 

any ideas?

Nicole


##########  OUTPUT  ################

Fetching ensembl variants.
Started: Tue Aug 27 09:30:59 2013 
Initializing...
  Connecting to Ensembl Variation DB...0 but trueDone.
...................................................There are 5068 variants in the 'ph_variants'.

rs10757274 found in the following sets: 1kg_amr,Cardio-Metabo_Chip,1kg_amr_com,hapmap_hcb,1kg_asn,1kg_eur,ind_watson,hapmap_yri,1kg_afr_com,1kg_asn_com,ind_angrist,1kg_eur_com,ind_yh,hapmap,ph_variants,ph_nhgri,ind_gill,ind_venter,1kg_afr,HumanOmni1-Quad,HumanOmni2.5,ind_sjk,1kg_com,hapmap_jpt,ph_omim,1kg,Illumina_1M-duo


##########  CODE FOR VARIANTSETS  ################

$registry->load_registry_from_db(
        -host => 'useastdb.ensembl.org',
#-host => 'ensembldb.ensembl.org', #this is ~4x slower
        -user => 'anonymous',
        -port => '5306'
        );
print STDOUT "Done.\n";

my $species = 'homo_sapiens';

my $vs_adaptor = $registry->get_adaptor($species,'variation','variationset');

my $variant_set = "ph_variants";  #variants from all sources
my $vs = $vs_adaptor->fetch_by_short_name($variant_set);

my $limit = 999999;  #
my $fetched = 0;
my $it = $vs->get_Variation_Iterator();

#for testing, only print the first $limit
while ($fetched < $limit && $it->has_next()) {
  my $var = $it->next();
  print STDOUT "Found $rsid, number $fetched.\n" if ($var->name() eq $rsid);
  print STDOUT "." if ($fetched % 100 == 0);
  $fetched++;
}
print "There are $fetched variants in the \'$variant_set\'.\n";

##########  CODE FOR VARIANT ################

my $rsid = 'rs10757274';
my $variation = $variation_adaptor->fetch_by_name($rsid);

my @sets = ();
for my $vf (@{$variation->get_all_VariationFeatures()}) {
  for my $vs (@{$vf->get_all_VariationSets()}) {
    push(@sets,$vs->short_name());
  }
}
print "$rsid found in the following sets: " . join(",",uniq @sets) . "\n";


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130827/5764564a/attachment.html>


More information about the Dev mailing list