[ensembl-dev] trouble fetching all phenotypic variants
Nicole Washington
nlwashington at lbl.gov
Tue Aug 27 17:56:30 BST 2013
hi,
i'm having a bit of trouble fetching all of the phenotypic variants.
i've written a script, following the example in your documentation, to fetch all the variants in 'ph_variants' set using an iterator from VariantSets. but i don't think it's enough variants...i only seem to fetch ~5068.
in particular, i've tried looking for some specific rs ids in my output that have phenotypes, and they don't show up. for example, take rs10757274.
http://uswest.ensembl.org/Homo_sapiens/Variation/Phenotype?db=core;r=9:22095555-22096555;v=rs10757274;vdb=variation;vf=7132897
if i look at my output for rs10757274, it doesn't show up. however, if i specifically query for that variant (using a variant adaptor directly), i can fetch it and it's VariantFeatures, and it says it is in the ph_variants set.
not sure i understand this discrepancy. perhaps the recursion (fetching of the subfeatures) in the API isn't working?
here's a simple version of my code that just counts the variations found. it ought to print a statement if it finds the rsid, but it never prints it. i also include the code that fetches that rsid directly, where it shows all the variationsets it's found in.
note, i'm using ensembl v72.
any ideas?
Nicole
########## OUTPUT ################
Fetching ensembl variants.
Started: Tue Aug 27 09:30:59 2013
Initializing...
Connecting to Ensembl Variation DB...0 but trueDone.
...................................................There are 5068 variants in the 'ph_variants'.
rs10757274 found in the following sets: 1kg_amr,Cardio-Metabo_Chip,1kg_amr_com,hapmap_hcb,1kg_asn,1kg_eur,ind_watson,hapmap_yri,1kg_afr_com,1kg_asn_com,ind_angrist,1kg_eur_com,ind_yh,hapmap,ph_variants,ph_nhgri,ind_gill,ind_venter,1kg_afr,HumanOmni1-Quad,HumanOmni2.5,ind_sjk,1kg_com,hapmap_jpt,ph_omim,1kg,Illumina_1M-duo
########## CODE FOR VARIANTSETS ################
$registry->load_registry_from_db(
-host => 'useastdb.ensembl.org',
#-host => 'ensembldb.ensembl.org', #this is ~4x slower
-user => 'anonymous',
-port => '5306'
);
print STDOUT "Done.\n";
my $species = 'homo_sapiens';
my $vs_adaptor = $registry->get_adaptor($species,'variation','variationset');
my $variant_set = "ph_variants"; #variants from all sources
my $vs = $vs_adaptor->fetch_by_short_name($variant_set);
my $limit = 999999; #
my $fetched = 0;
my $it = $vs->get_Variation_Iterator();
#for testing, only print the first $limit
while ($fetched < $limit && $it->has_next()) {
my $var = $it->next();
print STDOUT "Found $rsid, number $fetched.\n" if ($var->name() eq $rsid);
print STDOUT "." if ($fetched % 100 == 0);
$fetched++;
}
print "There are $fetched variants in the \'$variant_set\'.\n";
########## CODE FOR VARIANT ################
my $rsid = 'rs10757274';
my $variation = $variation_adaptor->fetch_by_name($rsid);
my @sets = ();
for my $vf (@{$variation->get_all_VariationFeatures()}) {
for my $vs (@{$vf->get_all_VariationSets()}) {
push(@sets,$vs->short_name());
}
}
print "$rsid found in the following sets: " . join(",",uniq @sets) . "\n";
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130827/5764564a/attachment.html>
More information about the Dev
mailing list