[ensembl-dev] (more) memory efficient LD calculation possible via API?
andrew126 at mac.com
andrew126 at mac.com
Wed Sep 21 22:36:37 BST 2016
Hi,
I'm using version 84 of the API on 64-bit Ubuntu.
I'm using the $ldFeatureContainerAdaptor->fetch_by_VariationFeature() method against particular index SNPs and particular human populations/datasets (e.g. 1000GENOMES:phase_3:EUR)
I'm using these relevant options:
$ldFeatureContainerAdaptor->db->use_vcf(1);
$ldFeatureContainerAdaptor->max_snp_distance(300000);
I've noticed that memory usage of $ldFeatureContainerAdaptor->fetch_by_VariationFeature() appears to scale roughly linearly with max_snp_distance:
roughly 7Mb RAM per kb of max_snp_distance against the 1000GENOMES:phase_3:EUR population/datasets.
I'm assuming all data get loaded and processed simultaneously?
The problem I'm facing is that for a max_snp_distance of 500kb (less usual, but not unheard of to be meaningful) it requires ~3 Gb of RAM to process, which can get prohibitive.
Is there a way for the method to decrease its memory usage somehow? Not trying to load everything simultaneously etc., even at the cost of a bit of CPU efficiency?
Thanks for any suggestions.
Best regards,
Andrew
More information about the Dev
mailing list