[ensembl-dev] (more) memory efficient LD calculation possible via API?

andrew126 at mac.com andrew126 at mac.com
Wed Sep 21 22:36:37 BST 2016


Hi,

I'm using version 84 of the API on 64-bit Ubuntu.

I'm using the $ldFeatureContainerAdaptor->fetch_by_VariationFeature() method against particular index SNPs and particular human populations/datasets (e.g. 1000GENOMES:phase_3:EUR)

I'm using these relevant options:
	$ldFeatureContainerAdaptor->db->use_vcf(1);
	$ldFeatureContainerAdaptor->max_snp_distance(300000);

I've noticed that memory usage of $ldFeatureContainerAdaptor->fetch_by_VariationFeature() appears to scale roughly linearly with max_snp_distance:
	roughly 7Mb RAM per kb of max_snp_distance against the 1000GENOMES:phase_3:EUR population/datasets.

I'm assuming all data get loaded and processed simultaneously?

The problem I'm facing is that for a max_snp_distance of 500kb (less usual, but not unheard of to be meaningful) it requires ~3 Gb of RAM to process, which can get prohibitive.

Is there a way for the method to decrease its memory usage somehow?  Not trying to load everything simultaneously etc., even at the cost of a bit of CPU efficiency?

Thanks for any suggestions.

Best regards,

Andrew



More information about the Dev mailing list