[ensembl-dev] (more) memory efficient LD calculation possible via API?

Anja Thormann anja at ebi.ac.uk
Mon Sep 26 11:47:03 BST 2016


Hi Andrew,

try using $ldFeatureContainer->get_all_ld_values(1); Passing argument 1 to get_all_ld_values() prevents fetching objects for all the variants in the result set. Instead you get names for variants as strings and if you are interested in more attributes you need to create your object.

You could do the following:

foreach my $ld_hash (@{$LDFC->get_all_ld_values(1)}) {
   my $d_prime = $ld_hash->{d_prime};
   my $r2 $ld_hash->{r2};
   my $variation_name1 = $ld_hash->{variation_name1};
   my $variation_name2 = $ld_hash->{variation_name2};
   
...
  }

HTH,
Anja


> On 21 Sep 2016, at 22:36, andrew126 at mac.com wrote:
> 
> Hi,
> 
> I'm using version 84 of the API on 64-bit Ubuntu.
> 
> I'm using the $ldFeatureContainerAdaptor->fetch_by_VariationFeature() method against particular index SNPs and particular human populations/datasets (e.g. 1000GENOMES:phase_3:EUR)
> 
> I'm using these relevant options:
> 	$ldFeatureContainerAdaptor->db->use_vcf(1);
> 	$ldFeatureContainerAdaptor->max_snp_distance(300000);
> 
> I've noticed that memory usage of $ldFeatureContainerAdaptor->fetch_by_VariationFeature() appears to scale roughly linearly with max_snp_distance:
> 	roughly 7Mb RAM per kb of max_snp_distance against the 1000GENOMES:phase_3:EUR population/datasets.
> 
> I'm assuming all data get loaded and processed simultaneously?
> 
> The problem I'm facing is that for a max_snp_distance of 500kb (less usual, but not unheard of to be meaningful) it requires ~3 Gb of RAM to process, which can get prohibitive.
> 
> Is there a way for the method to decrease its memory usage somehow?  Not trying to load everything simultaneously etc., even at the cost of a bit of CPU efficiency?
> 
> Thanks for any suggestions.
> 
> Best regards,
> 
> Andrew
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/





More information about the Dev mailing list