Speeding "Bayesian Power Analysis t-test" up with Snowfall

2016-01-15 R

This is a direct (though minor) answer to Daniel’s blogpost http://daniellakens.blogspot.de/2016/01/power-analysis-for-default-bayesian-t.html, which I found very interesting, as I have been trying to get my head around Bayesian statistics for quite a while now. However, one thing that bugs me, is the time needed for the simulation. On my machine it took around 22 minutes. Depending on the task, 22 minutes for a signle test can be way too long (especially if the tests are done in a business environment where many tests are needed - yesterday) and a simple t-test might be more appealing only because it takes a shorter computing time. Here is my solution to speed-up the code using snowfall’s load-balancing parallel structures to reduce the time to 8.5 minutes.

The initial code (which you can find in the original post) uses a for loop, snowfall, however, uses a function that we need to export, which is simply called simFun. The function takes the sample size n, the true effect size D, as well as the effect size of the alternative hypothesis rscaleBF as arguments (note, the names are equivalent to Daniel’s, as well as the values of the variables (i.e., n = 50 etc.)).

simFun <- function(n, D, rscaleBF){
  library(BayesFactor)
  x <- rnorm(n = n, mean = 0, sd = 1)
  y <- rnorm(n = n, mean = D, sd = 1)
  
  return(exp((ttestBF(x, y, rscale = rscaleBF))@bayesFactor$bf))
}

To load, initiate, and finally execute the snowfall-code, we can use the following:

# load the library
library(snowfall)
n <- 50; D <- 0.0; nSim <- 100000; rscaleBF <- sqrt(2)/2; threshold <- 3

# for time-keeping
t0 <- Sys.time()

# initiate a parallel cluster with 4 cpus
sfInit(parallel = T, cpus = 4)

# export the function to the clusters
sfExport("simFun", "n", "D", "rscaleBF")

# execute the code
bf <- sfClusterApplyLB(1:nSim, function(i) simFun(n = n, D = D, rscaleBF = rscaleBF))

# stop the clusters
sfStop()

# print the time it took for the calculation
Sys.time() - t0
# Time difference of 8.490639 mins

# and finally the result
sum(bf < (1/threshold))/nSim
#[1]  0.68578

With Daniel’s code I got supportH0 = 0.6904, the snowfall-code almost reproduces the same number 0.68578, but reduces the time substantially.

I hope you found this post interesting/helpful, should you have any questions, you are more then welcome to ask them here, otherwise send me an e-mail.

Lastly, thanks to Daniel, who was able to find some bugs in an earlier version of the code.

Note, the earlier version stated that the code was improved by 60x (without giving a result), the numbers are now corrected.

R monte-carlo snowfall parallel

David Zimmermann, PhD

Data Scientist

I am an economist by training, turned programmer/data scientist who loves to program with R, Python, and C++.

Speeding "Bayesian Power Analysis t-test" up with Snowfall

David Zimmermann, PhD

Data Scientist

Related