Performance Improvements in R: Vectorization & Memoisation

Full of potential coding improvements, Efficient R Programming: A Practical Guide to Smarter Programming, the book makes two suggestions that are notable. Vectorization, explained here and here, and memoisation, caching prior results albeit with additional memory use, were relevant and significant.

What follows is a demonstration of the speed improvements that might be achieved using these concepts.


 ################################  
 # performance  
 # vectorization and memoization  
 ################################
 
 # clear memory between changes
 rm(list = ls())  

 #load memoise
 #install.packages('memoise')
 library(memoise)  

 # create test function
 monte_carlo = function(N) {  
   hits = 0  
   for (i in seq_len(N)) {  
     u1 = runif(1)  
     u2 = runif(1)  
     if (u1 ^ 2 > u2)  
       hits = hits + 1  
   }  
   return(hits / N)  
 }  
 
 # memoise test function 
 monte_carlo_memo <- memoise(monte_carlo)  
  
 # vectorize function
 monte_carlo_vec <- function(N) mean(runif(N) ^ 2 > runif(N))  
 
 # memoise vectorized function
 monte_carlo_vec_memo <- memoise(monte_carlo_vec)  

 # run test - pass 1   
 n <- 999999  
 plainFor <- system.time(monte_carlo(n))  
 memoised <- system.time(monte_carlo_memo(n))  
 vectorised <- system.time(monte_carlo_vec(n))  
 both <- system.time(monte_carlo_vec_memo(n))  
   
 # results - pass 1   
 result <- cbind(plainFor, memoised, vectorised, both)  
 display <- format(result, digits = 3, nsmall = 3)  
 View(display)     


The result of the first pass shows that vectorization provides a vast improvement over a standard for loop, and that memoise provides little of an improvement over that.



 # run test - pass 2   
 plainFor <- system.time(monte_carlo(n))  
 memoised <- system.time(monte_carlo_memo(n))  
 vectorised <- system.time(monte_carlo_vec(n))  
 both <- system.time(monte_carlo_vec_memo(n))  
   
 # results - pass 2   
 result <- cbind(plainFor, memoised, vectorised, both)  
 display <- format(result, digits = 3, nsmall = 3)  
 View(display)   


That said, on a a second pass over the same loops, memoisation is vastly faster, returning values in zero (0) seconds.


Comments

Popular posts from this blog

Charting Correlation Matrices in R

Developers in New York City by Zip Code

Cultural Dimensions and Coffee Consumption