Performance Improvements in R: Vectorization & Memoisation

- May 26, 2017

Full of potential coding improvements, Efficient R Programming: A Practical Guide to Smarter Programming, the book makes two suggestions that are notable. Vectorization, explained here and here, and memoisation, caching prior results albeit with additional memory use, were relevant and significant.

What follows is a demonstration of the speed improvements that might be achieved using these concepts.


 ################################  
 # performance  
 # vectorization and memoization  
 ################################
 
 # clear memory between changes
 rm(list = ls())  

 #load memoise
 #install.packages('memoise')
 library(memoise)  

 # create test function
 monte_carlo = function(N) {  
   hits = 0  
   for (i in seq_len(N)) {  
     u1 = runif(1)  
     u2 = runif(1)  
     if (u1 ^ 2 > u2)  
       hits = hits + 1  
   }  
   return(hits / N)  
 }  
 
 # memoise test function 
 monte_carlo_memo <- memoise(monte_carlo)  
  
 # vectorize function
 monte_carlo_vec <- function(N) mean(runif(N) ^ 2 > runif(N))  
 
 # memoise vectorized function
 monte_carlo_vec_memo <- memoise(monte_carlo_vec)  

 # run test - pass 1   
 n <- 999999  
 plainFor <- system.time(monte_carlo(n))  
 memoised <- system.time(monte_carlo_memo(n))  
 vectorised <- system.time(monte_carlo_vec(n))  
 both <- system.time(monte_carlo_vec_memo(n))  
   
 # results - pass 1   
 result <- cbind(plainFor, memoised, vectorised, both)  
 display <- format(result, digits = 3, nsmall = 3)  
 View(display)

The result of the first pass shows that vectorization provides a vast improvement over a standard for loop, and that memoise provides little of an improvement over that.


 # run test - pass 2   
 plainFor <- system.time(monte_carlo(n))  
 memoised <- system.time(monte_carlo_memo(n))  
 vectorised <- system.time(monte_carlo_vec(n))  
 both <- system.time(monte_carlo_vec_memo(n))  
   
 # results - pass 2   
 result <- cbind(plainFor, memoised, vectorised, both)  
 display <- format(result, digits = 3, nsmall = 3)  
 View(display)

That said, on a a second pass over the same loops, memoisation is vastly faster, returning values in zero (0) seconds.

Search This Blog

Data Analytics Workouts

Performance Improvements in R: Vectorization & Memoisation

Comments

Post a Comment

Popular posts from this blog

Developers in New York City by Zip Code

Cultural Dimensions and Coffee Consumption

VBA versus .NET