ARIMA,Time Series, and Charting in R

ARIMA is an acronym for Autoregressive Integrated Moving Average, and one explanation describes ARIMA models as...
...another approach to time series forecasting. Exponential smoothing and ARIMA models are the two most widely-used approaches to time series forecasting, and provide complementary approaches to the problem. While exponential smoothing models were based on a description of trend and seasonality in the data, ARIMA models aim to describe the autocorrelations in the data.
A detailed technical discussion can be found on Wikipedia.

For the first exploration I developed several ARIMA models using financial data, varying the parameters, noted as P, D and Q. A general overview of the parameters is from Wikipedia:
  • p is the order (number of time lags) of the autoregressive model
  • d is the degree of differencing (the number of times the data have had past values subtracted)
  • q is the order of the moving-average model
 # filtered to start on a date  
 Portfolio.filtered <- Portfolio.df[Portfolio.df$YearMonthIndex >= 201501,]  
   
 # aggregated by YearMonthIndex, a field created from the date of the record
 Portfolio.summed <- aggregate(Portfolio.filtered$Mkt.Val, by = list(Portfolio.filtered$YearMonthIndex), FUN = sum)  
   
 # create time series  
 Portfolio.timeSeries <- ts(data = Portfolio.summed[, 2], start = c(2015, 01), frequency = 12)  
   
 # set arima models   
 (a1 <- arima(Portfolio.timeSeries, order = c(1, 0, 0)))  
 (a2 <- arima(Portfolio.timeSeries, order = c(0, 0, 1)))  
 (a3 <- arima(Portfolio.timeSeries, order = c(1, 0, 1)))  
 (a4 <- arima(Portfolio.timeSeries, order = c(1, 1, 1)))  
 (a5 <- arima(Portfolio.timeSeries, order = c(2, 1, 0)))  
   
 # select and set model  
 series.predict.a1 <- predict(a1, 12)  
 series.predict.a2 <- predict(a2, 12)  
 series.predict.a3 <- predict(a3, 12)  
 series.predict.a4 <- predict(a4, 12)  
 series.predict.a5 <- predict(a5, 12)  
   
 # combine with source  
 series.predict.a1.combined <- ts(data = c(Portfolio.timeSeries, as.vector(series.predict.a1$pred)), start = c(2015, 01), frequency = 12)  
 series.predict.a2.combined <- ts(data = c(Portfolio.timeSeries, as.vector(series.predict.a2$pred)), start = c(2015, 01), frequency = 12)  
 series.predict.a3.combined <- ts(data = c(Portfolio.timeSeries, as.vector(series.predict.a3$pred)), start = c(2015, 01), frequency = 12)  
 series.predict.a4.combined <- ts(data = c(Portfolio.timeSeries, as.vector(series.predict.a4$pred)), start = c(2015, 01), frequency = 12)  
 series.predict.a5.combined <- ts(data = c(Portfolio.timeSeries, as.vector(series.predict.a5$pred)), start = c(2015, 01), frequency = 12) 


From the above, I chose to plot the model with the lowest standard error, splicing the existing curve with the prediction.

 # plot time series & predictions, using model with lowest standard error  
 series.predict.a4.lone <- ts(data = as.vector(series.predict.a4$pred), start = c(2017, 05), frequency = 12)  
 plot(Portfolio.timeSeries, xlim = c(2015.0,2018.12), ylim = c(0, 10000), col = "Black", ann = FALSE)  
 par(new = TRUE)  
 plot(series.predict.a4.lone, xlim = c(2015.0, 2018.12), ylim = c(0, 10000), col = "Red", ann = FALSE)  

Thinking it might be useful to compare the actual outcome to predicted ones, the data was filtered using the window() function, generated predictions for all the models, then overlaid the plots with the help of par(). Since the data was trending down and/or flat at the point I cut it, the results predict a lower growth trend than actually occurred.

   
 Portfolio.timeSeries.window <- window(Portfolio.timeSeries, start = c(2015, 1), end = c(2016, 12))  
   
 (a1.w <- arima(Portfolio.timeSeries.window, order = c(1, 0, 0)))  
 (a2.w <- arima(Portfolio.timeSeries.window, order = c(0, 0, 1)))  
 (a3.w <- arima(Portfolio.timeSeries.window, order = c(1, 0, 1)))  
 (a4.w <- arima(Portfolio.timeSeries.window, order = c(1, 1, 1)))  
 (a5.w <- arima(Portfolio.timeSeries.window, order = c(2, 1, 0)))  
   
 # select and set model  
 series.predict.a1.w <- predict(a1.w, 5)  
 series.predict.a2.w <- predict(a2.w, 5)  
 series.predict.a3.w <- predict(a3.w, 5)  
 series.predict.a4.w <- predict(a4.w, 5)  
 series.predict.a5.w <- predict(a5.w, 5)  
   
 # combine with source  
 series.predict.a1.w.combined <- ts(data = c(Portfolio.timeSeries.window, as.vector(series.predict.a1.w$pred)), start = c(2015, 01), frequency = 12)  
 series.predict.a2.w.combined <- ts(data = c(Portfolio.timeSeries.window, as.vector(series.predict.a2.w$pred)), start = c(2015, 01), frequency = 12)  
 series.predict.a3.w.combined <- ts(data = c(Portfolio.timeSeries.window, as.vector(series.predict.a3.w$pred)), start = c(2015, 01), frequency = 12)  
 series.predict.a4.w.combined <- ts(data = c(Portfolio.timeSeries.window, as.vector(series.predict.a4.w$pred)), start = c(2015, 01), frequency = 12)  
 series.predict.a5.w.combined <- ts(data = c(Portfolio.timeSeries.window, as.vector(series.predict.a5.w$pred)), start = c(2015, 01), frequency = 12)  
   
 # plotting all predictions against actual  
 plot(series.predict.a1.w.combined, ylim = c(0, 8000), col = "Red", ann = FALSE)  
 par(new = TRUE)  
 plot(series.predict.a2.w.combined, ylim = c(0, 8000), col = "Red", ann = FALSE)  
 par(new = TRUE)  
 plot(series.predict.a3.w.combined, ylim = c(0, 8000), col = "Red", ann = FALSE)  
 par(new = TRUE)  
 plot(series.predict.a4.w.combined, ylim = c(0, 8000), col = "Red", ann = FALSE)  
 par(new = TRUE)  
 plot(series.predict.a5.w.combined, ylim = c(0, 8000), col = "Red", ann = FALSE)  
 par(new = TRUE)  
 plot(Portfolio.timeSeries, ylim = c(0, 8000), col = "Black")  

From the various ARIMA models, I took the one closest to the actual outcome, and varied the lag, and then graphed that to see its effect.

   
 (a4.w.111 <- arima(Portfolio.timeSeries.window, order = c(1, 1, 1)))  
 (a4.w.211 <- arima(Portfolio.timeSeries.window, order = c(2, 1, 1)))  
 (a4.w.311 <- arima(Portfolio.timeSeries.window, order = c(3, 1, 1)))  
 (a4.w.411 <- arima(Portfolio.timeSeries.window, order = c(4, 1, 1)))  
   
 series.predict.a4.w.111 <- predict(a4.w.111, 5)  
 series.predict.a4.w.211 <- predict(a4.w.211, 5)  
 series.predict.a4.w.311 <- predict(a4.w.311, 5)  
 series.predict.a4.w.411 <- predict(a4.w.411, 5)  
   
 series.predict.a4.w.111.combined <- ts(data = c(Portfolio.timeSeries.window, as.vector(series.predict.a4.w.111$pred)), start = c(2015, 01), frequency = 12)  
 series.predict.a4.w.211.combined <- ts(data = c(Portfolio.timeSeries.window, as.vector(series.predict.a4.w.211$pred)), start = c(2015, 01), frequency = 12)  
 series.predict.a4.w.311.combined <- ts(data = c(Portfolio.timeSeries.window, as.vector(series.predict.a4.w.311$pred)), start = c(2015, 01), frequency = 12)  
 series.predict.a4.w.411.combined <- ts(data = c(Portfolio.timeSeries.window, as.vector(series.predict.a4.w.411$pred)), start = c(2015, 01), frequency = 12)  
   
 plot(series.predict.a4.w.111.combined, ylim = c(0, 8000), col = "Red", ann = FALSE)  
 par(new = TRUE)  
 plot(series.predict.a4.w.211.combined, ylim = c(0, 8000), col = "Red", ann = FALSE)  
 par(new = TRUE)  
 plot(series.predict.a4.w.311.combined, ylim = c(0, 8000), col = "Red", ann = FALSE)  
 par(new = TRUE)  
 plot(series.predict.a4.w.411.combined, ylim = c(0, 8000), col = "Red", ann = FALSE)  
 par(new = TRUE)  
 plot(Portfolio.timeSeries, ylim = c(0, 8000), col = "Black")  

Using ARIMA with Forecast provides an attractive result, one that combines both the prediction with variance.

 # fit an ARIMA model  
 Portfolio.predictive.fit <- arima(Portfolio.timeSeries, order = c(1, 1, 1))  
  
 # predictive accuracy  
 library(forecast)  
 accuracy(Portfolio.predictive.fit)  
   
 # predict next 5 observations  
 #library(forecast)  
 forecast(Portfolio.predictive.fit, 6)  
 plot(forecast(Portfolio.predictive.fit, 6))  

While exploring ARIMA, and solving certain charting problems, I ran across other functions, one of which is Seasonal Decomposition, seen below. A full explanation of the methodology can be found here at OTexts, while the details of the function can be found here at R Documentation.

 
 Portfolio.seasonal <- stl(Portfolio.timeSeries, s.window = "period")  
 plot(Portfolio.seasonal)  

Comments

Popular posts from this blog

Charting Correlation Matrices in R

Cultural Dimensions and Coffee Consumption

Developers in New York City by Zip Code