ARIMA,Time Series, and Charting in R
ARIMA is an acronym for Autoregressive Integrated Moving Average, and one explanation describes ARIMA models as...
For the first exploration I developed several ARIMA models using financial data, varying the parameters, noted as P, D and Q. A general overview of the parameters is from Wikipedia:
From the above, I chose to plot the model with the lowest standard error, splicing the existing curve with the prediction.
Thinking it might be useful to compare the actual outcome to predicted ones, the data was filtered using the window() function, generated predictions for all the models, then overlaid the plots with the help of par(). Since the data was trending down and/or flat at the point I cut it, the results predict a lower growth trend than actually occurred.
From the various ARIMA models, I took the one closest to the actual outcome, and varied the lag, and then graphed that to see its effect.
Using ARIMA with Forecast provides an attractive result, one that combines both the prediction with variance.
While exploring ARIMA, and solving certain charting problems, I ran across other functions, one of which is Seasonal Decomposition, seen below. A full explanation of the methodology can be found here at OTexts, while the details of the function can be found here at R Documentation.
...another approach to time series forecasting. Exponential smoothing and ARIMA models are the two most widely-used approaches to time series forecasting, and provide complementary approaches to the problem. While exponential smoothing models were based on a description of trend and seasonality in the data, ARIMA models aim to describe the autocorrelations in the data.A detailed technical discussion can be found on Wikipedia.
For the first exploration I developed several ARIMA models using financial data, varying the parameters, noted as P, D and Q. A general overview of the parameters is from Wikipedia:
- p is the order (number of time lags) of the autoregressive model
- d is the degree of differencing (the number of times the data have had past values subtracted)
- q is the order of the moving-average model
# filtered to start on a date
Portfolio.filtered <- Portfolio.df[Portfolio.df$YearMonthIndex >= 201501,]
# aggregated by YearMonthIndex, a field created from the date of the record
Portfolio.summed <- aggregate(Portfolio.filtered$Mkt.Val, by = list(Portfolio.filtered$YearMonthIndex), FUN = sum)
# create time series
Portfolio.timeSeries <- ts(data = Portfolio.summed[, 2], start = c(2015, 01), frequency = 12)
# set arima models
(a1 <- arima(Portfolio.timeSeries, order = c(1, 0, 0)))
(a2 <- arima(Portfolio.timeSeries, order = c(0, 0, 1)))
(a3 <- arima(Portfolio.timeSeries, order = c(1, 0, 1)))
(a4 <- arima(Portfolio.timeSeries, order = c(1, 1, 1)))
(a5 <- arima(Portfolio.timeSeries, order = c(2, 1, 0)))
# select and set model
series.predict.a1 <- predict(a1, 12)
series.predict.a2 <- predict(a2, 12)
series.predict.a3 <- predict(a3, 12)
series.predict.a4 <- predict(a4, 12)
series.predict.a5 <- predict(a5, 12)
# combine with source
series.predict.a1.combined <- ts(data = c(Portfolio.timeSeries, as.vector(series.predict.a1$pred)), start = c(2015, 01), frequency = 12)
series.predict.a2.combined <- ts(data = c(Portfolio.timeSeries, as.vector(series.predict.a2$pred)), start = c(2015, 01), frequency = 12)
series.predict.a3.combined <- ts(data = c(Portfolio.timeSeries, as.vector(series.predict.a3$pred)), start = c(2015, 01), frequency = 12)
series.predict.a4.combined <- ts(data = c(Portfolio.timeSeries, as.vector(series.predict.a4$pred)), start = c(2015, 01), frequency = 12)
series.predict.a5.combined <- ts(data = c(Portfolio.timeSeries, as.vector(series.predict.a5$pred)), start = c(2015, 01), frequency = 12)
From the above, I chose to plot the model with the lowest standard error, splicing the existing curve with the prediction.
# plot time series & predictions, using model with lowest standard error
series.predict.a4.lone <- ts(data = as.vector(series.predict.a4$pred), start = c(2017, 05), frequency = 12)
plot(Portfolio.timeSeries, xlim = c(2015.0,2018.12), ylim = c(0, 10000), col = "Black", ann = FALSE)
par(new = TRUE)
plot(series.predict.a4.lone, xlim = c(2015.0, 2018.12), ylim = c(0, 10000), col = "Red", ann = FALSE)
Thinking it might be useful to compare the actual outcome to predicted ones, the data was filtered using the window() function, generated predictions for all the models, then overlaid the plots with the help of par(). Since the data was trending down and/or flat at the point I cut it, the results predict a lower growth trend than actually occurred.
Portfolio.timeSeries.window <- window(Portfolio.timeSeries, start = c(2015, 1), end = c(2016, 12))
(a1.w <- arima(Portfolio.timeSeries.window, order = c(1, 0, 0)))
(a2.w <- arima(Portfolio.timeSeries.window, order = c(0, 0, 1)))
(a3.w <- arima(Portfolio.timeSeries.window, order = c(1, 0, 1)))
(a4.w <- arima(Portfolio.timeSeries.window, order = c(1, 1, 1)))
(a5.w <- arima(Portfolio.timeSeries.window, order = c(2, 1, 0)))
# select and set model
series.predict.a1.w <- predict(a1.w, 5)
series.predict.a2.w <- predict(a2.w, 5)
series.predict.a3.w <- predict(a3.w, 5)
series.predict.a4.w <- predict(a4.w, 5)
series.predict.a5.w <- predict(a5.w, 5)
# combine with source
series.predict.a1.w.combined <- ts(data = c(Portfolio.timeSeries.window, as.vector(series.predict.a1.w$pred)), start = c(2015, 01), frequency = 12)
series.predict.a2.w.combined <- ts(data = c(Portfolio.timeSeries.window, as.vector(series.predict.a2.w$pred)), start = c(2015, 01), frequency = 12)
series.predict.a3.w.combined <- ts(data = c(Portfolio.timeSeries.window, as.vector(series.predict.a3.w$pred)), start = c(2015, 01), frequency = 12)
series.predict.a4.w.combined <- ts(data = c(Portfolio.timeSeries.window, as.vector(series.predict.a4.w$pred)), start = c(2015, 01), frequency = 12)
series.predict.a5.w.combined <- ts(data = c(Portfolio.timeSeries.window, as.vector(series.predict.a5.w$pred)), start = c(2015, 01), frequency = 12)
# plotting all predictions against actual
plot(series.predict.a1.w.combined, ylim = c(0, 8000), col = "Red", ann = FALSE)
par(new = TRUE)
plot(series.predict.a2.w.combined, ylim = c(0, 8000), col = "Red", ann = FALSE)
par(new = TRUE)
plot(series.predict.a3.w.combined, ylim = c(0, 8000), col = "Red", ann = FALSE)
par(new = TRUE)
plot(series.predict.a4.w.combined, ylim = c(0, 8000), col = "Red", ann = FALSE)
par(new = TRUE)
plot(series.predict.a5.w.combined, ylim = c(0, 8000), col = "Red", ann = FALSE)
par(new = TRUE)
plot(Portfolio.timeSeries, ylim = c(0, 8000), col = "Black")
From the various ARIMA models, I took the one closest to the actual outcome, and varied the lag, and then graphed that to see its effect.
(a4.w.111 <- arima(Portfolio.timeSeries.window, order = c(1, 1, 1)))
(a4.w.211 <- arima(Portfolio.timeSeries.window, order = c(2, 1, 1)))
(a4.w.311 <- arima(Portfolio.timeSeries.window, order = c(3, 1, 1)))
(a4.w.411 <- arima(Portfolio.timeSeries.window, order = c(4, 1, 1)))
series.predict.a4.w.111 <- predict(a4.w.111, 5)
series.predict.a4.w.211 <- predict(a4.w.211, 5)
series.predict.a4.w.311 <- predict(a4.w.311, 5)
series.predict.a4.w.411 <- predict(a4.w.411, 5)
series.predict.a4.w.111.combined <- ts(data = c(Portfolio.timeSeries.window, as.vector(series.predict.a4.w.111$pred)), start = c(2015, 01), frequency = 12)
series.predict.a4.w.211.combined <- ts(data = c(Portfolio.timeSeries.window, as.vector(series.predict.a4.w.211$pred)), start = c(2015, 01), frequency = 12)
series.predict.a4.w.311.combined <- ts(data = c(Portfolio.timeSeries.window, as.vector(series.predict.a4.w.311$pred)), start = c(2015, 01), frequency = 12)
series.predict.a4.w.411.combined <- ts(data = c(Portfolio.timeSeries.window, as.vector(series.predict.a4.w.411$pred)), start = c(2015, 01), frequency = 12)
plot(series.predict.a4.w.111.combined, ylim = c(0, 8000), col = "Red", ann = FALSE)
par(new = TRUE)
plot(series.predict.a4.w.211.combined, ylim = c(0, 8000), col = "Red", ann = FALSE)
par(new = TRUE)
plot(series.predict.a4.w.311.combined, ylim = c(0, 8000), col = "Red", ann = FALSE)
par(new = TRUE)
plot(series.predict.a4.w.411.combined, ylim = c(0, 8000), col = "Red", ann = FALSE)
par(new = TRUE)
plot(Portfolio.timeSeries, ylim = c(0, 8000), col = "Black")
Using ARIMA with Forecast provides an attractive result, one that combines both the prediction with variance.
# fit an ARIMA model
Portfolio.predictive.fit <- arima(Portfolio.timeSeries, order = c(1, 1, 1))
# predictive accuracy
library(forecast)
accuracy(Portfolio.predictive.fit)
# predict next 5 observations
#library(forecast)
forecast(Portfolio.predictive.fit, 6)
plot(forecast(Portfolio.predictive.fit, 6))
While exploring ARIMA, and solving certain charting problems, I ran across other functions, one of which is Seasonal Decomposition, seen below. A full explanation of the methodology can be found here at OTexts, while the details of the function can be found here at R Documentation.
Portfolio.seasonal <- stl(Portfolio.timeSeries, s.window = "period")
plot(Portfolio.seasonal)
Comments
Post a Comment