Thursday, January 8, 2015

ARIMA in R



ARIMA Models
·       Regression without lags fails to account for the relationships through time and overestimates the relationship between the dependent and independent variables.
·       Stationary Time Series- A stationary time series is one whose properties do not depend on the time at which the series is observed. Time series with trends, or with seasonality, are not stationary. White noise series is stationary. Transformations such as logarithms can help to stabilize the variance of a time series. Differencing can help stabilize the mean of a time series by removing changes in the level of a time series, and so eliminating trend and seasonality.
o   Absence of serial correlation or predictability.
o   Conditional homoscedasticity (constant conditional variance).
o   For a stationary time series, the ACF will drop to zero relatively quickly, while the ACF of non-stationary data decreases slowly.
o   Value of autocorrelations is often small.

·       Autoregressive moving average (ARMA) models combine both p autoregressive terms and q moving average terms, also called ARMA(p,q).
AR(1):    (random walk: )
MA(1):


·         De-trending-  A variable can be de-trended by regressing the variable on a time trend and obtaining the residuals-


·       Seasonality- Seasonality is a particular type of autocorrelation pattern where patterns occur every “season,” like monthly, quarterly, etc. For example, quarterly data may have the same pattern in the same quarter from one year to the next.
Test to know whether Seasonal Differencing is required-
nsdiffs() which uses seasonal unit root tests to determine the appropriate number of seasonal differences required.
The following code can be used to find how to make a seasonal series stationary. The resulting series stored as xstar has been differenced appropriately-
R-Code- ns <- nsdiffs(filename)
if(ns > 0) {
  xstar <- diff(x,lag=frequency(filename),differences=ns)
} else {
  xstar <- filename
}
nd <- ndiffs(xstar)
if(nd > 0) {
  xstar <- diff(xstar,differences=nd)

}



·         Differencing- The order of differencing needed to stationarize the series. Normally, the correct amount of differencing is the lowest order of differencing that yields a time series which fluctuates around a well-defined mean value and whose autocorrelation function (ACF) plot decays fairly rapidly to zero, either from above or below. When a variable yt is not stationary, a common solution is to use differenced variable:
Test to know whether Differencing is required-
o   Unit root tests-
1)   Augmented Dickey-Fuller (ADF) test- For this test, the following regression model is estimated:
Where y′t denotes the first-differenced series, y′t=yt−yt−1 and k is the number of lags to include in the regression.
If the original series, yt, needs differencing, then the coefficient ϕ^ should be approximately zero. If yt is already stationary, then ϕ^<0.

R code- adf.test(filename, alternative = "stationary")

Large p-values are indicative of non-stationarity, and small p-values suggest stationarity.

2)   Kwiatkowski-Phillips-Schmidt-Shin- small p-values (e.g., less than 0.05) suggest that differencing is required.
R code- kpss.test(filename)

Order of Differencing-
o   If the series has positive autocorrelations out to a high number of lags, then it probably needs a higher order of differencing.
o   If the lag-1 autocorrelation is zero or even negative, then the series does not need further differencing. Normally -.5 < autocorrelation < 0.
o   If the lag-1 autocorrelation is zero or negative, or the autocorrelations are all small and pattern less, then the series does not need a higher order of differencing.
o   The optimal order of differencing is often the order of differencing at which the standard deviation is lowest.

R-Code- New name <-diff(filename)
·       Autocorrelation function (ACF)- ACF is the proportion of the auto covariance of yt and yt-k to the variance of a dependent variable yt-
o   For stationary time series, the ACF will drop to zero relatively quickly.

·       Partial autocorrelation function (PACF) - A partial autocorrelation is the amount of correlation between a variable and a lag of itself that is not explained by correlations at all lower-order-lags. PACF is the simple correlation between yt and yt-k minus the part explained by the intervening lags-


·       Order of Auto Regression and Moving Average-
o   If the partial autocorrelation function (PACF) of the differenced series displays a sharp cutoff and/or the lag-1 autocorrelation is positive--i.e., if the series appears slightly "under differenced"--then consider adding one or more AR terms to the model. The lag beyond which the PACF cuts off is the indicated number of AR terms.
o   If the autocorrelation function (ACF) of the differenced series displays a sharp cutoff and/or the lag-1 autocorrelation is negative--i.e., if the series appears slightly "over differenced"--then consider adding an MA term to the model. The lag beyond which the ACF cuts off is the indicated number of MA terms.
o   It is possible for an AR term and an MA term to cancel each other's effects, so if a mixed AR-MA model seems to fit the data, also try a model with one fewer AR term and one fewer MA term--particularly if the parameter estimates in the original model require more than 10 iterations to converge. Try, Using multiple AR and MA terms in the same model.
o   If there is a unit root in the AR part of the model--i.e., if the sum of the AR coefficients is almost exactly 1--you should reduce the number of AR terms by one and increase the order of differencing by one.
o   If there is a unit root in the MA part of the model--i.e., if the sum of the MA coefficients is almost exactly 1--you should reduce the number of MA terms by one and reduce the order of differencing by one.
o   If the long-term forecasts* appear erratic or unstable, there may be a unit root in the AR or MA coefficients.



·       Goodness of fit- Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC) are two measures goodness of fit. They measure the trade-off between model fit and complexity of the model.
A lower AIC or BIC value indicates a better fit.

1.    AIC-

2.    BIC-
L- The value of the likelihood function evaluated at the parameter estimates.
N- The number of observations
K- The number of estimated parameters

The Box-Jenkins Methodology for ARIMA Model Selection
·       Examine the time plot of the series.
o   Identify outliers, missing values, and structural breaks in the data.
o   Non-stationary variables may have a pronounced trend or have changing variance.
o   Transform the data if needed. Use logs, differencing, or de-trending.
§  Using logs works if the variability of data increases over time.
§  Differencing the data can remove trends. But over-differencing may introduce dependence when none exists.
·       Examine the autocorrelation function (ACF) and partial autocorrelation function (PACF)-
o   Compare the sample ACF and PACF to those of various theoretical ARMA models.
o   Use properties of ACF and PACF as a guide to estimate plausible models and select appropriate p, d, and q.
o   Differencing may be needed if there is a slow decay in the ACF.

R-Code-
ARIMA.fit<- arima(filename$Columnname,order=c(1,1,1),seasonal=list(order=c(1,1,1),period=3),include.mean=FALSE)
·        include.mean=FALSE (“code not to take previous trend”.)
-  R automatically puts mean into it forcing the past trend to continue, turned false if there is no systematic trend upward or downward.

·        Prediction command – New Name<-predict(ARIMA.fit, n.ahead=12)



No comments:

R3 chase - Pursuit

Change Point Detection Time Series

  Change Point Detection Methods Kernel Change Point Detection: Kernel change point detection method detects changes in the distribution of ...