ARIMA
Models
· Regression
without lags fails to account for the relationships through time and
overestimates the relationship between the dependent and independent variables.
· Stationary
Time Series- A stationary time series is one whose
properties do not depend on the time at which the series is observed. Time
series with trends, or with seasonality, are not stationary. White noise series
is stationary. Transformations such as logarithms can help to stabilize the
variance of a time series. Differencing can help stabilize the mean of a time
series by removing changes in the level of a time series, and so eliminating
trend and seasonality.
o
Absence of serial correlation or
predictability.
o
Conditional homoscedasticity (constant
conditional variance).
o
For a stationary time series, the ACF
will drop to zero relatively quickly, while the ACF of non-stationary data
decreases slowly.
o
Value of autocorrelations is often
small.
· Autoregressive
moving average (ARMA) models combine both p autoregressive terms and q moving
average terms, also called ARMA(p,q).
AR(1):
(random walk:
)
MA(1):
·
De-trending- A variable can
be de-trended by regressing the variable on a time trend and obtaining the residuals-
· Seasonality-
Seasonality
is a particular type of autocorrelation pattern where patterns occur every
“season,” like monthly, quarterly, etc. For example, quarterly data may have
the same pattern in the same quarter from one year to the next.
Test
to know whether Seasonal Differencing is required-
nsdiffs() which uses
seasonal unit root tests to determine the appropriate number of seasonal
differences required.
The following code can
be used to find how to make a seasonal series stationary. The resulting series
stored as xstar has been differenced appropriately-
R-Code- ns <-
nsdiffs(filename)
if(ns > 0) {
xstar <- diff(x,lag=frequency(filename),differences=ns)
} else {
xstar <- filename
}
nd <- ndiffs(xstar)
if(nd > 0) {
xstar <- diff(xstar,differences=nd)
}
if(ns > 0) {
xstar <- diff(x,lag=frequency(filename),differences=ns)
} else {
xstar <- filename
}
nd <- ndiffs(xstar)
if(nd > 0) {
xstar <- diff(xstar,differences=nd)
}
·
Differencing- The
order of differencing needed to stationarize the series. Normally, the
correct amount of differencing is the lowest order of differencing that yields
a time series which fluctuates around a well-defined mean value and whose
autocorrelation function (ACF) plot decays fairly rapidly to zero, either
from above or below. When a variable yt
is not stationary, a common solution is to use differenced variable:
Test
to know whether Differencing is required-
o
Unit
root tests-
1) Augmented Dickey-Fuller (ADF) test-
For this test, the following regression model is estimated:
Where y′t
denotes the first-differenced series, y′t=yt−yt−1
and k is the number of lags to include in the regression.
If the original series, yt,
needs differencing, then the coefficient ϕ^ should be approximately zero. If yt
is already stationary, then ϕ^<0.
R
code- adf.test(filename, alternative =
"stationary")
Large p-values are indicative of
non-stationarity, and small p-values suggest stationarity.
2)
Kwiatkowski-Phillips-Schmidt-Shin- small
p-values (e.g., less than 0.05) suggest that differencing is required.
R
code- kpss.test(filename)
Order
of Differencing-
o
If the series has positive
autocorrelations out to a high number of lags, then it probably needs a higher
order of differencing.
o
If the lag-1 autocorrelation is zero or
even negative, then the series does not need further differencing. Normally -.5
< autocorrelation < 0.
o
If the lag-1 autocorrelation is zero or
negative, or the autocorrelations are all small and pattern less, then the
series does not need a higher order of differencing.
o
The optimal order of differencing is
often the order of differencing at which the standard deviation is lowest.
R-Code-
New name <-diff(filename)
· Autocorrelation
function (ACF)- ACF is the proportion of the auto
covariance of yt and yt-k
to the variance of a dependent variable yt-
o
For stationary time series, the ACF will
drop to zero relatively quickly.
· Partial
autocorrelation function (PACF) - A partial
autocorrelation is the amount of correlation between a variable and a lag of
itself that is not explained by correlations at all lower-order-lags. PACF is
the simple correlation between yt
and yt-k minus the part
explained by the intervening lags-
·
Order of Auto Regression and Moving
Average-
o
If the partial autocorrelation function
(PACF) of the differenced series displays a sharp cutoff and/or the lag-1
autocorrelation is positive--i.e., if the series appears slightly "under differenced"--then
consider adding one or more AR terms to the model. The lag beyond which the
PACF cuts off is the indicated number of AR terms.
o
If the autocorrelation function (ACF) of
the differenced series displays a sharp cutoff and/or the lag-1 autocorrelation
is negative--i.e., if the series appears slightly "over differenced"--then
consider adding an MA term to the model. The lag beyond which the ACF cuts
off is the indicated number of MA terms.
o
It is possible for an AR term and an MA
term to cancel each other's effects, so if a mixed AR-MA model seems to fit the
data, also try a model with one fewer AR term and one fewer MA
term--particularly if the parameter estimates in the original model require
more than 10 iterations to converge. Try, Using multiple AR and MA terms in the
same model.
o
If there is a unit root in the AR part
of the model--i.e., if the sum of the AR coefficients is almost exactly 1--you
should reduce the number of AR terms by one and increase the order of
differencing by one.
o
If there is a unit root in the MA part
of the model--i.e., if the sum of the MA coefficients is almost exactly 1--you
should reduce the number of MA terms by one and reduce the order of
differencing by one.
o
If the long-term forecasts* appear
erratic or unstable, there may be a unit root in the AR or MA coefficients.
· Goodness
of fit- Akaike Information Criterion (AIC) and the Bayesian
Information Criterion (BIC) are two measures goodness of fit. They measure the
trade-off between model fit and complexity of the model.
A lower AIC or BIC
value indicates a better fit.
1.
AIC-
2.
BIC-
L- The value of the likelihood function evaluated at
the parameter estimates.
N- The number of observations
K- The number of
estimated parameters
The
Box-Jenkins Methodology for ARIMA Model Selection
· Examine the time plot of the series.
o
Identify outliers, missing values, and
structural breaks in the data.
o
Non-stationary variables may have a
pronounced trend or have changing variance.
o
Transform the data if needed. Use logs,
differencing, or de-trending.
§ Using
logs works if the variability of data increases over time.
§ Differencing
the data can remove trends. But over-differencing may introduce dependence when
none exists.
· Examine the autocorrelation
function (ACF) and partial autocorrelation function (PACF)-
o
Compare the sample ACF and PACF to those
of various theoretical ARMA models.
o
Use properties of ACF and PACF as a
guide to estimate plausible models and select appropriate p, d, and q.
o
Differencing may be needed if there is a
slow decay in the ACF.
R-Code-
ARIMA.fit<-
arima(filename$Columnname,order=c(1,1,1),seasonal=list(order=c(1,1,1),period=3),include.mean=FALSE)
·
include.mean=FALSE (“code not to take previous
trend”.)
- R automatically puts mean into it forcing the
past trend to continue, turned false if there is no systematic trend upward or
downward.
·
Prediction command – New
Name<-predict(ARIMA.fit, n.ahead=12)
No comments:
Post a Comment