Homework 9. Forecasting: Airline Passengers
Introduction
This assignment is based on Airline Passengers, a time series dataset containing the number of passengers traveling by plane per month from 1949 to 1960. The goal is to forecast the number of passengers for the next 12 months.
Data Preprocessing
Load Data
df = pd.read_csv('./airline-passengers.csv', index_col=[0], parse_dates=True)
df.index = pd.DatetimeIndex(df.index.values, freq=df.index.inferred_freq)
Split data
train, test = df.iloc[:130, [0]], df.iloc[130:, [0]]
Visualize data
df.plot()
plt.show()
The data is not stationary, so we need to apply differencing to make it stationary.
Time Series Decomposition
Seasonal Decomposition (Additive)
from statsmodels.tsa.seasonal import seasonal_decompose
result = seasonal_decompose(df['airline passengers'], model='additive')
result.plot()
pyplot.show()
We can see the residual is not stationary. This is because appearently the seasonaility is multiplicative, not additive. Let's try again with multiplicative seasonality.
Seasonal Decomposition (Multiplicative)
result = seasonal_decompose(df['airline passengers'], model='multiplicative')
result.plot()
pyplot.show()
The residual is now stationary. We can use this to build our model.
Model Building
Simple Exponential Smoothing (SES)
from statsmodels.tsa.holtwinters import SimpleExpSmoothing
model = SimpleExpSmoothing(train['airline passengers'].astype(np.float))
model_fit = model.fit(smoothing_level=0.8)
z = model_fit.predict(start=test.index[0], end=test.index[-1])
print("rmse:" + str(sqrt(mean_squared_error(test, z))))
print("mae:" + str(mean_absolute_error(test, z)))
Expontential Smoothing (Holt)
from statsmodels.tsa.holtwinters import ExponentialSmoothing
model = ExponentialSmoothing(train['airline passengers'].astype(np.float))
# the rest of the code is the same as SES
AutoRegression (AR)
from statsmodels.tsa.ar_model import AR
p = 2
model = AR(train['airline passengers'].astype(np.float))
# the rest of the code is the same as SES
AutoRegression Moving Average (ARMA)
from statsmodels.tsa.arima_model import ARMA
model = ARMA(train['airline passengers'].astype(np.float), order=(0, 1))
# the rest of the code is the same as SES
AutoRegression Integrated Moving Average (ARIMA)
from statsmodels.tsa.arima_model import ARIMA
model = ARIMA(train['airline passengers'].astype(np.float), order=(2, 1, 1))
# the rest of the code is the same as SES
Results
Model | RMSE | MAE |
---|---|---|
Simple Exponential Smoothing (SES) | 86.93 | 62.10 |
Expontential Smoothing (Holt) | 94.83 | 67.35 |
AutoRegression (AR) | 116.31 | 87.59 |
AutoRegression Moving Average (ARMA) | 214.85 | 197.20 |
AutoRegression Integrated Moving Average (ARIMA) | 73.09 | 51.12 |
The best model is ARIMA, with RMSE of 73.09 and MAE of 51.12.
Discussion
In forecasting the Airline Passengers dataset, different models were evaluated. The raw data displayed a clear trend and seasonality, initially appearing additive but upon further inspection, was identified as multiplicative. By adjusting to a multiplicative decomposition, the residuals were made stationary, enabling more effective forecasting.
Models evaluated include Simple Exponential Smoothing (SES), Holt's Exponential Smoothing, AutoRegression (AR), AutoRegression Moving Average (ARMA), and AutoRegression Integrated Moving Average (ARIMA). Among them, ARIMA performed best with the lowest RMSE and MAE, suggesting it handled both the trend and seasonality of the dataset most effectively.
Possible improvements could involve tuning ARIMA's hyperparameters further or considering more advanced models like Prophet or LSTM, which can also capture seasonality and trends. Additionally, given the changing nature of the airline industry, incorporating external factors or events could refine forecasts.