Homework 9. Forecasting: Airline Passengers

Introduction

This assignment is based on Airline Passengers, a time series dataset containing the number of passengers traveling by plane per month from 1949 to 1960. The goal is to forecast the number of passengers for the next 12 months.

Data Preprocessing

Load Data

df = pd.read_csv('./airline-passengers.csv', index_col=[0], parse_dates=True)
df.index = pd.DatetimeIndex(df.index.values, freq=df.index.inferred_freq)

Split data

train, test = df.iloc[:130, [0]], df.iloc[130:, [0]]

Visualize data

df.plot()
plt.show()

airline_time_series

The data is not stationary, so we need to apply differencing to make it stationary.

Time Series Decomposition

Seasonal Decomposition (Additive)

from statsmodels.tsa.seasonal import seasonal_decompose

result = seasonal_decompose(df['airline passengers'], model='additive')
result.plot()
pyplot.show()

airline_decomposed_additive

We can see the residual is not stationary. This is because appearently the seasonaility is multiplicative, not additive. Let's try again with multiplicative seasonality.

Seasonal Decomposition (Multiplicative)

result = seasonal_decompose(df['airline passengers'], model='multiplicative')
result.plot()
pyplot.show()

airline_decomposed_multiplicative

The residual is now stationary. We can use this to build our model.

Model Building

Simple Exponential Smoothing (SES)

from statsmodels.tsa.holtwinters import SimpleExpSmoothing

model = SimpleExpSmoothing(train['airline passengers'].astype(np.float))
model_fit = model.fit(smoothing_level=0.8)
z = model_fit.predict(start=test.index[0], end=test.index[-1])

print("rmse:" + str(sqrt(mean_squared_error(test, z))))
print("mae:" + str(mean_absolute_error(test, z)))

Expontential Smoothing (Holt)

from statsmodels.tsa.holtwinters import ExponentialSmoothing
model = ExponentialSmoothing(train['airline passengers'].astype(np.float))

# the rest of the code is the same as SES

AutoRegression (AR)

from statsmodels.tsa.ar_model import AR

p = 2
model = AR(train['airline passengers'].astype(np.float))

# the rest of the code is the same as SES

AutoRegression Moving Average (ARMA)

from statsmodels.tsa.arima_model import ARMA

model = ARMA(train['airline passengers'].astype(np.float), order=(0, 1))

# the rest of the code is the same as SES

AutoRegression Integrated Moving Average (ARIMA)

from statsmodels.tsa.arima_model import ARIMA

model = ARIMA(train['airline passengers'].astype(np.float), order=(2, 1, 1))

# the rest of the code is the same as SES

Results

Model	RMSE	MAE
Simple Exponential Smoothing (SES)	86.93	62.10
Expontential Smoothing (Holt)	94.83	67.35
AutoRegression (AR)	116.31	87.59
AutoRegression Moving Average (ARMA)	214.85	197.20
AutoRegression Integrated Moving Average (ARIMA)	73.09	51.12

The best model is ARIMA, with RMSE of 73.09 and MAE of 51.12.

Discussion

In forecasting the Airline Passengers dataset, different models were evaluated. The raw data displayed a clear trend and seasonality, initially appearing additive but upon further inspection, was identified as multiplicative. By adjusting to a multiplicative decomposition, the residuals were made stationary, enabling more effective forecasting.

Models evaluated include Simple Exponential Smoothing (SES), Holt's Exponential Smoothing, AutoRegression (AR), AutoRegression Moving Average (ARMA), and AutoRegression Integrated Moving Average (ARIMA). Among them, ARIMA performed best with the lowest RMSE and MAE, suggesting it handled both the trend and seasonality of the dataset most effectively.

Possible improvements could involve tuning ARIMA's hyperparameters further or considering more advanced models like Prophet or LSTM, which can also capture seasonality and trends. Additionally, given the changing nature of the airline industry, incorporating external factors or events could refine forecasts.