Lecture 9. Forecasting

Date: 2023-03-28

1. Time Series Data

What makes time series data different?

Time series data is unique primarily because it's a sequence of data points typically measured at successive time intervals.

Order Matters: Unlike other forms of data, time series data must be observed in a sequence. Rearranging data points can alter the meaning and lead to incorrect insights.
Correlation: In time series data, points close together in time are often more similar than points further apart. This property is called autocorrelation.
Stationarity: Refers to the idea that statistical properties, like mean and variance, are constant over time.
- Strong vs. Weak: Strong stationarity means that all properties (mean, variance, etc.) are constant. Weak stationarity only requires that mean, variance, and autocorrelation structure remain constant.
- Mean vs. Variance: In the context of stationarity, the mean is the average value, and variance is the measure of how spread out numbers are. For a time series to be stationary, both mean and variance should be consistent over time.

Challenges

Time series data presents challenges because many foundational statistical assumptions don't hold:
1. iid (Independent and Identically Distributed): Unlike other types of data, successive values in time series often depend on prior values.
2. Distribution of Training Equals Testing: The distribution of data in the past (training set) may not represent the distribution in the future (testing set).
3. Fixed Over Time: In time series data, underlying patterns and structures can change over time, making predictions challenging.

Different Time Scales

Real-Time: Refers to data that's recorded and processed immediately, often within milliseconds or seconds. This is typically not covered in Machine Learning courses and is more of a Signal Processing topic.
Long-Term: Refers to forecasting or analyzing data over extended periods, which could be months, years, or even decades.

Methods of Time Series Analysis (Overview)

Time-Domain:

These methods study the series in its raw form:

Autocorrelation: Measures the relationship between a series and a lagged version of itself.
Moving Average: Averages out the fluctuations in data to reveal a trend or pattern.
Exponential Smoothing: Gives different weights to different data points, often giving more importance to recent data.
ARIMA (AutoRegressive Integrated Moving Average): A class of models that captures a suite of different standard temporal structures in time series data.

Frequency-Domain:

These methods analyze the periodic components:

Fourier Transform: Decomposes a function or dataset into its constituent frequencies.
Wavelets: Transform used to break up a time series into its time-frequency components.

2. Time Series Decomposition

Time Series Components

Trend: A consistent upward or downward movement in the data over a period of time.
Seasonality: Regular and predictable changes in a time series that recur every calendar year.
Cyclic: Patterns that arise due to economic fluctuations and aren't of a fixed period. They're longer than seasonal patterns and less predictable. Unlike seasonality, cyclic patterns don't have a fixed and known repeating schedule. They might last for varying lengths of time, typically at least 2 years.
Irregular: Unpredictable fluctuations in the data that don’t follow any patterns, often due to random events or outliers.

Decomposition is a teask that breaks a time series into its constituent components: trend, seasonality, cyclic, and irregular.

Additive vs. Multiplicative Decomposition

Additive: The components are added together to get the time series.
Multiplicative: The components are multiplied together to get the time series.

Steps to Decompose a Time Series

Estimate the trend: Use a moving average or polynomial regression to estimate the trend.
Remove the trend: Subtract the trend component from the time series.
Estimate the seasonal component: Compute the average for a given season, and subtract it from the time series.
Determine the random (irregular) component:
- Additive: random = series - (trend + seasonality)
- Multiplicative: random = series / (trend * seasonality)

3. Time Series Forecasting

Simple Forecasting Methods

Average Method: Forecasts the average of all past observations.
Naive Method: Forecasts the value of the next period to be the same as the value of the current period.
Seasonal Naive Method: Forecasts the value of the next period to be the same as the value of the same period last year.
Drift Method: Forecasts the value of the next period to be the same as the value of the current period plus the average change from one period to another.

ETS Models

ETS stands for Error, Trend, and Seasonality. These models are used to capture these three components in time series data. By decomposing a time series into these parts, it becomes easier to forecast and understand the underlying patterns.

Simple Exponential Smoothing (SES)

Overview: - Simple Exponential Smoothing (SES) is appropriate for time series with no clear trend or seasonality. - It uses weighted averages of past observations to forecast future points, with weights declining exponentially as observations get older.

Formula: $\hat{y}_{t+1} = \alpha y_t + (1-\alpha) \hat{y}_t$ Where: - $\hat{y}_{t+1}$ is the forecast for the next period. - $y_t$ is the actual value at time $t$ . - $\hat{y}_t$ is the forecast for time $t$ . - $\alpha$ is the smoothing parameter (between 0 and 1).

Use Cases: - Stock prices that don’t have a trend or seasonality. - Monitoring changes in metrics that fluctuate around a consistent mean.

Holt-Winters Exponential Smoothing (HWES)

Overview: - HWES is an extension of the SES method, designed to capture both trend and seasonality in a time series. - It uses three equations, one for the level (like SES), one for the trend, and one for seasonality.

Formulas: 1. Level: $\ell_t = \alpha(y_t - s_{t-m}) + (1-\alpha)(\ell_{t-1} + b_{t-1})$ 2. Trend: $b_t = \beta(\ell_t - \ell_{t-1}) + (1-\beta)b_{t-1}$ 3. Seasonal: $s_t = \gamma(y_t - \ell_t) + (1-\gamma)s_{t-m}$ Where: - $\ell_t$ is the smoothed observation at time $t$ . - $b_t$ represents the trend. - $s_t$ is the seasonal factor. - $m$ is the number of seasons (e.g., 12 for monthly data if there’s yearly seasonality). - $\alpha, \beta, \gamma$ are smoothing parameters.

Use Cases: - Monthly sales data with yearly seasonality. - Temperature data with daily readings and yearly seasonality.

Holt-Winters Seasonal Method

Overview: - This method is similar to HWES but emphasizes the seasonality component. - It's particularly suitable for series with strong seasonal effects.

Differences from HWES: - While HWES considers both trend and seasonality, the Holt-Winters Seasonal Method puts a stronger emphasis on the seasonal component. - Adjustments might be made more frequently to the seasonal factors, especially if the seasonality changes over time.

Use Cases: - Retail sales data with strong seasonal components like holiday sales. - Energy consumption data, where usage might peak during winter and summer months due to heating and cooling needs.

AutoRegressive Moving Average (ARMA)

ARMA models combine the ideas of autoregression and moving averages to model time series data. They are well-suited for univariate time series data that displays a blend of autoregressive and moving average behaviors.

AutoRegression (AR)

Overview: - An autoregressive (AR) model predicts future behavior based on past behavior. It assumes the current value of the time series is correlated with its past values.

Formula: $y_t = c + \phi_1 y_{t-1} + \phi_2 y_{t-2} + ... + \phi_p y_{t-p} + \varepsilon_t$ Where:

$y_t$ is the value at time $t$ .
$c$ is a constant.
$\phi_1, \phi_2, ...$ are the model parameters.
$p$ is the order of the AR model.
$\varepsilon_t$ is white noise.

Moving Average (MA)

Overview: - A moving average (MA) model represents the relationship between an observed value and white noise or error terms from previous periods.

Formula: $y_t = c + \varepsilon_t + \theta_1 \varepsilon_{t-1} + \theta_2 \varepsilon_{t-2} + ... + \theta_q \varepsilon_{t-q}$ Where:

$\theta_1, \theta_2, ...$ are the model parameters.
$q$ is the order of the MA model.

AutoRegressive Moving Average (ARMA)

Overview: - ARMA combines both AR and MA components. It's suitable for time series without a trend or seasonal components.

Formula: $y_t = c + \phi_1 y_{t-1} + ... + \phi_p y_{t-p} + \varepsilon_t + \theta_1 \varepsilon_{t-1} + ... + \theta_q \varepsilon_{t-q}$ Where:

$p$ and $q$ are the orders of the AR and MA parts, respectively.

AutoRegressive Integrated Moving Average (ARIMA)

Overview: - ARIMA extends ARMA by adding an "I" for integrated. It captures non-stationarity in the time series by differencing the data.

Components:

AR: Autoregression.
I: Differencing.
MA: Moving Average.

Order: - Described by three parameters: $(p, d, q)$ - $p$ : Order of the AR part. - $d$ : Degree of differencing. - $q$ : Order of the MA part.

Seasonal AutoRegressive Integrated Moving Average (SARIMA)

Overview:

SARIMA extends ARIMA to capture seasonality in the time series.

Additional Components:

Seasonal AR, Seasonal MA, and seasonal differencing.

Order:

Described by the parameters: for the non-seasonal parts and for the seasonal parts.
- $P$ : Order of the seasonal AR part.
- $D$ : Degree of seasonal differencing.
- $Q$ : Order of the seasonal MA part.
- $m$ : Number of time periods in each season (e.g., 12 for monthly data with yearly seasonality).

4. Other Time Series Models

Differencing

Differencing is a technique used to transform a time series into a stationary one, which is an essential step before applying models like ARIMA. A time series that's stationary has properties that don't change over time.

First Order Differencing

Overview: - It involves taking the difference between consecutive observations.

Formula: $\Delta y_t = y_t - y_{t-1}$ Where: - $\Delta y_t$ is the difference at time $t$ . - $y_t$ is the value at time $t$ .

Second Order Differencing

Overview: - If the first order differencing doesn't result in a stationary series, a second order differencing can be used.

Formula: $\Delta^2 y_t = \Delta y_t - \Delta y_{t-1}$ Where: - $\Delta^2 y_t$ is the second difference at time $t$ .

Seasonal Differencing

Overview: - This is used when a time series has a repeating pattern or seasonality.

Formula: $\Delta^m y_t = y_t - y_{t-m}$ Where: - $m$ is the number of seasons (e.g., 12 for monthly data if there’s yearly seasonality).

Box-Jenkins Methodology

The Box-Jenkins methodology, often associated with ARIMA modeling, is a systematic method of identifying, estimating, and checking time series models.

Step 1: Identification

Goal: - Determine the order of differencing ( $d$ ) and the order of the AR ( $p$ ) and MA ( $q$ ) terms.

Methods: - Plot the series and look for trends or seasonality. - Use the Autocorrelation Function (ACF) and the Partial Autocorrelation Function (PACF) plots to determine the orders $p$ and $q$ .

Step 2: Estimation

Goal: - Estimate the parameters of the model using methods like Maximum Likelihood Estimation (MLE).

Methods: - Once you've identified the model, use software packages like Python's statsmodels or R's forecast to fit the model to the data and estimate its parameters.

Step 3: Diagnostic Checking

Goal: - Check if the identified model fits the data well.

Methods: - Use residual plots to see if there's any pattern left in the residuals (which would suggest the model isn't capturing some aspect of the time series). - Use statistical tests like the Ljung-Box test to check the goodness of fit.

5. Evaluation of Time Series Models

Model evaluation provides a way to judge how well your time series model will predict future values. Here's how it's done:

1. Splitting the Data

Overview:

Just as in traditional machine learning, time series data is typically split into a training set and a test (or validation) set.

Best Practices:

Due to the sequential nature of time series data, a simple random split isn't appropriate. Instead, use techniques like a time-based split, rolling forecast origin, or walk-forward validation.

2. Residual Analysis

Overview:

Residuals are the difference between the observed values and the values predicted by the model. Analyzing residuals can provide insights into model performance and potential improvements.

Methods:

Plotting residuals can help identify patterns or correlations that the model hasn't captured.
Ideally, residuals should resemble white noise, meaning they're randomly distributed and have no discernible pattern.

3. Quantitative Measures

There are several metrics that can be used to quantitatively evaluate the performance of time series models:

Mean Absolute Error (MAE): The average of the absolute differences between the predicted and actual values. It gives a linear penalty for errors.

$\text{MAE} = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|$

Mean Squared Error (MSE): The average of the squared differences between the predicted and actual values. It gives a higher penalty for larger errors.

$\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$

Root Mean Squared Error (RMSE): The square root of MSE. It's in the same units as the data, making it more interpretable than MSE.

$\text{RMSE} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2}$

Mean Absolute Percentage Error (MAPE): Expresses the forecast errors as a percentage. Useful when comparing forecasts across different scales.

$\text{MAPE} = \frac{1}{n} \sum_{i=1}^{n} \left| \frac{y_i - \hat{y}_i}{y_i} \right| \times 100\%$

4. Model Comparisons

Overview:

When multiple models are fit to the same data, it's essential to compare them to determine the best one.

Best Practices:

Use the same evaluation metric across all models for consistent comparison.
Be aware of overfitting. A model that's too complex might perform well on the training data but poorly on new, unseen data.

5. Out-of-Sample Testing

Overview:

After selecting the best model based on the training data, test it on out-of-sample data (the test set) to see how it performs on new data it hasn't seen before.

Best Practices:

Ensure that the test set is representative of the future periods you're trying to forecast.

6. Q&A

1. Q: What is time series decomposition?

A: Time series decomposition is a technique that breaks down a time series into its constituent components, typically trend, seasonality, and residuals (or error).

2. Q: Why is stationarity important in time series analysis?

A: Stationarity is important because most time series forecasting models assume that the data is stationary. A stationary time series has properties that don't change over time, making it easier to model. If a time series is non-stationary, it can be transformed to become stationary through methods like differencing.

3. Q: What is the difference between an AR and MA process?

A: An AR (AutoRegressive) process models the relationship between an observation and a number of lagged observations. MA (Moving Average) process, on the other hand, models the relationship between an observation and a residual error from a moving average model applied to lagged observations.

4. Q: How can seasonality be detected in a time series?

A: Seasonality can be detected by visual inspection of a plotted time series, noticing recurring patterns at regular intervals. Additionally, autocorrelation plots and spectral analysis can provide statistical evidence of seasonality.

5. Q: What is the Box-Jenkins methodology?

A: The Box-Jenkins methodology refers to a systematic process for ARIMA modeling, which involves three main steps: model identification, parameter estimation, and model checking.

6. Q: Why might one use exponential smoothing in time series forecasting?

A: Exponential smoothing is used because it gives more weight to recent observations, making it adaptive to changes. It's particularly useful for time series data that has a trend and/or seasonal components.

7. Q: How can you evaluate the accuracy of a time series forecast?

A: The accuracy of a forecast can be evaluated using metrics like MAE (Mean Absolute Error), RMSE (Root Mean Squared Error), and MAPE (Mean Absolute Percentage Error). Additionally, residual analysis can be performed to ensure that the residuals are random and follow a white noise pattern.

8. Q: What is SARIMA, and how does it differ from ARIMA?

A: SARIMA stands for Seasonal AutoRegressive Integrated Moving Average. It's an extension of ARIMA that incorporates seasonality. While ARIMA models the non-seasonal components of a time series, SARIMA adds additional seasonal terms to model the seasonal patterns.

9. Q: Can time series models account for external factors or events?

A: Yes, certain models, like regression with ARIMA errors, allow for the inclusion of external variables. This way, the impact of external factors or events can be quantified and incorporated into the forecast.

10. Q: In the context of time series, what is "differencing"?

A: Differencing is a transformation technique used to stabilize the mean of a time series by removing changes in the level of a time series, thus eliminating (or reducing) trend and seasonality.

11. Q: What is a lag in the context of time series?

A: A lag is a time step back in the past. In time series, we often refer to "lagged observations", meaning observations from previous time steps. For instance, a lag of 1 is the previous observation, a lag of 2 is the observation from two time steps ago, and so on.

12. Q: Why might you apply a log transformation to a time series?

A: Log transformation can be applied to stabilize the variance of a time series, especially when there's a multiplicative relationship or exponential growth. It can also help linearize certain types of non-linear patterns.

13. Q: What is the significance of ACF (Autocorrelation Function) and PACF (Partial Autocorrelation Function) plots in time series analysis?

A: ACF and PACF plots are tools for identifying the order of AR (AutoRegressive) and MA (Moving Average) terms in a time series model. Specifically, the ACF plot shows the correlation of a series with its lags, while the PACF plot shows the correlation after removing the effect of previous lags.

14. Q: How can outliers affect time series forecasting?

A: Outliers can distort the pattern that the model is trying to capture, leading to inaccurate forecasts. Some models are sensitive to outliers, which can affect parameter estimates and future predictions.

15. Q: What is the concept of overfitting in time series forecasting?

A: Overfitting occurs when a model captures not only the underlying pattern in the data but also the random noise. An overfitted model may have great performance on the training data but performs poorly on new, unseen data because it's too tailored to the training set.

16. Q: Can you explain what "cointegration" means in time series?

A: Cointegration refers to a statistical relationship between two or more non-stationary time series, where even if the individual series are non-stationary, a linear combination of them is stationary. This is a key concept in multivariate time series analysis, suggesting a long-term equilibrium relationship between the series.

17. Q: How is the "rolling forecast origin" technique useful?

A: The rolling forecast origin technique involves moving the starting point of the forecast forward in time for model validation. It's useful for assessing the out-of-sample predictive performance of a time series model, especially when data is limited.

18. Q: Why might you use a state space model for time series forecasting?

A: State space models are flexible and can handle a variety of time series structures. They can deal with multiple seasonal patterns, missing values, and even incorporate external regressors. State space models can also be easily updated with new data, making them suitable for real-time forecasting.

19. Q: What is meant by the "frequency" of a time series?

A: The frequency of a time series refers to the number of observations per unit of time. For instance, a monthly time series has a frequency of 12 (for 12 months in a year), while a daily time series might have a frequency of 365 (for 365 days in a year).

20. Q: How does a multiplicative time series model differ from an additive one?

A: In an additive model, the components of a time series (like trend and seasonality) are added together. In a multiplicative model, these components are multiplied. The choice between them depends on the nature of the time series. For instance, if the amplitude of seasonal fluctuations increases over time, a multiplicative model might be more appropriate.