ANLY482 AY2017-18T2 Group10 Analysis & Findings: Analysis

From Analytics Practicum
Jump to navigation Jump to search

Tennet logo.png


HOME

ABOUT US

PROJECT OVERVIEW

ANALYSIS & FINDINGS

PROJECT MANAGEMENT

BACK TO MAIN ANLY482

EDA

Recommendations

Model

Overview

There are various time-series forecasting methods available, and our research paper will perform a comparison of two more commonly used time-series forecasting techniques – (1) exponential smoothing and (2) autoregressive integrated moving average (ARIMA), to determine the appropriate model to use.

Box-Jenkins Autoregressive Integrated Moving Average (ARIMA)

The Box-Jenkins ARIMA model is a combination of Auto Regressive (AR), Integrated (I) and Moving Average (MA) models. This assumes that the time series data is stationary. If it is not, differencing must be performed to make it stationary. An effective fitting of Box-Jenkins models requires at least a moderately long series. This has been recommended to contain at least 50 to 100 observations

The standard notation for ARIMA is denoted by: ARIMA(p,d,q)

  • p: the number of autoregressive terms(AR)
  • d: the number of times the series is differenced before it becomes stationary (I)
  • q: the number of moving average terms

We used Box-Jenkins ARIMA to build our model, which consists of the following iterative steps:

  1. Identification. Using the customer data, we performed analyses such as autocorrelation plot, partial autocorrelations and the augmented Dickey-Fuller stationary test. We then used those analyses to estimate appropriate values for p, d and q.
  2. Estimation and testing. Numerically approximating the solutions of nonlinear equations, using techniques such as nonlinear least square and maximum likelihood estimation.
  3. Diagnostic Checking. The fitted model is checked for inadequacies by considering the autocorrelations of the residual series (the series of residuals, or error values).

In Model Identification, we determined that the data stationary and non-seasonal. Augmented Dickey-Fuller (ADF) test is used to check for stationarity, while Autocorrelation plot and Partial Autocorrelation plot(PACF) is used to check for seasonality. From both test, we were able to determine that the dataset is non-stationary and non-seasonal. As such, we would be differencing the data before doing any analysis on it. Next we would be using plotting ACF and PACF and conducting Grid Search to identify the best parameters for ARIMA. We determined that the best 3 parameters for ARIMA forecasting are (0,1,1), (0,1,2) and (1,1,1) as they have the lowest RMSE, MAPE and MAE.

From all 3 ARIMA models, we plot an observed vs predicted time-series line graph and use Root-Mean-Square-Error (RMSE), Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE) to measure the accuracy of all models and for comparison with the Exponential Smoothing models. The following results were obtained:

Tennet-error-table1.png

Furthermore, using the 3 ARIMA models, we made out-of-sample weekly forecasts for the next 2 months and compared it with the observed weekly forecast for the next 2 months. Like the in-sample forecast RMSE, MAE and MAPE were used to measure the accuracy of ARIMA and use it for comparison with Exponential Smoothing. The results of the out-of-sample error forecast are shown below:

Tennet-error-table2.png

Exponential Smoothing

Exponential smoothing aims to isolate trends or seasonality from irregular variation and has been found to be most effective when the components describing the time series vary slowly as time passes. In calculating the new estimate, the estimate for the current period and a portion of the current period’s generated random error are combined. Past data is weighted unequally with the effect of recent observations expected to decline exponentially as time passes

In the paper “A state space framework for automatic forecasting using exponential smoothing methods”, the authors adopt a well-established taxonomy as a framework to choose between various exponential smoothing methods. This framework identifies the presence or absence of a trend component and seasonality component within the data being analysed.

To decide on the best model, we are implementing Rob J. Hyndman’s state space framework, that has the general notation of ETS (Error, Trend, Seasonal), where:

  • Error: The type of error function
  • Trend: Function of trend
  • Seasonal: Function of seasonality

Each component in the framework can either be Not present, Additive, Additive Damped, Multiplicative or Multiplicative Damped. For example the notation of ETS(A,N,N) represents Simple Exponential Smoothing - additive errors, no trend, no seasonality.

Hyndman’s framework applies each of the 24 possible exponential smoothing methods in the state space framework to our data set and decides on the best model using the AIC, BIC and AICc. In our dataset, the optimal method was ETS(M,N,N) which represents multiplicative errors with no trend and no seasonality. Additionally, also included the ETS(A,N,N) which is a Simple Exponential Smoothing (SES) as our candidate models for the final comparison with ARIMA.


Comparison between ARIMA and Exponential Smoothing Forecasting Models

This paper investigated two commonly used forecasting methods: Exponential Smoothing and ARIMA and applied them to predict customer count. Below is the summary of the ETS and ARIMA Models:

Tennet-error-table3.png

Based on the ARIMA model (1,1,1), it resulted in a forecast with an RMSE of 92.26, which was not the best performing model when predicting in-sample. However, when predicting out-of-sample, the RMSE was 107.120, higher than ARIMA (0,1,1) and ARIMA (0,1,2) which had lower errors for in-sample. This shows that the best performing models may still be slightly overfitted, and not perform as well for out-of-sample forecasting. On the other hand, the Exponential Smoothing models had an out-of-sample RMSE much higher than the in-sample RMSE, indicating overfitting and its unsuitability for forecasting of customer count on a weekly basis.

In conclusion, we determined that for weekly forecasting of customer count the best time series forecasting model is ARIMA.