Business Analytics Report
Submitted to:
Concerned
faculty At
Great learning
The University of Texas & Austin
Submitted By:
Mr. Charit Sharma
PGPDSBA Online July 2021
Subject: - Time Series Forecasting – ROSE DATA SET
1 Time Series Forecasting Project Report by Charit Sharma
Contents
Problem Statement
For this particular assignment, the data of different types of wine sales in the 20th century is to be
analyzed. Both of these data are from the same company but of different wines. As an analyst in the
ABC Estate Wines, you are tasked to analyze and forecast Wine Sales in the 20th century.
1. Read the data as an appropriate Time Series data and plot the data.
2. Perform appropriate Exploratory Data Analysis to understand the data and also perform
decomposition.
3. Split the data into training and test. The test data should start in 1991.
4. Build all the exponential smoothing models on the training data and evaluate the model using
RMSE on the test data. Other additional models such as regression, naïve forecast models,
simple average models, moving average models should also be built on the training data and
check the performance on the test data using RMSE.
5. Check for the stationarity of the data on which the model is being built on using appropriate
statistical tests and also mention the hypothesis for the statistical test. If the data is found to
be non-stationary, take appropriate steps to make it stationary. Check the new data for
stationarity and comment. Note: Stationarity should be checked at alpha = 0.05.
6. Build an automated version of the ARIMA/SARIMA model in which the parameters are
selected using the lowest Akaike Information Criteria (AIC) on the training data and evaluate
this model on the test data using RMSE.
7. Build ARIMA/SARIMA models based on the cut-off points of ACF and PACF on the training data
and evaluate this model on the test data using RMSE.
8. Build a table with all the models built along with their corresponding parameters and the
respective RMSE values on the test data.
9. Based on the model-building exercise, build the most optimum model(s) on the complete data
and predict 12 months into the future with appropriate confidence intervals/bands.
10. Comment on the model thus built and report your findings and suggest the measures that the
company should be taking for future sales.
1. Read the data as an appropriate Time Series data and plot the data.
Import necessary library
import numpy as np
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt
from matplotlib.pylab import rcParams
rcParams['figure.figsize'] = 13, 6
Afterwards read the dataset call them pd read command in the pandas
Then apply the head command for top 5
Rose
YearMonth
0 1980-01 112.0
1 1980-02 118.0
2 1980-03 129.0
3 1980-04 99.0
4 1980-05 116.0
Afterwards apply the Tail command to check the below 5
Then apply the Describe command to check the basic statistical details like
percentile, mean, std etc. of a data frame or a series of numeric values.
3 Time Series Forecasting Project Report by Charit Sharma
Now we will plot the graph, this shows
It shows a downward trend of production in Year 1981 it was way above then
250 and in year 1995 its marginally above 50 we can say near to 60
2. Perform appropriate Exploratory Data Analysis to understand the data and
also perform decomposition.
We will apply the Transcribe command to check the data in one table
Now we apply the isnull function to check the missing values or the null values
in the dataset
Now we will apply the shape command to check the size and dimension of the
dataframe
(187, 1)
Then we will apply the df.info to get further info of the dataframe
Now we will construct a boxplot here X label will be the Rose Wine Sale and Y
the number of years
Afterwards we will construct a boxplot to check the monthly trend
plt.xlabel('Monthly Rose Wine Sale');
plt.ylabel('Months');
5 Time Series Forecasting Project Report by Charit Sharma
Now we will construct a monthly graph by using statsmodel
It’s a monthly plot of rose wine production.
month_plot(df,ylabel='Rose Wine Production',ax=ax)
plt.grid();
Now we will construct a pivot table for monthly sales across years of Rose
Wine
Then we will construct a plot to show the monthly sales across the years.
Now we will check the Sum of the Observations of each year'
7 Time Series Forecasting Project Report by Charit Sharma
Now we will check the average mean df_yearly_mean of the Years and shall
apply the head command for first 5
Then we will check the 'Mean of the Observations of each year'
Now we will check the Quarterly mean
Now we will construct the df_quarterly_sum.plot();
df_daily_sum = df.resample('D').sum() Now we checked the monthly info
9 Time Series Forecasting Project Report by Charit Sharma
df_daily_sum.plot()
Now we checked the Daily yearly trends
df_decade_sum = df.resample('10Y').sum()
Now we checked the decade info
Now we have constructed the plot to show the trend
df_decade_sum.plot();
Now we will import ecdf from stats model
# statistics
from statsmodels.distributions.empirical_distribution import ECDF
cdf = ECDF(df['Rose'])
plt.plot(cdf.x, cdf.y, label = "statmodels");
plot x label shows the sales of rose wine
11 Time Series Forecasting Project Report by Charit Sharma
# group by date and get average RetailSales, and precent change
df['1994']
df.interpolate(methods='spline',order=3,inplace=True)
df['1994']
Now we will check the missing values in the dataframe
13 Time Series Forecasting Project Report by Charit Sharma
Now from stats model we will import seasonal decompose values
decomposition = seasonal_decompose(df['Rose'],model='additive')
This shows trends, seasonal and resid graphs.
trend = decomposition.trend
seasonality = decomposition.seasonal
residual = decomposition.resid
deaseasonalized_ts = trend + residual
decomposition = seasonal_decompose(df['Rose'],model='multiplicative')
15 Time Series Forecasting Project Report by Charit Sharma
trend = decomposition.trend
seasonality = decomposition.seasonal
residual = decomposition.resid
deaseasonalized_ts = trend + residual
"Original Time Series", "Time Series without Seasonality Component"
3. Split the data into training and test. The test data should start in 1991.
train=df[df.index.year< 1991]
test=df[df.index.year>=1991]
Then we will apply the shape command
(132, 1)
print(test.shape)
(55, 1)
We will then print first few rows of training data and last few rows of training
data.
17 Time Series Forecasting Project Report by Charit Sharma
Now we will do the same for Test Data first few and last few rows of it
Now we will plot the test and train data
train['Rose'].plot(figsize=(13,5), fontsize=14)
test['Rose'].plot(figsize=(13,5), fontsize=14)
4. Build various exponential smoothing models on the training data and evaluate the model
using RMSE on the test data. Other models such as regression,naïve forecast models and
simple average models. should also be built on the training data and check the
performance on the test data using RMSE.
train_time = [i+1 for i in range(len(train))]
test_time = [i+43 for i in range(len(test))]
Model 1: Linear Regression
LinearRegression_train = train.copy()
LinearRegression_test = test.copy()
Now we will import the sklearn linear model
lr = LinearRegression()
'Regression On Time_Test Data'
19 Time Series Forecasting Project Report by Charit Sharma
from sklearn import metrics
## Test Data – RMSE
For RegressionOnTime forecast on the Test Data, RMSE is 51.433
resultsDf = pd.DataFrame({'Test RMSE':
[rmse_model1_test]},index=['RegressionOnTime'])
resultsDf
Test RMSERegressionOnTime51.433312
Model 2 : Naive Approach
NaiveModel_train = train.copy()
NaiveModel_test = test.copy()
NaiveModel_test['naive'] =
np.asarray(train['Rose'])[len(np.asarray(train['Rose']))-1]
NaiveModel_test['naive'].head()
'Naive Forecast on Test Data'
## Test Data – RMSE
For Naive forecast on the Test Data, RMSE is 79.719
resultsDf_2 = pd.DataFrame({'Test RMSE':
[rmse_model2_test]},index=['NaiveModel'])
Test RMSERegressionOnTime51.433312NaiveModel79.718773
Model 3: Simple Average
SimpleAverage_train = train.copy()
SimpleAverage_test = test.copy()
'Simple Average on Test Data'
21 Time Series Forecasting Project Report by Charit Sharma
## Test Data – RMSE
For Simple Average forecast on the Test Data, RMSE is 53.461
resultsDf_3 = pd.DataFrame({'Test RMSE':
[rmse_model3_test]},index=['SimpleAverageModel'])
Model 4: Moving Average
Average means
# Plotting on the whole data
#Creating train and test set
# Test Data - RMSE --> 2 point Trailing MA
For 2 point Moving Average Model forecast on the Training Data, RMSE is 1
1.529
For 4 point Moving Average Model forecast on the Training Data, RMSE is 1
4.451
For 6 point Moving Average Model forecast on the Training Data, RMSE is 1
4.566
For 9 point Moving Average Model forecast on the Training Data, RMSE is 1
4.728
23 Time Series Forecasting Project Report by Charit Sharma
Model 5: Simple Exponential Smoothing
Exponential smoothing is generally used to make short term forecasts, but longer-term forecasts
using this technique can be quite unreliable.
from statsmodels.tsa.api import ExponentialSmoothing, SimpleExpSmoothing,
Holt
import warnings
warnings.filterwarnings("ignore")
SES_train = train.copy()
SES_test = test.copy()
## Plotting on both the Training and Test data
'Alpha =0.098 Predictions for Rose Wine'
## Test Data
rmse_model5_test_1 =
metrics.mean_squared_error(SES_test['Rose'],SES_test['predict'],squared=Fals
e)
For Alpha =0.098 Simple Exponential Smoothing Model forecast on the Test
Data, RMSE is 36.796
resultsDf_5 = pd.DataFrame({'Test RMSE':
[rmse_model5_test_1]},index=['Alpha=0.098,SimpleExponentialSmoothing'])
## First we will define an empty dataframe to store our values from the loop
25 Time Series Forecasting Project Report by Charit Sharma
## Plotting on both the Training and Test data
"Alpha= 0.3 Simple Exponential Smoothing Rose Wine
'Alpha=0.3,SimpleExponentialSmoothing','Alpha=0.4,SimpleExponentialSmoothi
ng'
Model 6: Double Exponential Smoothing - Holt's Model
Holt's Double Parameter Exponential Smoothing. This method is an extension of Brown's
method. In the Holt model a growth factor is added to the smoothing equation.
DES_train = train.copy()
DES_test = test.copy()
## First we will define an empty dataframe to store our values from the loop
Alpha ValuesBeta ValuesTrain RMSETest RMSE
for i in np.arange(0.3,1.1,0.1):
for j in np.arange(0.1,1.1,0.1):
model_DES_alpha_i_j =
model_DES.fit(smoothing_level=i,smoothing_slope=j,optimized=False,use_brut
e=True)
DES_train['predict',i,j] = model_DES_alpha_i_j.fittedvalues
DES_test['predict',i,j] = model_DES_alpha_i_j.forecast(steps=len(test))
rmse_model6_train =
metrics.mean_squared_error(DES_train['Rose'],DES_train['predict',i,j],squared=
False)
rmse_model6_test =
metrics.mean_squared_error(DES_test['Rose'],DES_test['predict',i,j],squared=Fa
lse)
resultsDf_7 = resultsDf_7.append({'Alpha Values':i,'Beta Values':j,'Train
RMSE':rmse_model6_train,'Test RMSE':rmse_model6_test}, ignore_index=True)
resultsDf_7.sort_values(by=['Test RMSE']).head()
27 Time Series Forecasting Project Report by Charit Sharma
## Plotting on both the Training and Test data
'Alpha=0.3,Beta=0.3,DoubleExponentialSmoothing predictions on Test Set
resultsDf_7_1 = pd.DataFrame({'Test RMSE': [resultsDf_7['Test RMSE'][0]]}
,index=['Alpha=0.3,Beta=0.1,DoubleExponentialSmoothing'])
Model 7: Triple Exponential Smoothing - Holt Winter's Model
29 Time Series Forecasting Project Report by Charit Sharma
TripleExponentialSmoothing predictions on Test Set
31 Time Series Forecasting Project Report by Charit Sharma
5. 5 Check for the stationarity of the data on which the model is being built on using
appropriate statistical tests and also mention the hypothesis for the statistical test. If the
data is found to be nonstationary, take appropriate steps to make it stationary. Check the
new data for stationarity and comment. Note: Stationarity should be checked at alpha =
0.05.
## Test for stationarity of the series - Dicky Fuller test
6. Build an automated version of the ARIMA/SARIMA model in which the parameters are
selected using the lowest Akaike Information Criteria (AIC) on the training data and
evaluate this model on the test data using RMSE.
33 Time Series Forecasting Project Report by Charit Sharma
'Rose Wine Differenced Data Partial Autocorrelation')
35 Time Series Forecasting Project Report by Charit Sharma
## Sort the above AIC values in the ascending order to get the parameters for
the minimum AIC value
resultsDf_9 = pd.DataFrame({'Test RMSE': [rmse]}
,index=['ARIMA(0,1,2)'])
resultsDf = pd.concat([resultsDf, resultsDf_9])
resultsDf
37 Time Series Forecasting Project Report by Charit Sharma
Now we will create a loop for pdq values
import statsmodels.api as sm
for param in pdq:
for param_seasonal in model_pdq:
SARIMA_model = sm.tsa.statespace.SARIMAX(train['Rose'].values,
order=param,
seasonal_order=param_seasonal,
enforce_stationarity=False,
enforce_invertibility=False)
39 Time Series Forecasting Project Report by Charit Sharma
41 Time Series Forecasting Project Report by Charit Sharma
7. Build ARIMA/SARIMA models based on the cut-off points of ACF and PACF
on the training data and evaluate this model on the test
data using RMSE.
Manual ARIMA
43 Time Series Forecasting Project Report by Charit Sharma
predicted_manual_ARIMA = results_manual_ARIMA.forecast(steps=len(test))
15.732718754522914
MANUAL SARIMA
Df.plot is constructed to check the trend
45 Time Series Forecasting Project Report by Charit Sharma
47 Time Series Forecasting Project Report by Charit Sharma
8. Build a table (create a data frame) with all the models built along with their corresponding
parameters and the respective RMSE values on the test data.
temp_resultsDf = pd.DataFrame({'Test RMSE': [rmse]},
index=['SARIMA(1,1,2)(2,0,2,6)'])
('Sorted by RMSE values on the Test Data for Rose Wine sale:','\n',)
49 Time Series Forecasting Project Report by Charit Sharma
9. Based on the model-building exercise, build the most optimum model(s) on the complete
data and predict 12 months into the future with appropriate confidence intervals/bands.
Building the most optimum model on the Full Data.
'Plot of Exponential Smoothing Acutal Values for Rose Wine Productions'
fullmodel = ExponentialSmoothing(df,trend='additive',
seasonal='multiplicative').fit(smoothing_level=0.3, smoothing_slope=0.4,
smoothing_seasonal=0.3)
RMSE: 20.672560612957582
# Getting the predictions for the same number of times stamps that are present
in the test data prediction = fullmodel.forecast(steps=len(test))
prediction = fullmodel.forecast(steps=len(test))
#In the below code, we have calculated the upper and lower confidence bands at
95% confidence level
#The percentile function under numpy lets us calculate these and adding and
subtracting from the predictions
51 Time Series Forecasting Project Report by Charit Sharma
#gives us the necessary confidence bands for the predictions
# plot the forecast along with the confidence band