100% found this document useful (4 votes)

600 views52 pages

Business Report TSF - Rose DataSet

The document is a business analytics report submitted by Mr. Charit Sharma to concerned faculty at Great Learning regarding a time series forecasting project on wine sales data. The report details the analysis conducted including reading and plotting the data, exploratory data analysis, splitting the data into training and test sets, building various forecasting models like exponential smoothing, regression, naive, simple average, and moving average on the training data and evaluating their performance on the test data using RMSE. Models are built to forecast 12 months into the future and findings are reported with recommendations for the company.

Uploaded by

Charit Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (4 votes)

600 views52 pages

Business Report TSF - Rose DataSet

Uploaded by

Charit Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 52

Business Analytics Report

Submitted to:
Concerned
faculty At
Great learning
The University of Texas & Austin

Submitted By:

Mr. Charit Sharma

PGPDSBA Online July 2021
Subject: - Time Series Forecasting – ROSE DATA SET

1 Time Series Forecasting Project Report by Charit Sharma

Contents
Problem Statement
For this particular assignment, the data of different types of wine sales in the 20th century is to be
analyzed. Both of these data are from the same company but of different wines. As an analyst in the
ABC Estate Wines, you are tasked to analyze and forecast Wine Sales in the 20th century.

1. Read the data as an appropriate Time Series data and plot the data.
2. Perform appropriate Exploratory Data Analysis to understand the data and also perform
decomposition.
3. Split the data into training and test. The test data should start in 1991.
4. Build all the exponential smoothing models on the training data and evaluate the model using
RMSE on the test data. Other additional models such as regression, naïve forecast models,
simple average models, moving average models should also be built on the training data and
check the performance on the test data using RMSE.
5. Check for the stationarity of the data on which the model is being built on using appropriate
statistical tests and also mention the hypothesis for the statistical test. If the data is found to
be non-stationary, take appropriate steps to make it stationary. Check the new data for
stationarity and comment. Note: Stationarity should be checked at alpha = 0.05.
6. Build an automated version of the ARIMA/SARIMA model in which the parameters are
selected using the lowest Akaike Information Criteria (AIC) on the training data and evaluate
this model on the test data using RMSE.
7. Build ARIMA/SARIMA models based on the cut-off points of ACF and PACF on the training data
and evaluate this model on the test data using RMSE.
8. Build a table with all the models built along with their corresponding parameters and the
respective RMSE values on the test data.
9. Based on the model-building exercise, build the most optimum model(s) on the complete data
and predict 12 months into the future with appropriate confidence intervals/bands.
10. Comment on the model thus built and report your findings and suggest the measures that the
company should be taking for future sales.
1. Read the data as an appropriate Time Series data and plot the data.

Import necessary library

import numpy as np
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt
from matplotlib.pylab import rcParams
rcParams['figure.figsize'] = 13, 6

Afterwards read the dataset call them pd read command in the pandas

Then apply the head command for top 5

Rose
YearMonth

0 1980-01 112.0

1 1980-02 118.0

2 1980-03 129.0

3 1980-04 99.0

4 1980-05 116.0

Afterwards apply the Tail command to check the below 5

Then apply the Describe command to check the basic statistical details like
percentile, mean, std etc. of a data frame or a series of numeric values.

3 Time Series Forecasting Project Report by Charit Sharma

Now we will plot the graph, this shows

It shows a downward trend of production in Year 1981 it was way above then
250 and in year 1995 its marginally above 50 we can say near to 60

2. Perform appropriate Exploratory Data Analysis to understand the data and

also perform decomposition.
We will apply the Transcribe command to check the data in one table

Now we apply the isnull function to check the missing values or the null values
in the dataset
Now we will apply the shape command to check the size and dimension of the
dataframe

(187, 1)

Then we will apply the df.info to get further info of the dataframe

Now we will construct a boxplot here X label will be the Rose Wine Sale and Y
the number of years

Afterwards we will construct a boxplot to check the monthly trend

plt.xlabel('Monthly Rose Wine Sale');

plt.ylabel('Months');

5 Time Series Forecasting Project Report by Charit Sharma

Now we will construct a monthly graph by using statsmodel
It’s a monthly plot of rose wine production.
month_plot(df,ylabel='Rose Wine Production',ax=ax)
plt.grid();

Now we will construct a pivot table for monthly sales across years of Rose
Wine
Then we will construct a plot to show the monthly sales across the years.

Now we will check the Sum of the Observations of each year'

7 Time Series Forecasting Project Report by Charit Sharma

Now we will check the average mean df_yearly_mean of the Years and shall
apply the head command for first 5

Then we will check the 'Mean of the Observations of each year'

Now we will check the Quarterly mean

Now we will construct the df_quarterly_sum.plot();

df_daily_sum = df.resample('D').sum() Now we checked the monthly info

9 Time Series Forecasting Project Report by Charit Sharma

df_daily_sum.plot()

Now we checked the Daily yearly trends

df_decade_sum = df.resample('10Y').sum()

Now we checked the decade info

Now we have constructed the plot to show the trend

df_decade_sum.plot();

Now we will import ecdf from stats model

# statistics
from statsmodels.distributions.empirical_distribution import ECDF

cdf = ECDF(df['Rose'])
plt.plot(cdf.x, cdf.y, label = "statmodels");

plot x label shows the sales of rose wine

11 Time Series Forecasting Project Report by Charit Sharma

# group by date and get average RetailSales, and precent change

df['1994']
df.interpolate(methods='spline',order=3,inplace=True)

df['1994']

Now we will check the missing values in the dataframe

13 Time Series Forecasting Project Report by Charit Sharma

Now from stats model we will import seasonal decompose values

decomposition = seasonal_decompose(df['Rose'],model='additive')

This shows trends, seasonal and resid graphs.

trend = decomposition.trend
seasonality = decomposition.seasonal
residual = decomposition.resid
deaseasonalized_ts = trend + residual

decomposition = seasonal_decompose(df['Rose'],model='multiplicative')

15 Time Series Forecasting Project Report by Charit Sharma

trend = decomposition.trend
seasonality = decomposition.seasonal
residual = decomposition.resid

deaseasonalized_ts = trend + residual

"Original Time Series", "Time Series without Seasonality Component"

3. Split the data into training and test. The test data should start in 1991.

train=df[df.index.year< 1991]
test=df[df.index.year>=1991]

Then we will apply the shape command

(132, 1)

print(test.shape)
(55, 1)

We will then print first few rows of training data and last few rows of training
data.

17 Time Series Forecasting Project Report by Charit Sharma

Now we will do the same for Test Data first few and last few rows of it

Now we will plot the test and train data

train['Rose'].plot(figsize=(13,5), fontsize=14)
test['Rose'].plot(figsize=(13,5), fontsize=14)

4. Build various exponential smoothing models on the training data and evaluate the model
using RMSE on the test data. Other models such as regression,naïve forecast models and
simple average models. should also be built on the training data and check the
performance on the test data using RMSE.

train_time = [i+1 for i in range(len(train))]

test_time = [i+43 for i in range(len(test))]

Model 1: Linear Regression

LinearRegression_train = train.copy()
LinearRegression_test = test.copy()

Now we will import the sklearn linear model

lr = LinearRegression()

'Regression On Time_Test Data'

19 Time Series Forecasting Project Report by Charit Sharma

from sklearn import metrics

## Test Data – RMSE

For RegressionOnTime forecast on the Test Data, RMSE is 51.433

resultsDf = pd.DataFrame({'Test RMSE':

[rmse_model1_test]},index=['RegressionOnTime'])
resultsDf

Test RMSERegressionOnTime51.433312

Model 2 : Naive Approach

NaiveModel_train = train.copy()
NaiveModel_test = test.copy()

NaiveModel_test['naive'] =
np.asarray(train['Rose'])[len(np.asarray(train['Rose']))-1]
NaiveModel_test['naive'].head()

'Naive Forecast on Test Data'

## Test Data – RMSE

For Naive forecast on the Test Data, RMSE is 79.719

resultsDf_2 = pd.DataFrame({'Test RMSE':

[rmse_model2_test]},index=['NaiveModel'])

Test RMSERegressionOnTime51.433312NaiveModel79.718773

Model 3: Simple Average

SimpleAverage_train = train.copy()
SimpleAverage_test = test.copy()

'Simple Average on Test Data'

21 Time Series Forecasting Project Report by Charit Sharma

## Test Data – RMSE

For Simple Average forecast on the Test Data, RMSE is 53.461

resultsDf_3 = pd.DataFrame({'Test RMSE':

[rmse_model3_test]},index=['SimpleAverageModel'])

Model 4: Moving Average

Average means
# Plotting on the whole data

#Creating train and test set

# Test Data - RMSE --> 2 point Trailing MA

For 2 point Moving Average Model forecast on the Training Data, RMSE is 1
1.529
For 4 point Moving Average Model forecast on the Training Data, RMSE is 1
4.451
For 6 point Moving Average Model forecast on the Training Data, RMSE is 1
4.566
For 9 point Moving Average Model forecast on the Training Data, RMSE is 1
4.728

23 Time Series Forecasting Project Report by Charit Sharma

Model 5: Simple Exponential Smoothing

Exponential smoothing is generally used to make short term forecasts, but longer-term forecasts
using this technique can be quite unreliable.

from statsmodels.tsa.api import ExponentialSmoothing, SimpleExpSmoothing,

Holt
import warnings
warnings.filterwarnings("ignore")

SES_train = train.copy()
SES_test = test.copy()

## Plotting on both the Training and Test data

'Alpha =0.098 Predictions for Rose Wine'

## Test Data
rmse_model5_test_1 =
metrics.mean_squared_error(SES_test['Rose'],SES_test['predict'],squared=Fals
e)

For Alpha =0.098 Simple Exponential Smoothing Model forecast on the Test
Data, RMSE is 36.796

resultsDf_5 = pd.DataFrame({'Test RMSE':

[rmse_model5_test_1]},index=['Alpha=0.098,SimpleExponentialSmoothing'])

## First we will define an empty dataframe to store our values from the loop

25 Time Series Forecasting Project Report by Charit Sharma

## Plotting on both the Training and Test data

"Alpha= 0.3 Simple Exponential Smoothing Rose Wine

'Alpha=0.3,SimpleExponentialSmoothing','Alpha=0.4,SimpleExponentialSmoothi
ng'

Model 6: Double Exponential Smoothing - Holt's Model

Holt's Double Parameter Exponential Smoothing. This method is an extension of Brown's

method. In the Holt model a growth factor is added to the smoothing equation.

DES_train = train.copy()
DES_test = test.copy()
## First we will define an empty dataframe to store our values from the loop

Alpha ValuesBeta ValuesTrain RMSETest RMSE

for i in np.arange(0.3,1.1,0.1):
for j in np.arange(0.1,1.1,0.1):
model_DES_alpha_i_j =
model_DES.fit(smoothing_level=i,smoothing_slope=j,optimized=False,use_brut
e=True)
DES_train['predict',i,j] = model_DES_alpha_i_j.fittedvalues
DES_test['predict',i,j] = model_DES_alpha_i_j.forecast(steps=len(test))

rmse_model6_train =
metrics.mean_squared_error(DES_train['Rose'],DES_train['predict',i,j],squared=
False)
rmse_model6_test =
metrics.mean_squared_error(DES_test['Rose'],DES_test['predict',i,j],squared=Fa
lse)
resultsDf_7 = resultsDf_7.append({'Alpha Values':i,'Beta Values':j,'Train
RMSE':rmse_model6_train,'Test RMSE':rmse_model6_test}, ignore_index=True)

resultsDf_7.sort_values(by=['Test RMSE']).head()

27 Time Series Forecasting Project Report by Charit Sharma

## Plotting on both the Training and Test data
'Alpha=0.3,Beta=0.3,DoubleExponentialSmoothing predictions on Test Set

resultsDf_7_1 = pd.DataFrame({'Test RMSE': [resultsDf_7['Test RMSE'][0]]}

,index=['Alpha=0.3,Beta=0.1,DoubleExponentialSmoothing'])

Model 7: Triple Exponential Smoothing - Holt Winter's Model

29 Time Series Forecasting Project Report by Charit Sharma
TripleExponentialSmoothing predictions on Test Set

31 Time Series Forecasting Project Report by Charit Sharma

5. 5 Check for the stationarity of the data on which the model is being built on using
appropriate statistical tests and also mention the hypothesis for the statistical test. If the
data is found to be nonstationary, take appropriate steps to make it stationary. Check the
new data for stationarity and comment. Note: Stationarity should be checked at alpha =
0.05.

## Test for stationarity of the series - Dicky Fuller test

6. Build an automated version of the ARIMA/SARIMA model in which the parameters are
selected using the lowest Akaike Information Criteria (AIC) on the training data and
evaluate this model on the test data using RMSE.

33 Time Series Forecasting Project Report by Charit Sharma

'Rose Wine Differenced Data Partial Autocorrelation')
35 Time Series Forecasting Project Report by Charit Sharma
## Sort the above AIC values in the ascending order to get the parameters for
the minimum AIC value

resultsDf_9 = pd.DataFrame({'Test RMSE': [rmse]}

,index=['ARIMA(0,1,2)'])
resultsDf = pd.concat([resultsDf, resultsDf_9])
resultsDf

37 Time Series Forecasting Project Report by Charit Sharma

Now we will create a loop for pdq values

import statsmodels.api as sm
for param in pdq:
for param_seasonal in model_pdq:
SARIMA_model = sm.tsa.statespace.SARIMAX(train['Rose'].values,
order=param,
seasonal_order=param_seasonal,
enforce_stationarity=False,
enforce_invertibility=False)

39 Time Series Forecasting Project Report by Charit Sharma

41 Time Series Forecasting Project Report by Charit Sharma
7. Build ARIMA/SARIMA models based on the cut-off points of ACF and PACF
on the training data and evaluate this model on the test
data using RMSE.
Manual ARIMA

43 Time Series Forecasting Project Report by Charit Sharma

predicted_manual_ARIMA = results_manual_ARIMA.forecast(steps=len(test))

15.732718754522914

MANUAL SARIMA
Df.plot is constructed to check the trend

45 Time Series Forecasting Project Report by Charit Sharma

47 Time Series Forecasting Project Report by Charit Sharma
8. Build a table (create a data frame) with all the models built along with their corresponding
parameters and the respective RMSE values on the test data.

temp_resultsDf = pd.DataFrame({'Test RMSE': [rmse]},

index=['SARIMA(1,1,2)(2,0,2,6)'])
('Sorted by RMSE values on the Test Data for Rose Wine sale:','\n',)

49 Time Series Forecasting Project Report by Charit Sharma

9. Based on the model-building exercise, build the most optimum model(s) on the complete
data and predict 12 months into the future with appropriate confidence intervals/bands.

Building the most optimum model on the Full Data.

'Plot of Exponential Smoothing Acutal Values for Rose Wine Productions'

fullmodel = ExponentialSmoothing(df,trend='additive',
seasonal='multiplicative').fit(smoothing_level=0.3, smoothing_slope=0.4,
smoothing_seasonal=0.3)

RMSE: 20.672560612957582

# Getting the predictions for the same number of times stamps that are present
in the test data prediction = fullmodel.forecast(steps=len(test))
prediction = fullmodel.forecast(steps=len(test))

#In the below code, we have calculated the upper and lower confidence bands at
95% confidence level
#The percentile function under numpy lets us calculate these and adding and
subtracting from the predictions

51 Time Series Forecasting Project Report by Charit Sharma

#gives us the necessary confidence bands for the predictions

# plot the forecast along with the confidence band

Project Time Series Forecasting
100% (1)
Project Time Series Forecasting
53 pages
Time Series Project
50% (4)
Time Series Project
2 pages
Time Series Rose Shehroz Arfeen
100% (1)
Time Series Rose Shehroz Arfeen
42 pages
Suresh-Rose Time Series Forecasting Project Report
100% (1)
Suresh-Rose Time Series Forecasting Project Report
75 pages
Shivani Pandey TSF
100% (1)
Shivani Pandey TSF
32 pages
Data Insights for Auto Parts Firm
100% (3)
Data Insights for Auto Parts Firm
46 pages
Lifi
100% (1)
Lifi
16 pages
Predictive Modelling Project 1 PDF
50% (2)
Predictive Modelling Project 1 PDF
38 pages
FRA Assignment - India Credit Model
No ratings yet
FRA Assignment - India Credit Model
14 pages
Auto Parts Customer Insights
100% (2)
Auto Parts Customer Insights
41 pages
Anamit Deb Gupta Mra - Project Milestone - 1
100% (1)
Anamit Deb Gupta Mra - Project Milestone - 1
30 pages
MRA Project Milestone 1 PDF
No ratings yet
MRA Project Milestone 1 PDF
1 page
Problem Statement1
No ratings yet
Problem Statement1
1 page
MRA Project ML 1: Abhishek Kapoor Dsba Aug A20
100% (1)
MRA Project ML 1: Abhishek Kapoor Dsba Aug A20
47 pages
Business Report Machine Learning-1
100% (7)
Business Report Machine Learning-1
60 pages
Executive Sumary - Rajarshi Das (Data Visualization Using Tableau Project)
100% (1)
Executive Sumary - Rajarshi Das (Data Visualization Using Tableau Project)
11 pages
MRA Project Milestone2 PDF
100% (1)
MRA Project Milestone2 PDF
1 page
Financial Risk Analysis Project Report Financial Risk Analysis Project Report
100% (2)
Financial Risk Analysis Project Report Financial Risk Analysis Project Report
29 pages
TSF - Graded Quiz 4 - Great Lakes Institute
No ratings yet
TSF - Graded Quiz 4 - Great Lakes Institute
5 pages
DM Gopala Satish Kumar Business Report G8 DSBA
100% (2)
DM Gopala Satish Kumar Business Report G8 DSBA
26 pages
DVT Group Assignment PDF
100% (1)
DVT Group Assignment PDF
14 pages
Project-Predictive Modeling-Rajendra M Bhat
100% (3)
Project-Predictive Modeling-Rajendra M Bhat
14 pages
Project Report
100% (3)
Project Report
36 pages
SMDM Business-Report Arvind Soni-2
0% (1)
SMDM Business-Report Arvind Soni-2
15 pages
Mra Project: Prepared By: Deepak Batabyal Date:-09 Feb 2020
100% (2)
Mra Project: Prepared By: Deepak Batabyal Date:-09 Feb 2020
32 pages
Milestone 1
No ratings yet
Milestone 1
2 pages
MRA Project As On 23rd Feb-2020
93% (14)
MRA Project As On 23rd Feb-2020
29 pages
Predictive Modeling for Analysts
100% (1)
Predictive Modeling for Analysts
28 pages
Business Report: Predictive Modelling
100% (2)
Business Report: Predictive Modelling
37 pages
Report - Project8 - FRA - Surabhi - Report
100% (2)
Report - Project8 - FRA - Surabhi - Report
15 pages
Credit Risk Model Analysis
100% (1)
Credit Risk Model Analysis
31 pages
Facebook Comment Prediction Guide
100% (1)
Facebook Comment Prediction Guide
12 pages
Manali Andyal 26 05 2025 FRA Part A Guided Project Report PDF
100% (1)
Manali Andyal 26 05 2025 FRA Part A Guided Project Report PDF
19 pages
FRA Extended
No ratings yet
FRA Extended
22 pages
Capstone Project
100% (1)
Capstone Project
7 pages
Report - Project8 - FRA - Surabhi - Report
0% (1)
Report - Project8 - FRA - Surabhi - Report
15 pages
Auto Parts Customer Segmentation Analysis
60% (5)
Auto Parts Customer Segmentation Analysis
20 pages
MRA Project Milestone 2
100% (2)
MRA Project Milestone 2
31 pages
MRA Project 2: Sudesh Yadav
100% (2)
MRA Project 2: Sudesh Yadav
23 pages
Predictive Model: Submitted by
100% (3)
Predictive Model: Submitted by
27 pages
Advance Statistics-Project Report
50% (2)
Advance Statistics-Project Report
17 pages
Time Series Forecasting - SoftDrink - Business Report
75% (4)
Time Series Forecasting - SoftDrink - Business Report
37 pages
Social Media Tourism: Model Analysis
No ratings yet
Social Media Tourism: Model Analysis
39 pages
Data Analysis for Python Users
100% (1)
Data Analysis for Python Users
14 pages
Election Prediction Model Analysis
100% (2)
Election Prediction Model Analysis
46 pages
Project +Sweta+Kumari+ +FRA+Milestone+1+ July+ 2021
100% (2)
Project +Sweta+Kumari+ +FRA+Milestone+1+ July+ 2021
31 pages
Business Report Problem 2
No ratings yet
Business Report Problem 2
10 pages
Insurance Insights with Tableau
No ratings yet
Insurance Insights with Tableau
4 pages
FRA Business Report
100% (1)
FRA Business Report
21 pages
India Credit Risk Default Model - Nivedita Dey - PGP BABI May19 - 2
100% (4)
India Credit Risk Default Model - Nivedita Dey - PGP BABI May19 - 2
19 pages
SMDM Extended Project Report
No ratings yet
SMDM Extended Project Report
9 pages
FRA Report
100% (1)
FRA Report
30 pages
CREDIT RISK and MARKETRISK MILESTONE2
100% (2)
CREDIT RISK and MARKETRISK MILESTONE2
34 pages
FRA Project Business Report
100% (2)
FRA Project Business Report
27 pages
ML Ts Proj
100% (9)
ML Ts Proj
58 pages
Data Mining Graded Assignment: Problem 1: Clustering Analysis
100% (3)
Data Mining Graded Assignment: Problem 1: Clustering Analysis
39 pages
PM - ExtendedProject - Business Report
100% (5)
PM - ExtendedProject - Business Report
35 pages
Time Series Forecasting - Rose - Buisness Report
100% (1)
Time Series Forecasting - Rose - Buisness Report
69 pages
Rose Wine Sales Forecasting Guide
No ratings yet
Rose Wine Sales Forecasting Guide
52 pages
Timeseries Forecasting Assignment - Rose
No ratings yet
Timeseries Forecasting Assignment - Rose
1,329 pages
DVT Alternate Project
50% (2)
DVT Alternate Project
1 page
Wine Sales Forecasting Report
No ratings yet
Wine Sales Forecasting Report
26 pages
Customer Life Time Value Calculator
No ratings yet
Customer Life Time Value Calculator
1 page
Payermax: Cross Border Payment Solution
100% (1)
Payermax: Cross Border Payment Solution
14 pages