0% found this document useful (0 votes)
170 views70 pages

P L Lohitha 19-04-23 TSF Business Report

This business report analyzes time series data on shoe and soft drink sales to build forecasting models. It includes an executive summary, introduction to the data, exploratory data analysis including decomposition, splitting the data into training and test sets, building and evaluating multiple forecasting models on the training data including regression, naive, moving average, and exponential smoothing models, checking for and ensuring stationarity, selecting the best model using AIC or ACF/PACF, making predictions for 12 months into the future, and providing conclusions and recommendations. The report also contains two problems/case studies following the same structure and steps to analyze additional time series sales data and select the most accurate forecasting models.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
170 views70 pages

P L Lohitha 19-04-23 TSF Business Report

This business report analyzes time series data on shoe and soft drink sales to build forecasting models. It includes an executive summary, introduction to the data, exploratory data analysis including decomposition, splitting the data into training and test sets, building and evaluating multiple forecasting models on the training data including regression, naive, moving average, and exponential smoothing models, checking for and ensuring stationarity, selecting the best model using AIC or ACF/PACF, making predictions for 12 months into the future, and providing conclusions and recommendations. The report also contains two problems/case studies following the same structure and steps to analyze additional time series sales data and select the most accurate forecasting models.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 70

Business Report

Project - Time Series Forecasting


Shoe sales
Soft drink sales

P L Lohitha

Date: 19/04/2023

1
Table of Contents

Problem 1

Table of Contents.......................................................................................................................2
1. Executive Summary............................................................................................................4
2. Introduction.........................................................................................................................4
3. Data Details.........................................................................................................................4
Q1 Read the data as an appropriate Time Series data and plot the data....................................5
1.1 Reading the Data.........................................................................................................5
1.2 Plotting the Data..........................................................................................................5
2. Perform appropriate Exploratory Data Analysis to understand the data and also perform
decomposition............................................................................................................................6
2.1 EDA..................................................................................................................................6
Null Value Check...................................................................................................................6
Duplicate Value Check.......................................................................................................7
Data Description.................................................................................................................7
Yearly Box Plots.................................................................................................................7
Monthly Box Plots..............................................................................................................8
Monthly Sales across Years................................................................................................9
2.2 Decomposition...........................................................................................................10
3. Split the data into training and test. The test data should start in 1991............................11
4. Build various exponential smoothing models on the training data and evaluate the model
using RMSE on the test data. Other models such as Regression, Naïve forecast models and
simple average models should also be built on the training data and check the performance on
the test data using RMSE.........................................................................................................13
4.1 Linear Regression......................................................................................................13
4.2 Naïve Model..............................................................................................................14
4.3 Simple Average Model..............................................................................................15
4.4 Moving Average Model............................................................................................17
4.5 Simple Exponential Smoothing (SES)......................................................................20
4.6 Double Exponential Smoothing (DES).....................................................................21
4.7 Triple Exponential Smoothing (TES)........................................................................23
4.8 Summary of all Models.............................................................................................25
5. Check for the stationarity of the data on which the model is being built on using
appropriate statistical tests and also mention the hypothesis for the statistical test. If the data is
found to be non-stationary, take appropriate steps to make it stationary. Check the new data for
stationarity and comment. Note: Stationarity should be checked at alpha = 0.05...................26
6. Build an automated version of the ARIMA/SARIMA model in which the parameters are
selected using the lowest Akaike Information Criteria (AIC) on the training data and evaluate
this model on the test data using RMSE..................................................................................28
2
6.1 ARIMA Model..........................................................................................................28
6.2 SARIMA Model........................................................................................................30
7. Build ARIMA/SARIMA models based on the cut-off points of ACF and PACF on the
training data and evaluate this model on the test data using RMSE........................................32
7.1 ACF and PACF plots.................................................................................................32
8. Build a table with all the models built along with their corresponding parameters and the
respective RMSE values on the test data.................................................................................36
9. Based on the model-building exercise, build the most optimum model(s) on the complete
data and predict 12 months into the future with appropriate confidence intervals/bands........37
10. Based Comment on the model thus built and report your findings and suggest the
measures that the company should be taking for future sales..................................................38

Problem 2

1. Executive Summary............................................................................................................36
2. Introduction.........................................................................................................................36
3. Data Details.........................................................................................................................36
Q1 Read the data as an appropriate Time Series data and plot the data......................................37
1.1 Reading the Data.........................................................................................................37
1.2 Plotting the Data..........................................................................................................37
2. Perform appropriate Exploratory Data Analysis to understand the data and also perform
decomposition
39
2.1 EDA..................................................................................................................................39
Null Value Check...................................................................................................................39
Duplicate Value Check.........................................................................................................40
Data Description...................................................................................................................40
Yearly Box Plots...................................................................................................................40
Monthly Box Plots................................................................................................................41
Monthly Sales across Years..................................................................................................42
2.2 Decomposition...........................................................................................................44
3. Split the data into training and test. The test data should start in 1991............................45
4. Build various exponential smoothing models on the training data and evaluate the model
using RMSE on the test data. Other models such as Regression, Naïve forecast models and
simple average models should also be built on the training data and check the performance on
the test data using RMSE.........................................................................................................46
4.1 Linear Regression......................................................................................................47
4.2 Naïve Model..............................................................................................................48
4.3 Simple Average Model..............................................................................................49
4.4 Moving Average Model............................................................................................50

3
4.5 Simple Exponential Smoothing (SES)......................................................................53
4.6 Double Exponential Smoothing (DES).....................................................................54
4.7 Triple Exponential Smoothing (TES)........................................................................56
4.8 Summary of all Models.............................................................................................58
5. Check for the stationarity of the data on which the model is being built on using
appropriate statistical tests and also mention the hypothesis for the statistical test. If the data
is found to be non-stationary, take appropriate steps to make it stationary. Check the new data
for stationarity and comment. Note: Stationarity should be checked at alpha = 0.05..............59
6. Build an automated version of the ARIMA/SARIMA model in which the parameters are
selected using the lowest Akaike Information Criteria (AIC) on the training data and evaluate
this model on the test data using RMSE..................................................................................61
6.1 ARIMA Model..........................................................................................................61
6.2 SARIMA Model........................................................................................................63
7. Build ARIMA/SARIMA models based on the cut-off points of ACF and PACF on the
training data and evaluate this model on the test data using RMSE........................................65
7.1 ACF and PACF plots.................................................................................................65
8. Build a table with all the models built along with their corresponding parameters and the
respective RMSE values on the test data.................................................................................68
9. Based on the model-building exercise, build the most optimum model(s) on the complete
data and predict 12 months into the future with appropriate confidence intervals/bands........69
10. Based Comment on the model thus built and report your findings and suggest the measures
that the company should be taking for future sales..................................................................70

4
1. Executive Summary

You are an analyst in the IJK shoe company and you are expected to forecast the sales of the

pairs of shoes for the upcoming 12 months from where the data ends. The data for the pair of

shoe sales have been given to you from January 1980 to July 1995.

2. Introduction

The aim for this venture is to perform estimating investigation on the Shoes deals dataset. I will

attempt to break down this dataset by utilizing Direct Relapse, Credulous Model, Basic and

Moving Normal models, Straightforward, Twofold and Triple Dramatic Smoothing. The

informational collection contains 187entries, and I will attempt to fabricate the most ideal

model(s) on the total information and foresee a year into the future with fitting certainty

stretches/groups.

3. Data Details

Data set contains two columns, where the first column shows the month and year of the

corresponding Sales Quantity recorded in the second column.

YearMonth Shoe_Sales
1980-01 85
1980-02 89
1980-03 109
1980-04 95
1980-05 91
1980-06 95
1980-07 96

Table 1: Shoes Sales Dataset Details

Q1 Read the data as an appropriate Time Series data and plot the data

1.1 Reading the Data

I have imported the information series and as we can notice, section has a YearMonth esteem

with it, which isn't exactly a data of interest, yet a list for the deals passage. So truly the datasets

have a solitary section that contains the amount of shoes sold in that specific month. Here,
5
while perusing the datasets I have given the contention in a manner so it parses the main section

which is date segment, and shows to the framework that this is a one section series through

crush.

Figure 1: Reading Shoes sales Dataset

It tends to be noticed the dataset has information beginning from January 1980 going till July

1995, so there are 187 passages in entirety in each dataset.

1.2 Plotting the Data

Since I have transferred the dataset without any contentions (and subsequently transferred

the datasets without parsing the dates here), I should give a period stamp esteem without

help from anyone else. Notwithstanding that I have taken out the YearMonth variable and

added a period stamp to the dataset myself.

I have plotted both the time series beneath..

Figure 2: Shoe Sales Time Series Plot

6
As we can see from the above plot, the deals for Shoes were in vertical pattern till 1988 and

descending pattern 1988 onwards. There is a sure irregularity component that is noticeable in

the diagram. We will investigate the pattern and irregularity further during deterioration, where

we will actually want to see a much itemized report on these two variables.

2. Perform appropriate Exploratory Data Analysis to understand the data

and also perform decomposition

2.1 EDA

Null Value Check

Performing a Null value check on the time series, I got:

Figure 3: Null Value Check

Duplicate Value Check

There are no copy sections in the dataset as each worth relates to an alternate time file, so

fundamentally these are marketing projections for various months.

Data Description

Figure 4: Shoes sales Time Series Data Description


As we can see from the abovementioned, the shoes deals time series information appear as

though they are slanted. There is Elevated expectation Deviation for the time series since

7
the Min and Max have massive contrast between them. Besides, there is contrast between

the mean and the middle for a similar explanation of skewness. As referenced before, there

are altogether 187 records in the dataset.

Yearly Box Plots

Following is the yearly box plot for the Shoes sales time-series:

Figure 5: Yearly Box Plots


As we can see from the above plot, Shoes has up pattern till 1987 and a descending deals

pattern post 1988. The most elevated deals for shoes can be seen in 1987 and the least deals

in 1980. The most noteworthy variety in month to month deals for shoes is by all accounts

in the year 1985 and on the year 1984 there is by all accounts the least variety in month to

month deals.

There are exceptions in the yearly deals information, but as it is a Period Series; we can

overlook the anomaly information.

Monthly Box Plots

Following is the monthly box plot for the shoe sales time-series:

8
Figure 6: Monthly Box Plots

As we can see from the Month to month Box Plots, we can plainly see that there is an

irregularity component noticeable in time series dataset. As can be plainly seen that the

deals have a rising deals pattern in the last quarter of the year. The deals for shoes appears

to get from July month and is pretty much predictable till June, notices a few dormancy in

September month and afterward begins to get again from October (for example last

quarter). Month to month deals information shows skewness absent a lot of special case.

Monthly Sales across Years

The monthly sales across years can be seen in the following Pivot Table and the associated

graph:

9
Figure 7: Monthly Sales across Years

As can be seen from the above set of table and diagram, the long periods of December is by

all accounts the month that drives the most elevated marketing projections. The second

most elevated deals being in November. We can notice an irregularity component in the

chart above.

10
2.2 Decomposition

I have provided the decomposed elements for the Time Series below:

Figure 8: Additive Decomposition

Figure 9: Multiplicative Decomposition


We can see the disintegration of the time series above. I have attempted with both added

substance and multiplicative decay for time series so I can decide whether the shoes dataset is a

multiplicative or added substance series.

As we can see from the abovementioned, we can say that the time series is plainly

multiplicative in nature and has an occasional part.

11
The plots above obviously demonstrate that the deals are unsteady and not uniform, and they

have an evident irregularity pattern.

3. Split the data into training and test. The test data should start in 1991.

I have split the time series datasets into Train and Test datasets below. It is given the

question that the Test Data should start in 1991.

Figure 10: Training and Test Datasets for Shoes Time Series
I have likewise affirmed that the Train dataset without a doubt finishes in 1990, and the

Test dataset to be sure beginnings in 1991 by utilizing the Head and Tail capabilities on the

Preparation and Test dataset. As we can notice, the size of the Train information outline is

132 perceptions and that of the Test information outline is 55 perceptions.

I have also plotted the Train and test data frames for both time series datasets below:

12
Figure 11: Plot for Training and Test data frames
We can notice the preparation and test information in the above plot, the blue piece of the

plots portrays the Train datasets (January '80 - December '90), and the Orange piece of the

plots portray the test datasets (January '91 - July '95).

4. Build various exponential smoothing models on the training data and

evaluate the model using RMSE on the test data. Other models such as

Regression, Naïve forecast models and simple average models should also

be built on the training data and check the performance on the test data

using RMSE

In this section I will try to run the various available models on time series data set. Let’s

kick off the analysis with Linear Regression model.

4.1 Linear Regression

The extracts of Training and Test time stamps for the Linear Regression can be seen below:

Figure 12: Training and Test data for Linear Regression


Following is the results from a Linear Regression model on the dataset:

13
Figure 13: Linear Regression Outcome

The Relapse plots above portray the relapse on preparing set as the Red line and that on the

test set as the green line. As we can see from the above plot and metric, shoes deals show

up pattern on preparing informational collection and descending pattern on test

informational collection.

For Relapse on Time conjecture on the Test Information,

RMSE = 266.276 | MAPE = 110.88

The summarized performance of the model run on the dataset can be seen below:

Figure 14: Performance of the Linear Regression Model

4.2 Naïve Model

The extracts of Training and Test data for the Naïve Model can be seen below:

14
Figure 15: Training and Test data for Naive Model
Following is the result from running a Naïve Model:

Figure 16: Naive Model Outcome


For Naive model on Time forecast on the Test Data,

RMSE = 2450.121 | MAPE = 101.47

Figure 17: Performance of the two Models


As can be seen from the Naïve model performance above, the Naïve model is not suitable

for the shoe dataset since the forecasts depends on the previous last observation.

4.3 Simple Average Model

The extracts of Training and Test data for the Simple Average Model can be seen below:

15
Figure 18: Training and Test data for Simple Average Model
Following are the results from running a Simple Average Model:

Figure 19: Simple Average Model Outcome


For Simple Average Model,

RMSE = 63.985| MAPE = 21.86

The summarized performance of the models run dataset can be seen below:

Figure 20: Performance of the three Models


As can be seen from the Simple Average model performance above, the Simple Average

model has the best performance among all the three models run till now for.

16
4.4 Moving Average Model

The Moving Average data for the dataset can be seen below:

Figure 21: Moving Average Model Data


Following is the result from running a Moving Average Model dataset:

17
Figure 22: Moving Average Model Outcome
For 2 point Moving Average Model forecast on the Testing Data, RMSE = 45.948 | MAPE

= 14.32

For 4 point Moving Average Model forecast on the Testing Data, RMSE = 57.872 | MAPE

= 19.48

For 6 point Moving Average Model forecast on the Testing Data, RMSE = 63.456 | MAPE

= 22.38

For 9 point Moving Average Model forecast on the Testing Data, RMSE = 67.723 | MAPE

= 23.33

The summarized performance of the models run on the wine datasets can be seen below:

18
Figure 23: Summarized Performance of the Models

I have applied 2, 4, 6 and 9-point trailing averages on the dataset.

As we can see from the above plots, all of the following typical plots show expectation

values beneath the genuine train and test informational indexes, and the 9 point following

normal plot shows the most reduced forecast of the relative multitude of plots. The nearest

expectation to genuine information is shown by the 2 point following moving normal

model. This perception is verified by the RMSE scores for every one of these moving

typical models.

As should be visible from the summed up presentation of the relative multitude of models,

the 2 point moving normal has shown the best execution of the relative multitude of models

run on dataset.

4.5 Simple Exponential Smoothing (SES)

The SES Parameters for dataset can be seen below:

Figure 24: SES Parameters

19
Following is the result from running a SES Model on the dataset:

Figure 25: Simple Exponential Smoothing Outcome

For Alpha = 0.605 Simple Exponential Smoothening Model forecast on the Test data,

RMSE = 196.405 | MAPE = 79.92

The summarized performance of the models run on the wine datasets can be seen below:

Figure 26: Summarized Performance of the Models

As we as a whole realize that SES model ought to be utilized on information which has no

component of pattern or irregularity, I actually applied it on the informational index to see what

the presentation of the model is for this situation.

I utilized Alpha = 0.605 for the SES model and true to form, it didn't perform well when

contrasted with recently run models.


20
4.6 Double Exponential Smoothing (DES)

The SES Parameters for dataset can be seen below:

Figure 27: DES Parameters


Following is the result from running a DES Model on dataset:

Figure 28: Double Exponential Smoothing Outcome

For Alpha =0.1, Beta = 0.1 Double Exponential Smoothening Model forecast on the Test

data, RMSE = 76.91

The summarized performance of the models run on the wine datasets can be seen below:

21
Figure 29: Summarized Performance of the Models

As we as a whole realize that DES model ought to be utilized on information which has no

irregularity except for has levels and patterns, I utilized the lattice search to start and we

arrived at resolution that Alpha = 0.1 and Beta = 0.1 show the least RMSE and MAPE. .

The DES model is the model with the great exhibition up to this point.

4.7 Triple Exponential Smoothing (TES)

The TES Parameters for the shoes sales data set can be seen below:

Figure 3: TES Parameters for the shoes sales data set


The TES train and test data dataset can be seen below:

22
Figure 4: TES Model Train and Test data
Following is the result from running a TES Model on the dataset:

Figure 5: Triple Exponential Smoothing Outcome

For Alpha=0.606, Beta=0, Gamma=0.262, Triple Exponential Smoothing Model forecast

on the Test, RMSE = 133.703

The summarized performance of the models run on the wine datasets can be seen below:

23
Figure 33: Summarized Performance of the Models

4.8 Summary of all Models

Now that we have run all the models planned, let’s view the summary of the performance

of the dataset:

Figure 34: Sorted Model Performance Summary

As we can observe that for the dataset, the 2 point trailing moving average gives the best

RMSE and MAPE among all the models.

24
5. Check for the stationarity of the data on which the model is being built on

using appropriate statistical tests and also mention the hypothesis for the

statistical test. If the data is found to be non-stationary, take appropriate

steps to make it stationary. Check the new data for stationarity and

comment. Note: Stationarity should be checked at alpha = 0.05

I have played out the Stationarity Test on information outline. I have utilized an expanded

Dickey-Fuller test on the shoes informational collection to really look at the stationarity.

The Speculation is that the shoes information is fixed, Alpha = 0.05

25
Figure 35: Stationarity

As we can see from the abovementioned, we want to dismiss the Speculation since the p

esteem is by all accounts more noteworthy than alpha, subsequently we should stationaries

the information. That is, the information properties don't rely upon when the information

series is noticed. This is essentially a touch of an irregularity/pattern component in the

dataset. In the wake of taking the distinction of in the middle of between constant

perceptions to stationaries the information, we can see that the p-esteem seemed, by all

accounts, to be under 0.05.

26
6. Build an automated version of the ARIMA/SARIMA model in which the

parameters are selected using the lowest Akaike Information Criteria

(AIC) on the training data and evaluate this model on the test data using

RMSE.

6.1 ARIMA Model

27
Figure 36: Running Automated ARIMA Model
Following are the Results of ARIMA model in Rose wine dataset:

Figure 37: Results of Automated ARIMA Model

As we can see from the abovementioned, the least AIC recorded for the information is for

p,d,q upsides of (4,1,3) individually and the most reduced AIC is 1479.147 . The p worth of

coefficients MA1 and MA2 are 0 and 0.013 which implies that these are really huge. The

RMSE and MAPE values are

28
RMSE: 205.555 MAPE: 83.41

6.2 SARIMA Model

Following is the outcome of SARIM Model run on data:

29
30
Figure 68:SARIMA Model

As can be noticed, the model with p,d,q, as 2,1,1 separately has the most minimal AIC,

which is 14. The p worth of ar.S.L12 and ma.S.L12 is under 0.05 which makes them pretty

critical. The RMSE and MAPE values are

RMSE: 70.723

MAPE: 24.48

7. Build ARIMA/SARIMA models based on the cut-off points of ACF and

PACF on the training data and evaluate this model on the test data using

RMSE.

7.1 ACF and PACF plots

An autocorrelation (ACF) plot addresses the autocorrelation of the series with slacks of

itself. A halfway autocorrelation (PACF) plot addresses how much connection between's a

series and a slack to itself that isn't made sense of by relationships at all lower-request

slacks. We would like every one of the spikes to fall in the blue district.

31
Figure 79:ACF and PACF result

The above shows ACF and PACF for a fixed time frame series, individually. The ACF and

PACF plots demonstrate that a Mama (1) model would be suitable for the time series in

light of the fact that the ACF cuts after 1 slack while the PACFs shows a gradually

diminishing pattern.

Following is the outcome of SARIMA Model run on data:

32
Figure 40: SARIMA model

Following is the outcome of ARIMA Model run on data:

33
Figure 41: ARIMA model

8. Build a table with all the models built along with their corresponding

parameters and the respective RMSE values on the test data.

I have sorted the models based on lowest RMSE and MAPE values on test data.

34
Figure 42: RMSE and MAPE values on test data for all the model runs

We can observe 2 point Trailing Moving average has the lowest RMSE and MAPE score

on test data and hence is the best model.

9. Based on the model-building exercise, build the most optimum model(s)

on the complete data and predict 12 months into the future with

appropriate confidence intervals/bands.

We can plot the real and the forecasted sales for the time series.

Figure 43: Forecasted sales

35
Figure 44: Lower and Upper Confidence interval bands

Figure 45: Lower and Upper Confidence interval forecasted plot

10.Based Comment on the model thus built and report your findings and

suggest the measures that the company should be taking for future sales.

• The organization ought to concoct rebate offers in the long periods of January to May as
the deals are low in these months.
• Likewise, the organization can take on a decent cost for shoes as we saw there were
numerous exceptions in the event of yearly expectation
• To increment test size
• To build the quantity of autonomous factors
• Attempt more blends of factors to check whether precision of the model can be moved
along.

Problem 2

36
1. Executive Summary

You are an analyst in the RST soft drink company and you are expected to forecast the sales

of the production of the soft drink for the upcoming 12 months from where the data ends. The

data for the production of soft drinks has been given to you from January 1980 to July 1995.

2. Introduction

The goal for this undertaking is to perform guaging investigation on the soda pop creation

dataset. I will attempt to dissect this dataset by utilizing Straight Relapse, Gullible Model,

Basic and Moving Normal models, Basic, Twofold and Triple Remarkable Smoothing. The

informational collection contains 187entries, and I will attempt to construct the most ideal

model(s) on the total information and foresee a year into the future with proper certainty

spans/groups.

3. Data Details

Data set contains two columns, where the first column shows the month and year of the

corresponding Production Quantity recorded in the second column.

YearMont SoftDrinkProducti
h on
1980-01 1954
1980-02 2302
1980-03 3054
1980-04 2414
1980-05 2226
1980-06 2725

Table 1: Soft drink Production Dataset Details

Q1 Read the data as an appropriate Time Series data and plot the data

1.1 Reading the Data


37
I have imported the information series and as we can notice, section has a YearMonth

esteem with it, which isn't exactly a data of interest, yet a list for the deals passage. So

as a general rule the datasets have a solitary segment that contains the amount of soda

pop delivered in that

specific month. Here, while perusing the datasets I have given the contention in a

manner so it parses the primary segment which is date section, and demonstrates to the

framework that this is a one section series through press.

Figure 1: Reading Shoes sales Dataset

It can be observed the dataset has data starting from January 1980 going till July 1995, so

there are 187 entries in totality in each dataset.

1.2 Plotting the Data

Since I have transferred the dataset without any contentions (and consequently

transferred the datasets without parsing the dates here), I should give a period stamp

esteem without anyone else. Notwithstanding that I have taken out the YearMonth

variable and added a period stamp to the dataset myself.

I have plotted both the time series underneath.

38
Figure 2: Soft Drink Production Time Series Plot

As we can see from the above plot, the creation for Soda pop was in vertical bearing.

There is a sure irregularity component that is noticeable in the chart. We will

investigate the pattern and irregularity further during deterioration, where we will

actually want to see a much point by point report on these two elements.

2. Perform appropriate Exploratory Data Analysis to understand the data

and also perform decomposition

2.1 EDA

Null Value Check

Performing a Null value check on the time series, I got:

Figure 3: Null Value Check

39
Duplicate Value Check

There are no duplicate entries in the dataset as each value corresponds to a different time

index, so basically these are all sales figures for different months.

Data Description

Figure 4: Shoes sales Time Series Data Description

As we can see from the abovementioned, the shoes deals time series information appear

as though they are slanted. There is Elevated requirement Deviation for the time series

since the Min and Max have tremendous contrast between them. Additionally, there is

contrast between the mean and the middle for a similar explanation of skewness. As

referenced before, there are altogether 187 records in the dataset.

Yearly Box Plots

Following is the yearly box plot for the Soft drink Production time-series:

40
Figure 5: Yearly Box Plots
As we can see from the above plot, soda creation has no pattern till 1988 and a vertical

deals pattern post 1988. The most noteworthy creation for soda can be seen in 1987 and

the least deals in 1980. The most noteworthy variety in month to month creation for soda

pop is by all accounts in the year 1993 and on the year 1995 there is by all accounts the

least variety in month to month creation.

There are exceptions in the yearly creation information, but as it is a Period Series; we

can disregard the exception information.

Monthly Box Plots

Following is the monthly box plot for the shoe sales time-series:

41
Figure 6: Monthly Box Plots

As we can see from the Month to month Box Plots, we can plainly see that there is an

irregularity component noticeable in time series dataset. As can be plainly seen that the

creation have a rising pattern in the last quarter of the year. The creation for soda appears

to get from July month and is pretty much predictable till June, notices a few dormancy

in September month and afterward begins to get again from October (for example last

quarter). Month to month deals information shows skewness absent a lot of exemption.

Monthly Sales across Years

The monthly sales across years can be seen in the following Pivot Table and the

associated graph:

42
Figure 7: Monthly Production across Years

As can be seen from the above set of table and diagram, the long periods of December is by all

accounts the month that drives the most elevated creation figures. The second most noteworthy

creation being in November. We can notice an irregularity component in the chart above.

43
2.2 Decomposition

I have provided the decomposed elements for the Time Series below:

Figure 8: Additive Decomposition

Figure 9: Multiplicative Decomposition

We can see the disintegration of the time series above. I have attempted with both added

substance and multiplicative deterioration for time series so I can decide whether the shoes

dataset is a multiplicative or added substance series.

44
As we can see from the abovementioned, we can say that the time series is obviously

multiplicative in nature and has an occasional part.

The plots above obviously show that the creation is unsteady and not uniform, and it has a clear

irregularity pattern.

3. Split the data into training and test. The test data should start in 1991.

I have split the time series datasets into Train and Test datasets below. It is given the

question that the Test Data should start in 1991.

Figure 10: Training and Test Datasets for Soft Drink Time Series

I have likewise affirmed that the Train dataset without a doubt finishes in 1990, and the

Test dataset to be sure beginnings in 1991 by utilizing the Head and Tail capabilities on

the Preparation and Test dataset. As we can notice, the size of the Train information

outline is 132 perceptions and that of the Test information outline is 55 perceptions.

I have also plotted the Train and test data frames for both time series datasets below:

45
Figure 11: Plot for Training and Test data frames

We can notice the preparation and test information in the above plot, the blue piece of

the plots portrays the Train datasets (January '80 - December '90), and the Orange piece

of the plots portray the test datasets (January '91 - July '95)..

4. Build various exponential smoothing models on the training data and

evaluate the model using RMSE on the test data. Other models such as

Regression, Naïve forecast models and simple average models should

also be built on the training data and check the performance on the

test data using RMSE

In this section I will try to run the various available models on time series data set. Let’s

kick off the analysis with Linear Regression model.

46
4.1 Linear Regression

The extracts of Training and Test time stamps for the Linear Regression can be seen

below:

Figure 12: Training and Test data for Linear Regression


Following is the results from a Linear Regression model on the dataset:

Figure 13: Linear Regression Outcome

The Relapse plots above portray the relapse on preparing set as the Red line and that on

the test set as the green line. As we can see from the above plot and metric, shoes deals

show up pattern on preparing informational index and descending pattern on test

informational collection.

For Regression on Time forecast on the Test Data,

RMSE = 775.807 | MAPE = 16.12

The summarized performance of the model run on the dataset can be seen below:

47
Figure 14: Performance of the Linear Regression Model

4.2 Naïve Model

The extracts of Training and Test data for the Naïve Model can be seen below:

Figure 15: Training and Test data for Naive Model


Following is the result from running a Naïve Model:

Figure 16: Naive Model Outcome


For Naive model on Time forecast on the Test Data,

RMSE = 1519.259 | MAPE = 37.75

48
Figure 17: Performance of the two Models
As can be seen from the Naïve model performance above, the Naïve model is not

suitable for the shoe dataset since the forecasts depends on the previous last observation.

4.3 Simple Average Model

The extracts of Training and Test data for the Simple Average Model can be seen below:

Figure 18: Training and Test data for Simple Average Model
Following are the results from running a Simple Average Model:

Figure 19: Simple Average Model Outcome

49
For Simple Average Model,

RMSE = 934.353 | MAPE = 20.12

The summarized performance of the models run dataset can be seen below:

Figure 20: Performance of the three Models


As can be seen from the Simple Average model performance above, the Regression

model has the best performance among all the three models run till now for.

4.4 Moving Average Model

The Moving Average data for the dataset can be seen below:

Figure 21: Moving Average Model Data


Following is the result from running a Moving Average Model dataset:

50
51
Figure 22: Moving Average Model Outcome
For 2 point Moving Average Model forecast on the Testing Data, RMSE = 556.725 |

MAPE = 10.67

For 4 point Moving Average Model forecast on the Testing Data, RMSE = 687.181 |

MAPE = 13.71

For 6 point Moving Average Model forecast on the Testing Data, RMSE = 710.513 |

MAPE = 15.01

For 9 point Moving Average Model forecast on the Testing Data, RMSE = 735.889 |

MAPE = 15.33

The summarized performance of the models run on the wine datasets can be seen below:

Figure 23: Summarized Performance of the Models

I have applied 2, 4, 6 and 9-point trailing averages on the dataset.


As we can see from the above plots, all of the following typical plots show expectation

values beneath the real train and test informational collections, and the 9 point following

normal plot shows the most minimal forecast of the multitude of plots. The nearest

expectation to genuine information is shown by the 2 point following moving normal

model. This perception is verified by the RMSE scores for every one of these moving

typical models.

52
As should be visible from the summed up presentation of the relative multitude of

models, the 2 point moving normal has shown the best execution of the multitude of

models run on dataset.

4.5 Simple Exponential Smoothing (SES)

The SES Parameters for dataset can be seen below:

Figure 24: SES Parameters

Following is the result from running a SES Model on the dataset:

Figure 25: Simple Exponential Smoothing Outcome

For Alpha = 0.216 Simple Exponential Smoothening Model forecast on the Test data,

RMSE = 847.635 | MAPE = 18.86

The summarized performance of the models run on the wine datasets can be seen below:

53
Figure 26: Summarized Performance of the Models

As we as a whole realize that SES model ought to be utilized on information which has no

component of pattern or irregularity, I actually applied it on the informational index to see

what the presentation of the model is for this situation.

I utilized Alpha = 0.216 for the SES model and true to form, it didn't perform well when

contrasted with recently run models.

4.6 Double Exponential Smoothing (DES)

The SES Parameters for dataset can be seen below:

Figure 27: DES Parameters


Following is the result from running a DES Model on dataset:

54
Figure 28: Double Exponential Smoothing Outcome

For Alpha =0.1, Beta = 0.1 Double Exponential Smoothening Model forecast on the Test

data, RMSE = 982.938

The summarized performance of the models run on the wine datasets can be seen below:

Figure 29: Summarized Performance of the Models

55
As we as a whole realize that DES model ought to be utilized on information which has no

irregularity except for has levels and patterns, I utilized the framework search to start and

we arrived at resolution that Alpha =

0.1 and Beta = 0.1 show the least RMSE and MAPE. . The DES model is the model with

the great presentation up until this point.

4.7 Triple Exponential Smoothing (TES)

The TES Parameters for the Soft Drink dataset can be seen below:

Figure 3: TES Parameters for the Soft drink data set


The TES train and test data dataset can be seen below:

Figure 4: TES Model Train and Test data


Following is the result from running a TES Model on the dataset:

56
Figure 5: Triple Exponential Smoothing Outcome

For Alpha=0.099, Beta=0.019, Gamma=0.355, Triple Exponential Smoothing Model

forecast on the Test, RMSE = 443.499

The summarized performance of the models run on the wine datasets can be seen below:

Figure 33: Summarized Performance of the Models

57
4.8 Summary of all Models

Now that we have run all the models planned, let’s view the summary of the performance

of the dataset:

Figure 34: Sorted Model Performance Summary

As we can observe that for the dataset, the Triple Exponential Smoothing gives the best

RMSE and MAPE among all the models.

58
5. Check for the stationarity of the data on which the model is being built

on using appropriate statistical tests and also mention the hypothesis for

the statistical test. If the data is found to be non-stationary, take

appropriate steps to make it stationary. Check the new data for

stationarity and comment. Note: Stationarity should be checked at

alpha

= 0.05

I have played out the Stationarity Test on information outline. I have utilized an expanded

Dickey-Fuller test on the shoes informational index to actually take a look at the stationarity.

The Speculation is that the shoes information is fixed, Alpha = 0.05

59
Figure 35: Stationarity

As we can see from the abovementioned, we really want to dismiss the Speculation since the p

esteem is by all accounts more noteworthy than alpha, subsequently we should stationaries the

information. That is, the information properties don't rely upon when the information series is

noticed. This is essentially a smidgen of an irregularity/pattern component in the dataset. In the

wake of taking the distinction of in the middle of between constant perceptions to stationaries

the information, we can see that the p-esteem gave off an impression of being under 0.05.

60
6. Build an automated version of the ARIMA/SARIMA model in which

the parameters are selected using the lowest Akaike Information

Criteria (AIC) on the training data and evaluate this model on the test

data using RMSE.

6.1 ARIMA Model

61
Figure 36: Running Automated ARIMA Model
Following are the Results of ARIMA model in Rose wine dataset:

Figure 37: Results of Automated ARIMA Model

As we can see from the abovementioned, the most minimal AIC recorded for the

information is for p,d,q upsides of (3,1,3) individually and the least AIC is 2027.528.

The p worth of coefficients MA1 and MA2 are 0 and 0.013 which implies that these are

really critical. The RMSE and MAPE

values are:

RMSE: 784.989 MAPE: 16.2

62
6.2 SARIMA Model

Following is the outcome of SARIM Model run on data:

63
Figure 68:SARIMA Model

64
As can be noticed, the model with p,d,q, as 3,1,3 separately has the most minimal AIC,

which is 14. The p worth of ar.S.L12 and ma.S.L12 is under 0.05 which makes them

pretty huge. The RMSE and MAPE values are

RMSE: 429.452

MAPE: 9.95

7. Build ARIMA/SARIMA models based on the cut-off points of ACF

and PACF on the training data and evaluate this model on the test data

using RMSE.

7.1 ACF and PACF plots


An autocorrelation (ACF) plot addresses the autocorrelation of the series with slacks of itself. A

fractional autocorrelation (PACF) plot addresses how much connection between's a series and a

slack to itself that isn't made sense of by relationships at all lower-request slacks. We would

like every one of the spikes to fall in the blue district.

65
Figure 79:ACF and PACF result

The above shows ACF and PACF for a fixed time frame series, separately. The ACF and PACF

plots demonstrate that a Mama (1) model would be fitting for the time series on the grounds that

the ACF cuts after 1 slack while the PACFs shows a gradually diminishing pattern.

Following is the outcome of SARIMA Model run on data:

66
Figure 40: SARIMA model

Following is the outcome of ARIMA Model run on data:

67
Figure 41: ARIMA model

8. Build a table with all the models built along with their

corresponding parameters and the respective RMSE values on the

test data.

I have sorted the models based on lowest RMSE and MAPE values on test data.

68
Figure 42: RMSE and MAPE values on test data for all the model runs

We can observe SARIMA (3, 1, 3)(3, 0, 0, 12) average has the lowest RMSE and MAPE

score on test data and hence is the best model.

9. Based on the model-building exercise, build the most optimum

model(s) on the complete data and predict 12 months into the future

with appropriate confidence intervals/bands.

We can plot the real and the forecasted sales for the time series.

Figure 43: Forecasted sales

69
70

Figure 44: Lower and Upper Confidence interval bands

Figure 45: Lower and Upper Confidence interval forecasted plot

10.Based Comment on the model thus built and report your findings and

suggest the measures that the company should be taking for future

sales.

• The organization ought to concoct rebate offers in the long periods of January to May
as the deals are low in these months.
• Likewise, the organization can embrace a decent cost for shoes as we saw there were
numerous exceptions in the event of yearly expectation
• To increment test size
• To expand the quantity of autonomous factors
• Attempt more mixes of factors to check whether precision of the model can be moved
along.

70

You might also like