P L Lohitha 19-04-23 TSF Business Report
P L Lohitha 19-04-23 TSF Business Report
P L Lohitha
Date: 19/04/2023
                                    1
Table of Contents
Problem 1
Table of Contents.......................................................................................................................2
1.      Executive Summary............................................................................................................4
2.      Introduction.........................................................................................................................4
3.      Data Details.........................................................................................................................4
Q1 Read the data as an appropriate Time Series data and plot the data....................................5
     1.1       Reading the Data.........................................................................................................5
     1.2       Plotting the Data..........................................................................................................5
2. Perform appropriate Exploratory Data Analysis to understand the data and also perform
decomposition............................................................................................................................6
     2.1 EDA..................................................................................................................................6
     Null Value Check...................................................................................................................6
        Duplicate Value Check.......................................................................................................7
        Data Description.................................................................................................................7
        Yearly Box Plots.................................................................................................................7
        Monthly Box Plots..............................................................................................................8
        Monthly Sales across Years................................................................................................9
     2.2       Decomposition...........................................................................................................10
3.      Split the data into training and test. The test data should start in 1991............................11
4. Build various exponential smoothing models on the training data and evaluate the model
using RMSE on the test data. Other models such as Regression, Naïve forecast models and
simple average models should also be built on the training data and check the performance on
the test data using RMSE.........................................................................................................13
     4.1       Linear Regression......................................................................................................13
     4.2       Naïve Model..............................................................................................................14
     4.3       Simple Average Model..............................................................................................15
     4.4       Moving Average Model............................................................................................17
     4.5       Simple Exponential Smoothing (SES)......................................................................20
     4.6       Double Exponential Smoothing (DES).....................................................................21
     4.7       Triple Exponential Smoothing (TES)........................................................................23
     4.8       Summary of all Models.............................................................................................25
5. Check for the stationarity of the data on which the model is being built on using
appropriate statistical tests and also mention the hypothesis for the statistical test. If the data is
found to be non-stationary, take appropriate steps to make it stationary. Check the new data for
stationarity and comment. Note: Stationarity should be checked at alpha = 0.05...................26
6. Build an automated version of the ARIMA/SARIMA model in which the parameters are
selected using the lowest Akaike Information Criteria (AIC) on the training data and evaluate
this model on the test data using RMSE..................................................................................28
                                                                                                                                                  2
   6.1        ARIMA Model..........................................................................................................28
   6.2        SARIMA Model........................................................................................................30
7. Build ARIMA/SARIMA models based on the cut-off points of ACF and PACF on the
training data and evaluate this model on the test data using RMSE........................................32
   7.1        ACF and PACF plots.................................................................................................32
8. Build a table with all the models built along with their corresponding parameters and the
respective RMSE values on the test data.................................................................................36
9. Based on the model-building exercise, build the most optimum model(s) on the complete
data and predict 12 months into the future with appropriate confidence intervals/bands........37
10.   Based Comment on the model thus built and report your findings and suggest the
measures that the company should be taking for future sales..................................................38
Problem 2
 1.      Executive Summary............................................................................................................36
 2.      Introduction.........................................................................................................................36
 3. Data Details.........................................................................................................................36
Q1 Read the data as an appropriate Time Series data and plot the data......................................37
      1.1       Reading the Data.........................................................................................................37
      1.2       Plotting the Data..........................................................................................................37
 2. Perform appropriate Exploratory Data Analysis to understand the data and also perform
 decomposition
     39
      2.1 EDA..................................................................................................................................39
      Null Value Check...................................................................................................................39
       Duplicate Value Check.........................................................................................................40
       Data Description...................................................................................................................40
       Yearly Box Plots...................................................................................................................40
       Monthly Box Plots................................................................................................................41
       Monthly Sales across Years..................................................................................................42
      2.2       Decomposition...........................................................................................................44
 3.      Split the data into training and test. The test data should start in 1991............................45
 4. Build various exponential smoothing models on the training data and evaluate the model
 using RMSE on the test data. Other models such as Regression, Naïve forecast models and
 simple average models should also be built on the training data and check the performance on
 the test data using RMSE.........................................................................................................46
      4.1       Linear Regression......................................................................................................47
      4.2       Naïve Model..............................................................................................................48
      4.3       Simple Average Model..............................................................................................49
      4.4       Moving Average Model............................................................................................50
                                                                                                                                               3
        4.5      Simple Exponential Smoothing (SES)......................................................................53
        4.6      Double Exponential Smoothing (DES).....................................................................54
        4.7      Triple Exponential Smoothing (TES)........................................................................56
        4.8      Summary of all Models.............................................................................................58
     5. Check for the stationarity of the data on which the model is being built on using
     appropriate statistical tests and also mention the hypothesis for the statistical test. If the data
     is found to be non-stationary, take appropriate steps to make it stationary. Check the new data
     for stationarity and comment. Note: Stationarity should be checked at alpha = 0.05..............59
     6. Build an automated version of the ARIMA/SARIMA model in which the parameters are
     selected using the lowest Akaike Information Criteria (AIC) on the training data and evaluate
     this model on the test data using RMSE..................................................................................61
        6.1      ARIMA Model..........................................................................................................61
        6.2      SARIMA Model........................................................................................................63
     7. Build ARIMA/SARIMA models based on the cut-off points of ACF and PACF on the
     training data and evaluate this model on the test data using RMSE........................................65
        7.1      ACF and PACF plots.................................................................................................65
     8. Build a table with all the models built along with their corresponding parameters and the
     respective RMSE values on the test data.................................................................................68
     9. Based on the model-building exercise, build the most optimum model(s) on the complete
     data and predict 12 months into the future with appropriate confidence intervals/bands........69
10. Based Comment on the model thus built and report your findings and suggest the measures
     that the company should be taking for future sales..................................................................70
                                                                                                                                           4
1. Executive Summary
You are an analyst in the IJK shoe company and you are expected to forecast the sales of the
pairs of shoes for the upcoming 12 months from where the data ends. The data for the pair of
shoe sales have been given to you from January 1980 to July 1995.
2. Introduction
The aim for this venture is to perform estimating investigation on the Shoes deals dataset. I will
attempt to break down this dataset by utilizing Direct Relapse, Credulous Model, Basic and
Moving Normal models, Straightforward, Twofold and Triple Dramatic Smoothing. The
informational collection contains 187entries, and I will attempt to fabricate the most ideal
model(s) on the total information and foresee a year into the future with fitting certainty
stretches/groups.
3. Data Details
Data set contains two columns, where the first column shows the month and year of the
                                     YearMonth       Shoe_Sales
                                     1980-01                  85
                                     1980-02                  89
                                     1980-03                 109
                                     1980-04                  95
                                     1980-05                  91
                                     1980-06                  95
                                     1980-07                  96
Q1 Read the data as an appropriate Time Series data and plot the data
I have imported the information series and as we can notice, section has a YearMonth esteem
with it, which isn't exactly a data of interest, yet a list for the deals passage. So truly the datasets
have a solitary section that contains the amount of shoes sold in that specific month. Here,
                                                                                                      5
while perusing the datasets I have given the contention in a manner so it parses the main section
which is date segment, and shows to the framework that this is a one section series through
crush.
It tends to be noticed the dataset has information beginning from January 1980 going till July
Since I have transferred the dataset without any contentions (and subsequently transferred
the datasets without parsing the dates here), I should give a period stamp esteem without
help from anyone else. Notwithstanding that I have taken out the YearMonth variable and
                                                                                                 6
As we can see from the above plot, the deals for Shoes were in vertical pattern till 1988 and
descending pattern 1988 onwards. There is a sure irregularity component that is noticeable in
the diagram. We will investigate the pattern and irregularity further during deterioration, where
we will actually want to see a much itemized report on these two variables.
2.1 EDA
There are no copy sections in the dataset as each worth relates to an alternate time file, so
Data Description
though they are slanted. There is Elevated expectation Deviation for the time series since
                                                                                                 7
the Min and Max have massive contrast between them. Besides, there is contrast between
the mean and the middle for a similar explanation of skewness. As referenced before, there
Following is the yearly box plot for the Shoes sales time-series:
pattern post 1988. The most elevated deals for shoes can be seen in 1987 and the least deals
in 1980. The most noteworthy variety in month to month deals for shoes is by all accounts
in the year 1985 and on the year 1984 there is by all accounts the least variety in month to
month deals.
There are exceptions in the yearly deals information, but as it is a Period Series; we can
Following is the monthly box plot for the shoe sales time-series:
                                                                                          8
                             Figure 6: Monthly Box Plots
As we can see from the Month to month Box Plots, we can plainly see that there is an
irregularity component noticeable in time series dataset. As can be plainly seen that the
deals have a rising deals pattern in the last quarter of the year. The deals for shoes appears
to get from July month and is pretty much predictable till June, notices a few dormancy in
September month and afterward begins to get again from October (for example last
quarter). Month to month deals information shows skewness absent a lot of special case.
The monthly sales across years can be seen in the following Pivot Table and the associated
graph:
                                                                                            9
                        Figure 7: Monthly Sales across Years
As can be seen from the above set of table and diagram, the long periods of December is by
all accounts the month that drives the most elevated marketing projections. The second
most elevated deals being in November. We can notice an irregularity component in the
chart above.
                                                                                       10
   2.2 Decomposition
I have provided the decomposed elements for the Time Series below:
substance and multiplicative decay for time series so I can decide whether the shoes dataset is a
As we can see from the abovementioned, we can say that the time series is plainly
                                                                                              11
The plots above obviously demonstrate that the deals are unsteady and not uniform, and they
3. Split the data into training and test. The test data should start in 1991.
I have split the time series datasets into Train and Test datasets below. It is given the
                  Figure 10: Training and Test Datasets for Shoes Time Series
    I have likewise affirmed that the Train dataset without a doubt finishes in 1990, and the
Test dataset to be sure beginnings in 1991 by utilizing the Head and Tail capabilities on the
Preparation and Test dataset. As we can notice, the size of the Train information outline is
I have also plotted the Train and test data frames for both time series datasets below:
                                                                                              12
                      Figure 11: Plot for Training and Test data frames
   We can notice the preparation and test information in the above plot, the blue piece of the
plots portrays the Train datasets (January '80 - December '90), and the Orange piece of the
evaluate the model using RMSE on the test data. Other models such as
Regression, Naïve forecast models and simple average models should also
be built on the training data and check the performance on the test data
using RMSE
In this section I will try to run the various available models on time series data set. Let’s
The extracts of Training and Test time stamps for the Linear Regression can be seen below:
                                                                                             13
                        Figure 13: Linear Regression Outcome
The Relapse plots above portray the relapse on preparing set as the Red line and that on the
test set as the green line. As we can see from the above plot and metric, shoes deals show
informational collection.
The summarized performance of the model run on the dataset can be seen below:
The extracts of Training and Test data for the Naïve Model can be seen below:
                                                                                         14
                  Figure 15: Training and Test data for Naive Model
Following is the result from running a Naïve Model:
for the shoe dataset since the forecasts depends on the previous last observation.
The extracts of Training and Test data for the Simple Average Model can be seen below:
                                                                                         15
            Figure 18: Training and Test data for Simple Average Model
Following are the results from running a Simple Average Model:
The summarized performance of the models run dataset can be seen below:
model has the best performance among all the three models run till now for.
                                                                               16
4.4 Moving Average Model
The Moving Average data for the dataset can be seen below:
                                                                       17
                    Figure 22: Moving Average Model Outcome
For 2 point Moving Average Model forecast on the Testing Data, RMSE = 45.948 | MAPE
= 14.32
For 4 point Moving Average Model forecast on the Testing Data, RMSE = 57.872 | MAPE
= 19.48
For 6 point Moving Average Model forecast on the Testing Data, RMSE = 63.456 | MAPE
= 22.38
For 9 point Moving Average Model forecast on the Testing Data, RMSE = 67.723 | MAPE
= 23.33
The summarized performance of the models run on the wine datasets can be seen below:
                                                                                       18
                  Figure 23: Summarized Performance of the Models
As we can see from the above plots, all of the following typical plots show expectation
values beneath the genuine train and test informational indexes, and the 9 point following
normal plot shows the most reduced forecast of the relative multitude of plots. The nearest
model. This perception is verified by the RMSE scores for every one of these moving
typical models.
As should be visible from the summed up presentation of the relative multitude of models,
the 2 point moving normal has shown the best execution of the relative multitude of models
run on dataset.
                                                                                        19
    Following is the result from running a SES Model on the dataset:
For Alpha = 0.605 Simple Exponential Smoothening Model forecast on the Test data,
The summarized performance of the models run on the wine datasets can be seen below:
As we as a whole realize that SES model ought to be utilized on information which has no
component of pattern or irregularity, I actually applied it on the informational index to see what
I utilized Alpha = 0.605 for the SES model and true to form, it didn't perform well when
For Alpha =0.1, Beta = 0.1 Double Exponential Smoothening Model forecast on the Test
The summarized performance of the models run on the wine datasets can be seen below:
                                                                                       21
                  Figure 29: Summarized Performance of the Models
As we as a whole realize that DES model ought to be utilized on information which has no
irregularity except for has levels and patterns, I utilized the lattice search to start and we
arrived at resolution that Alpha = 0.1 and Beta = 0.1 show the least RMSE and MAPE. .
The DES model is the model with the great exhibition up to this point.
The TES Parameters for the shoes sales data set can be seen below:
                                                                                           22
                      Figure 4: TES Model Train and Test data
Following is the result from running a TES Model on the dataset:
The summarized performance of the models run on the wine datasets can be seen below:
                                                                                       23
                  Figure 33: Summarized Performance of the Models
Now that we have run all the models planned, let’s view the summary of the performance
of the dataset:
As we can observe that for the dataset, the 2 point trailing moving average gives the best
                                                                                       24
5. Check for the stationarity of the data on which the model is being built on
using appropriate statistical tests and also mention the hypothesis for the
steps to make it stationary. Check the new data for stationarity and
I have played out the Stationarity Test on information outline. I have utilized an expanded
Dickey-Fuller test on the shoes informational collection to really look at the stationarity.
                                                                                            25
                               Figure 35: Stationarity
As we can see from the abovementioned, we want to dismiss the Speculation since the p
esteem is by all accounts more noteworthy than alpha, subsequently we should stationaries
the information. That is, the information properties don't rely upon when the information
dataset. In the wake of taking the distinction of in the middle of between constant
perceptions to stationaries the information, we can see that the p-esteem seemed, by all
                                                                                       26
6. Build an automated version of the ARIMA/SARIMA model in which the
(AIC) on the training data and evaluate this model on the test data using
RMSE.
                                                                              27
                    Figure 36: Running Automated ARIMA Model
Following are the Results of ARIMA model in Rose wine dataset:
As we can see from the abovementioned, the least AIC recorded for the information is for
p,d,q upsides of (4,1,3) individually and the most reduced AIC is 1479.147 . The p worth of
coefficients MA1 and MA2 are 0 and 0.013 which implies that these are really huge. The
                                                                                        28
 RMSE: 205.555        MAPE: 83.41
                                                       29
30
                                  Figure 68:SARIMA Model
As can be noticed, the model with p,d,q, as 2,1,1 separately has the most minimal AIC,
which is 14. The p worth of ar.S.L12 and ma.S.L12 is under 0.05 which makes them pretty
RMSE: 70.723
MAPE: 24.48
PACF on the training data and evaluate this model on the test data using
RMSE.
An autocorrelation (ACF) plot addresses the autocorrelation of the series with slacks of
itself. A halfway autocorrelation (PACF) plot addresses how much connection between's a
series and a slack to itself that isn't made sense of by relationships at all lower-request
slacks. We would like every one of the spikes to fall in the blue district.
                                                                                           31
                         Figure 79:ACF and PACF result
The above shows ACF and PACF for a fixed time frame series, individually. The ACF and
PACF plots demonstrate that a Mama (1) model would be suitable for the time series in
light of the fact that the ACF cuts after 1 slack while the PACFs shows a gradually
diminishing pattern.
                                                                                  32
                           Figure 40: SARIMA model
                                                       33
                                Figure 41: ARIMA model
8. Build a table with all the models built along with their corresponding
I have sorted the models based on lowest RMSE and MAPE values on test data.
                                                                                 34
           Figure 42: RMSE and MAPE values on test data for all the model runs
We can observe 2 point Trailing Moving average has the lowest RMSE and MAPE score
on the complete data and predict 12 months into the future with
We can plot the real and the forecasted sales for the time series.
                                                                                 35
                   Figure 44: Lower and Upper Confidence interval bands
10.Based Comment on the model thus built and report your findings and
suggest the measures that the company should be taking for future sales.
  • The organization ought to concoct rebate offers in the long periods of January to May as
  the deals are low in these months.
  • Likewise, the organization can take on a decent cost for shoes as we saw there were
  numerous exceptions in the event of yearly expectation
  • To increment test size
  • To build the quantity of autonomous factors
  • Attempt more blends of factors to check whether precision of the model can be moved
  along.
Problem 2
                                                                                         36
  1. Executive Summary
You are an analyst in the RST soft drink company and you are expected to forecast the sales
of the production of the soft drink for the upcoming 12 months from where the data ends. The
data for the production of soft drinks has been given to you from January 1980 to July 1995.
2. Introduction
The goal for this undertaking is to perform guaging investigation on the soda pop creation
dataset. I will attempt to dissect this dataset by utilizing Straight Relapse, Gullible Model,
Basic and Moving Normal models, Basic, Twofold and Triple Remarkable Smoothing. The
informational collection contains 187entries, and I will attempt to construct the most ideal
model(s) on the total information and foresee a year into the future with proper certainty
spans/groups.
3. Data Details
Data set contains two columns, where the first column shows the month and year of the
                                YearMont    SoftDrinkProducti
                                h           on
                                1980-01                  1954
                                1980-02                  2302
                                1980-03                  3054
                                1980-04                  2414
                                1980-05                  2226
                                1980-06                  2725
Q1 Read the data as an appropriate Time Series data and plot the data
esteem with it, which isn't exactly a data of interest, yet a list for the deals passage. So
as a general rule the datasets have a solitary segment that contains the amount of soda
specific month. Here, while perusing the datasets I have given the contention in a
manner so it parses the primary segment which is date section, and demonstrates to the
It can be observed the dataset has data starting from January 1980 going till July 1995, so
Since I have transferred the dataset without any contentions (and consequently
transferred the datasets without parsing the dates here), I should give a period stamp
esteem without anyone else. Notwithstanding that I have taken out the YearMonth
                                                                                               38
                      Figure 2: Soft Drink Production Time Series Plot
As we can see from the above plot, the creation for Soda pop was in vertical bearing.
investigate the pattern and irregularity further during deterioration, where we will
actually want to see a much point by point report on these two elements.
2.1 EDA
                                                                                        39
 Duplicate Value Check
There are no duplicate entries in the dataset as each value corresponds to a different time
index, so basically these are all sales figures for different months.
Data Description
As we can see from the abovementioned, the shoes deals time series information appear
as though they are slanted. There is Elevated requirement Deviation for the time series
since the Min and Max have tremendous contrast between them. Additionally, there is
contrast between the mean and the middle for a similar explanation of skewness. As
Following is the yearly box plot for the Soft drink Production time-series:
                                                                                          40
                          Figure 5: Yearly Box Plots
As we can see from the above plot, soda creation has no pattern till 1988 and a vertical
deals pattern post 1988. The most noteworthy creation for soda can be seen in 1987 and
the least deals in 1980. The most noteworthy variety in month to month creation for soda
pop is by all accounts in the year 1993 and on the year 1995 there is by all accounts the
There are exceptions in the yearly creation information, but as it is a Period Series; we
Following is the monthly box plot for the shoe sales time-series:
                                                                                       41
                             Figure 6: Monthly Box Plots
As we can see from the Month to month Box Plots, we can plainly see that there is an
irregularity component noticeable in time series dataset. As can be plainly seen that the
creation have a rising pattern in the last quarter of the year. The creation for soda appears
to get from July month and is pretty much predictable till June, notices a few dormancy
in September month and afterward begins to get again from October (for example last
quarter). Month to month deals information shows skewness absent a lot of exemption.
The monthly sales across years can be seen in the following Pivot Table and the
associated graph:
                                                                                           42
                         Figure 7: Monthly Production across Years
As can be seen from the above set of table and diagram, the long periods of December is by all
accounts the month that drives the most elevated creation figures. The second most noteworthy
creation being in November. We can notice an irregularity component in the chart above.
                                                                                           43
    2.2 Decomposition
I have provided the decomposed elements for the Time Series below:
We can see the disintegration of the time series above. I have attempted with both added
substance and multiplicative deterioration for time series so I can decide whether the shoes
                                                                                         44
As we can see from the abovementioned, we can say that the time series is obviously
The plots above obviously show that the creation is unsteady and not uniform, and it has a clear
irregularity pattern.
3. Split the data into training and test. The test data should start in 1991.
I have split the time series datasets into Train and Test datasets below. It is given the
Figure 10: Training and Test Datasets for Soft Drink Time Series
I have likewise affirmed that the Train dataset without a doubt finishes in 1990, and the
Test dataset to be sure beginnings in 1991 by utilizing the Head and Tail capabilities on
the Preparation and Test dataset. As we can notice, the size of the Train information
outline is 132 perceptions and that of the Test information outline is 55 perceptions.
I have also plotted the Train and test data frames for both time series datasets below:
                                                                                               45
                     Figure 11: Plot for Training and Test data frames
We can notice the preparation and test information in the above plot, the blue piece of
the plots portrays the Train datasets (January '80 - December '90), and the Orange piece
of the plots portray the test datasets (January '91 - July '95)..
evaluate the model using RMSE on the test data. Other models such as
also be built on the training data and check the performance on the
In this section I will try to run the various available models on time series data set. Let’s
                                                                                              46
4.1 Linear Regression
The extracts of Training and Test time stamps for the Linear Regression can be seen
below:
The Relapse plots above portray the relapse on preparing set as the Red line and that on
the test set as the green line. As we can see from the above plot and metric, shoes deals
informational collection.
The summarized performance of the model run on the dataset can be seen below:
                                                                                       47
             Figure 14: Performance of the Linear Regression Model
The extracts of Training and Test data for the Naïve Model can be seen below:
                                                                                48
                     Figure 17: Performance of the two Models
As can be seen from the Naïve model performance above, the Naïve model is not
suitable for the shoe dataset since the forecasts depends on the previous last observation.
The extracts of Training and Test data for the Simple Average Model can be seen below:
            Figure 18: Training and Test data for Simple Average Model
Following are the results from running a Simple Average Model:
                                                                                          49
For Simple Average Model,
The summarized performance of the models run dataset can be seen below:
model has the best performance among all the three models run till now for.
The Moving Average data for the dataset can be seen below:
                                                                              50
51
                     Figure 22: Moving Average Model Outcome
For 2 point Moving Average Model forecast on the Testing Data, RMSE = 556.725 |
MAPE = 10.67
For 4 point Moving Average Model forecast on the Testing Data, RMSE = 687.181 |
MAPE = 13.71
For 6 point Moving Average Model forecast on the Testing Data, RMSE = 710.513 |
MAPE = 15.01
For 9 point Moving Average Model forecast on the Testing Data, RMSE = 735.889 |
MAPE = 15.33
The summarized performance of the models run on the wine datasets can be seen below:
values beneath the real train and test informational collections, and the 9 point following
normal plot shows the most minimal forecast of the multitude of plots. The nearest
model. This perception is verified by the RMSE scores for every one of these moving
typical models.
                                                                                          52
As should be visible from the summed up presentation of the relative multitude of
models, the 2 point moving normal has shown the best execution of the multitude of
For Alpha = 0.216 Simple Exponential Smoothening Model forecast on the Test data,
The summarized performance of the models run on the wine datasets can be seen below:
                                                                                   53
                 Figure 26: Summarized Performance of the Models
As we as a whole realize that SES model ought to be utilized on information which has no
I utilized Alpha = 0.216 for the SES model and true to form, it didn't perform well when
                                                                                           54
               Figure 28: Double Exponential Smoothing Outcome
For Alpha =0.1, Beta = 0.1 Double Exponential Smoothening Model forecast on the Test
The summarized performance of the models run on the wine datasets can be seen below:
                                                                                   55
As we as a whole realize that DES model ought to be utilized on information which has no
irregularity except for has levels and patterns, I utilized the framework search to start and
0.1 and Beta = 0.1 show the least RMSE and MAPE. . The DES model is the model with
The TES Parameters for the Soft Drink dataset can be seen below:
                                                                                           56
                Figure 5: Triple Exponential Smoothing Outcome
The summarized performance of the models run on the wine datasets can be seen below:
                                                                                   57
4.8 Summary of all Models
Now that we have run all the models planned, let’s view the summary of the performance
of the dataset:
As we can observe that for the dataset, the Triple Exponential Smoothing gives the best
                                                                                      58
 5. Check for the stationarity of the data on which the model is being built
on using appropriate statistical tests and also mention the hypothesis for
alpha
= 0.05
I have played out the Stationarity Test on information outline. I have utilized an expanded
Dickey-Fuller test on the shoes informational index to actually take a look at the stationarity.
                                                                                             59
                                     Figure 35: Stationarity
As we can see from the abovementioned, we really want to dismiss the Speculation since the p
esteem is by all accounts more noteworthy than alpha, subsequently we should stationaries the
information. That is, the information properties don't rely upon when the information series is
wake of taking the distinction of in the middle of between constant perceptions to stationaries
the information, we can see that the p-esteem gave off an impression of being under 0.05.
                                                                                               60
6. Build an automated version of the ARIMA/SARIMA model in which
Criteria (AIC) on the training data and evaluate this model on the test
                                                                            61
                  Figure 36: Running Automated ARIMA Model
Following are the Results of ARIMA model in Rose wine dataset:
As we can see from the abovementioned, the most minimal AIC recorded for the
information is for p,d,q upsides of (3,1,3) individually and the least AIC is 2027.528.
The p worth of coefficients MA1 and MA2 are 0 and 0.013 which implies that these are
values are:
                                                                                     62
6.2 SARIMA Model
                                                       63
Figure 68:SARIMA Model
                         64
     As can be noticed, the model with p,d,q, as 3,1,3 separately has the most minimal AIC,
which is 14. The p worth of ar.S.L12 and ma.S.L12 is under 0.05 which makes them
RMSE: 429.452
MAPE: 9.95
and PACF on the training data and evaluate this model on the test data
using RMSE.
fractional autocorrelation (PACF) plot addresses how much connection between's a series and a
slack to itself that isn't made sense of by relationships at all lower-request slacks. We would
                                                                                                  65
                               Figure 79:ACF and PACF result
The above shows ACF and PACF for a fixed time frame series, separately. The ACF and PACF
plots demonstrate that a Mama (1) model would be fitting for the time series on the grounds that
the ACF cuts after 1 slack while the PACFs shows a gradually diminishing pattern.
                                                                                             66
                          Figure 40: SARIMA model
                                                       67
                               Figure 41: ARIMA model
8. Build a table with all the models built along with their
test data.
I have sorted the models based on lowest RMSE and MAPE values on test data.
                                                                                 68
          Figure 42: RMSE and MAPE values on test data for all the model runs
We can observe SARIMA (3, 1, 3)(3, 0, 0, 12) average has the lowest RMSE and MAPE
model(s) on the complete data and predict 12 months into the future
We can plot the real and the forecasted sales for the time series.
                                                                                   69
                                                                                      70
10.Based Comment on the model thus built and report your findings and
suggest the measures that the company should be taking for future
sales.
  • The organization ought to concoct rebate offers in the long periods of January to May
  as the deals are low in these months.
  • Likewise, the organization can embrace a decent cost for shoes as we saw there were
  numerous exceptions in the event of yearly expectation
  • To increment test size
  • To expand the quantity of autonomous factors
  • Attempt more mixes of factors to check whether precision of the model can be moved
  along.
70