0% found this document useful (0 votes)
10 views47 pages

ISHA 26MayRoseWine

The TSF Project analyzes historical wine sales data from ABC Estate Wines to identify trends and provide actionable insights for future sales strategies. The document outlines the problem statement, exploratory data analysis, model building, and performance comparisons of various forecasting models. Key findings indicate a long-term decreasing trend in sales, with seasonal patterns observed, particularly peaking in December.

Uploaded by

murali.dhiviya96
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views47 pages

ISHA 26MayRoseWine

The TSF Project analyzes historical wine sales data from ABC Estate Wines to identify trends and provide actionable insights for future sales strategies. The document outlines the problem statement, exploratory data analysis, model building, and performance comparisons of various forecasting models. Key findings indicate a long-term decreasing trend in sales, with seasonal patterns observed, particularly peaking in December.

Uploaded by

murali.dhiviya96
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

TSF Project - Coded

Rose

Isha Shukla
25 May 2024

TSF-Coded-Rose 1
INDEX
SL Title Page No.
No.

1 De ine the problem and perform Exploratory Data 3


Analysis

2 Data Pre-processing 15

3 Model Building - Original Data 16

4 Check for Stationarity 25

5 Model Building - Stationary Data 29

6 Compare the performance of the models 44

7 Actionable Insights & Recommendations 46

Plots
1. Line plot of dataset
2. Boxplot of dataset
3. Lineplot of sales
4. Boxplot of yearly data
5. Boxplot of monthly data
6. Weekly boxplot
7. Graph of monthly sales over the year
8. Correlation
9. ECDF plot
10. Decomposition addictive
11. Decomposition multiplicative
12. Train and test dataset
13. Linear regression
14. Moving average
15. Simple exponential smoothing
16. Double exponential smoothing
17. Naive approach
18. Simple average
19. Triple exponential smoothing
20. Dickey fuller test
21. Dickey fuller test after diff
22. Auto ARIMA plot
23. Auto SARIMA plots
24. Manual ARIMA
25. Manual SARIMA
26. PACF and ACF plot
27. PACF and ACF plot train dataset
28. Manual ARIMA plot
29. Manual SARIMA plot
30. Prediction plot

TSF-Coded-Rose 2
f
Problem Statement

As an analyst at ABC Estate Wines, we are presented with historical data


encompassing the sales of different types of wines throughout the 20th
century. These datasets originate from the same company but represent
sales igures for distinct wine varieties. Our objective is to delve into the data,
analyze trends, patterns, and factors in luencing wine sales over the course
of the century. By leveraging data analytics and forecasting techniques, we
aim to gain actionable insights that can inform strategic decision-making and
optimize sales strategies for the future.

The main goal of this project is to study and predict wine sales trends from
the 20th century using historical data from ABC Estate Wines. We want to
give ABC Estate Wines useful insights to improve sales, take advantage of
new market opportunities, and stay competitive in the wine industry.

1. De ine the problem and perform Exploratory Data


Analysis
(I) Read the data

• There are two columns in the dataset Rose.csv

• The dataset has 187 rows.

• Columns - YearMonth(datatype as object) and Rose (datatype as loat)

• Rose column has 2 null values.

Null values in the dataset

Information about the dataset

TSF-Coded-Rose 3
f
f
f
f
Table 1 - Rows of the dataset

Top rows of the dataset Last rows of the dataset

Table 2 - Statistical Summary of the dataset.

Comprehensive Summary of
the dataset.

TSF-Coded-Rose 4
5
5
II. Plot the data
Plot 1 - Plot the data

Plot of the dataset - Rose vs YearMonth

For enhanced analysis of the dataset, we have segmented it further by


extracting the month and year components from the 'YearMonth' column.
This division allows for more granular examination of the data based on
month-to-month and year-to-year trends. Now, there are 3 columns and 187
rows.

Table 3 - After extraction of Year and Month.

Top rows after extraction of Year and Month

TSF-Coded-Rose 5
5
IV. Perform Exploratory Data Analysis (EDA)

• There are 2 null values.


• Replace missing values in the Rose column with the calculated mean.
Treating missing values is crucial for maintaining data integrity, ensuring
accurate analyses, and deriving reliable insights, thereby enabling informed
decision-making and valid conclusions.

After treatment of missing value

• We'll resample the data to aggregate values at a monthly level from the
daily-level data, computing the average for each month.

After resampling of the dataset

TSF-Coded-Rose 6
Plot 2 - The trend of Rose at year level

• There was a peak in 1981. The plot shows that there is trend and
seasonality.
Plot 3 - Yearly Box-plot

TSF-Coded-Rose 7

Yearly Box-plot
Plot 4 - Monthly Boxplot

Monthly Boxplot

Plot 5 - Weekly Box-plot

Weekly Box-plot

TSF-Coded-Rose 8
Outliers Observation

• Yearly Boxplot - Outliers persist across nearly all years. There was a
peek in 1981.

• Monthly Boxplot - The graph indicates that wine sales peak in


December and hit their lowest point in January. Sales remain steady
from January to June, but then begin to rise steadily from July
onwards. However, there are some outliers in June, July, August,
September, and December.

• Weekly Boxplot - There are some outliers in Thursday, Friday,


Monday, Wednesday and Tuesday.

Table 4 - Pivot table displays monthly price across years

Pivot Table

Here are observations from the pivot table:

TSF-Coded-Rose 9
• There are missing values for August, October, September, December,
and November in 1995.

Plot 6 - Pivot table plot for monthly wine sale across year

Graph to display monthly price across years

Plot 7 - Correlation Plot

Rose Wine sales Correlation with respect to year and


month

Observations from the correlation


Correlation Plot table and plot:

•The strong negative correlation


between Rose Wine sales and Year suggests a clear trend over time. This

TSF-Coded-Rose 10
indicates that there is a consistent decrease in Rose wine sales values as
the years progress, implying a long-term downward trend in Rose sales.

• The moderate positive correlation between Rose Wine sales and Month
indicates some seasonality in Rose sales. This suggests that there is a
tendency for Rose sales to increase as the month progresses, implying a
seasonal pattern within each year. However, this correlation is not as
strong as the trend observed over the years.

• Overall, these observations suggest that while there is both a long-term


trend of decreasing Rose Wine sales over the years and a seasonal pattern
of increasing sales within each year, the trend effect is more pronounced
than the seasonal effect.

Plot 8 - Empirical Cumulative Distribution Function (ECDF)

Graph for ECDF

From the ECDF plot of wine sales observations, we can observe the following:

• The x-axis represents the range of wine sales observations, while the y-axis
represents the cumulative probability.

• The plot shows how the cumulative probability increases as we move


along the sorted wine sales values.

• By examining the slope of the curve, we can infer the density of the
observations at different values. Steeper slopes indicate higher density of
observations, while latter slopes indicate lower density.

TSF-Coded-Rose 11
f
• The ECDF plot provides a comprehensive overview of the distribution of
wine sales observations, allowing us to assess characteristics such as
central tendency, spread, and percentiles.

• Overall, the ECDF plot helps us understand the empirical distribution of


wine sales observations and can provide insights into the underlying
patterns and variability in the data.

ECDF for Y-axis(Cumulative Probability)


ECDF for X- axis(Wine sales)

• Highest Wine sale is 267.


• Lowest wine sale is 28.
• More than 75% has sales less than 150.

TSF-Coded-Rose 12
IV. Decomposition

a. Additive

Plot 9 - Additive Decomposition

Decomposition - Additive

• Trend and Seasonality is present.

• Residue/Noise is also there.

• Trend is decreasing with respect to year.

• Rose wine sales increases as the month progresses, implying a seasonal


pattern within each year.

• Peak year was 1981. Afterward sales is decreasing over the time.

TSF-Coded-Rose 13
Additive Decomposition - Trend, Multiplicative Decomposition - Trend,
Seasonality and Residual in year Seasonality and Residual in year

b. Multiplicative - Plot 10

Decomposition - Multiplicative
TSF-Coded-Rose 14
1
1
9
9
8
8
0
0
• Trend and Seasonality is present.

• Residue/Noise ranges from 0 to 1, whereas additive noise ranges from 0 to


50.

• Trend is decreasing with respect to year.

• Rose wine sales increases as the month progresses, implying a seasonal


pattern within each year.

• Peak year was 1981. Afterward sales is decreasing over the time.

• The multiplicative model is preferred over the additive model for


decomposing rose wine sales because of its narrower residual range.

2. Data Pre-processing

I. Train-test split

• The data from 1980 to 1990 is used as the training set, while the data from
1991 to 1995 is used as the testing set. This separation allows us to use the
earlier data for training models and the later data for testing their
performance.
Table 5 - Train and Test rows and columns

Shape of Test( rows


and columns) and
Train ( rows and
columns)

Train Dataset Test Dataset


TSF-Coded-Rose 15
3
1
3
2
5
5
3
Plot 11 - Plot of train and test

Plot of train and test

3. Model Building
(1) Linear Regression
Plot 12 - Linear Regression

Green Line indicates Linear Regression Prediction

RMSE calculated for Linear Regression: 17.08

TSF-Coded-Rose 16
(2) Moving Average (MA)
For the moving average model, we are going to calculate rolling means (or
moving averages) for different intervals. The best interval can be determined
by the maximum accuracy (or the minimum error) over here. For Moving
Average, we are going to average over the entire data.

Top rows for Trailing Moving average for , , and

Plot 13 -Moving Average (MA)

Moving Average plot - point Moving Average is best

TSF-Coded-Rose 17
5
2
2
4
6
9
RMSE calculated for Moving Average:

We created several moving average models with rolling windows ranging


from 2 to 9 points. The best model was the 2-point moving average, with a
RMSE value of 12.29.

Plot for the best Moving Average that is rolling window .

TSF-Coded-Rose 18
2
(3) Simple Exponential Smoothening Model - Plot 14
A Simple Exponential Smoothing (SES) model is a time series forecasting
technique that applies weighted averages of past observations to make
future predictions. In SES, more recent observations are given exponentially
more weight compared to older observations, allowing the model to adapt
quickly to changes in the data.

The SES model is particularly useful for data with no clear trend or seasonal
pattern, as it effectively smooths out short-term luctuations to reveal longer-
term trends or patterns.

Table for forecast of the model

RMSE

TSF-Coded-Rose 19
f
(4) Double Exponential Smoothening (Holt's Model)
Double Exponential Smoothing (DES), also known as Holt's Exponential
Smoothing, is an extension of Simple Exponential Smoothing that
incorporates both level and trend components to handle time series data
with trends.
The DES method helps in capturing both the level and the trend in the time
series, making it suitable for datasets where trends are present, thus
providing more accurate forecasts compared to Simple Exponential
Smoothing when trends exist in the data.

• Two parameters and are estimated in this model. Level and Trend are

accounted for in this model.


Plot 15 - Holt’s Model

Holt’s Model

RMSE

TSF-Coded-Rose 20
𝛼
𝛽
(5) Naive Approach
The Naive Approach is a simple and straightforward time series forecasting
method where the forecast for any future period is assumed to be equal to
the most recent actual observation.

forecast for test prediction is assumed 132

Plot 16 Naive Approach

Plot for Naive Approach

RMSE for Naive Approach

TSF-Coded-Rose 21
(6) Simple Average
The Simple Average Time Series Forecasting (TSF) model is a basic yet
effective method for predicting future values in a time series. It operates on
the principle that the forecasted value for a given period is the simple
average (arithmetic mean) of all previous observations. This model is
particularly useful for data with a consistent level over time and without
signi icant trends or seasonal patterns.

Simple Average Test mean forecast predicted values

Plot 17 Simple Average

Simple Average Plot


TSF-Coded-Rose 22
f
RMSE for Simple Average

(7) Triple Exponential Smoothing (Holt - Winter's Model)


Triple Exponential Smoothing, also known as the Holt-Winters method, is an
extension of Exponential Smoothing that accounts for trends and seasonality
in time series data. This method involves three components: level, trend, and
seasonality. There are two main variations of the Holt-Winters method:
additive and multiplicative. The additive model is used when seasonal
variations are roughly constant over time, while the multiplicative model is
used when seasonal variations change proportionally to the level of the time
series.

Plot 18 - Triple Exponential Smoothing

TSF-Coded-Rose 23
RMSE = 11.76

The optimal Triple Exponential Smoothing (Holt-Winters) model, featuring an


additive trend and multiplicative seasonality, has been identi ied. The best
smoothing parameters alpha, beta and gamma have been determined,
making this the most effective model so far.

Plot 19 - Model Building for all forecast

Plot of all the forecasting models

TSF-Coded-Rose 24
f
RMSE Value in sorted way for all the building

3. Check for Stationarity


The Augmented Dickey-Fuller test is an unit root test which determines
whether there is a unit root and subsequently whether the series is non-
stationary.
The hypothesis in a simple form for the ADF test is:

• H0 : The Time Series has a unit root and is thus non-stationary.


• H1 : The Time Series does not have a unit root and is thus stationary.
We would want the series to be stationary for building ARIMA models and
thus we would want the p-value of this test to be less than the α value.

We see that at 5% signi icant level the Time Series is non-stationary.

Plot 20 - The Augmented Dickey-Fuller test

TSF-Coded-Rose 25
f
Dickey-Fuller Test

Here are the results of the Dickey-Fuller test presented in points:

• Test Statistic: -1.933803


• p-value: 0.316330
• Lags Used 13
• Number of Observations Used: 173
• Critical Values - 1%: -3.468726, 5%: -2.878396 and 10%: -2.575756

Conclusion:

• The test statistic (-1.933803) is higher than the critical values at the 1%, 5%,
and 10% signi icance levels.

• The p-value (0.316330) is greater than typical signi icance thresholds (e.g.,
0.01, 0.05, 0.10).

TSF-Coded-Rose 26
f
f
• As a result, we fail to reject the null hypothesis that the time series has a
unit root (i.e., it is non-stationary).

• This suggests that the time series is likely non-stationary, meaning its
statistical properties such as mean and variance may change over time.

Plot 21 - The Augmented Dickey-Fuller test after difference

AdF test after difference

The Dickey-Fuller test, performed after differencing the data, is used to test
the null hypothesis that a unit root is present in the time series sample. Here
are the key points regarding the null hypothesis and the interpretation of the
test results:

TSF-Coded-Rose 27
1. Null Hypothesis (H0): The time series has a unit root (i.e., it is non-
stationary).
2. Alternative Hypothesis (H1): The time series does not have a unit root (i.e.,
it is stationary).

Interpretation of the Results:

Test Statistic: -7.855944

• This value is the computed test statistic for the Dickey-Fuller test. It is
compared against the critical values to determine whether to reject the
null hypothesis.

• The p-value is extremely small, much less than typical signi icance levels
(e.g., 0.01, 0.05, 0.10). This indicates strong evidence against the null
hypothesis.

• Critical Values: 1%: -3.468726, 5%: -2.878396 and 10%: -2.575756


These values represent the thresholds for rejecting the null hypothesis at the
1%, 5%, and 10% signi icance levels.

Comparison:

• The test statistic (-7.855944) is more negative than all the critical values at
the 1%, 5%, and 10% levels.

• Since the test statistic is much lower (more negative) than the critical
values, and the p-value is extremely small, we reject the null hypothesis.

TSF-Coded-Rose 28
f
f
Conclusion:

• Given that the test statistic (-7.855944) is much lower than the critical
values and the p-value is signi icantly small, we reject the null hypothesis
that the time series has a unit root.

• This indicates that the time series is stationary, meaning its statistical
properties such as mean and variance remain constant over time.

5. Model Building - Stationary Data


Plot 22 - Plot the Autocorrelation and the Partial Autocorrelation function plots on
the whole data.

Plot the Autocorrelation and the Partial Autocorrelation function plots on the whole data.

TSF-Coded-Rose 29
f
(1) Auto ARIMA (Auto-Regressive Integrated Moving Average)

The Auto ARIMA (Auto-Regressive Integrated Moving Average) model is a


statistical analysis technique used for time series forecasting that
automatically selects the best- itting ARIMA model by optimizing the
parameters. ARIMA models are widely used for forecasting data that show
evidence of non-stationarity and require differencing to achieve stationarity.
The ARIMA model comprises three components: the auto-regressive (AR)
part, which regresses the variable on its own lagged values; the integrated (I)
part, which involves differencing the observations to make the time series
stationary; and the moving average (MA) part, which models the error term
as a linear combination of past error terms. The Auto ARIMA model
automates the identi ication of optimal values for these parameters (p, d, q)
by evaluating multiple ARIMA models with different combinations and
selecting the best one based on information criteria such as the Akaike
Information Criterion (AIC) or Bayesian Information Criterion (BIC). This
process includes differencing the series to achieve stationarity, exploring a
range of p and q values, and evaluating each model to ind the one with the
lowest criterion value.

For Rose wine sales analysis, the parameter d represents the differencing
required to render the series stationary. The for loop iterates over p and q
values ranging from 0 to 3, while a ixed value of 1 is assigned to d. This
choice is made because we had previously determined through the
Augmented Dickey-Fuller (ADF) test that a differencing order of 1 was
necessary to achieve stationarity.

TSF-Coded-Rose 30
f
f
f
f
Some parameter combinations for the Model:

Parameters for Auto ARIMA

To obtain the parameters corresponding to the minimum AIC value, we need


to sort the AIC values in ascending order and then select the parameters
associated with the lowest AIC value.

Arranged the AIC values in


ascending order to identify the
set of parameters that yield the
minimum AIC value.
TSF-Coded-Rose 31
p=2, d=1 and q=3 has the minimum AIC value of 1274.695.

We will generate the summary report for this.

Auto ARIMA Summary Report for p ,d and q

The summary report for the Auto ARIMA model offers a detailed overview of
the model's performance and diagnostics. It begins by identifying the
dependent variable, labeled as "Rose," and speci ies that 132 observations
were utilized in the analysis. The chosen ARIMA model is denoted as
ARIMA(2, 1, 3), indicating auto-regressive and moving average orders of 2 and
3, respectively, with a differencing order of 1. The log likelihood, AIC (Akaike
Information Criterion) - 1274.695 , and BIC (Bayesian Information Criterion) -
1291.946 values provide measures of model it, with lower AIC and BIC values
indicating better it. Additionally, the report includes parameter estimates for
the model coef icients, standard errors, and statistical signi icance.
Diagnostic tests such as the Ljung-Box (Q) - 0.02 and Jarque-Bera (JB) - 24.44

TSF-Coded-Rose 32
f
f
=
2
=
1
=
3
f
f
f
tests assess the goodness of it, while the Heteroskedasticity (H) - 0.4 test
evaluates the constancy of residual variance. Skewness and kurtosis
measures provide insights into the distributional properties of the residuals.
Overall, this comprehensive summary aids in the interpretation and
evaluation of the Auto ARIMA model, helping to understand its effectiveness
in capturing the underlying patterns in the time series data.

Forecast for Auto ARIMA model before we evaluate the RMSE

RMSE value for Auto ARIMA = 35.96

TSF-Coded-Rose 33
f
Diagnostic Plot for auto ARIMA

Plot 23 - Diagnostic plot for auto ARIMA for the best auto ARIMA model

(2) Auto SARIMA (Seasonal Auto-Regressive Integrated Moving


Average)

SARIMA, which stands for Seasonal Auto-Regressive Integrated Moving


Average, extends the ARIMA model to account for seasonal patterns in the
data.

Components of SARIMA
1. Auto-Regressive (AR) part: Represents the correlation between the
current observation and a lagged (past) observation within the same
series.
2. Integrated (I) part: Involves differencing the raw observations to make
the time series stationary. This accounts for trends present in the data.

TSF-Coded-Rose 34
3. Moving Average (MA) part: Represents the correlation between the
current observation and a residual error from a moving average model
applied to lagged observations.
Additionally, SARIMA includes seasonal components:
1. Seasonal Auto-Regressive (SAR) part: Represents the correlation
between the current observation and a lagged observation within the
same series, but over seasonal intervals.
2. Seasonal Integrated (SI) part: Involves seasonal differencing to remove
seasonal trends from the data.
3. Seasonal Moving Average (SMA) part: Represents the correlation
between the current observation and a residual error from a moving
average model applied to lagged observations over seasonal intervals.

Overall, Auto SARIMA helps you forecast future values in your data easily and
accurately by automatically inding the best way to do it.

For Rose wine sales analysis, the parameter d represents the differencing
required to render the series stationary. The for loop iterates over p, d and q
values ranging from 0 to 2. The parameter m represent number of seasonals
months. We are keeping seasonal month as 12. This choice is made because
we had previously determined through the Augmented Dickey-Fuller (ADF)
test that a differencing order of 1 was necessary to achieve stationarity.

To obtain the parameters corresponding to the minimum AIC value, we need


to sort the AIC values in ascending order and then select the parameters
associated with the lowest AIC value.

p=0, d=1 and q=2 has the minimum AIC value of 716.793
Seasonal p=2, d=2 ,q=2 and m=12.

TSF-Coded-Rose 35
f
Top rows for Auto SARIMA based on the minimum AIC
value

The summary report for Auto SARIMA

Summary Report for Auto SARIMA


p=0, d=1 and q=2 has the minimum AIC value of 716.793

Seasonal p=2, d=2 ,q=2 and m=12

The summary report provides a detailed overview of observations derived


from a SARIMAX model. The analyzed data comprises 132 observations of the
dependent variable labeled as "Rose." The SARIMAX model, characterized as
SARIMAX(0, 1, 2)x(2, 2, 2, 12), encompasses seasonal and exogenous factors
in its formulation. Evaluating the model's it, the log likelihood is reported as

TSF-Coded-Rose 36
5
f
-351.396, indicating how well the model aligns with the data, while the AIC
and BIC stand at 716.793 and 733.467, respectively, serving as measures of
model it and complexity. The temporal span of the dataset extends from
January 31, 1980, to December 31, 1990. Covariance estimation of the model
is identi ied as "opg." Parameter estimates offer insights into the coef icients
of model terms, alongside their associated standard errors and statistical
signi icance. The variance of residuals, denoted as Sigma2, is recorded at
274.1713. Diagnostic tests encompass the Ljung-Box (Q) test, Jarque-Bera (JB)
test, and a test for heteroskedasticity (H), assessing various assumptions
underlying the model. Additionally, skewness and kurtosis measures provide
further characterization of the distributional properties of residuals. Overall,
the summary furnishes a comprehensive assessment of the SARIMAX model's
performance, encompassing its alignment with data, parameter signi icance,
and adherence to underlying assumptions.

Auto SARIMA forecast values

RMSE

Plot 24 - Diagnostic plot for auto SARIMA for the best auto SARIMA model

TSF-Coded-Rose 37
f
f
f
f
f
Diagnostic Plot for Auto SARIMA

(3)Build ARIMA/SARIMA models based on the cut-off points of ACF


and PACF on the training data and evaluate this model on the
test data using RMSE

• Manual ARIMA
In manual ARIMA, the user manually selects appropriate values for

, and based on prior knowledge, domain expertise, or iterative testing.

This approach requires a deep understanding of the data and the underlying
patterns to choose the most suitable parameters. Manual ARIMA is often
used when automated methods like Auto ARIMA or Auto SARIMA are not
available or when users prefer a more hands-on approach to model selection.
However, it can be time-consuming and may not always yield the best results
compared to automated approaches.

Plot 25 - ACF Plot on train data

TSF-Coded-Rose 38
𝑝
𝑑
𝑞
ACF Plot on train data ACF Plot on Partial Auto-correlation

Plot 26 - PACF Plot on train data

PACF Plot on train data

TSF-Coded-Rose 39
Value selected for manual ARIMA: p=1, q=1 and d=1

Manual ARIMA p , q and d

The manual ARIMA model, denoted as ARIMA(1, 1, 1), was applied to analyze
the "Rose" dataset comprising 132 observations. The model suggests a irst-
order auto-regressive component p=1 and a irst-order moving average
component q=1, along with irst-order differencing d=1. The log likelihood of
the model is reported as -637.287, with corresponding AIC and BIC values of
1280.574 and 1289.200, respectively. The HQIC value stands at 1284.079.
Parameter estimates indicate a coef icient of 0.1814 for the auto-regressive
term and -0.9192 for the moving average term. The variance of residuals
(sigma2) is calculated as 972.5964. Diagnostic tests include the Ljung-Box
test for autocorrelation, with a p-value of 0.98, indicating no signi icant
autocorrelation, and the Jarque-Bera test for normality, yielding a p-value of
0.00. Additionally, the model's heteroskedasticity test returns a p-value of
0.00, suggesting heteroskedasticity is present. Overall, the manual ARIMA
model provides insights into the relationships between the variables and
their predictive capabilities within the dataset.

TSF-Coded-Rose 40
=
1
=
1
=
1
f
f
f
f
f
RSME value for Manual ARIMA

Plot 27 - Manual ARIMA diagnostic

Manual ARIMA - p- , q- and q-

TSF-Coded-Rose 41
1
1
1
• Manual SARIMA
The manual SARIMA (Seasonal Auto-Regressive Integrated Moving Average)
model is a technique for time series forecasting where the user manually
selects the values of the SARIMA parameters to capture both the seasonal
and non-seasonal patterns present in the data.
Once the parameters are selected, the manual SARIMA model is itted to the
data, and forecasts can be generated for future time points. Diagnostic tests
and evaluation metrics are then used to assess the model's performance and
determine if adjustments to the parameter values are necessary.
Manual SARIMA offers lexibility and control over the modeling process, but it
requires expertise in time series analysis and a deep understanding of the
data to select appropriate parameter values that result in accurate forecasts.

Value selected for manual SARIMA: p=1, q=1 and d=1


Seasonal p=1, q=1 m= 12 and d=1

The summary report for manual SARIMA

Manual SARIMA p , q and d


TSF-Coded-Rose Seasonal p , q , d and m 42
=
1
=
1
=
1
=
1
=
1
=
1
2
=
1
f
f
The manual SARIMA model, speci ied as SARIMAX(1, 1, 1)x(1, 1, 1, 12), was
applied to the dataset "y," consisting of 132 observations. This model
con iguration includes an auto-regressive order of 1 (p=1), a differencing
order of 1 (d=1), and a moving average order of 1 (q=1) for the non-seasonal
components. For the seasonal components, there is an auto-regressive order
of 1 (P=1), a differencing order of 1 (D=1), a moving average order of 1 (Q=1),
and a seasonal period of 12 months.

The log likelihood of the model is reported as -458.646, with corresponding


AIC and BIC values of 927.292 and 940.562, respectively. The HQIC value
stands at 932.669. Parameter estimates indicate a coef icient of 0.2139 for
the auto-regressive term and -0.9289 for the moving average term. The
seasonal auto-regressive term has a coef icient of -0.4113, and the seasonal
moving average term has a coef icient of 0.0061.

The variance of the residuals (sigma2) is calculated as 361.3677. Diagnostic


tests include the Ljung-Box test for autocorrelation, with a p-value of 0.96
indicating no signi icant autocorrelation, and the Jarque-Bera test for
normality, yielding a p-value of 0.88. Additionally, the model's
heteroskedasticity test returns a p-value of 0.12, suggesting the presence of
heteroskedasticity. Overall, the manual SARIMA model provides insights into
the relationships between the variables and their predictive capabilities
within the dataset.

RSME for Manual SARIMA

TSF-Coded-Rose 43
f
f
f
f
f
f
6. Compare the performance of the models

• Compare the performance of the models


We can see that Alpha = 0.1, Beta = 0.2 and Gamma = 0.1 Triple Exponential
Smoothing has the lowest RSME value. So this will be considered as the best
mode

Sorted by RMSE values on the Test Data

TSF-Coded-Rose 44
• Rebuild the best model using the entire data - Make a forecast for
the next 12 months
After comparing all the models we constructed, it's evident that the triple
exponential smoothing or Holt-Winters model yields the lowest RMSE.
Therefore, it emerges as the most optimal choice. We will rebuild the best
model using triple exponential smoothing for the next 12 months prediction.
Forecasts and con idence intervals into a DataFrame.

Next months dates that we will be using for prediction

Forecasted value for next months

Rose Wine Sales predicted values made with


Triple Exponential Smoothing

TSF-Coded-Rose 45
1
2
1
2
f
Future predicted plot

Plot 29 - Prediction for future 12 months

7. Actionable Insights & Recommendations


Actionable Insights
1. Trend and Seasonality Observations:

• Long-term Trend: There is a strong negative correlation between Rose


Wine sales and Year, indicating a consistent decrease in sales over time.
This suggests a long-term downward trend in Rose wine sales.

• Seasonal Pattern: A moderate positive correlation between Rose Wine


sales and Month suggests some seasonality, with sales tending to increase
as the month progresses. However, this seasonal pattern is less
pronounced than the long-term downward trend.
2. Monthly Sales Insights:

• December: Sales are consistently highest across all years.


• January: Sales tend to be the lowest.
• January to June: Sales remain relatively stable.

TSF-Coded-Rose 46
• July Onwards: Sales begin to increase.
• The highest wine sale was recorded in the year 1981.

3. Seasonal In luence:
Wine sales are signi icantly in luenced by seasonal changes, with an increase
during the festival season and a drop during peak winter (January).

Recommendations

• Focus on marketing campaigns from April to June, when sales are low, to
boost overall annual performance.

• Consider running campaigns throughout the year to encourage wine


consumption during typically low-sales periods.

• Running campaigns during peak periods might not signi icantly impact
sales, as they are already high.

• Due to low purchase tendencies during peak winter (January), campaigns


may not be effective.

• Explore reasons behind the decline in Rose wine popularity and adjust
production and marketing strategies as needed to regain market share.

• If necessary, modify the production and marketing strategies to address


the long-term downward trend in sales and improve market performance.

TSF-Coded-Rose 47
f
f
f
f

You might also like