0% found this document useful (0 votes)

11 views31 pages

TSF - Rose Data

Uploaded by

Ganesh Manoharan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views31 pages

TSF - Rose Data

Uploaded by

Ganesh Manoharan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

TSF – ROSE

REPORT
DSBA

`1 | P a g e
Contents
Problem :
 Read the data as an appropriate Time Series data and plot the data………3

 Perform appropriate Exploratory Data Analysis to understand the data and also
perform decomposition.

 Split the data into training and test. The test data should start in 1991.

 Build all the exponential smoothing models on the training data and evaluate the
model using RMSE on the test data. Other models such as regression, naïve forecast
models and simple average models. should also be built on the training data and check
the performance on the test data using RMSE.

 Check for the stationarity of the data on which the model is being built on using
appropriate statistical tests and also mention the hypothesis for the statistical test. If
the data is found to be non-stationary, take appropriate steps to make it stationary.
Check the new data for stationarity and comment. Note: Stationarity should be
checked at alpha = 0.05.

 Build an automated version of the ARIMA/SARIMA model in which the parameters are
selected using the lowest Akaike Information Criteria (AIC) on the training data and
evaluate this model on the test data using RMSE.

 Build ARIMA/SARIMA models based on the cut-off points of ACF and PACF on the
training data and evaluate this model on the test data using RMSE.

 Build a table (create a data frame) with all the models built along with their
corresponding parameters and the respective RMSE values on the test data.

 Based on the model-building exercise, build the most optimum model(s) on the
complete data and predict 12 months into the future with appropriate confidence
intervals/bands.

 Comment on the model thus built and report your findings and suggest the measures
that the company should be taking for future sales.

`2 | P a g e
Problem
You as an analyst have been tasked with performing a thorough analysis of the data
and coming up with insights to improve the marketing campaign. For this particular
assignment, the data of different types of wine sales in the 20th century is to be
analysed. Both of these data are from the same company but of different wines. As
an analyst in the ABC Estate Wines, you are tasked to analyse and forecast Wine
Sales in the 20th century.

Data set for the Problem: [Link] and [Link]

1. Read the data as an appropriate Time Series data and plot the data.

Normally an analyst would be interested in knowing the formal properties of a

Dataset, so we need to know what kind of data we are going to use and need to
plot the data through Time Series Forecasting methods.

Solution:

a) Dimensions of the Dataset = 187 Rows x 2 Columns.

b) Nature of Variables (Datum) present in Dataset:

Nature of Datum - Table
S/NO Column _Datum Datatype
1 YearMonth Integer
2 Rose Integer

c) Displaying First and Last 5 Rows including last date of each month:

Top Five Rows Last Five Rows

Functions & Methods Used:

a) shape() is used to get the size of the Dataset which gives us the
dimensions of the Dataset.
b) info() is used to get the Nature (Datatype) of all variables (Datum) present
in the Dataset.
c) head() is used to display the first 10 rows data by default.
d) tail() is used to display the last 10 rows data by default.

d) Plotting of Data-Line Plot:

`3 | P a g e
Here plotting of data is done to separate the ‘YearMonth’ Column into
separate columns like ‘Year’ & ‘Month’ and the ‘Rose Column is renamed
as ‘Rose Sales’ for better clarity and understanding.
So, the new dataset consists of 187 Rows and 3 Columns.

Top Five Rows Last Five Rows

2. Perform appropriate Exploratory Data Analysis to understand the data and

also perform decomposition.

On performing the preliminary analysis of the variables present in the Dataset,

common answers like Standard Deviation, mean, median, mode etc…are obtained
and especially null values check is also involved.

Solution:

a) Description for newly plotted Dataset:

b) Null Value Check:

`4 | P a g e
Based on the results we can conclude that there are ‘2’ Null Values present
in the given dataset. So ‘Mean’ method is followed to fill up the null values.

c) Plotting of Complete Dataset Variables-Box Plot:

From the box plots we are able to identify the outliers present in and among
the variables. Here the presence of outliers doesn’t make much impact on
our required data, so we have decided not to treat them.

 Box Plot – Weekday Wise :

Here it shows ‘Tuesday’ has more sales when compared to the others and
‘Wednesday’ stands at the lowest & Outliers are absent in ‘Friday’ &
‘Thursday’.
d) Plot for Monthly Sparkling Sales over the Years:

`5 | P a g e
Month vs Year Graph
This plot shows that the Sparkling Sales is high in the year ’1981’ when
compared to the others and in month wise it is highest in ‘December’.

e) Correlation Map:

Analysis can be done by co-relating the data with one another through
different methods so that we can gain deeper insights in finding
relationships among the variables in the dataset.
Heat Map

From the above heatmap we can conclude that there is a high Correlation
among the variables ‘Month & Rose Sales’ and low Correlation between
‘Year & Rose Sales’ in the dataset.
f) Empirical Cumulative Distribution Function:

`6 | P a g e
In statistics, an empirical distribution function (commonly also called an
empirical Cumulative Distribution Function, eCDF) is the distribution
function associated with the empirical measure of a sample. This cumulative
distribution function is a step function that jumps up by 1/n at each of
the n data points. Its value at any specified value of the measured variable
is the fraction of observations of the measured variable that are less than or
equal to the specified value.
The empirical distribution function is an estimate of the cumulative
distribution function that generated the points in the sample.

 Observations:

This graph shows:

 50% of sales have been less than 100;
 the highest value is 250;
 and almost 90% of sales have been less than 150.

g) Decomposition Plot :

 Additive Model:

Additive state decomposition occurs when a system is decomposed into two

or more subsystems with the same dimension as that of the original
system. A commonly used decomposition in the control field is to
decompose a system into two or more lower-order subsystems, called
lower-order subsystem decomposition here. In contrast, additive state
decomposition is to decompose a system into two or more subsystems with
the same dimension as that of the original system.

 Observations:

a) It shows the Rose Sales & Trend peaks at 1981.

b) After 1981, Trend & Rose Sales are at normal rate.
c) Here Seasonal & Residue are present to the satisfactory level.

`7 | P a g e
 Multiplicative Model:

Multiplicative Model are the one where as the data increases, so does
the seasonal pattern or the variance increases. Here the trend and
seasonal components are multiplied and then added to the error
component. Multiplicative model is non-linear, such as quadratic or
exponential and the trend is a curved line and seasonality has an

𝒴(𝑡)=𝑆t x 𝑇t x Rt
increasing or decreasing frequency and amplitude over time.

 Observations:

a) It shows the Rose Sales & Trend peaks at 1981.

b) After 1981, Trend & Rose Sales are at normal rate.
c) Here Seasonal & Residue are present to the satisfactory level,
Since residue is in lower range this model is selected for further
purposes

`8 | P a g e
Functions & Methods & Plots Used:

a) isnull().sum() helps to count missing values in each column by

default, and in each row with axis=1.
b) describe().T is used to view some basic statistical details like
percentile, mean, std, etc. of a data frame or a series of numeric
values.
c) Box Plot - A Box Plot is also known as Whisker plot is created to
display the summary of the set of data values having properties like
minimum, first quartile, median, third quartile and maximum. In the
box plot, a box is created from the first quartile to the third quartile, a
vertical line is also there which goes through the box at the median.
Here x-axis denotes the data to be plotted while the y-axis shows
the frequency distribution (Reference:
[Link]
d) Heatmap - Heatmap is defined as a graphical representation of
data using colours to visualize the value of the matrix. In this, to
represent more common values or higher activities brighter colours
basically reddish colours are used and to represent less common or
activity values, darker colours are preferred. Heatmap is also
defined by the name of the shading matrix.(Reference:
[Link]

3. Split the data into training and test. The test data should start in 1991.

Solution:

a) Plot for the Test and Training data:

`9 | P a g e
Based on the plot we can see that the segregated test data starts from
January of 1991 and continues till the end.

b) Description for the Test and Training data:

`10 | P a g e
4. Build all the exponential smoothing models on the training data and evaluate
the model using RMSE on the test data. Other models such as regression,
naïve forecast models and simple average models. should also be built on the
training data and check the performance on the test data using RMSE.

• Model 1:Linear Regression

• Model 2: Naive Approach

• Model 3: Simple Average

• Model 4: Moving Average(MA)

• Model 5: Simple Exponential Smoothing

• Model 6: Double Exponential Smoothing (Holt's Model)

• Model 7: Triple Exponential Smoothing (Holt - Winter's Model)

Solution:

`11 | P a g e
a) Model 1:Linear Regression:

 Observations:

The model's predictions are shown by the green line, while the test results are shown by the
orange values. It is evident that the real values are substantially different from the expected
ones.
The RMSE measure was used to assess the model. The RMSE for this model is shown
below.
b) Model 2:Naïve Approach:

 Observations:

c) The model's predictions are shown by the green line, while the test results
are shown by the orange values. It is evident that the real values are
substantially different from the expected ones.

`12 | P a g e
The RMSE measure was used to assess the model. The RMSE for this
model is shown below.

Method 3:Simple Average:

 Observations:

a) While the orange numbers represent the actual test results, the
green line represents the model's predictions. It is obvious that the
anticipated numbers are wildly different from the actual values.
The RMSE metric was employed to evaluate the model. The
model's RMSE is displayed below.

d) Method 4:Moving Average(MA):

`13 | P a g e
 Observations:

The RMSE measure was used to evaluate the model. The RMSE for this
model is shown below.

We developed a number of moving average models with rolling windows

ranging from 2 to 9. Rolling average is a superior method than simple
average since it predicts using only the previous n values, where n is the
defined rolling window. This takes recent developments into consideration
and is generally more accurate. The higher the rolling window, the smoother
the curve, because more values are taken into account.

`14 | P a g e
e) Method 5:Simple Exponential Smoothing:

 Observations:

The RMSE measure was used to evaluate the model. The RMSE for this model is shown
below.

`15 | P a g e
f) Method 6: Double Exponential Smoothing (Holt's Model):

 Observations:

RMSE was used to evaluate the model. This model's RMSE is shown below.

`16 | P a g e
g) Method 7: Triple Exponential Smoothing (Holt - Winter's Model):

 Observations:

The green colour line in the above plot represents the output for the best alpha, beta, and
gamma values. The best model had both a multiplicative trend and seasonality.
So far, this is the most effective model.

The RMSE measure was used to evaluate the model. The RMSE for this model is shown
above.

`17 | P a g e
5. Check for the stationarity of the data on which the model is being built on
using appropriate statistical tests and also mention the hypothesis for the
statistical test. If the data is found to be non-stationary, take appropriate steps
to make it stationary. Check the new data for stationarity and comment. Note:
Stationarity should be checked at alpha = 0.05.

Solution:

Check for stationarity of the whole Time Series data.

The Augmented Dickey-Fuller test is an unit root test which determines whether
there is a unit root and subsequently whether the series is non-stationary.

The hypothesis in a simple form for the ADF test is:

H0 : The Time Series has a unit root and is thus non-stationary.

H1 : The Time Series does not have a unit root and is thus stationary.

We would want the series to be stationary for building ARIMA models and thus we
would want the p-value of this test to be less than the α value.

We see that at 5% significant level the Time Series is non-stationary.

`18 | P a g e
6. Build an automated version of the ARIMA/SARIMA model in which the
parameters are selected using the lowest Akaike Information Criteria (AIC) on
the training data and evaluate this model on the test data using RMSE.

Solution:

a) Auto ARIMA Model:

We used a for loop to find the best values of p,d,q, where p represents the
order of the AR (Auto-Regressive) part of the model and q represents the
order of the MA (Moving Average) part of the model. d is the amount of
differencing required to make the series stationary. The for loop was given
p,q values in the range (0,4), but d was given a fixed value of 1 because we
had already found d to be 1 while checking for stationarity using the ADF
test.
Some Model parameter combinations...

For each of these models, the Akaike information criterion (AIC) value was
calculated, and the model with the lowest AIC value was [Link]
Model parameter combinations....

`19 | P a g e
b) Auto SARIMA Model:

`20 | P a g e
A for loop identical to AUTO_ARIMA was used with the values provided
below, resulting in the models shown below.
range(0, 4) p = q range(0,2) d D = 0-2 range list([Link](p, d, q))
pdq
model_pdq = [(x[0], x[1], x[2], 12) in list([Link](p, D, q)) for x]

For each of these models, the Akaike information criterion (AIC) value was
calculated, and the model with the lowest AIC value was chosen. Only the
top 5 models are shown here.
The summary report for the best SARIMA model with values (3,1,1)
(3,0,2,12)

We also displayed the leftover graphs to see if any additional information

could be recovered or if all relevant information had already been collected. The
graphs for the best auto SARIMA model are shown below.

`21 | P a g e
7. Build ARIMA/SARIMA models based on the cut-off points of ACF and PACF
on the training data and evaluate this model on the test data using RMSE.

`22 | P a g e
Solution:

a) Manual ARIMA Model:

PACF the ACF plot on data : Plot 23: PACF and ACF plots

Following is plotting the PACF and ACFgraph for the training data.

Hence the values selected for manual ARIMA:- p=2, d=1, q=2 summary from this manual ARIMA
model.

`23 | P a g e
b) Manual SARIMA Model:

`24 | P a g e
Looking at the ACF and PACF plots for training data, we can clearly see significant spikes at lags
12,24,36,48 etc, indicating a seasonality of 12. The parameters used for manual SARIMA model are
as below. SARIMAX(2, 1, 2)x(2, 1, 2, 12)
Below is the summary of the manual SARIMA model

The triple exponential smoothing model with alpha 0.1, beta 0.7, and
gamma 0.2 is clearly the best because it has the lowest RSME score.

`25 | P a g e
8. Build a table (create a data frame) with all the models built along with their
corresponding parameters and the respective RMSE values on the test data.

Solution:

`26 | P a g e
We can clearly see that triple exponential smoothing model with alpha 0.1, beta 0.7 and
gamma 0.2 is the best as it he the lowest RSME score.

`27 | P a g e
9. Based on the model-building exercise, build the most optimum model(s) on
the complete data and predict 12 months into the future with appropriate
confidence intervals/bands.

Solution:

Based on the above comparison of all the various models that we had built,we can
conclude that the triple exponential smoothing or the Holts-Winter model is
giving us the lowest RMSE, hence it would be the most optimum model sales
predictions made by this best optimum model.

the sales prediction on the graph along with the confidence intervals. PFB the
graph

`28 | P a g e
Predictions, 1 year into the future are shown in orange color, while the confidence
interval has been shown in grey color.

10. Comment on the model thus built and report your findings and suggest the
measures that the company should be taking for future sales.

Solution:

 The review of wine sales data shows a definite negative trend for the company's
Rose wine variety, which has been dropping in popularity for more than a decade.

 Seasonal fluctuations have a significant impact on wine sales, with sales increasing
during festival season and decreasing during peak winter months, such as January.

 Because sales are low during this time of year, the company should consider
advertising advertisements to increase wine consumption for the rest of the year.

 Campaigns during the lean season (April to June) may offer the best benefits for the
firm because sales are low at this time, and increasing them would improve the
wine's overall success in the market throughout the year.

 Running advertisements during peak seasons (such as during festivals) may have
little impact on sales because they are already high at this time of year.

 Advertising during peak winter months (January) are not advised because people are
less inclined to purchase wine owing to climatic factors, and running advertising
during this time period may not affect people's opinions.

`29 | P a g e
 The corporation should also investigate the reasons for the drop in popularity of the
Rose wine varietal, and if necessary, overhaul its production and marketing strategy
to reclaim market share.

`30 | P a g e
THANK YOU

`31 | P a g e

PriyankaSharma TSF Rose
No ratings yet
PriyankaSharma TSF Rose
41 pages
TSF-Rose Wines
No ratings yet
TSF-Rose Wines
37 pages
Subhadeep Seal TSF-Coded Project Rose Wine Business Report
No ratings yet
Subhadeep Seal TSF-Coded Project Rose Wine Business Report
38 pages
TSF Rose
No ratings yet
TSF Rose
35 pages
PriyankaSharma TSF Sparkling
No ratings yet
PriyankaSharma TSF Sparkling
36 pages
Time Series Forecasting Business Report
No ratings yet
Time Series Forecasting Business Report
42 pages
1902TaniyaDubey TSF Sparkling 2
No ratings yet
1902TaniyaDubey TSF Sparkling 2
36 pages
Wine Sales Time Series Forecasting
No ratings yet
Wine Sales Time Series Forecasting
18 pages
Time Series Forecasting Project Report
100% (3)
Time Series Forecasting Project Report
62 pages
Sparkling Wine Sales Forecasting
No ratings yet
Sparkling Wine Sales Forecasting
35 pages
1902T TSF Sparkling
No ratings yet
1902T TSF Sparkling
35 pages
TSF Shoe Sales & Softdrink by Shubradip Ghosh Pgpdsba 2022 Mar
No ratings yet
TSF Shoe Sales & Softdrink by Shubradip Ghosh Pgpdsba 2022 Mar
61 pages
TSF Project Faizanalisayyed 26012024
No ratings yet
TSF Project Faizanalisayyed 26012024
72 pages
Rose Wine Sales Forecasting Guide
No ratings yet
Rose Wine Sales Forecasting Guide
52 pages
Project Report TSF Extendec
No ratings yet
Project Report TSF Extendec
52 pages
Business Report TSF - Rose DataSet
100% (4)
Business Report TSF - Rose DataSet
52 pages
Time Series Rose Shehroz Arfeen
100% (1)
Time Series Rose Shehroz Arfeen
42 pages
Time Series Forecasting Business Report: Name: S.Krishna Veni Date: 20/02/2022
100% (1)
Time Series Forecasting Business Report: Name: S.Krishna Veni Date: 20/02/2022
31 pages
Time Series EDA for Data Analysts
No ratings yet
Time Series EDA for Data Analysts
20 pages
Time Series Forecasting for Wine Sales
No ratings yet
Time Series Forecasting for Wine Sales
75 pages
E Monika Sree 10-10-2024
No ratings yet
E Monika Sree 10-10-2024
60 pages
Wine Sales Forecasting Report
No ratings yet
Wine Sales Forecasting Report
26 pages
Semi-20-21 Dseclzg555 Ec3r QP
No ratings yet
Semi-20-21 Dseclzg555 Ec3r QP
5 pages
Rio 094551
No ratings yet
Rio 094551
8 pages
Shoe Sales Time Series Forecasting
No ratings yet
Shoe Sales Time Series Forecasting
26 pages
Data Science for Sri Lanka's Growth
No ratings yet
Data Science for Sri Lanka's Growth
15 pages
SANDYA VB-Business Report TSF
100% (6)
SANDYA VB-Business Report TSF
24 pages
Time Series Forecasting - M Brahma Chari
No ratings yet
Time Series Forecasting - M Brahma Chari
46 pages
TSF Extended
No ratings yet
TSF Extended
52 pages
Hanke9 Odd-Num Sol 03
100% (1)
Hanke9 Odd-Num Sol 03
10 pages
DAUP Exam Notes - 2in1
No ratings yet
DAUP Exam Notes - 2in1
35 pages
Ad3301 Set4
No ratings yet
Ad3301 Set4
4 pages
Dev Record Aids
No ratings yet
Dev Record Aids
24 pages
Ad3301 Apr May 2024 Answer Key
No ratings yet
Ad3301 Apr May 2024 Answer Key
31 pages
Exploratory Data Analysis-1
No ratings yet
Exploratory Data Analysis-1
10 pages
Da Laqs Saqs
No ratings yet
Da Laqs Saqs
23 pages
Project Time Series Analysis
100% (2)
Project Time Series Analysis
26 pages
Anushi Sparkling
100% (4)
Anushi Sparkling
70 pages
TS Gas Report
No ratings yet
TS Gas Report
40 pages
Business Report Time Series
No ratings yet
Business Report Time Series
54 pages
EDA
100% (1)
EDA
9 pages
Time Series Modeling & Analysis
No ratings yet
Time Series Modeling & Analysis
55 pages
Timeseries Forecasting Assignment - Rose
No ratings yet
Timeseries Forecasting Assignment - Rose
1,329 pages
Statistics Project SEM1 Notes
No ratings yet
Statistics Project SEM1 Notes
5 pages
Answer Book - Rose Wines
100% (1)
Answer Book - Rose Wines
11 pages
Eda Lab Manual
No ratings yet
Eda Lab Manual
34 pages
TS Gas Report
No ratings yet
TS Gas Report
43 pages
2 Mark Dev
No ratings yet
2 Mark Dev
6 pages
Exploratory Data Analysis Course
100% (1)
Exploratory Data Analysis Course
139 pages
Lecture 2 - Statistical Inference - EDA and DS Process - 02032023 111156am 1 - 1 27022024 012412pm
No ratings yet
Lecture 2 - Statistical Inference - EDA and DS Process - 02032023 111156am 1 - 1 27022024 012412pm
44 pages
Unit 5
No ratings yet
Unit 5
16 pages
Data Analytics for IoT Solutions Guide
No ratings yet
Data Analytics for IoT Solutions Guide
79 pages
Linear Regression Datascience Basit PDF
No ratings yet
Linear Regression Datascience Basit PDF
19 pages
TSF Extended Project
No ratings yet
TSF Extended Project
52 pages
Econometrics Project: ARIMA Modeling
100% (2)
Econometrics Project: ARIMA Modeling
20 pages
Assigment # 1 For Economatrics - 102649
No ratings yet
Assigment # 1 For Economatrics - 102649
10 pages
Wine Sales Time Series Forecasting
No ratings yet
Wine Sales Time Series Forecasting
39 pages
LDA Slides n0.1
No ratings yet
LDA Slides n0.1
19 pages
Automobile Parts Manufacturing Company
No ratings yet
Automobile Parts Manufacturing Company
22 pages
Finance & Risk Analytics QSTN 2 - Market Risk
No ratings yet
Finance & Risk Analytics QSTN 2 - Market Risk
10 pages
Incineration Documentation Bengaluru
No ratings yet
Incineration Documentation Bengaluru
8 pages
Data Mining Assignment-Clustering Data-Ads 24x7 Summary
No ratings yet
Data Mining Assignment-Clustering Data-Ads 24x7 Summary
12 pages
Crispy Potato Straws Recipe
No ratings yet
Crispy Potato Straws Recipe
2 pages
Story
No ratings yet
Story
13 pages
Preparation Methods for Salts
No ratings yet
Preparation Methods for Salts
26 pages
Đề reading ngày 09 - 10 - 2021
No ratings yet
Đề reading ngày 09 - 10 - 2021
18 pages
Pindyck 8e IM Ch04App
No ratings yet
Pindyck 8e IM Ch04App
7 pages
Lexmark Supplies Guide 2003
No ratings yet
Lexmark Supplies Guide 2003
22 pages
BIOENERGETICS
No ratings yet
BIOENERGETICS
2 pages
Report Bacteria
No ratings yet
Report Bacteria
11 pages
India’s Hypersonic Missile Strategy
No ratings yet
India’s Hypersonic Missile Strategy
4 pages
Sr Electrical Technician CV Summary
No ratings yet
Sr Electrical Technician CV Summary
2 pages
Basic Specifications: Product Sheet
No ratings yet
Basic Specifications: Product Sheet
2 pages
Ikat Weaving in Cambodian Textiles
No ratings yet
Ikat Weaving in Cambodian Textiles
7 pages
The Domestication of Pigeons
No ratings yet
The Domestication of Pigeons
3 pages
Boiler & Pressure Vessel Guide
No ratings yet
Boiler & Pressure Vessel Guide
10 pages
Complex Geometry & Foliations
No ratings yet
Complex Geometry & Foliations
146 pages
A319/A320/A321 CFM56 Ignition System Schematics
100% (1)
A319/A320/A321 CFM56 Ignition System Schematics
7 pages
Grade X English Last Minute Revision Notes
No ratings yet
Grade X English Last Minute Revision Notes
20 pages
Challenger Disaster Risk Lessons
No ratings yet
Challenger Disaster Risk Lessons
3 pages
Case Study Sustainablde
No ratings yet
Case Study Sustainablde
14 pages
Disfagia: Cauze și Diagnostic
No ratings yet
Disfagia: Cauze și Diagnostic
158 pages
The Transgender Movement: - Antonio Bernard
67% (3)
The Transgender Movement: - Antonio Bernard
78 pages
Religious Moral Education For Basic Schools Teachers Guide Book 3
No ratings yet
Religious Moral Education For Basic Schools Teachers Guide Book 3
72 pages
Interview Questions On ME FM
No ratings yet
Interview Questions On ME FM
7 pages
Zamboanga 2017 Biology Teacher Exam
100% (1)
Zamboanga 2017 Biology Teacher Exam
13 pages
Tobacco Export Data Country-Wise 2024-25 Uptodate
No ratings yet
Tobacco Export Data Country-Wise 2024-25 Uptodate
1 page
Mid Day Meal Scheme Upsc Notes 38
No ratings yet
Mid Day Meal Scheme Upsc Notes 38
4 pages
Glosario Ilustrado de Equipos de Proceso
No ratings yet
Glosario Ilustrado de Equipos de Proceso
350 pages
3rd Month of Israel Sivan Zebulun
No ratings yet
3rd Month of Israel Sivan Zebulun
8 pages
Photosynthesis and Carbon Cycling Lesson
No ratings yet
Photosynthesis and Carbon Cycling Lesson
4 pages
Innovative Potatoes and Milk Tea Concept
No ratings yet
Innovative Potatoes and Milk Tea Concept
4 pages

TSF - Rose Data

Uploaded by

TSF - Rose Data

Uploaded by

TSF – ROSE

Data set for the Problem: [Link] and [Link]

Normally an analyst would be interested in knowing the formal properties of a

a) Dimensions of the Dataset = 187 Rows x 2 Columns.

b) Nature of Variables (Datum) present in Dataset:

Top Five Rows Last Five Rows

Functions & Methods Used:

d) Plotting of Data-Line Plot:

Top Five Rows Last Five Rows

2. Perform appropriate Exploratory Data Analysis to understand the data and

On performing the preliminary analysis of the variables present in the Dataset,

a) Description for newly plotted Dataset:

b) Null Value Check:

c) Plotting of Complete Dataset Variables-Box Plot:

 Box Plot – Weekday Wise :

This graph shows:

Additive state decomposition occurs when a system is decomposed into two

a) It shows the Rose Sales & Trend peaks at 1981.

a) It shows the Rose Sales & Trend peaks at 1981.

a) isnull().sum() helps to count missing values in each column by

a) Plot for the Test and Training data:

b) Description for the Test and Training data:

• Model 1:Linear Regression

• Model 2: Naive Approach

• Model 3: Simple Average

• Model 4: Moving Average(MA)

• Model 5: Simple Exponential Smoothing

• Model 6: Double Exponential Smoothing (Holt's Model)

• Model 7: Triple Exponential Smoothing (Holt - Winter's Model)

Method 3:Simple Average:

d) Method 4:Moving Average(MA):

We developed a number of moving average models with rolling windows

Check for stationarity of the whole Time Series data.

The hypothesis in a simple form for the ADF test is:

H0 : The Time Series has a unit root and is thus non-stationary.

We see that at 5% significant level the Time Series is non-stationary.

a) Auto ARIMA Model:

We also displayed the leftover graphs to see if any additional information

a) Manual ARIMA Model:

You might also like