Vol 11, Issue 2, FEB/ 2020
ISSN NO: 0377-9254
BIGMART SALES USING MACHINE LEARNING WITH DATA
ANALYSIS.
1
AYESHA SYED, 2ASHA JYOTHI KALLURI, 3VENKATESWARA REDDY POCHA, 4VENKATA
ARUN KUMAR DASARI,5 B.RAMASUBBAIAH
1234
B. Tech Student, 5Assistant Professor
DEPARTMENT OF CSE
SVR ENGINEERING COLLEGE, NANDYAL.
Abstract:
The sales forecast is based on BigMart sales for various outlets to adjust the business model to expected outcomes.The resulting
data can then be used to prediction potential sales volumes for retailers such as BigMart through various machine learning methods. The
estimate of the system proposed shouldtake account of price tag, outlet and outlet location. A number of networks use the various
machine- learning algorithms, such as linear regression and decision tree algorithms, and an XGBoost regressor, which offers an efficient
prevision of BigMart sales based on gradient. At last, hyperparameter tuning is used to help you to choose relevant hyperparameters that
make the algorithmshine and produce the highest accuracy.
Index Terms—Machine Learning Algorithms, Prediction, Reliability,Sales forecasting ,Prediction model, Regression.
************************
I. INTRODUCTION and model categories such as linear Regression, Ridge
Every item is tracked for its shopping centers and BigMarts Regression, Random Forest, Decision Tree, XGBoost these
in order to anticipate a future demand of the customer and algorithms are suitable for sales forecast. The technique of
also improve the management of its inventory. BigMart is an regression is used to forecast, model the time series, and find
immense network of shops virtually all over the world. Trends the relationship of cause-effect between variables. A linear
in BigMart are very relevant and data scientists evaluate those regression model assumes that inputs X1, ..., XP is linear
trends per product and store in order to create potential with the regression function E(Y). Because the continuous
centres.Using the machine to forecast the transactions of variables are not normally distributed, the regression model
BigMart helps data scientists to test the various patterns by is constructed with transformed variables. Plotting the
store and product to achieve the correct results. Many residuals against the variables makes it clear. From the model
companies rely heavily on the knowledge base and need description, only the variables Item MRP, Outlet Identifier,
market patterns to be forecasted. Each shopping center or Outlet Establishment Year, Outlet Size, Outlet Location Type,
store endeavors to give the individual and present moment and Outlet Type are relevant at a significance level of 5
proprietor to draw in more clients relying upon the day, with percent[6]. Complex models like neural networks are overkill
the goal that the business volume for everything can be for simple problems like regression. And simpler models
evaluated for organization stock administration, logistics and alongwith proper data cleaning perform well for the
transportation administration, and so forth. To address the regression[2].Linear regression is a very famous method for
issue of deals expectation of things dependent on client’s prediction and analysis but one drawback is it gives less
future requests in various BigMarts across different areas accuracy[5].Using the Random Forest, prediction of the sales
diverse Machine Learning algorithms like Linear Regression, is made easier and care is taken in fixing the optimum number
Random Forest, Decision Tree, Ridge Regression, XGBoost of trees[6]. Random Forest is a tree-based algorithm wherein
are utilized for gauging of deals volume. Deals foresee the a certain number of decision trees are combined to make
outcome as deals rely upon the sort of store, populace around a powerful prediction model. It was found that the general
the store, a city wherein the store is located,i.e. it is possible linear model using the principal component analysis and the
that it is in an urban zone or country. Population statistics random forest techniques produce better results which are
around the store also affect sales, and the capacity of the store been decided by the RMSE values[6]. The Decision Tree
and many more things should be considered. Because every technique comes under the paradigm of artificial intelligence
business has strong demand, sales forecasts play an significant that creates a tree with the most significant function and
part in a retail center. A stronger prediction is always helpful subsequent nodes in the root node in a tree with features of
in developingand enhancing corporate market strategies, lesser ranking[2]. Internally, the XGBoost model implements
which also help to increase awareness of the market. the stepwise, ridge the regression that dynamically selects
II. LITERATURE SURVEY the features, and excludes the features multicollinearity. This
Sales forecasts provide insight into how a firm should implementation yielded the best data set outcomes[2].
manage its workforce, cash flow, and the means. This is an III. EXPLORATORY DATA ANALYSIS
important precondition for the planning and decision-making It is beneficial to add test data to train data to explore data in
of enterprises. It allows businesses to formulate their business every dataset and thus to merge train and test data with a view
plans effectively[1]. Learning algorithms used in classification to data visualization, feature engineering. For the exploratory
www.jespublication.com Page No:926
Vol 11, Issue 2, FEB/ 2020
ISSN NO: 0377-9254
method, univariate analysis and bivariate analysis are to be has 16 categories and is combined under the Food , Drink
conducted to obtain data information. Few observations have and Non-Consumable category. Column Item fat content had
been made during the Univariate Analysis and are as follows: various representations, which were divided into low fat and
The categories ‘LF’, ‘low fat’, and ‘Low Fat’ are the same and regular categories. Outliers present in Item Outlet Sales are
‘reg’ and ‘Regular’ are the same category. As a result, they often excluded for better performance.
can merge into one, and Low fats are almost twice that of VI. EVALUATION METRICS
regular items. The main sales in the Item Type column are Evaluation of the model is the vital part of creating an
Fruit and Snack. The variable goal is skewed to the right. efficient machine learning model. Therefore it is important to
These items are not consumable, but all items are labelled create a model and get suggestions from it in terms of metrics.
either as lowfat or regular items. Through the study of It will take and continue until we achieve good accuracy
Bivariate, a clear relationship between product weight and according to the value obtained from metric improvements.
sales and between item fat content and sales has been found. Evaluation metrics describe one model’s results [3]. The
A significant amount of sales is obtained from products with ability to distinguish between model outcomes is an important
visibility below 0.2. Individuals have selected a low fat feature of the evaluation metrics. Here, we used Root Mean
category over other groups.In the relationship between the Squared Error(RMSE)metric for evaluation process. RMSE is
item identifiers and the outlet size, the items are purchased given by following formula-
more frequently as the outlet size increases. The exposure of RMSE=qPN
the item means that more visible items have less sales. i=1(PredictediActuali)2N.
IV. DATA PREPROCESSING Where, N is the Complete Number of Observations. RMSE
The dataset used is BigMart 2013 sales result and there is the most commonly used evaluation method for regression
are total 12 attributes. Item Outlet Sales is the target variable problems. The power of ’square root’ causes this metric to
and the other remaining attributes are independent display significant variation in percentages. The ’squared’
variable.The pre-processing of data is a method for preparing aspect of this metric tends to deliver more stable outcomes
and adapting raw data to a model of learning. This is the first that avoids the cancelation of positive or negative error values.
and significant step to construct a machine learning model. VII. MODEL BUILDING
Real-world data generally contain noise, missing values and The dataset is now ready to fit a model after performing
may not be used in an unusable format especially for machine Data Preprocessing and Feature Transformation. The training
learning models. set is fed into the algorithm in order to learn how to predict
Data pre-processing needs to be performed in order to purify values[3]. Testing data is given as input after Model Building
data and adapt it to the machine learning model of a system a target variable to predict. The models are build using:
which also makes a machine learning model more accurate _ Linear Regression
and efficient. The first thing for data preprocessing is to _ Ridge Regression
collect the required dataset, and then check the missing values _ Decision Tree
once the dataset is imported. Correcting missed values is _ RandomForest
necessary,or else the data would be difficult to access and _ XGBoost
maintain. Then calculate the mean of the column containing For all models based on the above algorithms, 20 fold cross
missing values to rectify the missed values, and substitute it validation is used. Essentially cross validation provides an
with the measured mean. When the dataset is pre-processed, indication of how well a model is generalizing to the unseen
the dataset is separated into the dataset of train and test. Now, results.Description of different algorithms used as follows:-
this dataset can be used to train a machine learning algorithm A. Linear Regression
to predict Item Outlet Sales against a variety of items that will The most common and simplest statistical approach for
help retailers create personalized offers against specific predictive modeling is linear regression. Below is the linear
products for customers. regression equation:
V. FEATURE ENGINEERING Y = _1X1 + _2X2 + ::::_nXn
Feature Engineering is a method to exploit domain data Where X1, X2,..., Xn are the independent variables, Y is the
understanding to construct functions that work with machine target variable and all the coefficients are the thetas. The
learning algorithms. When feature engineering is done magnitude of a coefficient as compared to the other variables
correctly,the predictive capability of machine learning determines the importance of the corresponding independent
algorithms is enhanced by building raw data features that help variable. This algorithm’s basic principle is to match a straight
facilitate the machine learning process. Feature engineering line between the chosen training dataset features and a
also includes the correction of inappropriate values. In the constant target variable, i.e. sales. The algorithm chooses a
device dataset, the visibility of the item had a minimum value line which fits better with the data. Linear regression performs
of 0 which is not acceptable, because the item should be the task of predicting a dependent variable value ( y) based on
accessible to all. And so it was replaced by the mean of the a given independent variable ( x). This regression technique
column. As Outlet Years , a new column is created so we considers a linear relationship between x (input) and
must consider how long the store runs instead of the year it y(output)[9]. Some requirements for a successful linear
was formed. Item Type is another column in the dataset that regression model must be fulfilled by data. Some of those is
www.jespublication.com Page No:927
Vol 11, Issue 2, FEB/ 2020
ISSN NO: 0377-9254
the lack of multicollinearity, i.e. the independent variables Hyperparameter tuning selects an optimal range of
should correlate with each other. hyperparameters for algorithm learning. A hyperparameter for
RMSE: 1127 is accomplished by this algorithm. this is a parameter the value of which is set before learning
B. Ridge Regression starts. Hyperparameters are not model parameters, and can
Ridge Regression is a method used where multicollinearity not be directly derived from results. By planning,System
(independent variables are highly correlated) affects parameters shall be equipped when using gradient descent
outcomes. While the least square estimates (OLS) are minimize the function to loss. Whilst the model parameters
objective in multicollinearity, their variances are broad and specify how input data can be translated to the desired output,
deviate from the true value. By applying a degree of bias to the hyperparameters explain how the model is actually being
regression calculations, ridge regression eliminates standard structured. The best way to think of hyperparameters is like
errors[2]. The Linear Regression Loss function is increased in an algorithm ’s settings which can be modified to maximize
Ridge Regression so as not only to minimize the number of performance. Models can have multiple hyperparameters and
square residuals but also to penalize the estimates of the can be treated as a test problem in order to find the right
parameters. combination of parameters. While there are now many
This algorithm is achieves RMSE:1129. hyperparameter optimization / tuning algorithms, simple strategies:
C. Decision Tree 1.Grid Search, and 2.Random search. However, computational
Decision tree is a classifier referred to as a tuple recursive in methods for both grid search and random search tuning take a
instant-space. It is a powerful way of multi-variable analysis very long time, from an hour to a day. Because of its quickest
and is a powerful technique for data mining. Applications calculation, thus, the Bayesian Optimization approach is used
can be used in various fields, and this approach represents for hyperparameter tuning.
the variables involved in achieving a given purpose and the Bayesian Optimization:
motives for achieving the target and the methods of execution. Bayesian methods, in contrast to random or matrix search,
Let the objective be denoted as (O) and (Ci) the means of maintain track of previous test outcomes that they use to
action to be followed and let (M ij) the means of action construct a probabilistic model mapping of hyperparameters
corresponding to those means, which can be indicated by qi, to the likelihood of an objective function score:P(score
(i=P1 ... Pn), which corresponds to the relationship.[1] n jhyperparameters):
i=1 qi = 1; cuqi > 0 With this algorithm, RMSE:1058 is The simple theory is to spend a little more time choosing
achieved. the next hyperparameter and allow fewer calls to the objective
D. RandomForest function. The goal of Bayesian reasoning is to become ”less
RandomForest is a tree-based bootstrapping algorithm that accurate” by constantly updating the surrogate probability
combines a certain number of weak learners (decision trees) model after-objective function evaluation with more data than
to construct a powerful model of prediction. For each person these methods do. Bayesian model-based approaches can find
learner, a random set of rows and a few randomly selected better hyperparameters in less time, since they purpose for
variables are used to create a decision tree model. Final determining the right range of hyperparameters based on
prediction may be a function of all the predictions made by previous experiments.
the individual learners. In the event of a regression problem, IX. RESULTS
the final prediction may be the mean for all predictions. With To forecast BigMart’s revenue, simple to advanced machine
this algorithm RMSE:1069 is reached. learning algorithms have been implemented, such as Linear
E. XGBoost Regression, Ridge Regression, Decision Tree, Random Forest,
XGBoost stands for eXtreme Gradient Boosting. The XGBoost. It has been observed that increased efficiency is
implementationof the algorithm was engineered for the observed with XGBoost algorithms with lower RMSE rating.
efficiency of computing time and memory resources [9]. As a result , additional Hyperparameter Tuning was conducted
Boosting is a sequential process based on the principle of the on XGBoost with Bayesian Optimization technique due to its
ensemble.This ncorporates a collection of low learners and quick and fairly simple computation, which culminated in the
improves the accuracy of predictions.Model values are acquisition of the lowest RMSE value and making the model
weighted at any better matched to the underlying results. The submission file
moment t, based on the effects of the preceding instant t- detailing Item Outlet Sales for Item based on the Model is
1. The correctly calculated results are given a lower weight, resulted.
and the wrong ones are weighted higher. With this algorithm, Fig. 1. RMSE Table
the XGBoost model implements the stepwise, ridge regression
internally, which automatically chooses the features and
removes
the multi-colinearity.RMSE:1052 is achieved with this
algorithm.
VIII. HYPERPARAMETER TUNING
www.jespublication.com Page No:928
Vol 11, Issue 2, FEB/ 2020
ISSN NO: 0377-9254
Regression on Black Friday Sales Data,quot; 2018 IEEE 9th
International Conference on Software Engineering and Service
Science (ICSESS), Beijing, China, 2018, pp. 16-
20.
[3] A. Krishna, A. V, A. Aich and C. Hegde,
quot;Salesforecasting of Retail Stores using Machine Learning
Techniques,quot; 2018 3rd International Conference on
Computational Systems and Information Technology for
Sustainable Solutions (CSITSS), Bengaluru, India, 2018, pp.
160-166.
[4] G. Nunnari and V. Nunnari, quot;Forecasting Monthly Sales
Retail Time Series: A Case Study,quot; 2017 IEEE
19th Conference on Business Informatics (CBI), Thessaloniki,
2017, pp. 1-6.
[5] Kadam, H., Shevade, R., Ketkar, P. and Rajguru, S. (2018).
Fig. 2. Hyper-parameter tuning A Forecast for Big Mart Sales Based on Random Forests and
Multiple Linear Regression. International Journal of Engineering
Development and Research, 6(4), pp. 1-2.
[6] T. Alexander and D. Christopher, quot;An Ensemble Based
Predictive Modeling in Forecasting Sales of Big Martquot;,
International Journal of Scientific Research, vol. 5, no. 5, pp. 1-
4, 2016. [Accessed 10 October 2019].
[7] G. Behera and N. Nain, quot;A Comparative Study of Big
Mart Sales Predictionquot;, pp. 1-13, 2019. [Accessed 10
October 2019].
[8] S. Beheshti-Kashi, H. Karimi, K. Thoben and M. L¨utjen,
quot;A survey on retail sales forecasting and prediction
in fashion marketsquot;, Systems Science amp; Control
Engineering, vol. 3, no. 1, pp. 154-161, 2014. Available:
10.1080/21642583.2014.999389 [Accessed 27 January
2020].
[9] A. Chandel, A. Dubey, S. Dhawale and M. Ghuge,
Fig. 3. XGBoost
quot;Sales Prediction System using Machine Learningquot;,
International Journal of Scientific Research and Engineering
Development, vol. 2, no. 2, pp. 1-4, 2019. [Accessed 27 January
2020].
[10] B. Pavlyshenko, quot;Machine-Learning Models for Sales
Time Series Forecastingquot;, Data, vol. 4, no. 1, p. 15, 2019.
Available: 10.3390/data4010015 [Accessed 27 January 2020].
[11] T. T. Joy, S. Rana, S. Gupta and S. Venkatesh,
X. CONCLUSION quot;Hyperparameter tuning for big data using Bayesian
Experts also shown that a smart sales forecasting program optimisation,quot; 2016 23rd International Conference on
is required to manage vast volumes of data for business Pattern Recognition (ICPR), Cancun, 2016, pp. 2574-
organizations. Business assessments are based on the speed 2579.
and precision of the methods used to analyze the results. [12] M. Wistuba, N. Schilling and L. Schmidt-Thieme,
The Machine Learning Methods presented in this research quot;Hyperparameter Optimization Machines,quot; 2016 IEEE
paper should provide an effective method for data shaping International Conference on Data Science and Advanced
and decision-making. New approaches that can better identify Analytics (DSAA), Montreal, QC, 2016, pp. 41-
consumer needs and formulate marketing plans will be 50.
implemented. [13] M. Wistuba, N. Schilling and L. Schmidt-Thieme,
The outcome of machine learning algorithms will quot;Learning hyperparameter optimization initializations, quot;
help to select the most suitable demand prediction algorithm 2015 IEEE International Conference on Data Science and
and with the aid of which BigMart will prepare its marketing Advanced Analytics (DSAA), Paris, 2015,
campaigns. pp. 1-10.
REFERENCES [14] K. Punam, R. Pamula and P. K. Jain, quot;A Two-Level
[1] S. Cheriyan, S. Ibrahim, S. Mohanan and S. Treesa, Statistical Model for Big Mart Sales Prediction,quot;
quot;Intelligent Sales Prediction Using Machine Learning 2018 International Conference on Computing, Power and
Techniques,quot; 2018 International Conference on Communication Technologies (GUCON), Greater Noida, Uttar
Computing, Pradesh, India, 2018, pp. 617-620.
Electronics amp; Communications Engineering (iCCECE), [15] S. Yadav and S. Shukla, quot;Analysis of k-Fold Cross-
Southend, United Kingdom, 2018, pp. 53-58. Validation over Hold-Out Validation on Colossal Datasets for
[2] C. M. Wu, P. Patil and S. Gunaseelan, quot;Comparison of Quality Classification,quot; 2016 IEEE 6th International
Different Machine Learning Algorithms for Multiple Conference on Advanced Computing (IACC),
www.jespublication.com Page No:929
Vol 11, Issue 2, FEB/ 2020
ISSN NO: 0377-9254
Bhimavaram, 2016, pp. 78-83.
[16] S. V. Patel and V. N. Jokhakar, quot;A random forest
based machine learning approach for mild steel defect
diagnosis,quot; 2016 IEEE International Conference on
Computational Intelligence and Computing Research (ICCIC),
Chennai, 2016, pp. 1-8.
[17] V. Shrivastava and P. Arya, quot;A study of various
clustering algorithms on retail sales dataquot;, International
Journal of Computing, Communications and Networking, vol.
1, no. 2, pp. 1-7, 2012. [Accessed 13 October 2019].
www.jespublication.com Page No:930