0% found this document useful (0 votes)
84 views6 pages

Big Mart Sales Forecasting

This document summarizes a conference paper presented at the Fifth International Conference on Intelligent Computing and Control Systems about using machine learning algorithms to do predictive analysis for sales data from Big Mart supermarkets. Specifically, it discusses developing a predictive model using techniques like Xgboost, linear regression, polynomial regression, and ridge regression. The model was found to outperform existing models for forecasting business sales like for Big Mart.

Uploaded by

Sonal Koli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
84 views6 pages

Big Mart Sales Forecasting

This document summarizes a conference paper presented at the Fifth International Conference on Intelligent Computing and Control Systems about using machine learning algorithms to do predictive analysis for sales data from Big Mart supermarkets. Specifically, it discusses developing a predictive model using techniques like Xgboost, linear regression, polynomial regression, and ridge regression. The model was found to outperform existing models for forecasting business sales like for Big Mart.

Uploaded by

Sonal Koli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Proceedings of the Fifth International Conference on Intelligent Computing and Control Systems (ICICCS 2021)

IEEE Xplore Part Number: CFP21K74-ART; ISBN: 978-0-7381-1327-2

Predictive Analysis for Big Mart Sales


Using Machine Learning Algorithms
Ranjitha P1
Department of Computer Science,
Amrita School of Arts and Sciences, Mysuru
Amrita Vishwa Vidyapeetham, India
Email: [email protected]
2021 5th International Conference on Intelligent Computing and Control Systems (ICICCS) | 978-1-6654-1272-8/21/$31.00 ©2021 IEEE | DOI: 10.1109/ICICCS51141.2021.9432109

Spandana M2
Department of Computer Science,
Amrita School of Arts and Sciences, Mysuru
Amrita Vishwa Vidyapeetham, India
Email:Spandanasatishm@gmail.

Abstract— Currently, supermarket run-centres, Big Measurable methodologies, for example, with
Marts keep track of each individual item's sales data in
regression, (ARIMA) Auto-Regressive Integrated
order to anticipate potential consumer demand and
update inventory management. Anomalies and general Moving Average, (ARMA) Auto-Regressive Moving
trends are often discovered by mining the data Average, have been utilized to develop a few deals
warehouse's data store. For retailers like Big Mart, the forecast standards. Be that as it may, deals anticipating
resulting data can be used to forecast future sales volume is a refined issue and is influenced by both outer and
using various machine learning techniques like big mart. inside factors, and there are two significant detriments
A predictive model was developed using Xgboost, Linear to the measurable technique as set out in A. S. Weigend
regression, Polynomial regression, and Ridge regression et A mixture occasional quantum relapse approach and
techniques for forecasting the sales of a business such as (ARIMA) Auto-Regressive Integrated Moving
Big -Mart, and it was discovered that the model
Average way to deal with every day food deals
outperforms existing models.
anticipating were recommend by N. S. Arunraj and
Keywords—Linear Regression, Polynomial Regression, furthermore found that the exhibition of the individual
Ridge Regression, Xgboost Regression model was moderately lower than that of the crossover
model.
E. Hadavandi utilized the incorporation of “Genetic
I. INTRODUCTION Fuzzy Systems (GFS)” and information gathering to
conjecture the deals of the printed circuit board. In their
Everyday competitiveness between various shopping paper, K-means bunching delivered K groups of all
centres as and as huge marts is becoming higher information records. At that point, all bunches were
intense, violent just because of the quick development taken care of into autonomous with a data set tuning
of global malls also online shopping. Each market and rule-based extraction ability. Perceived work in the
seeks to offer personalized and limited-time deals to field of deals gauging was done by P.A. Castillo, Sales
attract many clients relying on period of time, so that estimating of new distributed books was done in a
each item's volume of sales may be estimated for the publication market the executives setting utilizing
organization's stock control, transportation and computational techniques. “Artificial neural
logistical services. The current machine learning organizations” are additionally utilized nearby income
algorithm is very advanced and provides methods for estimating. Fluffy Neural Networks have been created
predicting or forecasting sales any kind of organization, with the objective of improving prescient effectiveness,
extremely beneficial to overcome low – priced used for and the Radial “Base Function Neural Network
prediction. Always better prediction is helpful, both in (RBFN)” is required to have an incredible potential for
developing and improving marketing strategies for the anticipating deals.
marketplace, which is also particularly helpful
Dataset: collected the dataset form the internet for the
website called kaggle.com .In this work all having test
II. RELEATED WORK
dataset and train dataset in the test data set having a
A great deal of work having been gotten really
intended to date the territory of deals foreseeing. A 5000 dataset and in the train data having a 8000 data
concise audit of the important work in the field of
big_mart deals is depicted in this part. Numerous other

978-0-7381-1327-2/21/$31.00 ©2021 IEEE 1416

Authorized licensed use limited to: East Carolina University. Downloaded on June 30,2021 at 07:39:02 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Intelligent Computing and Control Systems (ICICCS 2021)
IEEE Xplore Part Number: CFP21K74-ART; ISBN: 978-0-7381-1327-2

set. Fig1shows the train data and Fig2 shows the sample
of test dataset.
TABLE 1: Attributes Information
Attribute Description Outlet-Identifier a distinct slot number

Item_Identifer It is the unique product Id number. Outlet- The year that the shop first opened its doors.
Establishment
Item Weight It will include the product's weight.
Year
Item_Fat_Content It will mean whether the item is low in fat Outlet-Size The sum of total area occupied by a
or not. supermarket.
Item -Visibility The percentage of the overall viewing area Outlet-Location The kind of town where the store is situated.
assigned to the particular item from all
Outlet-Type The shop is merely a supermarket or a
items in the shop.
grocery store.
Item -Type To which group does the commodity belong
Item-Outlet-Sales The item's sales in the original shop
Item-MRP The product's price list

Train data set

Fig1: Shows the sample of train data

Test dataset

Fig2: Shows the sample of test data

978-0-7381-1327-2/21/$31.00 ©2021 IEEE 1417

Authorized licensed use limited to: East Carolina University. Downloaded on June 30,2021 at 07:39:02 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Intelligent Computing and Control Systems (ICICCS 2021)
IEEE Xplore Part Number: CFP21K74-ART; ISBN: 978-0-7381-1327-2

III. METHODOLOGY
Fig3 shows the architecture Diagram of the proposed
model where they focus on the different algorithm Accuracy, MAE, MSE, RMSE and final concluding the
application to the dataset. Where we are calculating the best yield algorithm. Here are the following Algorithm
are used.

BigMart Dataset Apply


Preprocessing

Regression and Classification


Models
Linear Polynomial
Ridge XgBoost
Regression Regression
Classifier Classifier
Classifier Classifier

Predicted Result

Comparison with
Performance Computation

Accuracy RMSE MAE MSE

• If required, convert the data to the-


Fig3: Shows the proposed Architecture Diagram least square using the transformed

A. Linear Regression data, construct a regression line.


• If a change has been completed,
• Build a fragmented plot.1) a linear or return to the previous process 1. If
non-linear pattern of data and 2) a not, continue to phase 5.
variance (outliers). Consider a • When a "good-fit" classic is defined,
transformation if the marking isn't write the least-square regression line
linear. If this is the case, outsiders, it equation. Consist of normal
can suggest only eliminating them if estimation, estimation, and R-
there is a non-statistical justification. squared errors.
• Link the data to the least squares line
and confirm the model assumptions Linear regression formulas look like this:

using the residual plot (for the Y=o1x1+ o2x2+……… onxn


constant standard deviation
assumption) and the normal
R-Square: Defines the difference in X
probability plot (for the normal
(depending variable) explains the total
probability assumption) A variance in Y (dependent variable)
transformation might be necessary if (independent variable). This can be expressed
the assumptions made do not appear mathematically as

to be met.

978-0-7381-1327-2/21/$31.00 ©2021 IEEE 1418

Authorized licensed use limited to: East Carolina University. Downloaded on June 30,2021 at 07:39:02 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Intelligent Computing and Control Systems (ICICCS 2021)
IEEE Xplore Part Number: CFP21K74-ART; ISBN: 978-0-7381-1327-2

Which permits “xgboost” in any event


multiple times quicker than current slope
boosting executions. It underpins various
B. Polynomial Regression Algorithm target capacities, including relapse, order and
rating. As "xgboost" is extremely high in

Polynomial Regression is a relapse
prescient force however generally delayed
calculation that modules the
relationship here among with organization, it is appropriate for some
dependent(y) and the autonomous rivalries. It likewise has extra usefulness for
variable(x) in light of the fact that as
cross-approval and finding significant factors.
most extreme limit polynomial. The
condition for polynomial relapse is
given beneath: y= b0+b1x1+ b2x12+ IV. RESULT AND DISCUSSION
b2x13+...... bnx1n Liner Regression
• It is regularly alluded to as the TABLE 2: Shows the linear regression result on the
exceptional instance of various various parameter
Parameter value
straight relapse in ML. Since we MSE 7.4631
apply some polynomial terms to the MAE 1.166
numerous straight relapse condition RMSE 2.731

to change it to polynomial relapse


adjustment to improve accuracy. Polynomial regression
• The informational collection utilized TABLE 3: Shows the polynomial regression result on
the various parameter
for preparing in polynomial relapse Parameter value
is of a non-straight nature. MSE 6.120
• It uses a linear regression model to fit MAE 2.968
complex and non-linear functions RMSE 7.823
and datasets.
C. Ridge Regression Ridge regression
TABLE 4: Shows the Ridge regression result on the
Ridge regression is a model tuning tool used
various parameter
to evaluate any data that suffers from Parameter value
MSE 3.671
multicollinearity. This method performs the MAE 8.289
RMSE 1.916
L2 regularization procedure. When
multicollinearity issues arise, the least squares
XgBoost Regression
are unbiased and the variances are high, TABLE 5: Shows the Xgboost regression result on the
resulting in the expected values being far various parameter
Parameter value
removed from the actual values. MSE 0.001
MAE 0.029
The cost function for ridge regression:
RMSE 0.032

Min(||Y – X(theta)||^2 + λ||theta||^2) Frequency of item_fat_content


TABLE 6: Shows the Xgboost regression frequency of
item fat content
D. XGBoost Regression Parameter value
“Extreme Gradient Boosting” is same but Low Fat 5089
Regular 2889
much more effective to the gradient boosting LF 316
reg 117
system. It has both a linear model solver and a
tree algorithm.

978-0-7381-1327-2/21/$31.00 ©2021 IEEE 1419

Authorized licensed use limited to: East Carolina University. Downloaded on June 30,2021 at 07:39:02 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Intelligent Computing and Control Systems (ICICCS 2021)
IEEE Xplore Part Number: CFP21K74-ART; ISBN: 978-0-7381-1327-2

TABLE 7: Comparison of MAE, MSE, RMSE with the [6] Zone-Ching Lin, Wen-Jang Wu, “Multiple LinearRegression
Model Analysis of the Overlay Accuracy Model Zone”, IEEE Trans. on
Model MSE MAE RMSE Semiconductor Manufacturing, vol. 12, no. 2, pp. 229 – 237, May
Linear Regression 7.4631 1.166 2.731
1999.
Polynomial Regression 2.0364 7.002 1.427
Ridge Regression 3.6712 8.289 1.916 [7] O. Ajao Isaac, A. Abdullahi Adedeji, I. Raji Ismail, “Polynomial
Xgboost Regression 0.001 0.029 0.0321 Regression Model of Making Cost Prediction In Mixed Cost
Analysis”, Int. Journal on Mathematical Theory and Modeling, vol.
2, no. 2, pp. 14 – 23, 2012.
V. CONCLUSION [8] C. Saunders, A. Gammerman and V. Vovk, “Ridge Regression

In this work, the effectiveness of various algorithms Learning Algorithm in Dual Variables”, Proc. of Int. Conf. on
Machine Learning, pp. 515 – 521, July 1998.IEEE
on the data on revenue and review of, best
TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO.
performance-algorithm, here propose a software to 7, JULY 2010 3561.
using regression approach for predicting the sales [9] ”Robust Regression and Lasso”. Huan Xu, Constantine

centered on sales data from the past the accuracy of Caramanis, Member, IEEE, and Shie Mannor, Senior Member, IEEE.
2015 International Conference on Industrial Informatics-Computing
linear regression prediction can be enhanced with this
Technology, Intelligent Technology, Industrial Information
method, polynomial regression, Ridge regression, and Integration.”An improved Adaboost algorithm based on uncertain
Xgboost regression can be determined. So, we can functions”.Shu Xinqing School of Automation Wuhan University of

conclude ridge and Xgboost regression gives the better Technology.Wuhan, China Wang Pan School of the Automation
Wuhan University of Technology Wuhan, China.
prediction with respect to Accuracy, MAE and RMSE
[10] Xinqing Shu, Pan Wang, “An Improved Adaboost Algorithm
than the Linear and polynomial regression approaches. based on Uncertain Functions”, Proc. of Int. Conf. on Industrial
In future, the forecasting sales and building a sales plan Informatics – Computing Technology, Intelligent Technology,

can help to avoid unforeseen cash flow and manage Industrial Information Integration, Dec. 2015.
[11] A. S. Weigend and N. A. Gershenfeld, “Time series prediction:
production, staff and financing needs more
Forecasting the future and understanding the past”, Addison-Wesley,
effectively.In future work we can also consider with the 1994.
ARIMA model which shows the time series graph. [12] N. S. Arunraj, D. Ahrens, A hybrid seasonal autoregressive
integrated moving average and quantile regression for daily food
sales forecasting, Int. J. Production Economics 170
(2015) 321-335P
REFERANCES
[13] D. Fantazzini, Z. Toktamysova, Forecasting German car sales
[1] Ching Wu Chu and Guoqiang Peter Zhang, “A comparative using Google data and multivariate models, Int. J. Production
study of linear and nonlinear models for aggregate retails sales Economics 170 (2015) 97-135.
forecasting”, Int. Journal Production Economics, vol. 86, pp. 217- [14] X. Yua, Z. Qi, Y. Zhao, Support Vector Regression for
231, 2003. Newspaper/Magazine Sales Forecasting, Procedia Computer Science
[2] Wang, Haoxiang. "Sustainable development and management in 17 ( 2013) 1055–1062.
consumer electronics using soft computation." Journal of Soft [15] E. Hadavandi, H. Shavandi, A. Ghanbari, An improved sales
Computing Paradigm (JSCP) 1, no. 01 (2019): 56.- 2. Suma, V., and forecasting approach by the integration of genetic fuzzy systems and
Shavige Malleshwara Hills. "Data Mining based Prediction of D data clustering: a Case study of the
[3] Suma, V., and Shavige Malleshwara Hills. "Data Mining based printed circuit board, Expert Systems with Applications 38 (2011)
Prediction of Demand in Indian Market for Refurbished Electronics." 9392–9399.
Journal of Soft Computing Paradigm (JSCP) 2, no. 02 (2020): 101- [16] P. A. Castillo, A. Mora, H. Faris, J.J. Merelo, P. GarciaSanchez,
110 A.J. Fernandez-Ares, P. De las Cuevas, M.I. Garcia-Arenas,
[4] Giuseppe Nunnari, Valeria Nunnari, “Forecasting Monthly Sales Applying computational intelligence methods for predicting the sales
Retail Time Series: A Case Study”, Proc. of IEEE Conf. on Business of newly published books in a real editorial business management
Informatics (CBI), July 2017. environment, Knowledge-Based Systems 115 (2017) 133-151.
[5]https://halobi.com/blog/sales-forecasting-five-uses/. [Accessed: [17] R. Majhi, G. Panda and G. Sahoo, “Development and
Oct. 3, 2018] performance evaluation of FLANN based model for forecasting of

978-0-7381-1327-2/21/$31.00 ©2021 IEEE 1420

Authorized licensed use limited to: East Carolina University. Downloaded on June 30,2021 at 07:39:02 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Intelligent Computing and Control Systems (ICICCS 2021)
IEEE Xplore Part Number: CFP21K74-ART; ISBN: 978-0-7381-1327-2

stock markets”.Expert Systems with Applications, vol. 36, issue 3, International Journal of Business Forecasting and Market
part 2, pp. 6800-6808, April 2009. Intelligence, vol. 1, no. 1, pp.50-67, 2008.
[18] Pei Chann Chang and Yen-Wen Wang, “Fuzzy Delphi and back [21]Suresh K and Praveen O, "Extracting of Patterns Using Mining
propagation model for sales forecasting in PCB industry”, Expert Methods Over Damped Window," 2020 Second International
systems with applications, vol. 30,pp. 715-726, 2006. Conference on Inventive Research in Computing Applications
[19] R. J. Kuo, Tung Lai HU and Zhen Yao Chen “application of (ICIRCA), Coimbatore, India, 2020, pp. 235-241, DOI:
radial basis function neural networks for sales forecasting”, Proc. of 10.1109/ICIRCA48905.2020.9182893.
Int. Asian Conference on Informatics in control, automation, and [22] Shobha Rani, N., Kavyashree, S., & Harshitha, R. (2020). Object
robotics, pp. 325- 328, 2009. Detection in Natural Scene Images Using Thresholding Techniques.
[20] R. Majhi, G. Panda, G. Sahoo, and A. Panda, “On the Proceedings of the International Conference on Intelligent
development of Improved Adaptive Models for Efficient Prediction Computing and Control Systems, ICICCS 2020, Iciccs, 509–515.
of Stock Indices using Clonal-PSO (CPSO) and PSO Techniques”,
[23] https://www.kaggle.com/brijbhushannanda1979/bigmartsales-
data. [Accessed: Jun. 28, 2018].

978-0-7381-1327-2/21/$31.00 ©2021 IEEE 1421

Authorized licensed use limited to: East Carolina University. Downloaded on June 30,2021 at 07:39:02 UTC from IEEE Xplore. Restrictions apply.

You might also like