0% found this document useful (0 votes)
705 views11 pages

Sales Prediction Model For Big Mart: Parichay: Maharaja Surajmal Institute Journal of Applied Research

1) The document discusses building a sales prediction model for Big Mart stores using machine learning algorithms. 2) Various machine learning algorithms like linear regression, KNN, decision trees, naive bayes, random forest and K-means clustering are discussed for their application in sales prediction. 3) The methodology involves collecting sales data from 10 Big Mart stores with 1559 products, preprocessing the data to handle null values and ambiguities, and then using algorithms like random forest and linear regression to build a predictive model and obtain sales prediction results.

Uploaded by

Iqra Tabusam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
705 views11 pages

Sales Prediction Model For Big Mart: Parichay: Maharaja Surajmal Institute Journal of Applied Research

1) The document discusses building a sales prediction model for Big Mart stores using machine learning algorithms. 2) Various machine learning algorithms like linear regression, KNN, decision trees, naive bayes, random forest and K-means clustering are discussed for their application in sales prediction. 3) The methodology involves collecting sales data from 10 Big Mart stores with 1559 products, preprocessing the data to handle null values and ambiguities, and then using algorithms like random forest and linear regression to build a predictive model and obtain sales prediction results.

Uploaded by

Iqra Tabusam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Parichay: Maharaja Surajmal Institute Journal of Applied Research

Sales Prediction Model for Big Mart


Nikita Malik*, Karan Singh**

Abstract: Machine Learning is a category of algorithms that two decades, Machine Learning (ML) is also on a fast pace for
allows software applications to become more accurate in its evolution. ML is an important mainstay of IT sector and
predicting outcomes without being explicitly programmed. The with that, a rather central, albeit usually hidden, part of our life
basic premise of machine learning is to build modelsand employ [1]. As the technology progresses, the analysis and
algorithms that can receive input data and use statistical understanding of data to give good results will also increase as
analysis to predict an output while updating outputs as new data the data is very useful in current aspects.In machine learning,
becomes available. These models can be applied in different one deals with both supervised and unsupervised types of tasks
areas and trained to match the expectations of management so and generally a classification type problem accounts as a
that accurate steps can be taken to achieve the organization’s resource for knowledge discovery. Itgeneratesresources and
target.In this paper, the case of Big Mart, a one-stop-shopping-
employs regression to make precise predictions about future,
center, has been discussed to predict the sales of different types
the main emphasis being laid on making a system self-
of items and for understanding the effects of different factors on
the items’ sales.Taking various aspects of a dataset collected for efficient, to be able to do computations and analysis to
Big Mart, and the methodology followed for building a generate much accurate and precise results [2]. By using
predictive model, results with high levels of accuracy are statistic and probabilistic tools, data can be converted into
generated, and these observations can be employed to take knowledge. The statistical inferencing uses sampling
decisions to improve sales. distributions as a conceptual key [11].

Keywords: Machine Learning, Sales Prediction, Big Mart, ML can appear in many guises. In this paper, firstly, various
Random Forest, Linear Regression applications of ML and the types of data they deal with are
discussed. Next, the problem statement addressed through this
1. INTRODUCTION work is stated in a formalized way. This is followed by
explaining the methodology ensued and the prediction results
In today’s modern world, huge shopping centers such as big observed on implementation.Various machine learning
malls and marts are recording data related to sales of items or algorithms include [3]:
products with their various dependent or independent factors • Linear Regression: It can be termed as a parametric
as an important step to be helpful in prediction of future technique which is used to predict a continuous or
demands and inventory management. The dataset built with dependent variable on basis of a provided set of
various dependent and independent variables is a composite independent variables. This technique is said to be
form of item attributes, data gathered by means of customer, parametric as different assumptions are made on basis of
and also data related to inventory management in a data data set.
warehouse. The data is thereafter refined in order to get
accurate predictions and gather new as well as interesting • K-Nearest Neighbors (KNN): It is a learning algorithm
results that shed a new light on our knowledge with respect to which is based on instances and knowledge gained
the task’s data. This can then further be used for forecasting through them [4]. Unlike mining in data stream scenarios,
future sales by means of employing machine learning cases where every sample can simultaneously belong to
algorithms such as the random forests and simple or multiple multiple classes in hierarchical multi-label classification
linear regression model. problems, k-NN is being proposed to be applied to predict
outputs in structured form [5].
1.1 Machine Learning • Decision tree: It is an intuitive model having low bias and
it can be adopted to build a classification tree with root
The data available is increasing day by day and such a huge
node being the first to be taken into account in a top-down
amount of unprocessed data is needed to be analysed precisely,
manner.It is a classic model for machine learning [6].
as it can give very informative and finely pure gradient results
as per current standard requirements. It is not wrong to say as • Naïve Bayes classifiers: These are based on Bayes
with the evolution of Artificial Intelligence (AI) over the past theorem and a collection of classification algorithmswhere

*Assistant Professor, MSI, Janakpuri, New Delhi; [email protected]


**Student, MSI, Janakpuri, New Delhi; [email protected]

Vol 3. Issue 1; January-June 2020 22


Sales Prediction Model for Big Mart

classification of every pair is independent of each based ontheir closeness to the centroid value [9].
other.Bayesian learning can provide predictions with
1.2 Problem Statement
readable reasons by generating an if-then form of list of
rules [8]. “To find out what role certain properties of an item play and
• Random Tree: It is an efficient algorithm for achieving how they affect their sales by understanding Big Mart sales.”
scalability and is used in identification problems for In order to help BigMart achieve this goal, a predictive model
building approximate system. The decisions are taken can be built to find out for every store, the key factors that can
considering the choices made on basis of possible increase their sales and what changes could be made to the
consequences, the variables which are included, input product or store’s characteristics.
factor. Other algorithms can include SVM, xgboost,
logistic regression and so on[7]. 2. METHODOLOGY

• K-means clustering: This algorithm is used in The steps followed in this work, right from the dataset
unsupervised learning for creating clusters of related data preparation to obtaining results arerepresented in Fig.1.

Fig. 1. Steps followed for obtaining results

2.1 Dataset and its Preprocessing observations it is inferred what role certain properties of an
item play and how they affect their sales.The dataset looks like
BigMart’s data scientists collected sales data of their 10 stores shown in Fig.2 on using head() function on the dataset
situated at different locations with each store having 1559 variable.
different products as per 2013 data collection.Using all the

Fig. 2. Screenshot of Dataset

Vol 3. Issue 1; January-June 2020 23


Nikita Malik, Karan Singh

The data set consists of various data types from integer to float
to object as shown in Fig.3.

In the raw data, there can be various types of underlying


patterns which also gives an in-depth knowledge about subject
of interest and provides insights about the problem. But
caution should be observed with respect to data as itmay
contain null values, or redundant values, or various types of
ambiguity, which also demands for pre-processing of data.
Dataset should therefore be explored as much as possible.

Various factors important by statistical means like mean,


standard deviation, median, count of values and maximum
value etc. are shown in Fig.4 for numerical variables of our
Fig. 3. Various datatypes used in the Dataset dataset.

Fig. 4. Numerical variables of the Dataset

Preprocessing of thisdataset includes doing analysis on the 2.2 Algorithms employed


independent variables like checking for null values in each
column and then replacing or filling them with supported Scikit-Learn can be used to track machine-learning system on
appropriate data types, so that analysis and model fitting is not wholesome basis[12].Algorithms employed for predicting
hindered from its way to accuracy. Shown above are some of sales for this dataset are discussed as follows:
the representations obtained by using Pandas tools which tells
about variable count for numerical columnsandmodal values • Random Forest Algorithm
for categorical columns. Maximum and minimum values in Random forest algorithm is a very accurate algorithm to be
numerical columns, along with their percentile values for used for predicting sales.It is easy to use and understandfor the
median, plays an important factor in deciding which value to purpose of predicting results of machine learning tasks.In sales
be chosen at priority for further exploration tasks and analysis. prediction, random forest classifier is used because it has
Data types of different columns are used further in label decision tree like hyperparameters.The tree model is same as
processing and one-hot encoding scheme during model decision tool. Fig.5 shows the relation between decision trees
building. and random forest.To solve regression tasks of prediction by

Vol 3. Issue 1; January-June 2020 24


Sales Prediction Model for Big Mart

virtue of random forest, the sklearn.ensemble library’s random samples are considered.A split’s quality is measured using mse
forest regressor class is used. The key role is played by the (mean squared error), which can also be termed as feature
parameter termed as n_estimators which also comes under selection criterion.This also means reduction in variancemae
random forest regressor.Random forest can be referred to as a (mean absolute error), which is another criterion for feature
meta-estimator used to fit upon numerous decision trees (based selection.Maximum tree depth, measured in integer terms, if
on classification) by taking the dataset’s different sub- equals one, then all leaves are pure or pruning for better model
samples.min_samples_split is taken as the minimum number fitting is done forall leaves less than
when splitting an internal node if integer number of minimum min_samples_splitsamples.

Fig. 5. Relation between Decision Trees and Random Forest

• Linear Regression Algorithm β1 -when there is a change in X by 1 unit it denotes change in


Y.It can also be said as slope term ∈ -The difference between
Regression can be termed as a parametric technique which is the predicted and actual values is represented by this parameter
used to predict a continuous or dependent variable on basis of
and also represents the residual value.Howeverefficiently the
a provided set of independent variables. This technique is said
model is trained, tested and validated, there is always a
to be parametric as different assumptions are made on basis of
difference between actual and predicted values which is
data set.
irreducible error thus we cannot rely completely on the
predicted results by the learning algorithm. Alternative
Y = βo + β1X + ∈ (1) methods given by Dietterich can be used for comparing
learning algorithms [10].
Equation shown in eq.1 is used for simple linear
regression.These parameters can be said as: 2.3 Metrics for Data Modelling
Y -Variable to be predicted • The coefficient of determination R2(R-squared) is a
statistic that measuresthe goodness of a model’s fit i.e.
X -Variable(s) used for making a prediction how well the real data points are approximated by the
predictions of regression. Higher values of R2 suggest
βo -When X=0, it is termed as prediction value or can be higher model accomplishments in terms of prediction
referredto as intercept term along with accuracy, and the value 1 of R2 is indicative of
regression predictions perfectly fitting the real data

Vol 3. Issue 1; January-June 2020 25


Nikita Malik, Karan Singh

points.For further better results, the use of adjusted R2 easier, is to combine several sub-models which are low
measures works wonders. Taking logarithmic values of dimensional and easily verifiable by domain experts, i.e.,
the target column in the dataset proves to be significant in ensemble learning can be exploited [9].
the prediction process. So, it can be said that on taking
adjustments of columns used in prediction, better results 3. IMPLEMENTATION AND RESULTS
can be deduced. One way of incorporating adjustment
could also have included taking square root of the column. In this section, the programming language, libraries,
It also provides better visualization of the dataset and implementation platform along with the data modeling and the
target variable as the square root of target variable is observations and results obtained from it are discussed.
inclined to be a normal distribution.
3.1 Implementation Platform and Language
• The error measurement is an important metric in the
estimation period. Root mean squared error (RMSE) and Python is a general purpose, interpreted-high level language
Mean Absolute Error (MAE) are generally used for used extensively nowadays for solving domain problems
continuous variable’s accuracy measurement. It can be instead of dealing with complexities of a system. It is also
said that the average model prediction error can be termed as the ‘batteries included language’ for programming.
expressed in units of the variable of interest by using both It has various libraries used for scientific purposes and
MAE and RMSE. MAE is the average over the test sample inquiries along with number of third-party libraries for making
of the absolute differences between prediction and actual problem solving efficient.
observation where all individual differences have equal
weight. The square root of the average of squared In this work, the Python libraries of Numpy, for scientific
differences between prediction and actual observation can computation, and Matplotlib, for 2D plotting have been used.
be termed as RMSE. RMSE is an absolute measure of fit, Along with this, Pandas tool of Python has been employed for
whereas R2 is a relative measure of fit.RMSE helps in carrying out data analysis.Random forest regressor is used to
measuring the variable’s average error and it is also a solve tasks by ensembling random forest method.As a
quadratic scoring rule. Low RMSE values obtained for development platform, Jupyter Notebook, which proves to
linear or multiple regression corresponds to better model work great due to its excellence in ‘literate programming’,
fitting. where human friendly code is punctuated within code blocks,
With respect to the results obtained in this work, it can be said has been used.
that there is no big difference between our train and test
sample since the metric RMSE ratio is calculated to be equal to 3.2 Data Modeling and Observations
the ratio between train and test sample. The results related to Correlation is used to understand the relation between a target
how accurately responses are predicted by our model can be variable and predictors. In this work, Item-Sales is the target
inferred from RMSE as it is a good measure along with variable and its correlation with other variables is observed.
measuring precision and other required capabilities. A
considerable improvement could be made by further data Considering the case of Item-Weight, the feature item weight
exploration incorporated with outlier detection and high is shown to have a low correlation with the target variable
leverage points.Another approach, which is conceptually Item-Outlet-Sales in Fig.6.

Fig. 6. Correlation between target variable and Item-weight variable

Vol 3. Issue 1; January-June 2020 26


Sales Prediction Model for Big Mart

As can be seen from Fig.7, there is no significant relation found between the year of store establishment and the sales for the items.
Values can also be combined into variables that classify them into periods and give meaningful results.

Fig. 7. Correlation between target variable and Outlet-establishment-year variable

The place where an item is placed in a store, referred to as Item_visibility, definitely affectsthe sales.However, the plot chart and
correlation table generated previouslyshow that the flow is in opposite side.One of the reasons might be that daily used products
don’t need high visibility. However, there is an issue that some products have zero visibility, which is quite impossible. Fig.8
shows the correlation between item visibility variable and the target variable.

Fig. 8. Correlation between target variable and Item-visibility variable

Vol 3. Issue 1; January-June 2020 27


Nikita Malik, Karan Singh

Frequency for each categorical or nominal variable plays a significant role in further analysis of the dataset, thus supporting and
collaborating in data exploration to be performed. As shown in Fig.9, various variables in our dataset, with their data type and
categories are shown. Here, the ID column and the source column, denoting from where the test or train sample data belongs to,
are excluded and not used.

Fig. 9. Different item categories in the dataset

When a predictive model generated from any supervised learning regression method is applied to the dataset, the process is said to
be data scoring. The above model score clearly infers about Data Scoring.The probability of a product’s sales to rise and sink can
be discussed and understood on the basis of certain parameters. The vulnerabilities associated with a product or item and further its
sales are also necessary and play a very important role in our problem-solving task. Further, a user authentication mechanism
should be employed to avoid access from any unauthorized users and thus ensuring all results are protected and secured.

Vol 3. Issue 1; January-June 2020 28


Sales Prediction Model for Big Mart

Fig. 10. Flowchart for division of dataset on various factors (having proper leaves after pruning)

In Fig.10, a flowchart is represented in which the dataset has been divided on the basis of various factors. In the last stage of the
flowchart, the nodes with numbers ‘a’ and ‘b’ represent some string values for distinguishing the dataset items and ‘num’ can be
any arbitrary number.The dataset has been divided andpruning has been performed on the basis of different factors. Ensembling
many such decision trees will generate a random forest model.

Fig. 11. Diagram showing correlation among different factors

From Fig.11, the correlation among various dependent and after data pre-processing, and following are some of the
independent variables is explored to be able to decide on the important observations about some of the used variables:
further steps that are to be taken. Variables used are obtained

Vol 3. Issue 1; January-June 2020 29


Nikita Malik, Karan Singh

• Item_visibility is having nearly zero correlation with our medium are also having high sales, which means that a
dependent variable item_outlet_sales and grocerystore one-stop-shopping-center situated in a town or city with
outlet_type. This means that the sales are not affected by populated area can have high sales.
visibility of item which is a contradiction to the general
• Variation in MRP quoted by various outlets depends on
assumption of “more visibility thus, more sales”.
their individual sales.
• Item_MRP (maximum retail price) is positively correlated
Fig.12 summarizes the various observations obtained from the
with sales at an outlet, which indicates that the price
developed linear regression model.The method used is least
quoted by an outlet plays an important factor in sales.
square method and model used is ordinary least square method
• Outlet situated in location with type tier 2 and size (OLS).

Vol 3. Issue 1; January-June 2020 30


Sales Prediction Model for Big Mart

Fig. 12. Summary from linear regression model

It is observed that the R-squared value is 0.563 for our The median of the target variable Item_Outlet_Sales was
dependent variable for 8523 number of observations taken calculated to be 3364.95 for OUT027 location. The location
under consideration. This signifies how accurately the built with second highest median score (OUT035) hada median
regression model fits. value of 2109.25.

4. PREDICTION RESULTS AND CONCLUSION Adjusted R-squared and R-squared values are higher for Linear
regression model than average. Therefore, the used model fits
The largest location did not produce the highest sales. The better and exhibits accuracy.
location that produced the highest sales was the OUT027
location, which was in turn a Supermarket Type3, having its Also, model accuracy and score of regression model canreach
size recorded as medium in our dataset. It can be said that this nearly 61% if built with more hypothesis consideration and
outlet’s performance was much better than any other outlet analysis, as shown by code snippet in Fig.13.
location with any size provided in the considered dataset.

Fig. 13. Code showing model score of random forest

It can be concluded that more locations should be switched or 5. CONCLUSION AND FUTURE SCOPE
shifted to Supermarket Type3 to increase the sales of products
at Big Mart. Any one-stop-shopping-center like Big Mart can In this paper, basics of machine learning and the associated
benefit from this model by being able to predict its items’ data processing and modeling algorithms have been described,
future sales at different locations. followed by their application for the task of sales prediction in
Big Mart shopping centers at different locations.On

Vol 3. Issue 1; January-June 2020 31


Nikita Malik, Karan Singh

implementation, the prediction results show the correlation measurements of both accuracy and resource efficiency to
among different attributes considered and how a particular assess and optimize correctly.
location of medium size recorded the highest sales, suggesting
that other shopping locations should follow similar patterns for REFERENCES
improved sales.
[1] Smola, A., & Vishwanathan, S. V. N. (2008). Introduction to
Multiple instances parameters and various factors can be used machine learning. Cambridge University, UK, 32, 34.
to make this sales prediction more innovative and successful. [2] Saltz, J. S., & Stanton, J. M. (2017). An introduction to data
Accuracy, which plays a key role in prediction-based systems, science. Sage Publications.
can be significantly increased as the number of parameters [3] Shashua, A. (2009). Introduction to machine learning: Class
used are increased. Also, a look into how the sub-models work notes 67577. arXiv preprint arXiv:0904.3664.
can lead to increase in productivity of system. The project can [4] MacKay, D. J., & Mac Kay, D. J. (2003). Information theory,
be further collaborated in a web-based application or in any inference and learning algorithms. Cambridge university press.
device supported with an in-built intelligence by virtue of [5] Daumé III, H. (2012). A course in machine learning. Publisher,
ciml. info, 5, 69.
Internet of Things (IoT), to be more feasible for use. Various
stakeholders concerned with sales information can also provide [6] Quinlan, J. R. (2014). C4. 5: programs for machine learning.
Elsevier.
more inputs to help in hypothesis generation and more
instances can be taken into consideration such that more [7] Cerrada, M., & Aguilar, J. (2008). Reinforcement learning in
system identification. In Reinforcement Learning. IntechOpen.
precise results that are closer to real world situations are
generated. When combined with effective data mining methods [8] Welling, M. (2011). A first encounter with Machine Learning.
Irvine, CA.: University of California, 12.
and properties, the traditional means could be seen to make a
[9] Learning, M. (1994). Neural and Statistical Classification.
higher and positive effect on the overall development of
Editors D. Mitchie et. al, 350.
corporation’s tasks on the whole. One of the main highlights is
[10] Mitchell, T. M. (1999). Machine learning and data mining.
more expressive regression outputs, which are more
Communications of the ACM, 42(11), 30-36.
understandable bounded with some of accuracy.Moreover, the
[11] Downey, A. B. (2011). Think stats. "O'Reilly Media, Inc.".
flexibility of the proposed approach can be increased with
[12] Géron, A. (2019). Hands-On Machine Learning with Scikit-
variants at a very appropriate stage of regression model-
Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques
building. There is a further need of experiments for proper to Build Intelligent Systems. O'Reilly Media.

Vol 3. Issue 1; January-June 2020 32

You might also like