Behavioral Segmentation For Supermarket Customers Using Unsupervised Machine Learning Algorithms
Behavioral Segmentation For Supermarket Customers Using Unsupervised Machine Learning Algorithms
Galo Paiva
Departamento de Ingeniería Industrial y Sistemas
Universidad de La Frontera
Temuco, Chile
[email protected]
Abstract
Clustering is a machine learning technique to analyze data and to discover groups that share some similarities or
closeness. It is useful for marketing segmentations because it allows classifying customers into groups based on
certain characteristics. In literature, the most commonly studies segmentation types are: geographic, demographic,
psychographic, behavioristic, volume, product-space, and benefit segmentation. This research is focused on
behavioristic segmentations for supermarket chains.
Behavior patterns are the core of the behavioristic segmentation. It considers customers’ attitude toward brands, the
knowledge of brands, purchasing habits and frequency. The segmentation related to loyalty is crucial to identify
loyal customers and to focus marketing strategies and tactics. The revenue depends greatly on that segment.
The research has been carried in four stages: analysis, design, development, and discussion. During the analysis,
1.073 customer loyalty surveys are preprocessed and analyzed. Two algorithms are employed during the
investigation: Simple K-Means Algorithm (SKMA) and Expectation-Maximization Clustering (EMC).
While SKMA cluster sizes are 27%, 15%, 15%, 26%, and 17%, EMC cluster sizes are 2%, 29%, 24%, 32%, and
13%. In conclusion, both SKMA and EMC help segmenting supermarket customers based on their behavior.
However, behavioral segmentation requires a deeper analysis since the cluster boundaries are not evident.
Keywords
Behavioristic Market Segmentation, Customer Loyalty, Machine Learning, Clustering Analysis, Supermarket Chains
1. Introduction
Market segmentation is one of the most relevant part of the marketing strategy. It was first introduced by W. Smith
in 1056 and it has been deeply analyzed ever since. The objective of segmentation is to find a set of variable which
allow identify homogeneous groups within a heterogeneous market to help focus the marketing strategies and
tactics. To accomplish it, a variety of models and techniques have been developed and used thru the years, from
simple statistical models to algorithms based on Artificial Intelligence (AI) (Mckechnie, 2006). Nowadays, the
availability of new technologies makes possible to access customer data to be analyzed quickly. In the retail
industry, for example, point of sales (POS) already allow apply data mining techniques and AI models.
Customer segmentation can be carried out following different criteria. For instance, it can be based on
demographics (age, sex, income, occupation, social class, stage of life, Internet access and use), geographical
(country, region, city, rural, density), behavioral (frequency of purchase, loyalty, where you buy, quantity
purchased), purchase occasion (routine, special, hours or days of purchase, fixed place or while traveling),
psychographic (lifestyle, personality, needs, values, attitudes, motivations), benefits (comfort, quality, economy,
ease, speed, etc.), beliefs and attitudes (towards brands, products, purchase channels) (Tynan and Drayton, 1987;
Rayport, 2001; Wyner, 2002; Kotler et al., 2009; Villarreal, R., 2014).
As expected, different segmentation criteria give rise to different results or segments. The degree of difficulty to
complete the segmentation varies too. In particular, psychographic variables are the most complex mainly because
they are related to the internal structure of individuals and are subjective in nature. Despite that, studies show that it
is an approach that can be appropriate to guide marketing strategies (Barry and Weinstein, 2009; Kaze and Skapars,
2011; Scheuffelen et al., 2019).
Selecting segmentation criteria is not simple and yet crucial. It depends on the purpose of the segmentation and on
the data availability, among others. Data-based segmentation is a useful tool to understand customer, however most
of the times data are not easy to be understood (Dolnicar and Leisch, 2014; Venter et al., 2015). Once the
segmentation is completed, it is necessary to identify the segments of mayor interest and the way they will be
approached, which depends on the specific customers’ needs and the products the company offers. (Wyner, 2002;
Wedel and Kamakura, 2002).
Knowing the behavior of consumers is crucial to create value and communicate it. Along the years, several models
to explain customers’ behavior have been created. They are based on a paradigm commonly referred in the literature
as CAB (cognition, affect, behavior). Howar and Sheth (1969) proposed that the brand recognition influences the
attitude and the purchase intention.
This investigation make use of an extensive survey based on behavioral CAB paradigm to analyze supermarket
customers by means of the following constructs: purchasing objectives (Baltas, 1997, Putrevu and Lord, 2001);
supermarket image (Semeijn et. Al, 2004; Collins et. Al, 2002, Grewal et al., 1998); brand loyalty (Garretson et. Al,
2002; Harcar and Kucukemiroglu, 2006, Ailawadi, 2001); shopping experience (Ailawadi, 2001); convenience of
the commercial relationship (Flavián et al., 2001); brand satisfaction (Burton et al., 1998); private label perception
(Collins-Dodd, and Lindley, 2003; Garretson, 2002; Burton et al., 1998, Dick et al., 1995).
1.1 Objective
To carry out a customer behavioristic segmentation for supermarket chains by means of applying unsupervised
machine learning techniques and clustering algorithms.
2. Literature Review
Several criteria can be applied to define segment. Some of the most studied segmentation types in the literature are:
• Behavioral : brand loyalty, buyer journey stage, price sensitivity, purchasing style, etc.
• Benefit : customer service, quality, etc.
• Demographic : age, education level, gender, income, family members, status, religion, etc.
• Geographic : country, city, district, etc.
• Psychographic : hobbies, interests, lifestyle, etc.
3. Methods
This investigation is carried out following a 4-stage model: analysis, design, construction, and discussion (Figure 5).
3.1 Analysis
In the first stage, a complete review of the data gathered during an extensive survey about the purchasing habits and
supermarket customers’ preferences is carried out. The survey took place in Temuco, a city in the south of Chile
with a population of approximately 221.000 inhabitants, and it considered five supermarket chains each of them
targeting different demographic market segments. These chains are quite distinguishable. For instance, one of them
is focused on convenient stores, while other one offers exclusive products at higher prices.
The original questionnaire was based on previous well-documented survey created by renowned authors. It has 69
questions grouped in several domains such us: product quality, product availability and variety, value/price
proposition, discount and loyalty campaigns, customer service, facility organization, etc. For the purposes of the
present work only 25 questions grouped in 5 domains are considered, all of which were answered using a scale from
1 to 7 (Table 1).
Since the chains considered in the original survey target different demographic segments, this research focusses in
only one of them. The selected chain is the middle of the price range, it counts with stores in different districts of the
city and it has a well-established private label with a variety of products.
3.2 Design
The original unlabeled dataset, a matrix of 1073 rows (instances) by 25 columns, is prepared to be clustered by
means of applying the algorithms SKMA and EMC. During the experiments, different number of clusters will be
tested and compared. Survey domains and their corresponding questions are presented below (Table 2 and Table 3).
3.3 Construction
As aforementioned, the objective of this investigation is to carry out the behavioral segmentation of a specific
supermarket chain customers by means of applying clustering algorithms. However, behavioral segmentation differs
significantly from other types. For instance, in demographic segmentation is evident whether customers are in
certain age range. In this case, instead, a deeper analysis is required.
In particular, the survey data used in this work is organized in discrete values in the range 0 to 7. The summary of
the answers to que questions are presented below (Table 4)
3.4 Discussion
Even though there exist more algorithms, the scope of this work is restricted to 2 of the most common algorithms:
SKMA and EMC, which belong to a broader family usually called Gaussian mixture models.
The first algorithm studies, SMKA, requires the definition of K centroids and the iterations until certain degree of
convergence to a local minimum is achieved. The latter, EMC, is meant to solve some of the weaknesses of SKMA.
Rather than focusing on the accuracy of the classification, due to the nature of the behavioral clustering the interest
is set on the number of clusters and the distribution of them.
4. Data Collection
Finding the optimal number of clusters for a given dataset requires the application of optimization algorithms.
However, it might have some drawbacks. Especially when the number of clusters is too high to be applied in reality.
The following tables present the data (instances) classification distribution when clustering algorithms are force to
generate 1, 2, 3, 4, and 5 clusters (Table 4 and Table 5).
Table 7. Clustering for domain D.1 (Q.3, Q.4, Q.7, and Q.53)
SKMA EMC
Cluster % Instances % Instances
C.1 31 328 51 549
C.2 11 121 1 8
C.3 40 430 6 60
C.4 9 99 2 18
C.5 9 95 41 438
Table 8. Clustering for domain D.2 (Q.5, Q.6, Q.16, Q.48, Q.50, and Q.52)
SKMA EMC
Cluster % Instances % Instances
C.1 35 374 2 22
C.2 17 179 34 360
C.3 9 101 35 376
C.4 15 156 18 196
Table 9. Clustering for domain D.3 (Q.17, Q.18, Q.19, Q.20, and Q.22)
SKMA EMC
Cluster % Instances % Instances
C.1 26 282 2 26
C.2 19 202 32 345
C.3 18 188 9 101
C.4 8 82 36 382
C.5 30 319 20 219
Table 10. Clustering for domain D.4 (Q.21, Q.23, Q.43, Q.47, and Q.49)
SKMA EMC
Cluster % Instances % Instances
C.1 30 325 31 334
C.2 21 227 30 322
C.3 20 214 4 41
C.4 17 186 22 235
C.5 11 121 13 141
Table 11. Clustering for domain D.5 (Q.41, Q.42, Q.44, Q.45 and Q.46)
SKMA EMC
Cluster % Instances % Instances
C.1 34 365 3 28
C.2 25 266 15 159
C.3 14 150 33 355
C.4 13 141 12 129
C.5 14 151 37 402
6. Conclusion
Behavioral segmentation, different from other types, is not so evident and usually requires a deeper analysis to
establish the difference between segments. This investigation takes advantage of an extensive supermarket customer
survey to outline a segmentation based on the well-known SKMA and EMC clustering algorithms.
Clustering algorithms are a special case type of machine learning algorithms used to classified unlabeled data by
means of grouping data points having similarities or some degree of closeness with each other.
Both SKMA and EMC are iterative optimization methods. Depending on the circumstances and necessities it might
be possible to find the optimal number of clusters or define a given number that is more practical. Experiments
revealed that the optimal number of clusters for the dataset is 16. However, only between 2 and 4 clusters
concentrates more than 10% of the data points. A collection of small clusters or market segments might complicate
excessively the design of an effective marketing campaign. Instead of that, a fixed number of clusters, from K=2 to
K=5, was analyzed. In the case of five clusters, while SKMA cluster sizes are 27%, 15%, 15%, 26%, and 17%,
EMC cluster sizes are 2%, 29%, 24%, 32%, and 13%.
An additional analysis was carried out to determine whether the nature of the survey’s domain has an influence on
the cluster definition. The difference in the sizes of the resulting clusters confirms that the segmentation depends, up
to certain point, on the criteria being applied.
In conclusion, both clustering algorithms SKMA and EMC can help segmenting supermarket customers based on
their behavior. However, behavioral segmentation requires a deeper analysis since the cluster borders usually are not
evident.
References
Ailawadi, K., The retail power-performance conundrum, Journal of Retailing, vol. 77, no. 3, pp. 299-318, 2001
Baltas, G., Determinants of store brand choice: a behavioral analysis, Journal of Product & Brand Management, vol.
6, no. 5, pp. 315-324, 1997.
Barry, J., and Weinstein, A., Business psychographics revisited: From segmentation theory to successful marketing
practice, Journal of Marketing Management, vol. 25, no. 3-4, pp. 315–340, 2009.
Burton, S., Lichtenstein, D., Netemeyer, R., and Garretson, J., A scale for measuring attitude toward private label
products and an examination of its psychological and behavioral correlates, Journal of the Academy of
Marketing Science, vol. 26, no. 4, pp. 293-306, 1998.
Collins-Dodd, C., and Lindley, T., Store brands and retail differentiation: the influence of store image and store
brand attitude on store own brand perceptions, Journal of Retailing and Consumer Services, vol. 10, no. 6,
pp. 345-352, 2003.
Dick, A., Jain, A., and Richardson, P., Correlates of store brand proneness: Some empirical observations, Journal of
Product & Brand Management, vol. 4, no. 4, pp. 15-22. 1995.
Dolnicar, S., and Leisch, F., Using graphical statistics to better understand market segmentation solutions,
International Journal of Market Research, vol. 56, no. 2, pp. 207-230, 2014.
Flavián, C., Martínez, E., and Polo, Y., Loyalty to grocery stores in the Spanish market of the 1990s, Journal of
Retailing and Consumer Services, vol. 8, no. 8), pp. 85-93, 2001.
Garretson, J. A., Fisher, D., and Burton, S., Antecedents of private label attitude and national brand promotion
attitude: similarities and differences, Journal of Retailing, vol. 78, no. 1, pp. 91-99, 2002.
Garriga J., Palmer J., Oltra A., and Bartumeus F., Expectation-Maximization Binary Clustering for Behavioural
Annotation, PLoS ONE, vol. 11, no. 3 2016.
Grewal, D., Krishnan, R., Baker, J., and Borin, N., The effect of store name, brand name and price discounts on
consumers’ evaluations and purchase intentions, Journal of Retailing, vol. 74, no. 3, pp. 331-352, 1998.
Harcar, T., Kara, A., and Kucukemiroglu, O., Consumer’ s perceived value and buying behavior of store brands: An
empirical investigation, The Business Review, Cambridge, vol. 5, no. 2, pp. 55-62, 2006.
Jung Y., Kang m., and Heo M., Clustering performance comparison using K-means and expectation maximization
algorithms, Biotechnology & Biotechnological Equipment, vol. 28, pp. 44-48, 2014.
Kaze, V., and Skapars, R., Paradigm shift in consumer segmentation to gain competitive advantages in post-crisis
FMCG markets: Lifestyle or social values? Economics and Management, vol. 16, no. 1956, pp. 1266-1274,
2011.
Kotler, P. and Keller, K., Marketing Management, 13th edition, Pearson Prentice-Hall, 2009.
Mckechnie, S., Integrating intelligent systems into marketing to support market segmentation decisions, Intelligent
Systems in Accounting, Finance and Management, vol. 14, no. 3, pp.117-127, 2006.
Putrevu, S., and Lord, K., Search dimension, patterns segment profiles of grocery shoppers. Journal of Retailing and
Consumer Service, vol. 8, no. 3, pp. 127-137, 2001.
Scheuffelen, S., Kemper, J., and Brettel, M., How do human attitudes and values predict online marketing
responsiveness? Comparing consumer segmentation bases toward brand purchase and marketing response,
Journal of Advertising Research, vol. 59, no. 2, pp. 142-157, 2019.
Semeijn, J., van Riel, A., and Ambrosini, A., Consumer evaluations of store brands: effects of store image and
product attributes, Journal of Retailing and Consumer Services, vol. 11, no. 11, pp. 247-258, 2004.
Sigurdsson, V., and Kahamseh, S. (2013). An econometric examination of the behavioral perspective model in the
context of Norwegian retailing, The Psychological Record, vol. 63, no. 1, pp. 277-294, 2013.
Smith, W., Product differentiation and market segmentation as alternative marketing strategies, Journal of
Marketing, 21, 3–8, 1956.
Tynan, A., and Drayton, J., Market segmentation, Journal of Marketing Management, vol. 2, no. 3, pp. 301-335,
1987.
Venter, P., Wright, A., and Dibb, S., Performing market segmentation: a performative perspective. Journal of
Marketing Management, vol. 31, pp. 62-83, 2015.
Villarreal, R., Improving multicultural marketing: between-group and within-group segmentation approaches,
Journal of Brand Strategy, vol. 3, no. 3, pp. 278-289, 2014.
Wedel, M., and Kamakura, W., Introduction to the special issue on market segmentation. International Journal of
Research in Marketing, vol. 19, pp. 181-183, 2002.
Witten, I., Frank, E., Hall, M., and Pal, C., Data Mining: Practical Machine Learning Tools and Techniques, 4th
Edition, Morgan Kaufmann, Cambridge, 2017.
Yoseph, F., Malim, N., Heikkilä, M., Brezulianu, A., Geman, O., and Rostam, N., The impact of big data market
segmentation using data mining and clustering techniques. Journal of Intelligent and Fuzzy Systems, vol.
38, no. 5, pp. 6159-6173, 2020.
Biographies
Carlos Hernández is an industrial engineer, consultant, and university professor. He earned Licentiate Degree in
Engineering from Universidad de La Frontera, Temuco, Chile, Master of Sciences in Computational Engineering
and Doctor of Engineering from Technische Universität Braunschweig, Brunswick, Germany. He is the author of
several scientific and engineering articles. He has taught lectures in Discrete Event Simulation, Supply Chain
Management, Engineering Economics, Corporate Finances, Financial Engineering, Business Analytics, Data Mining
and Machine Learning for engineering students. He has developed a professional career working for large
multinational companies (PricewaterhouseCoopers, BHP Billiton, and Merck Sharp & Dohme). He also worked as a
scientific researcher in the Institut für Produktionsmesstechnick at TU Braunschweig, Germany. His research
interests include manufacturing process simulation, transportation systems simulation, supply chain design and
simulation, and machine learning for finances. He is a member of IEOM.
Galo Paiva is an industrial engineer, consultant, and university professor. He earned Licentiate Degree in
Engineering from Universidad de Santiago de Chile, Chile, and Doctor of Business Management from Universidad
Autónoma de Madrid, Spain. He has taught lectures in Strategic Management, Operations Management, Industrial
Engineering, and Project Planning & Management. His research interests include manufacturing process simulation,
industrial design, business management, and entrepreneurship.