Data Mining Definitions and Applications For The Management of
Data Mining Definitions and Applications For The Management of
com
ScienceDirect
Procedia CIRP 81 (2019) 874–879
www.elsevier.com/locate/procedia
* Corresponding author. Tel.: +49 89 28915576; fax: +49 89 28915576. E-mail address: [email protected]
Abstract
Production complexity has increased considerably in recent years due to increasing customer requirements for individual products. At the same
time, continuous digitization has led to the recording of extensive, granular production data. Research claims that using production data in data
mining methods can lead to managing production complexity effectively. However, manufacturing companies widely do not use such data mining
methods. In order to support manufacturing companies in utilizing data mining, this paper presents both a literature review on definitions of data
mining, artificial intelligence and machine learning as well as a categorization of existing approaches of applying data mining to manage
production complexity.
© 2019 The Authors. Published by Elsevier Ltd.
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/)
Peer-review under responsibility of the scientific committee of the 52nd CIRP Conference on Manufacturing Systems.
Keywords: data mining; machine learning; artificial intelligence; production complexity
This is a resupply of March 2023 as the template used in the publication of the original article contained errors. The content of the article has remained unaffected.
Günther Schuh et al. / Procedia CIRP 81 (2019) 874–879 875
The goals of this paper are therefore (1) to create a common Objective 2: Develop tools for solving specific problems
understanding for the terms DM, ML, AI and statistics from an The pitfalls of symbolic AI stated above, led to a paradigm
application point of view (i.e. production), and (2) to support shift. Instead of modelling explicit knowledge, in 1987
production managers to identify relevant use-cases for Rumelhart and McClelland shifted the focus to the assumption
managing production complexity through DM. that a computer can learn rules by observing connections in
The remainder of the paper is structured as follows. In data, which moved especially the subject area of ML into focus
section 2, the terms of AI, ML and DM are historically [13, 14]. This insight enabled a movement summarized under
analyzed, defined in an application-oriented manner and the term connectionism. By linking many simple computing
subsequently separated from one another. Section 3 lays out a units in form of neural networks, a flexible yet robust
framework for managing production complexity with architecture is created, countering the symbolic AI approach
presenting and classifying existing applications of DM methods [12]. In addition to neural networks, other ML methods such as
in manufacturing companies. Section 4 summarizes the results kernel methods (e.g. support vector machines), hierarchical and
and gives an outlook on future research. ensemble learning methods (e.g. decision trees) also gained
acceptance [15]. In 2006, more extensive neural networks were
introduced and deemed particularly useful for central problems
2. Definition of AI, ML and DM of AI, especially with regard to vision and language. This field
is better known as deep learning [4, 15].
2.1. Historical developments of the terms DM, ML and AI
Objective 3: Develop tools for identifying and explaining
The term AI refers to the eponymous field of science, which patterns in data
emerged under the influence of computer science, mathematics, Since the late 1990s an inherent need for tools interpreting
neuroscience and other scientific disciplines and was shaped in the vast and exponentially growing amounts of data stored in
by several phases of strong research activity and economic databases has emerged [9, 16]. Thus, the field of DM has
interest (cf. Fig. 1). From this, three basic objectives of these developed from the environment of AI and under the influence
individual phases can be identified in retrospect [7, 8]. of statistics, employing ML methods and statistical data
analysis with the aim of addressing this need [9, 17]. In addition
Objective 1: Develop a toolbox for imitating human to gaining knowledge from data through DM, extensive
thinking and actions end-to-end concepts have gradually developed, starting with
Gödel, Church and Turing, among others, laid the company and task analysis through data acquisition and DM to
foundations of computer science and logic for computer the provision of software tools [16, 18].
technology in the 1930s. Programmable computers became
available and, subsequently, the idea of automating human 2.2. Definitions
thinking and behavior arose [9]. In 1950, Turing described a
theoretical concept, which later became known as the Turing Based on the above introduced three objectives of the data
Test, defining the branches and tools that would later be science phases, the definitions of AI, ML and DM are derived.
subsumed under the term AI [10, 11]. In the following years, AI seeks to enable computational agents to act and think
after the goal of the so-called symbolic AI had been set, rationally and intelligently [8, 11, 19]. The scientific goal of AI
researchers were concerned with the hypothesis that is to understand the principles of knowledge representation that
intelligence on a human level could be achieved by modelling enable intelligent behavior. The engineering goal of AI is to
a sufficient amount of knowledge in form of logical create computational agents that can solve real world problems
connections and automated reasoning by computers [12]. These as or more effectively and efficiently than humans [8, 11, 19].
expert systems showed limitations of the symbolic AI approach The implementation of these premises has many different
that could not live up to expectations [8, 9]. forms. Thus, AI can be seen as a toolbox whose subdomains
deliver tools to create intelligent computational agents [8].
ML is a subdomain of AI and seeks to enable computational
agents to gain task-related knowledge and solve task-specific
This is a resupply of March 2023 as the template used in the publication of the original article contained errors. The content of the article has remained unaffected.
876 Günther Schuh et al. / Procedia CIRP 81 (2019) 874–879
problems. ML methods aim to optimize a performance As for the terms DM and AI, the difference lays in the
criterion. This criterion acts as an indicator of the degree to relation. As DM is a subdomain of AI, it relates to the goals of
which a given task is solved. ML methods enable AI but specifies and implements them. DM does not supply
computational agents to learn from (historical) data. The degree new methods to AI but employs methods that evolved in ML
of solution of a task is optimized by learning. ML offers tools and statistics to extract knowledge from data [8].
to AI and is thus the basis for further subdomains of AI. ML The difference between ML and DM is that, while for both
and statistics methods form the core of DM [8, 17, 20]. ML methods are used, they are used for different purposes and
DM is another subdomain of AI and can be defined as a thus with different requirements. In ML, the knowledge is
process that aims to generate knowledge from data and presents stored implicitly and serves the purpose of optimizing
findings comprehensively to the user. Generating knowledge in computational agents’ performances. In DM, ML methods are
the context of DM can be translated to the discovering of new employed so that knowledge is gained from data and is then
and non-trivial patterns, relations and trends in data useful to stored and visualized explicitly, making it accessible and
the user. DM as a process involves, in essence, the collection interpretable to the user [20].
and selection of data, the pre-processing of data, data analysis Statistics supplies methods directly to DM. Statistics, as a
itself including the visualization of results, interpretation of subdomain of mathematics, is per definition a formal science.
findings, and the application of knowledge. To pre-process and DM does not require the same formality, even when employing
analyze data, ML and statistics methods are deployed in DM. statistics methods. This allows DM to analyze data without
Findings from DM processes can be distinguished in hypotheses and is driven by results that are to be evaluated by
descriptive ones, where knowledge is represented in form of experts, rather than precise reproducibility of the same. This
models that depict patterns and relations in data and predictive less formal concept was introduced to the domain of statistics
ones, where knowledge is represented in a prediction of future as Exploratory Data Analysis (EDA) before DM evolved, and
conditions, trends and relations [16, 21, 22]. has influenced the approach of DM towards statistics. The
fewer formal requirements also enable DM to use statistical
2.3. DM and its similarities and differences to AI, ML and methods on data that has not been specifically designed for
statistics analyses [23]. This feature becomes especially important in the
industrial context, where data integrity is still a big issue [5].
While the term statistics hardly gets used synonymously for
the other three terms, it is important to point out the scientific
and methodical differences. For the comprehensiveness of this 2.4. Insights and discussion
paper, similarities and differences between all of these four
terms can be found in the matrix below (cf. Fig. 2) Throughout our research we found that the term DM had the
least stringent definitions, whereas the definitions for ML were
Fig. 2. Definitions, similarities and differences of the terms AI, ML, DM and statistics.
This is a resupply of March 2023 as the template used in the publication of the original article contained errors. The content of the article has remained unaffected.
Günther Schuh et al. / Procedia CIRP 81 (2019) 874–879 877
To find applications of DM suitable to support managing Evaluating and preventing new variants: Applications in
production complexity, we examined current classifications of this category aim to assess the costs and benefits of creating
DM applications in production or manufacturing literature. new variants to offer decision support.
None of the eight publications found [3, 26-32] presented Neis [34] presents an approach employing clustering
categories specific to production complexity. While explicit methods to assess the costs of adopting new variants based off
references of production complexity were missing, some of the reference products. Products are initially clustered into product
presented categories related to production planning and control families and reference products are assigned based on the
and decision support contained applications of interest to the shortest distance to all other products within the cluster.
topic of production complexity. Factoring in the distance between a new variant and the
We then examined the applications in those categories and reference product, a cost function is calculated [34].
evaluated their contribution towards managing external or
internal complexity. Additionally, we conducted a Modulization and standardization: Applications in this
keyword-based research for applications of DM methods in category aim to reduce product variety by identifying common
managing production complexity outside the mentioned parts and possible modules.
publications. Finally, we clustered the suitable applications Agard and Kusiak [35] seek to identify subassemblies as
found in 42 publications into six categories separately for modules using association analysis. Parts in customer orders
internal or external production complexity. These categories are analyzed for common appearance in orders. Item sets
have been developed at the interface of already existing DM (combinations of parts) that exceed the confidence levels are
applications in production management and the assumption then examined regarding their feasibility as module [35].
that managing production complexity requires efficient support Instead of modulization, Romanowski and Nagi [36]
regarding product-based as well as process-based decisions in propose an approach for standardization based on the bill of
multi-variant value streams [33]. material (BOM) using clustering to reduce product variety.
This is a resupply of March 2023 as the template used in the publication of the original article contained errors. The content of the article has remained unaffected.
878 Günther Schuh et al. / Procedia CIRP 81 (2019) 874–879
After products have been clustered into product families, the A more general approach is presented by Gröger et al. [30],
parts used for the products within a product family are clustered who suggest a combined approach of structured query language
based on textual DM. The emerging cluster dendrogram is (SQL) and DM to quickly identify influence factors. Influence
examined by a product expert for the possibility of standardized factors on key performance indicators (KPI), such as lead time,
parts within a part cluster [36]. can be identified using data stored in structured data bases and
accessed through SQL-queries. The data can then be directly
Process planning for new products/variants: In this analyzed without data preparation using classification methods
category, applications aim to support planning processes for that identify factors influencing the classifying KPI [30].
products and variants based on the existing portfolio. Backus et al. [44] propose using clustering methods and
Hochdörffer et al. [37] suggest using clustering methods to regression trees to predict cycle times for lots based on
determine products requiring similar manufacturing historical data. Previous lots are clustered based on common
technologies and similar capacity on these machines. The bottlenecks in the production system. Regression trees analysis
clusters can be used to determine processes for new variants or is then used to determine factors influencing cycle time within
to optimize production networks [37]. common lots and thus enabling prediction [44].
Similarly, Denkena et al. [38] suggest clustering products
based on processes but extend the approach by using k-nearest Value stream complexity:
neighbor classification to classify new products/variants. Based This category presents applications analyzing complexity
on this classification processes from the nearest neighboring from a process-oriented value stream perspective. Process
product/variant can be adopted or used as planning base [38]. mining (PM) has evolved as a DM method for discovering,
Wallis et al. [39] propose using clustering and classification analyzing and improving processes. PM extracts process
methods as well, but deploy them differently. Products are models based on event data created during operations [45].
clustered first into part-based clusters and then, separately, into Rozinat et al. [46] applied PM to reduce the complexity of
process-based clusters. Using the naïve Bayes function, a wafer scanner testing process. Based on a discovery analysis,
process-based clusters are mapped onto part-based clusters. feedback loops and idle times were identified [46]. Park et al.
This allows more efficient assembly planning and exposes [47] use PM to analyze a production process within the
relations between variants and processes [39]. shipbuilding industry. When combining PM with data
envelopment analysis (DEA), a variety of block types could be
Choosing dispatching rules and planning sequencing: analyzed and differences between planned and actual
Applications in this category aim to support choosing operations were identified [47].
dispatching rules based on the existing conditions and seek to Within logistics we also identified PM applications across
make sequencing more efficient. different industries (e.g. shipbuilding [48] and automotive
Bohnen et al. [40] present an approach for production industry [49]). The PM methods (mainly discovery [48] and
levelling using clustering methods. Products are clustered conformance checking [49]) were combined with DM methods
based on their manufacturing requirements. Time blocks for (e.g. clustering [48]) to improve processes. Exemplarily, Lee et
production are then dedicated to the product families, which are al. [48] use PM and clustering to discover process models,
sequenced based on the needed set-up change. This allows to iterations and bottleneck activities [48]. Contrastingly, Knoll et
efficiently minimize set-up costs and time [40]. al. [49] address product and processes complexity using
Seeking to gain information about dispatching rules and multidimensional PM to identify waste.
factors influencing lead times Koonce and Tsai [41] propose PM supports the value stream perspective both in
employing decision trees. Initially, using evolutionary production and logistics. Therefore, PM should be seen as a key
algorithms, for a realistic scenario dispatching rules are DM technique for analyzing and reducing value stream
compared based on lead times. A decision tree is used to learn complexity. Further research directions should address the
factors of different dispatching rules influencing lead time [41]. integration of established lean principles for PM.
Following the goal of gaining information about
dispatching rules and influencing factors, Liu and Dong [42] 4. Findings and conclusion
suggest an approach similar to Koonce and Tsai but suggest
using artificial neural networks (ANN) that analyze and At the interface of rising production complexity due to
determine lead times of different dispatching rules [42]. shifting market demands and vast amounts of production data,
DM can be a valid tool to support managing complexity.
Predicting and optimizing lead and cycle times: Most applications of DM in production management have so
Applications in this category seek to help understand and assess far been related to quality management. There are very few
factors that influence lead and cycle times. applications of DM directly related to production complexity.
Cheng et al. [43] propose an approach using decision trees However, other applications of DM in other fields of production
to examine the influence of production staff on lead times and management serve the purpose of managing production
predict lead times based on staff set-up. Additionally, complexity very well. We have presented some of these
manufacturing tasks can be assigned based on individual applications and plan to extend the categories in future work to
performance. Correlations between staff set-up and lead times present a holistic framework of DM, as well as other ML and
are analyzed via monitored manufacturing steps. [43]. AI applications able to cover all relevant aspects of managing
production complexity.
This is a resupply of March 2023 as the template used in the publication of the original article contained errors. The content of the article has remained unaffected.
Günther Schuh et al. / Procedia CIRP 81 (2019) 874–879 879
In order to verify the outlined framework, we plan to [26] Otte, R; Otte, V; Kaiser, V. Data Mining für die industrielle Praxis.
evaluate the methods by employing real process data. This is München: Hanser; 2004.
[27] Lieber, D. Data Mining in der Qualitätslenkung am Beispiel
especially significant as many of the presented approaches have Stabstahlproduktion. Herzogenrath: Shaker; 2018.
only been implemented using synthetic data. [28] Wallis, R. Data-Mining-basierte Erstellung von Montagearbeitsplänen in
A holistic framework and its validation in practice are thus der Digitalen Fabrik. Herzogenrath: Shaker; 2016..
the logical next steps. [29] Harding, JA; Shahbaz, M; Srinivas; Kusiak, A. Data Mining in
Manufacturing: A Review. International Journal of Production Research
2006;128:969-976.
References [30] Gröger, C; Niedermann, F; Mitschang, B. Data Mining-driven
Manufacturing Process Optimization. Proceedings of the World Congress
[1] ElMaraghy, H; Schuh, G; ElMaraghy, W; Piller, F; Schönsleben, P; Tseng, on Engineering 2012;1-7.
M; Bernard, A. Product variety management. CIRP Annals 2013;62:629- [31] Wang, K. Applying data mining to manufacturing: the nature and
652. implications. In: Journal of Intelligent Manufacturing 2007;18.487-495.
[2] Tao, F; Qi, Q; Liu, A; Kusiak, A. Data-driven smart manufacturing. [32] Chen, MC. Configuration of cellular manufacturing systems using
Journal of Manufacturing Systems 2018;48:157–169. association rule induction. International Journal of Production Research,
[3] Choudhary, A; Harding, J; Tiwari, M. Data Mining in manufacturing - A 2003;41:381-395.
review based on the kind of knowledge. Journal of Intelligent [33] Hooshmand, Y; Köhler, P; Korff-Krum, A. Komplexitätsbeherrschung
Manufacturing 2009;20:501-521. und Transparenzerhöhung in der Einzelfertigung. ProduktDaten Journal,
[4] Goodfellow, I; Bengio, Y; Courville, A. Deep learning. Cambridge, 2013;2:55-9.
Massachusetts, London, England: MIT Press, 2016. [34] Neis, J. Analyse der Produktportfoliokomplexität unter Anwendung von
[5] Schuh, G; Reuter, C; Prote, J; Brambring, F; Ays, J. Increasing data Verfahren des Data Mining. Herzogenrath: Shaker, 2015.
integrity for improving decision making in production planning and [35] Agard, B; Kusiak, A. Data Mining for Subassembly Selection. In: IEEE
control. CIRP Annals 2017;66:425-8. Transaction on Pattern Analysis and Machine Intelligence 2004;126:628-
[6] Knoll, D; Prüglmeier, M; Reinhart, G. Predicting Future Inbound Logistics 631.
Processes Using Machine Learning. Procedia CIRP 2016;52:145-150. [36] Romanowski, CJ; Nagi, R. A Data Mining Approach to Forming Generic
[7] Chollet, F. Deep learning with Python. Shelter Island, NY: Manning Bills of Materials in Support of Variant Design Activities. In; International
(Safari Tech Books Online); 2018. Journal of Production Research 2004;4:316-347.
[8] Russell, S; Norvig, P. Artificial intelligence. A modern approach. 3rd ed. [37] Hochdörffer, J; Laule, C; Lanza, G. Product variety management using
Upper Saddle River, NJ: Prentice-Hall; 2010. data-mining methods. Reducing planning complexity by applying
[9] Bibel, W.; Ertel, W.; Kruse, R.: Grundkurs Künstliche Intelligenz. Eine clustering analysis on product portfolios. IEEE International Conference
praxisorientierte Einführung. 3rd ed.Wiesbaden: Springer; 2013. 2017:593-7.
[10]McCarthy, J; Minsky, M; Rochester, N; Shannon, CE. A proposal for the [38] Denkena, B; Schmidt, J; Krüger, M. Data Mining Approach for
dartmouth summer research project on artificial intelligence, August 31, Knowledge-based Process Planning. Procedia Technology 2014;15:406-
1955. AI magazine 2006;27):12-4. 415.
[11]Turing, AM. Computing Machinery and Intelligence. In: Mind [39] Wallis, R; Erohin, O; Klinkenberg, R; Deuse, J; Stromberger, F. Data
1950;59:433–460. Mining-supported Generation of Assembly Process Plans. Procedia CIRP,
[12]Smolensky, P. Connectionist AI, symbolic AI, and the brain. Artificial 2014;23:178-183.
Intelligence Review 1987;2:95-109. [40] Bohnen, F; Maschek, T; Deuse, J. Leveling of low volume and high mix
[13]Shalev-Shwartz, S; Ben-David, S. Understanding machine learning. From production based on a group technology approach. CIRP Journal of
theory to algorithms, Cambridge: Cambridge University Press; 2014. Manufacturing Science and Technology 2011; 4:247-251.
[14]Rumelhart, D.; McClelland, J. Parallel distributed processing. [41] Koonce, DA.; Tsai, SC. Using data mining to find patterns in genetic
Explorations in the microstructure of cognition (Series: Computational algorithm solutions to a job ship schedule. Computers & Industrial
models of cognition and perception). 3. print Ed. Cambridge, Mass.: MIT Engineering 2000;38:361-374.
Pr, 1987 [42] Liu, H; Dong, J. Dispatching rule selection using artificial neural networks
[15] LeCun, Y; Bengio, Y; Hinton, G. Deep learning. Nature 2015;521:436- for dynamic planning and scheduling. Journal of Intelligent Manufacturing
444. 1996;7:243-250.
[16]Fayyad, U; Piatetsky-Shapiro, G; Smyth, P. From data mining to [43] Cheng, YJ; Chen, MH; Cheng, FC; Cheng, YC.; Lin, YS.; Yang, CJ.
knowledge discovery in databases. AI magazine 1996;17:37-54. Developing a decision support system (DSS) for a dental manufacturing
[17]Alpaydin, E. Introduction to machine learning. Cambridge, Massachusetts, production line based on data mining. IEEE International Conference 2018;
London, England: MIT Press; 2009. 638-641.
[18]Wirth, R; Hipp, J. CRISP-DM: Towards a standard process model for data [44] Backus, P; Janakiram, M; Mowzoon, S; Runger, GC; Bhargava, A.
mining. Proceedings of the 4th International Conference on the Practical Factory Cycle-Time Prediction With a Data-Mining Approach. IEEE
Application of Knowledge Discovery and Data Mining 2000:29-39. Transactions on Semiconductor Manufacturing 2006;19:252-258.
[19]Winston, PH. Artificial Intelligence. Boston: Addison-Wesley; 1993. [45] Van der Aalst, WM. Process mining: data science in action. Berlin
[20]Mannila, H. Data mining: machine learning, statistics, and databases. Heidelberg: Springer; 2016.
Proceedings of 8th International Conference on Scientific and Statistical [46] Rozinat, A; de Jong, IS; Gunther, CW; van der Aalst, WM. Process mining
Data Base Management 1996;8:2-9. applied to the test process of wafer scanners in ASML. IEEE Transactions
[21] Witten, IH; Pal, CJ; Frank, E; Hall, MA. Data mining. Practical machine on Systems, Man, and Cybernetics, Part C (Applications and Reviews)
learning tools and techniques. Cambridge: Morgan Kaufmann; 2017. 2009;39:474-479.
[22] Chapman, P; Clinton, J; Kerber, R; Khabaza, T; Reinartz, T; Shearer, C; [47] Park, J; Lee, D; Zhu, J. (2014). An integrated approach for ship block
Wirth, R. CRISP-DM 1.0. Step-by-step data mining guide. SPSS Inc.; manufacturing process performance evaluation: Case from a Korean
2000. shipbuilding company. International Journal of Production Economics
[23] Hand, DJ. Data Mining: Statistics and More? In: The American Statistician 2014;156:214-222.
1998;52:112-8. [48] Lee, SK; Kim, B; Huh, M; Cho, S; Park, S; Lee, D. Mining transportation
[24]Dietrich, E; Schulze, A. Statistische Verfahren zur Maschinen- und logs for understanding the after-assembly block manufacturing process in
Prozessqualifikation. München: Carl Hanser Verlag; 2014. the shipbuilding industry. Expert Systems with Applications 2013;40:83-
[25] Schuh, G; Kampker, A, editors. Strategie und Management 95.
produzierender Unternehmen. Handbuch Produktion und Management 1. [49] Knoll, D; Reinhart, G; Prüglmeier, M. Enabling value stream mapping for
Berlin, Heidelberg: Springer-Verlag; 2011. internal logistics using multidimensional process mining. Expert Systems
with Applications 2019; 124:130-142.
This is a resupply of March 2023 as the template used in the publication of the original article contained errors. The content of the article has remained unaffected.