0% found this document useful (0 votes)
57 views23 pages

Understanding Data Mining Techniques

Uploaded by

arvinlibang2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views23 pages

Understanding Data Mining Techniques

Uploaded by

arvinlibang2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 23

DATA MINING

COGNATE/ ELECTIVE II
(Techtarget, 2021)

What is Data Mining?


• Data mining is the process of sorting through large data sets to identify
patterns and relationships that can help solve business problems through
data analysis. Data mining techniques and tools enable enterprises
to predict future trends and make more-informed business decisions.
• Data mining is a key part of data analytics overall and one of the core
disciplines in data science, which uses advanced analytics techniques to
find useful information in data sets. At a more granular level, data mining
is a step in the Knowledge Discovery In Databases (KDD) process, a
data science methodology for gathering, processing and analyzing data.
Data mining and KDD are sometimes referred to interchangeably, but
they're more commonly seen as distinct things.
Data mining history and origins
• Data warehousing, BI and analytics technologies began to emerge in the late 1980s and early 1990s,
providing an increased ability to analyze the growing amounts of data that organizations were creating and
collecting. The term data mining was in use by 1995, when the First International Conference on
Knowledge Discovery and Data Mining was held in Montreal.
• The event was sponsored by the Association for the Advancement of Artificial Intelligence, or AARI, which
also held the conference annually for the next three years. Since 1999, the conference -- popularly known as
KDD 2021 and so on -- has been organized primarily by SIGKDD, the special interest group on knowledge
discovery and data mining within the Association for Computing Machinery.
• A technical journal, Data Mining and Knowledge Discovery, published its first issue in 1997. Initially a
quarterly, it's now published bimonthly and contains peer-reviewed articles on data mining and knowledge
discovery theories, techniques and practices. Another publication, the American Journal of Data Mining and
Knowledge Discovery, was launched in 2016.
Why is Data Mining important?
• Data mining is a crucial component of successful analytics initiatives in
organizations. The information it generates can be used in business
intelligence (BI) and advanced analytics applications that involve analysis
of historical data, as well as real-time analytics applications that examine
streaming data as it's created or collected.
• Effective data mining aids in various aspects of planning business
strategies and managing operations. That includes customer-facing
functions such as marketing, advertising, sales and customer support, plus
manufacturing, supply chain management, finance and HR. Data mining
supports fraud detection, risk management, cybersecurity planning and
many other critical business use cases. It also plays an important role in
healthcare, government, scientific research, mathematics, sports and more.
Data Mining process: How does it work?
• Data mining is typically done by data scientists and other skilled BI and analytics
professionals. But it can also be performed by data-savvy business analysts,
executives and workers who function as citizen data scientists in an organization.
• Its core elements include machine learning and statistical analysis, along with
data management tasks done to prepare data for analysis. The use of machine
learning algorithms and artificial intelligence (AI) tools has automated more of
the process and made it easier to mine massive data sets, such as customer
databases, transaction records and log files from web servers, mobile apps and
sensors.
The data mining process can be broken down into these Four Primary
Stages:

• Data gathering. Relevant data for an analytics application is identified and assembled.
The data may be located in different source systems, a data warehouse or a data lake, an
increasingly common repository in big data environments that contain a mix of structured
and unstructured data. External data sources may also be used. Wherever the data comes
from, a data scientist often moves it to a data lake for the remaining steps in the process.
• Data preparation. This stage includes a set of steps to get the data ready to be mined. It
starts with data exploration, profiling and pre-processing, followed by data cleansing
work to fix errors and other data quality issues. Data transformation is also done to make
data sets consistent, unless a data scientist is looking to analyze unfiltered raw data for a
particular application.
Four Primary Stages

• Mining the data. Once the data is prepared, a data scientist chooses the appropriate
data mining technique and then implements one or more algorithms to do the
mining. In machine learning applications, the algorithms typically must be trained on
sample data sets to look for the information being sought before they're run against
the full set of data.
• Data analysis and interpretation. The data mining results are used to create
analytical models that can help drive decision-making and other business actions.
The data scientist or another member of a data science team also must communicate
the findings to business executives and users, often through data visualization and
the use of data storytelling techniques.
These steps are part of the data mining process.
Types of data mining techniques
• Various techniques can be used to mine data for different data science applications. Pattern
recognition is a common data mining use case that's enabled by multiple techniques, as is
anomaly detection, which aims to identify outlier values in data sets. Popular data mining
techniques include the following types:
• Association rule mining. In data mining, association rules are if-then statements that identify
relationships between data elements. Support and confidence criteria are used to assess the
relationships -- support measures how frequently the related elements appear in a data set, while
confidence reflects the number of times an if-then statement is accurate.
• Classification. This approach assigns the elements in data sets to different categories defined as
part of the data mining process. Decision trees, Naive Bayes classifiers, k-nearest neighbor
and logistic regression are some examples of classification methods.
Types of data mining techniques

• Clustering. In this case, data elements that share particular characteristics


are grouped together into clusters as part of data mining applications.
Examples include k-means clustering, hierarchical clustering and
Gaussian mixture models.
• Regression. This is another way to find relationships in data sets, by
calculating predicted data values based on a set of variables. Linear
regression and multivariate regression are examples. Decision trees and
some other classification methods can be used to do regressions, too.
Types of data mining techniques

• Sequence and path analysis. Data can also be mined to look for patterns
in which a particular set of events or values leads to later ones.
• Neural networks. A neural network is a set of algorithms that simulates
the activity of the human brain. Neural networks are particularly useful in
complex pattern recognition applications involving deep learning, a more
advanced offshoot of machine learning.
Data mining software and tools
• Data mining tools are available from a large number of vendors, typically as part of software platforms that also
include other types of data science and advanced analytics tools. Key features provided by data mining software
include data preparation capabilities, built-in algorithms, predictive modeling support, a GUI-based
development environment, and tools for deploying models and scoring how they perform.
• Vendors that offer tools for data mining include Alteryx, AWS, Databricks, Dataiku, DataRobot, Google,
H2O.ai, IBM, Knime, Microsoft, Oracle, RapidMiner, SAP, SAS Institute and Tibco Software, among others.
• A variety of free open source technologies can also be used to mine data, including DataMelt, Elki, Orange,
Rattle, scikit-learn and Weka. Some software vendors provide open source options, too. For example, Knime
combines an open source analytics platform with commercial software for managing data science applications,
while companies such as Dataiku and H2O.ai offer free versions of their tools.
Benefits of data mining

• In general, the business benefits of data mining come from the increased
ability to uncover hidden patterns, trends, correlations and anomalies in
data sets. That information can be used to improve business decision-
making and strategic planning through a combination of conventional
data analysis and predictive analytics.
Specific data mining benefits include the
following:
• More effective marketing and sales. Data mining helps marketers better understand customer
behavior and preferences, which enables them to create targeted marketing and advertising
campaigns. Similarly, sales teams can use data mining results to improve lead conversion rates
and sell additional products and services to existing customers.
• Better customer service. Thanks to data mining, companies can identify potential customer
service issues more promptly and give contact center agents up-to-date information to use in
calls and online chats with customers.
• Improved supply chain management. Organizations can spot market trends and forecast
product demand more accurately, enabling them to better manage inventories of goods and
supplies. Supply chain managers can also use information from data mining to optimize
warehousing, distribution and other logistics operations.
Data Mining Benefits

• Increased production uptime. Mining operational data from sensors on manufacturing


machines and other industrial equipment supports predictive maintenance applications to
identify potential problems before they occur, helping to avoid unscheduled downtime.
• Stronger risk management. Risk managers and business executives can better assess
financial, legal, cybersecurity and other risks to a company and develop plans for managing
them.
• Lower costs. Data mining helps drive cost savings through operational efficiencies in
business processes and reduced redundancy and waste in corporate spending.
• Ultimately, data mining initiatives can lead to higher revenue and profits, as well as
competitive advantages that set companies apart from their business rivals.
Industry examples of Data Mining
• Here's how organizations in some industries use data mining as part of analytics applications:
• Retail. Online retailers mine customer data and internet clickstream records to help them
target marketing campaigns, ads and promotional offers to individual shoppers. Data mining
and predictive modeling also power the recommendation engines that suggest possible
purchases to website visitors, as well as inventory and supply chain management activities.
• Financial services. Banks and credit card companies use data mining tools to build financial
risk models, detect fraudulent transactions and vet loan and credit applications. Data mining
also plays a key role in marketing and in identifying potential upselling opportunities with
existing customers.
Industry examples

• Insurance. Insurers rely on data mining to aid in pricing insurance


policies and deciding whether to approve policy applications, including
risk modeling and management for prospective customers.
• Manufacturing. Data mining applications for manufacturers include
efforts to improve uptime and operational efficiency in production plants,
supply chain performance and product safety.
Industry examples

• Entertainment. Streaming services do data mining to analyze what users


are watching or listening to and to make personalized recommendations
based on people's viewing and listening habits.
• Healthcare. Data mining helps doctors diagnose medical conditions, treat
patients and analyze X-rays and other medical imaging results. Medical
research also depends heavily on data mining, machine learning and other
forms of analytics.
Data Mining vs. Data Analytics and Data Warehousing

• Data mining is sometimes viewed as being synonymous with data analytics. But it's
predominantly seen as a specific aspect of data analytics that automates the analysis of
large data sets to discover information that otherwise couldn't be detected. That information
can then be used in the data science process and in other BI and analytics applications.
• Data warehousing supports data mining efforts by providing repositories for the data sets.
Traditionally, historical data has been stored in enterprise data warehouses or smaller data
marts built for individual business units or to hold specific subsets of data. Now, though,
data mining applications are often served by data lakes that store both historical and
streaming data and are based on big data platforms like Hadoop and Spark, NoSQL
databases or cloud object storage services.
(TDK technologies, 2021)
Business Applications of Data Mining and Machine Learning

• Many businesses have a substantial amount of data, often with volume growing at a rapid rate. This makes
cost effective manual data analysis virtually impossible. Therefore, businesses turn to data mining
techniques to identify potentially useful information in their data, to aid business decision making
processes and enhance business intelligence in general.
• Machine learning leverages data mining and computational intelligence algorithms to improve decision
making models. Example applications of data mining and machine learning to business uses include:
• Search Engines: Adapting search engine results to search behaviors and the preferences of search users.
Determining the relevance of topics on a webpage to topics of a given keyword for which that webpage
may be listed in the search engine result pages.
• Customer Relationship Management (CRM): Determining the probability a given customer will
respond favorably to a certain interaction, typically sales and marketing activities, but also customer and
technical support approaches.
Business Applications

• Human Resources: Determining the probability that a given recruit will be a successful fit in an
organization. Predicting what incentives and company policies in general are most likely to achieve
the desired HR results.
• Retail: Determining the probability that a given customer would prefer a certain product or certain
user preferences, for example the product placement and recommender systems utilized by many
online retailers.
• Fraud Analysis: Determining the probability that a given credit card transaction may be fraudulent.
• Pharmaceuticals: Using bioinformatics to analyze life science data to aid in future drug discovery
and development processes. Analyzing demographic and health data to predict profitability of a
future drug if it were brought to market.

You might also like