0% found this document useful (0 votes)
16 views

Data Mining

Data mining is the process of analyzing large amounts of data to discover useful patterns and relationships. It involves using software to extract patterns from raw data. Businesses use data mining to gain insights into customer behavior and improve marketing strategies. Common data mining techniques include classification, clustering, regression, association rule learning, outlier detection, and discovering sequential patterns.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Data Mining

Data mining is the process of analyzing large amounts of data to discover useful patterns and relationships. It involves using software to extract patterns from raw data. Businesses use data mining to gain insights into customer behavior and improve marketing strategies. Common data mining techniques include classification, clustering, regression, association rule learning, outlier detection, and discovering sequential patterns.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Data Mining

What is Data Mining?


• In simple words, data mining is defined as a process used to extract
usable data from a larger set of any raw data. It implies analysing data
patterns in large batches of data using one or more software.
• Data mining is a process used by companies to turn raw data into
useful information. By using software to look for patterns in large
batches of data, businesses can learn more about their customers to
develop more effective marketing strategies, increase sales and
decrease costs. Data mining depends on effective data collection,
warehousing, and computer processing.
What is Data Mining?
• Data mining is the process of analyzing massive volumes of data to
discover business intelligence that helps companies solve problems,
mitigate risks, and seize new opportunities.

• The most commonly accepted definition of “data mining” is the


discovery of “models” for data. A “model,” however, can be one of
several things.
What is Data Mining?
• Statistical Modelling
• Machine Learning
• Computational Approaches to Modelling
What is Data Mining?
Statistical Modelling
• Statisticians were the first to use the term “data mining.”
• statisticians view data mining as the construction of a statistical model,
that is, an underlying distribution from which the visible data is drawn.
Example :
• Suppose our data is a set of numbers. A statistician might decide that
the data comes from a Gaussian distribution and use a formula to
compute the most likely parameters of this Gaussian. The mean and
standard deviation of this Gaussian distribution completely characterize
the distribution and would become the model of the data.
What is Data Mining?
Machine Learning
• Few scientists regard data mining as synonymous with machine
learning.
• There is no question that some data mining appropriately uses
algorithms from machine learning. Machine-learning practitioners use
the data as a training set.
• To train an algorithm of one of the many types used by machine-
learning practitioners, such as Bayes nets, support-vector machines,
decision trees, hidden Markov models, and many others.
What is Data Mining?
Machine Learning
• The typical case where machine learning is a good approach is when
we have little idea of what we are looking for in the data.
• On the other hand, machine learning has not proved successful in
situations where we can describe the goals of the mining more
directly.
What is Data Mining?
Computational Approaches to Modelling
• More recently, computer scientists have looked at data mining as an
algorithmic problem. In this case, the model of the data is simply the
answer to a complex query about it.
• There are many different approaches to modelling data. Most other
approaches to modelling can be described as either
• Summarizing the data succinctly and approximately
• Extracting the most prominent features of the data and ignoring the rest.
What is Data Mining?
Summarization
• Web mining, the entire complex structure of the Web is summarized
by a single number for each page.
• Most interesting forms of summarization is the PageRank idea, which
made Google successful.
• Another important form of summary – clustering
• Data is viewed as points in a multidimensional space. Points that are
“close” in this space are assigned to the same cluster.
What is Data Mining?
Feature Extraction
• A complex relationship between objects is represented by finding the
strongest statistical dependencies among these objects and using only
those in representing all statistical connections.
• Some of the important kinds of feature extraction from large-scale
data
• Frequent Itemsets
This model makes sense for data that consists of “baskets” of small
sets of items, as in the market-basket problem
What is Data Mining?
Feature Extraction
• Similar Items. Data looks like a collection of sets, and the objective is
to find pairs of sets that have a relatively large fraction of their
elements in common. Process known as “Collaborative Filtering”.
• Collaborative Filtering is the most common technique used when it
comes to building intelligent recommender systems that can learn to
give better recommendations as more information about users is
collected.
Data Mining Techniques
• Classification:
This analysis is used to retrieve important and relevant information
about data, and metadata. This data mining method helps to classify
data in different classes.
• Clustering:
Clustering analysis is a data mining technique to identify data that are
like each other. This process helps to understand the differences and
similarities between the data.
• Regression:
Regression analysis is the data mining method of identifying and
analyzing the relationship between variables. It is used to identify the
likelihood of a specific variable, given the presence of other variables.
• Association Rules:
This data mining technique helps to find the association between two
or more Items. It discovers a hidden pattern in the data set.
• Outer detection:
This type of data mining technique refers to observation of data items
in the dataset which do not match an expected pattern or expected
behavior. This technique can be used in a variety of domains, such as
intrusion, detection, fraud or fault detection, etc. Outer detection is
also called Outlier Analysis or Outlier mining.
• Sequential Patterns:
This data mining technique helps to discover or identify similar patterns
or trends in transaction data for certain period.

You might also like