Data mining is the process of analyzing large amounts of data to discover useful patterns and relationships. It involves using software to extract patterns from raw data. Businesses use data mining to gain insights into customer behavior and improve marketing strategies. Common data mining techniques include classification, clustering, regression, association rule learning, outlier detection, and discovering sequential patterns.
Download as PPTX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
16 views
Data Mining
Data mining is the process of analyzing large amounts of data to discover useful patterns and relationships. It involves using software to extract patterns from raw data. Businesses use data mining to gain insights into customer behavior and improve marketing strategies. Common data mining techniques include classification, clustering, regression, association rule learning, outlier detection, and discovering sequential patterns.
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 15
Data Mining
What is Data Mining?
• In simple words, data mining is defined as a process used to extract usable data from a larger set of any raw data. It implies analysing data patterns in large batches of data using one or more software. • Data mining is a process used by companies to turn raw data into useful information. By using software to look for patterns in large batches of data, businesses can learn more about their customers to develop more effective marketing strategies, increase sales and decrease costs. Data mining depends on effective data collection, warehousing, and computer processing. What is Data Mining? • Data mining is the process of analyzing massive volumes of data to discover business intelligence that helps companies solve problems, mitigate risks, and seize new opportunities.
• The most commonly accepted definition of “data mining” is the
discovery of “models” for data. A “model,” however, can be one of several things. What is Data Mining? • Statistical Modelling • Machine Learning • Computational Approaches to Modelling What is Data Mining? Statistical Modelling • Statisticians were the first to use the term “data mining.” • statisticians view data mining as the construction of a statistical model, that is, an underlying distribution from which the visible data is drawn. Example : • Suppose our data is a set of numbers. A statistician might decide that the data comes from a Gaussian distribution and use a formula to compute the most likely parameters of this Gaussian. The mean and standard deviation of this Gaussian distribution completely characterize the distribution and would become the model of the data. What is Data Mining? Machine Learning • Few scientists regard data mining as synonymous with machine learning. • There is no question that some data mining appropriately uses algorithms from machine learning. Machine-learning practitioners use the data as a training set. • To train an algorithm of one of the many types used by machine- learning practitioners, such as Bayes nets, support-vector machines, decision trees, hidden Markov models, and many others. What is Data Mining? Machine Learning • The typical case where machine learning is a good approach is when we have little idea of what we are looking for in the data. • On the other hand, machine learning has not proved successful in situations where we can describe the goals of the mining more directly. What is Data Mining? Computational Approaches to Modelling • More recently, computer scientists have looked at data mining as an algorithmic problem. In this case, the model of the data is simply the answer to a complex query about it. • There are many different approaches to modelling data. Most other approaches to modelling can be described as either • Summarizing the data succinctly and approximately • Extracting the most prominent features of the data and ignoring the rest. What is Data Mining? Summarization • Web mining, the entire complex structure of the Web is summarized by a single number for each page. • Most interesting forms of summarization is the PageRank idea, which made Google successful. • Another important form of summary – clustering • Data is viewed as points in a multidimensional space. Points that are “close” in this space are assigned to the same cluster. What is Data Mining? Feature Extraction • A complex relationship between objects is represented by finding the strongest statistical dependencies among these objects and using only those in representing all statistical connections. • Some of the important kinds of feature extraction from large-scale data • Frequent Itemsets This model makes sense for data that consists of “baskets” of small sets of items, as in the market-basket problem What is Data Mining? Feature Extraction • Similar Items. Data looks like a collection of sets, and the objective is to find pairs of sets that have a relatively large fraction of their elements in common. Process known as “Collaborative Filtering”. • Collaborative Filtering is the most common technique used when it comes to building intelligent recommender systems that can learn to give better recommendations as more information about users is collected. Data Mining Techniques • Classification: This analysis is used to retrieve important and relevant information about data, and metadata. This data mining method helps to classify data in different classes. • Clustering: Clustering analysis is a data mining technique to identify data that are like each other. This process helps to understand the differences and similarities between the data. • Regression: Regression analysis is the data mining method of identifying and analyzing the relationship between variables. It is used to identify the likelihood of a specific variable, given the presence of other variables. • Association Rules: This data mining technique helps to find the association between two or more Items. It discovers a hidden pattern in the data set. • Outer detection: This type of data mining technique refers to observation of data items in the dataset which do not match an expected pattern or expected behavior. This technique can be used in a variety of domains, such as intrusion, detection, fraud or fault detection, etc. Outer detection is also called Outlier Analysis or Outlier mining. • Sequential Patterns: This data mining technique helps to discover or identify similar patterns or trends in transaction data for certain period.