Classification-Based Approaches in Data Mining
Last Updated :
04 Aug, 2021
Classification is that the processing of finding a group of models (or functions) that describe and distinguish data classes or concepts, for the aim of having the ability to use the model to predict the category of objects whose class label is unknown. The determined model depends on the investigation of a set of training data information (i.e. data objects whose class label is known). The derived model could also be represented in various forms, like classification (if – then) rules, decision trees, and neural networks. Data Mining has a different type of classifier: A classification is a form of data analysis that extracts models describing important data classes. Such models are called Classifiers. For example, We can build a classification model for banks to categorize loan applications.
A general approach to classification:
Classification is a two-step process involving,
Learning Step: It is a step where the Classification model is to be constructed. In this phase, training data are analyzed by a classification Algorithm.
Classification Step: it’s a step where the model is employed to predict class labels for given data. In this phase, test data are wont to estimate the accuracy of classification rules.
Basic algorithms of classification:
Decision Tree Induction:
- Decision Tree Induction is the learning of decision trees from class labeled training tuples.
- Given a tuple X, for which the association class label is unknown the attribute values of tuples are tested against the decision tree.
- A path that is traced from the root to the leaf node, which holds the class prediction for the tuple.
- These trees are then converted into Classification rules.
- Decision Trees are easier to interrupt are they no need any domain Knowledge
Naïve Bayesian Classification:
- They are Statistical Classifiers.
- They can predict class membership probabilities such as the probability that a given tuple belongs to a particular class.
- Naïve classifiers assume that the effect of an attribute value on a class is independent of values of other attributes.
- The mathematical formula for this classification is,

where H be a hypothesis and p(H|X) is a probability that H holds the given evidence for the tuple X (Observed data)
p(X|H) is the posterior probability of X conditioned on H
Rule-Based Classification:
- Rules are a good way of representing information or knowledge.
- A rule-based classifier uses a set of IF-THEN rules for Classification and is represented as

- The IF part is called as Precondition and THEN part is called as rule consequent.
- This implies that only if the condition is met is the next(THEN) part will execute.
Now let’s see how to classify Outlier. A database may contain data objects that don’t suits the overall behavior or model of the info . These data objects are Outliers. The investigation of OUTLIER data is understood as OUTLIER MINING. An outlier could also be detected or classify using statistical tests which assume a distribution or probability model for the info , or using distance measures where objects having alittle fraction of “close” neighbors in space are considered outliers. Rather than utilizing factual or distance measures, deviation-based techniques distinguish exceptions/outliers by inspecting differences within the principle attributes of things during a group.
Outlier detection (also referred to as anomaly detection) is that the process of finding data objects with behaviors that are very different from expectations. Such objects are called outliers or anomalies. Outlier detection is vital in many applications additionally to fraud detection like medical aid , public safety and security, industry damage detection, image processing, sensor/video network surveillance, and intrusion detection.
In general, outliers are often classified into three categories, namely global outliers, contextual (or conditional) outliers, and collective outliers. Let’s examine each of these categories.

Global Outliers: during a given data set, a knowledge object may be a global outlier if it deviates significantly from the remainder of the info set. Global outliers are sometimes called point anomalies and are the only sort of outliers. Most outlier detection methods are aimed toward finding global outliers.
Contextual Outliers: during a given data set, a knowledge object could also be a contextual outlier if it deviates significantly with regard to a specific context of the thing. Contextual outliers also are referred to as conditional outliers because they’re conditional on the chosen context. Therefore, in contextual outlier detection, the context possesses to be specified as a neighborhood of the matter definition. Unlike global outlier detection, in contextual outlier detection, whether a knowledge object is an outlier depends on not only the behavioral attributes but also the contextual attributes. Contextual outliers are a generalization of local outliers, a notion introduced in density-based outlier analysis approaches. An object during a data set may be a local outlier if its density significantly deviates from the local area during which it occurs.
Collective Outliers: during a given data set, a subset of knowledge objects forms a collective outlier if the objects as an entire deviate significantly from the whole data set. Importantly, the individual data objects might not be outliers. Unlike global or contextual outlier detection, in collective outlier detection, we’ve to think about not only the behavior of individual objects but also that of groups of objects. Therefore, to detect collective outliers, we’d like the background of the connection among data objects like distance or similarity measurements between objects.
Similar Reads
Associative Classification in Data Mining
Data mining is the process of discovering and extracting hidden patterns from different types of data to help decision-makers make decisions. Associative classification is a common classification learning method in data mining, which applies association rule detection methods and classification to c
7 min read
Clustering-Based approaches for outlier detection in data mining
Clustering Analysis is the process of dividing a set of data objects into subsets. Each subset is a cluster such that objects are similar to each other. The set of clusters obtained from clustering analysis can be referred to as Clustering. For example: Segregating customers in a Retail market as a
6 min read
Data Cube or OLAP approach in Data Mining
What is OLAP?OLAP stands for Online Analytical Processing, which is a technology that enables multi-dimensional analysis of business data. It provides interactive access to large amounts of data and supports complex calculations and data aggregation. OLAP is used to support business intelligence and
4 min read
Aggregation in Data Mining
Aggregation in data mining is the process of finding, collecting, and presenting the data in a summarized format to perform statistical analysis of business schemes or analysis of human patterns. When numerous data is collected from various datasets, it's important to gather accurate data to provide
7 min read
Random Forest Approach for Classification in R Programming
Random forest approach is supervised nonlinear classification and regression algorithm. Classification is a process of classifying a group of datasets in categories or classes. As random forest approach can use classification or regression techniques depending upon the user and target or categories
4 min read
Classification and Tabulation of Data
Classification and Tabulation of Data are fundamental processes in the field of statistics, especially in the context of economics. They transform raw data into a structured form, enabling better analysis, interpretation, and presentation of economic data. Proper classification ensures that data is
12 min read
Biclustering in Data Mining
In recent days there is a tremendous development in technology. With recent technological advances in such areas as IT and biomedicine, many are facing issues in extracting of required data from the huge volume of data. These modern computers can produce and store unlimited data. So the problem of p
4 min read
Classification on a large and noisy dataset with R
In this article, we will discuss What is noisy data and perform Classification on a large and noisy dataset with R Programming Language. What is noisy data?Noise in data refers to random or irrelevant information that interferes with the analysis or interpretation of the data. It can include errors,
8 min read
Classification Using Frequent Patterns in Data Mining
A data mining approach called frequent pattern mining is used to find recurring patterns in a dataset. It is a kind of unsupervised machine-learning technique that looks for and identifies patterns in data using algorithms. This method can be applied to find products that are frequently purchased to
7 min read
Objectives and Characteristics of Classification of Data
Data can not always be found in an organised manner. Therefore, an analyst or investigator has to properly organise the collected data for a better analysis of information and to reach the desired results. One of the most important methods of organising such data is known as the classification of da
5 min read