0% found this document useful (0 votes)
113 views

Class Comparison Methods in Data Mining - Javatpoint

Class comparison methods in data mining analyze and compare target and contrasting classes that are comparable. The key steps are to collect relevant data and partition it into classes, perform dimension relevance analysis, synchronously generalize all classes to the same abstraction levels, and present the comparison in tables, charts or rules. An example is given of comparing graduate and undergraduate students using attributes like name, gender and GPA, and analyzing the count% to show differences between the classes. Discriminant rules can also be used to quantitatively show distinguishing features of each class.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
113 views

Class Comparison Methods in Data Mining - Javatpoint

Class comparison methods in data mining analyze and compare target and contrasting classes that are comparable. The key steps are to collect relevant data and partition it into classes, perform dimension relevance analysis, synchronously generalize all classes to the same abstraction levels, and present the comparison in tables, charts or rules. An example is given of comparing graduate and undergraduate students using attributes like name, gender and GPA, and analyzing the count% to show differences between the classes. Discriminant rules can also be used to quantitatively show distinguishing features of each class.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

2/14/23, 11:57 AM Class Comparison Methods in Data Mining - Javatpoint

Class Comparison Methods in Data Mining


In many applications, users may not be interested in having a single class or concept described or
characterized but rather would prefer to mine a description comparing or distinguishing one class
(or concept) from other comparable classes (or concepts). Class discrimination or comparison
(hereafter referred to as class comparison) mines descriptions that distinguish a target class from its
contrasting classes. Notice that the target and contrasting classes must be comparable because they
share similar dimensions and attributes. For example, the three classes, person, address, and item,
are not comparable.

The previous sections' discussions on class characterization handle multilevel data summarization
and characterization in a single class. However, the sales in the last three years are comparable
classes, and so are computer science students versus physics students. The techniques developed
can be extended to handle class comparison across several comparable classes.

For example, the attribute generalization process described for class characterization can be
modified so that the generalization is performed synchronously among all the classes compared.
This allows the attributes in all classes to be generalized to the same levels of abstraction. Suppose
that we are given the All Electronics data for sales in 2003 and sales in 2004 and would like to
compare these two classes. Consider the dimension location with abstractions at the city, province
or state, and country levels. Each class of data should be generalized to the same location level.
They are synchronously all generalized to either the city level, the province or state level, or the
country level. Ideally, this is more useful than comparing the sales in Vancouver in 2003 with the
sales in the United States in 2004 (i.e., where each set of sales data is generalized to a different
level). The users, however, should have the option to overwrite such an automated, synchronous
comparison with their own choices when preferred.

Class Comparison Methods and Implementation


The general procedure for class comparison is as follows:

https://www.javatpoint.com/class-comparison-methods-in-data-mining 2/8
2/14/23, 11:57 AM Class Comparison Methods in Data Mining - Javatpoint

1. Data Collection: The set of relevant data in the database and data warehouse is collected by
query Processing and partitioned into a target class and one or a set of contrasting classes.

2. Dimension relevance analysis: If there are many dimensions and analytical comparisons are
desired, then dimension relevance analysis should be performed. Only the highly relevant
dimensions are included in the further analysis.

3. Synchronous Generalization: The process of generalization is performed upon the target


class to the level controlled by the user or expert specified dimension threshold, which results
in a prime target class relation or cuboid. The concepts in the contrasting class or classes are
generalized to the same level as those in the prime target class relation or cuboid, forming
the prime contrasting class relation or cuboid.

4. Presentation of the derived comparison: The resulting class comparison description can be
visualized in the form of tables, charts, and rules. This presentation usually includes a
"contrasting" measure (such as count%) that reflects the comparison between the target and
contrasting classes. As desired, the user can adjust the comparison description by applying
drill-down, roll-up, and other OLAP operations to the target and contrasting classes.
For example, the task we want to perform is to compare graduate and undergraduate
students using the discriminant rule. So to do this, the DMQL query would be as follows.

use University_Database  
mine comparison as "graduate_students vs_undergraduate_students"  
in relevance to name, gender, program, birth_place, birth_date, residence, phone_no, GPA  
for "graduate_students"  
where status in "graduate"  
versus "undergraduate_students"  
where status in "undergraduate"  

https://www.javatpoint.com/class-comparison-methods-in-data-mining 3/8
2/14/23, 11:57 AM Class Comparison Methods in Data Mining - Javatpoint

analyze count%  
from student  

Now from this, we can formulate that

attributes = name, gender, program, birth_place, birth_date, residence, phone_no, and GPA.

Gen(ai)= concept hierarchies on attributes ai.

Ui = attribute analytical thresholds for attributes ai.

Ti = attribute generalization thresholds for attributes ai.

R = attribute relevance threshold.

Presentation of Class Comparison Descriptions

As with class characterizations, class comparisons can be presented to the user in various forms,
including generalized relations, crosstabs, bar charts, pie charts, curves, and rules. Except for logic
rules, these forms are used in the same way for characterization as for comparison. This section
discusses the visualization of class comparisons in the form of discriminant rules.

Similar to characterization descriptions, the discriminative features of the target and contrasting
classes of a comparison quantitatively by a quantitative discriminant rule, which associates a
statistical interestingness measure, d-weight, with each generalized tuple in the description.

← Prev Next →

Integrated CAD, CAM, &


CAE
Simplify your entire workflow with one
unified platform. Try Fusion 360 free for
30 days.

Autodesk Subscribe

https://www.javatpoint.com/class-comparison-methods-in-data-mining 4/8

You might also like