0% found this document useful (0 votes)
28 views11 pages

Data Mining and Predictive Modelling: Lecture 2: Functionalities, KDD Process, Data Attributes and Properties

The document discusses data mining and predictive modeling, outlining key functionalities such as data cleaning, integration, selection, and mining processes. It describes various data attributes, including nominal, binary, ordinal, numeric, discrete, and continuous attributes, and their significance in data analysis. Additionally, it covers data mining techniques like classification, prediction, clustering, and outlier analysis, emphasizing the importance of pattern evaluation and knowledge representation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views11 pages

Data Mining and Predictive Modelling: Lecture 2: Functionalities, KDD Process, Data Attributes and Properties

The document discusses data mining and predictive modeling, outlining key functionalities such as data cleaning, integration, selection, and mining processes. It describes various data attributes, including nominal, binary, ordinal, numeric, discrete, and continuous attributes, and their significance in data analysis. Additionally, it covers data mining techniques like classification, prediction, clustering, and outlier analysis, emphasizing the importance of pattern evaluation and knowledge representation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Data Mining

and
Predictive Modelling

Lecture 2: Functionalities, KDD Process, Data attributes and Properties


Data cleaning (to remove noise and inconsistent data)

Data integration (where multiple data sources may be combined)


Knowledge Data Selection (where data relevant to the analysis task are retrieved

Discovery
from the database)

Data mining (an essential process where intelligent methods are


applied to extract data patterns)

from Data Pattern evaluation (to identify the truly interesting patterns
representing knowledge)

Knowledge representation (where visualization and knowledge


representation techniques are used to present mined knowledge.)
Evaluation and
presentation
Knowledge
Data Mining
Selection and
Patterns
transformation

Cleaning and
integration

Data
warehouse

Database
Data Characterization

• Summarization of the general characteristics of an object


class of data.
• The data corresponding to the user-specified class is
generally collected by a database query.

Data discrimination
Data Mining • Comparison of the general characteristics of target class
data objects with the general characteristics of objects
Functionalities from one or a set of contrasting classes.
• The target and contrasting classes can be represented by
the user.

Association Analysis

• It analyses the set of items that generally occur together


in a transactional datasets. Two parameters ‘support and
confidence’ is used for determining association rules.
Classification

• Is the procedure of discovering a model that represents and


distinguishes data classes or concepts.
• The derived model is established on the analysis of a set of
training data (i.e., data objects whose class label is common)

Prediction
Data Mining • It defines predict some unavailable data values or pending
trends.
Functionalities • It can be a prediction of missing numerical values or
increase/decrease trends in time-related information.

Clustering

• It is similar to classification, but the classes are not


predefined.
• The classes are represented by the data attributes and known
as unsupervised learning.
Outlier Analysis

• These are the data elements that


cannot be grouped in a given class or
cluster.
Data Mining • These have multiple behavior from the
general behavior of other data objects.
Functionalities
Evolution Analysis

• It defines the trends for objects whose


behavior changes over some time.
Data and Attribute Types
• A data object represents an entity.
• The data objects are typically described by
attributes.
• Data objects can also be referred to as samples,
instances, data points, objects.
Data Objects • If the objects are stored in a database, they are
and data tuples.
• An attribute is a data field representing a
Attributes characteristic or feature of a data object.
• The distribution of data involving one attribute
(or variable) is called univariate.
• A bivariate distribution involves two attributes
and so on.
Nominal Attributes

• Nominal means “relating to names”.


• The values of nominal attributes are symbols or
names of things.
• Each value represents some kind of category, code,
or state, and so nominal attributes are also referred
to as categorical. These values do not have any

Attributes meaningful order.

Binary Attributes

• A binary attribute is a nominal attribute with only


two categories: 0 and 1, where 0 typically means
that the attribute is absent, and 1 means that it is
present.
• Binary attributes are referred to as Boolean if the
two states correspond to true or false.
Ordinal Attributes

• Is an attribute with possible values that have a


meaningful order or ranking among them, but the
magnitude between successive values is not known.
• Example: Student grades, customer satisfaction

Attributes Numeric Attributes (Quantitative)

• Interval-scaled attribute: temperature attribute is


interval-scaled.
• Ratio-scaled: is a numerical attribute with an
inherent zero point.
• If a measurement is ratio-scaled, we can speak of a
value as being a multiple (or ratio) of another value.
Discrete Attribute

• Has a finite and countably infinite set of values,


which may or may not be represented as
integers.
• Example: Student grades, customer satisfaction
• Countably infinite: customer ID, Pin codes

Attributes Continuous Attribute

• The attribute which is not discrete is continuous.


• Continuous attributes are typically represented
as floating point variables.
• The term numeric attribute and continuous
attribute are often used interchangeably in the
literature.

You might also like