Lecture Notes 1.3 & 1.4
Lecture Notes 1.3 & 1.4
DATA PREPROCESSING:
Definition - What does Data Preprocessing mean?
Data preprocessing is a data mining technique that involves transforming raw
data into an understandable format. Real-world data is often incomplete,
inconsistent, and/or lacking in certain behaviors or trends, and is likely to
contain many errors. Data preprocessing is a proven method of resolving such
issues. Data preprocessing prepares raw data for further processing. Data
preprocessing is used database-driven applications such as customer
relationship management and rule-based applications (like neural networks).
Data goes through a series of steps during preprocessing:
• Data Cleaning: Data is cleansed through processes such as filling in
missing values, smoothing the noisy data, or resolving the inconsistencies in
the data.
• Data Integration: Data with different representations are put together and
conflicts within the data are resolved.
• Data Transformation: Data is normalized, aggregated and generalized.
• Data Reduction: This step aims to present a reduced representation of the
data in a data warehouse.
• Data Discretization: Involves the reduction of a number of values of a
continuous attribute by dividing the range of attribute intervals.