0% found this document useful (0 votes)
4 views

Lecture Notes 1.3 & 1.4

Data preprocessing is a critical technique in data mining that transforms raw data into a usable format by addressing issues such as incompleteness, inconsistency, and errors. The process includes steps like data cleaning, integration, transformation, reduction, and discretization to ensure data quality across multiple dimensions such as accuracy and completeness. This preparation is essential for effective data analysis in various applications, including customer relationship management and neural networks.

Uploaded by

Sajal Jain
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Lecture Notes 1.3 & 1.4

Data preprocessing is a critical technique in data mining that transforms raw data into a usable format by addressing issues such as incompleteness, inconsistency, and errors. The process includes steps like data cleaning, integration, transformation, reduction, and discretization to ensure data quality across multiple dimensions such as accuracy and completeness. This preparation is essential for effective data analysis in various applications, including customer relationship management and neural networks.

Uploaded by

Sajal Jain
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Lecture Notes

Course Name: Data Mining and Warehousing


Course Code: 22CSH– 380

DATA PREPROCESSING:
Definition - What does Data Preprocessing mean?
Data preprocessing is a data mining technique that involves transforming raw
data into an understandable format. Real-world data is often incomplete,
inconsistent, and/or lacking in certain behaviors or trends, and is likely to
contain many errors. Data preprocessing is a proven method of resolving such
issues. Data preprocessing prepares raw data for further processing. Data
preprocessing is used database-driven applications such as customer
relationship management and rule-based applications (like neural networks).
Data goes through a series of steps during preprocessing:
• Data Cleaning: Data is cleansed through processes such as filling in
missing values, smoothing the noisy data, or resolving the inconsistencies in
the data.
• Data Integration: Data with different representations are put together and
conflicts within the data are resolved.
• Data Transformation: Data is normalized, aggregated and generalized.
• Data Reduction: This step aims to present a reduced representation of the
data in a data warehouse.
• Data Discretization: Involves the reduction of a number of values of a
continuous attribute by dividing the range of attribute intervals.

Why Is Data Dirty?


• Incomplete data comes from
• n/a data value when collected
• different consideration between the time when the data was collected and
when it is analyzed.
• human/hardware/software problems
• Noisy data comes from the process of data
• collection
• entry

Apex Institute of Technology, Chandigarh University, India


• transmission
• Inconsistent data comes from
• Different data sources
• Functional dependency violation

Multi-Dimensional Measure of Data Quality


• A well-accepted multidimensional view:
• Accuracy
• Completeness
• Consistency
• Timeliness
• Believability
• Value added
• Interpretability
• Accessibility
• Broad categories:
intrinsic, contextual, representational, and accessibility.
Why Data Pre-processing? Data preprocessing prepares raw data for further processing. The
traditional data preprocessing method is reacting as it starts with data that is assumed ready
for analysis and there is no feedback and impart for the way of data collection. The data
inconsistency between data sets is the main difficulty for the data preprocessing
Suggestive Reading Material
• TEXT BOOKS
Introduction to Data Mining, Tan, Steinbach and Vipin Kumar, Pearson Education, 2016
• REFERENCE BOOKS
Data Mining: Concepts and Techniques, Pei, Han and Kamber, Elsevier
• Journals:
• http://www.ijsmsjournal.org/ijsms-v4i4p137.html
• https://www.springer.com/journal/41060

Apex Institute of Technology, Chandigarh University, India

You might also like