Chapter 2 - Data Preprocessing
Chapter 2 - Data Preprocessing
WHY DO WE
NEED TO
PREPROCESS THE
DATA?
MUCH OF THE RAW DATA CONTAINED IN DATABASES IS
UNPREPROCESSED, INCOMPLETE AND NOISY
THE DATABASES MAY CONTAIN:
• FIELDS THAT ARE REDUNDANT
• MISSING VALUE
• OUTLIERS
• DATA IN A FORM NOT SUITABLE FOR DATA
MINNING MODELS
• VALUES NOT CONSISTENT WITH POLICY OR
COMMON SENSE
TWO PRINCIPLE METHOD
DATA CLEANING
DATA TRANSFORMATION
DATA CLEANING
HANDLING MISSING DATA
INSIGHTFUL MINER OFFERS A CHOICE OF
REPLACEMENT VALUES FOR MISSING DATA: