0% found this document useful (0 votes)
7 views2 pages

Date Cleaning Notes

Uploaded by

mykanadinechua
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views2 pages

Date Cleaning Notes

Uploaded by

mykanadinechua
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

📌 Data Cleaning Notes

🔹 What is Data Cleaning?


●​ The process of detecting and correcting (or removing) errors, inconsistencies, and
inaccuracies in datasets.​

●​ Ensures that data is accurate, complete, consistent, and reliable for analysis or
decision-making.​

🔹 Common Issues in Raw Data


1.​ Missing values – empty or null fields.​

2.​ Duplicates – repeated records.​

3.​ Inconsistent formatting – e.g., "PH", "Philippines", "PHIL" for the same country.​

4.​ Outliers – unusual values that may be errors.​

5.​ Incorrect data types – e.g., numbers stored as text.​

6.​ Noise or irrelevant data – unnecessary information.​

🔹 Steps in Data Cleaning


1.​ Remove duplicates – drop or merge repeated entries.​

2.​ Handle missing values:​

○​ Delete rows/columns (if too many missing).​

○​ Fill in with mean, median, mode, or placeholder values.​

3.​ Correct inconsistencies – standardize formats (e.g., dates, units, spelling).​

4.​ Fix data types – convert text to numeric, ensure correct date/time formats.​
5.​ Handle outliers – investigate and decide whether to remove or keep.​

6.​ Validate data – check for logical accuracy (e.g., age cannot be negative).​

7.​ Normalize/standardize values – ensure uniform scales (e.g., all in USD).​

🔹 Tools & Methods Used


●​ Spreadsheets (Excel, Google Sheets) – basic cleaning.​

●​ Programming:​

○​ Python: pandas, NumPy, OpenRefine.​

○​ R: dplyr, tidyr.​

●​ Databases: SQL queries for filtering and updating.​

🔹 Benefits of Data Cleaning


●​ Improves accuracy of analysis.​

●​ Saves time and cost in decision-making.​

●​ Leads to better predictions and insights.​

●​ Ensures data quality and trustworthiness.​

You might also like