0% found this document useful (0 votes)
37 views31 pages

Data Collection and Preparation

Uploaded by

cadizharleyn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views31 pages

Data Collection and Preparation

Uploaded by

cadizharleyn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Data Collection

and Preparation
PEARLY ANN A. ESCALAMBRE, MIT
OBJECTIVES
 Importance of Data Collection and
Preparation
 Basic Data Quality Assessment
 Ethical Considerations in Data Collection
 Importing Data from Various Sources
 Data Cleaning and Preprocessing in
Excel
 Handling Missing Data and Outliers
Importance of Data Collection and
Preparation
Data collection
is the
systematic
gathering of
information for
analysis
Why It Matters:

Informs decision-making
processes.
Enhances the accuracy of
insights derived from data.
Supports strategic planning and
operational efficiency.
Basic Data Quality
Assessment
What is Data Quality
Assessment (DQA)?
Evaluates data
accuracy,
completeness,
reliability, and
validity.
Is it a quality data?
Is it a quality data?
Is it a quality data?
Is it a quality data?
Key Components:
Accuracy: How well data reflects
real-world scenarios.
Completeness: Whether all
necessary data is present.
Consistency: Uniformity across
datasets.
Validity: Adherence to defined
rules and formats
Ethical Considerations in
Data Collection
Informed Consent: Participants should
understand how their data will be used.
Privacy Protection: Safeguarding
personal information is crucial.
Data Ownership: Clarifying who owns
the data collected and how it can be used
Do not share personal
info on the internet
To avoid identity theft
Importing Data from
Various Sources
Common Sources:

CSV files
Excel spreadsheets
Databases (SQL, NoSQL)
Steps to Import:
Identify the source format.
Use appropriate tools or software
(e.g., Excel, Python) to import data.
Microsoft Word (.doc)
Microsoft PowerPoint (.ppt / .pptx)
Microsoft Excel (.xls )
Import Data: CSV File Example
Data Cleaning and Preprocessing
in Excel
What is Data Cleaning?
The process of
correcting or removing
inaccurate records from
a dataset.
Steps in Excel:

Remove duplicates using the


"Remove Duplicates" feature.
Use filters to identify and correct
errors.
Standardize formats for
consistency.
Remove Duplicates
Duplicates will be highlighted into
red, then delete
Some other versions will be seen
at the Conditional Formatting
Handling Missing Data and
Outliers
Missing Data Strategies:
Imputation (filling in
missing values).
Deleting rows/columns
with excessive missing
values.
Outlier Treatment:

Identify outliers using


statistical methods (e.g.,
Z-scores).
Decide whether to
remove or adjust outliers
based on context.
How to get Outliers in Excel
END

You might also like