Data is a gathered body of facts.
Importance of data
• Decision making
• Problem Solving
• Personalization
• Innovation
• Accountability and Transparency
Data Source is the location where Data that is being used comes from.
Internal data includes data that is being stored in the organization. e.g. Internal documents, In house call
centers, website logs,
External data includes data that is not collected by the organization. E.g. Social media, official websites,
publicly available data,
Primary data means data that has been collected from the original source e.g. Surveys, Questionnaire,
Interviews, Observation.
Advantages Disadvantages
Data Interpretation is better High Cost
Targeted issues are addressed Time consuming
Decency of data Inaccurate feedbacks
Efficient spending More number of sources required
Secondary data are data that has already been collected by and available form other sources e.g. books,
newspapers, websites, journals, weblogs
Advantages Disadvantages
Inexpensive Some are expensive
Easily accessible Takes time to analyze
Immediately available Incomplete Information
Primary data uses surveys, experiments, or direct observations while secondary data uses data from
diverse sources of documents or electronically stored information, census, data mining
Qualitative(categorical) data: objects being studied are grouped into categories based on some
qualitative trait e.g. gender, ethnicity, political preference.
Quantitative(measurement) data: objects being studied are grouped into categories based on some
quantitative trait. Resulting data are a set of numbers
Data types
• Continuous data: Continuous data is data that can take any value. Height, weight, temperature
and length are all examples of continuous data. Some continuous data will change over time
• Discrete data: Data that can only take certain values is called discrete data or discrete values.
• Ordinal data: Ordinal data is a kind of qualitative data that groups variables into ordered
categories.
• Nominal data: is a type of data that is used to label variables without providing any quantitative
value.
• Open data: data that is freely available to anyone in terms of its use
Discrete growth change happens at specific intervals
Continuous growth change happens at every instant
Importance of data
1. Improve people’s lives
2. Make informed decisions
3. Stop molehills from turning into mountains
4. Get the result you want
5. Find solutions to problems
Data collection is the process of collecting information on specific variables.
Qualitative method of data collecting the collection of numerical data that can be analyzed using
statistical methods. E.g. interviews, focus groups, observations, oral theories.
Quantitative method of data collecting deals with something that can be counted e.g. questionnaires,
surveys, documents, and records.
Data collection tools
1. Interviews
2. Questionnaires and surveys
3. Observations
4. Focus groups
Uses of data collections
1. Improving your understanding of your audience
2. Identifying areas for improvement or expansions
3. Predicting future patterns
4. Better personalizing your content and messaging
Characteristics that define quality data
1. Accuracy and precision
2. Legitimacy of validity
3. Reliability and consistency
4. Timeliness and relevance
5. Completeness and comprehensiveness
6. Availability and accessibility
7. Granularity and uniqueness
Data cleaning is the process of ensuring your data is correct, consistent, and usable. It is fixing any data
that is inaccurate, incomplete, incorrectly formatted, duplicated
Importance if data cleaning
1. Data cleaning is a vital step to ensure that the answers you generate are accurate
2. Data cleaning is important to ensure that we achieve high data integrity
Steps cleaning data
1. Removal of unwanted observations
2. Fixing structural errors
3. Managing unwanted outliners
4. Handling missing data
Data cleaning techniques
1. Get rid of extra spaces
2. Select and treat all blank cells
3. Convert numbers stored as texts into numbers
4. Remove duplicates
5. Highlight errors
6. Change text to lower/upper/proper case
7. Spell check
8. Delete all formatting