Math: Data Analysis
2nd Semester
C Nature of Probability and Statistics
1
Introduction to Statistics
Statistics: The science of collecting, organizing, summarizing,
analyzing, and drawing conclusions from data.
Uses of Statistics:
o Analyzing survey results.
o Decision-making in scientific research.
o Used in fields like sports, public health, education, and
business.
Importance of Studying Statistics:
1. Understand statistical studies in various fields.
2. Conduct research, design experiments, and analyze data.
3. Make informed decisions as consumers and citizens.
1. Descriptive and Inferential Statistics
Descriptive Statistics
Summarizes and describes the main features of a dataset.
Focuses on presenting data without making inferences.
Example:
o A company records test scores of 100 students.
o Descriptive statistics include:
Mean (Average): 75
Median: 78
Standard Deviation: 10
25% of students scored below 70
Math: Data Analysis
2nd Semester
o These describe the dataset but do not generalize beyond it.
Inferential Statistics
Makes inferences from samples to populations.
Includes hypothesis testing, determining relationships, and making
predictions.
Example:
o Using test scores from 100 students to estimate the average
math score of all students in a district.
o A confidence interval (e.g., 95% confidence that the district’s
average score is between 73 and 77).
o Hypothesis testing to check if the average score significantly
differs from 80.
o Generalizing results from a sample to a larger group.
2. Variables and Types of Data
Definition of Key Terms
Variable: A characteristic or attribute that can assume different
values.
Data: The values (measurements or observations) that variables
assume.
Random Variable: A variable whose values are determined by
chance.
Data Set: A collection of data values.
Datum: A single value in a dataset.
Types of Variables
1. Quantitative Variables
o Numerical and can be ordered or ranked.
o Types:
Math: Data Analysis
2nd Semester
Discrete Variables: Countable values (e.g., number of
students, books on a shelf).
Continuous Variables: Measurable values, including
fractions and decimals (e.g., height, weight,
temperature).
2. Qualitative Variables
o Non-numerical; categorized based on characteristics.
o Examples: Gender, religion, nationality, eye color.
Levels of Measurement
1. Nominal Level:
o Data categorized without ranking.
o Examples: Gender, blood type, nationality, car brands.
2. Ordinal Level:
o Data can be ranked, but differences between ranks are not
equal.
o Examples: Educational levels, customer satisfaction ratings,
job performance ratings.
3. Interval Level:
o Data is ordered with equal intervals, but no true zero.
o Examples: Temperature (°C, °F), IQ scores, calendar years.
4. Ratio Level:
o Data has equal intervals and a true zero.
o Examples: Height, weight, income, distance traveled.
3. Data Collection and Sampling Techniques
Methods of Data Collection
1. Telephone Survey:
o Less costly, more honest responses.
Math: Data Analysis
2nd Semester
o Some people may not answer or have unlisted numbers.
2. Mailed Questionnaire:
o Covers a larger geographic area, maintains anonymity.
o Low response rates, potential for misunderstood questions.
3. Personal Interview:
o In-depth responses, but costly and may introduce interviewer
bias.
Sampling Techniques
1. Random Sampling:
o Every member of the population has an equal chance of
selection.
o Uses chance methods or random number generators.
2. Systematic Sampling:
o Selecting every kth member from the population.
o Example: Choosing every 40th subject from a population of
2000.
3. Stratified Sampling:
o Population divided into subgroups (strata), then random
samples taken from each.
o Example: Selecting students from different grade levels.
4. Cluster Sampling:
o Population divided into groups (clusters), then entire clusters
are selected randomly.
o Example: Selecting all residents from randomly chosen
apartment buildings.
Other Sampling Methods
Convenience Sampling: Uses readily available subjects (e.g., mall
surveys).
Math: Data Analysis
2nd Semester
Volunteer Sampling: Respondents choose to participate (e.g., call-
in surveys).
Types of Sampling Errors
1. Sampling Error:
o Occurs when a sample does not perfectly represent the
population.
o Example: A survey estimates 55% support for a candidate, but
actual votes show 52%.
2. Non-Sampling Error:
o Errors in data collection, recording, or survey design.
o Example: Biased survey questions, misreported income, or
excluding certain groups.
4. Experimental Design
Types of Studies
1. Observational Study:
o Researcher observes subjects without manipulation.
o Types:
Cross-Sectional Study: Data collected at one time.
Retrospective Study: Uses past records.
Longitudinal Study: Data collected over time.
o Example: Studying smoking habits and lung cancer over 10
years.
2. Experimental Study:
o Researcher manipulates variables to observe effects.
o Example: Studying effects of exercise on stress levels with
control and experimental groups.
Advantages & Disadvantages
Math: Data Analysis
2nd Semester
Observational Study:
o ✅ Natural setting, ethical for sensitive topics.
o ❌ Cannot establish cause-and-effect, expensive.
Experimental Study:
o ✅ Researcher controls variables, can determine cause-and-
effect.
o ❌ May not apply to real-world settings, subjects may change
behavior due to observation (Hawthorne effect).
5. Uses and Misuses of Statistics
Common Misuses of Statistics
1. Suspect Samples:
o Small, convenience, or volunteer samples may not represent
the population.
2. Ambiguous Averages:
o Different measures of average (mean, median, mode) can be
used to mislead.
3. Changing the Subject:
o Using different values to represent the same data to influence
perception.
o Example: "3% budget increase" vs. "$6 million increase."
4. Detached Statistics:
o Claims without comparisons.
o Example: "This drug is 50% more effective" (compared to
what?).
5. Implied Connections:
o Suggests relationships without proof.
o Example: "Eating fish may lower cholesterol."
Math: Data Analysis
2nd Semester
6. Misleading Graphs:
o Improperly drawn graphs can exaggerate trends.
7. Faulty Survey Questions:
o Poorly worded questions influence responses.
o Example: "Do you support raising taxes to build a stadium?"
vs. "Should a new stadium be built?"
Summary
Statistics helps collect, analyze, and interpret data.
Two branches: Descriptive (summarizing data) and Inferential
(drawing conclusions).
Data types: Quantitative (discrete, continuous) and Qualitative.
Measurement levels: Nominal, Ordinal, Interval, Ratio.
Sampling techniques: Random, Systematic, Stratified, Cluster.
Research methods: Observational vs. Experimental studies.
Beware of statistical misuse through biased sampling, misleading
graphs, and ambiguous claims.
Math: Data Analysis
2nd Semester
C Frequency Distribution and Graphs
2
1. Frequency Distribution
Data collected in original form is called raw data.
Nominal – or ordinal-level data that can be placed in categories is
organized in categorical frequency distributions.
A frequency distribution is a table that organizes data into classes
or groups, showing the number of observations in each category.
Purpose:
o Organizes large data sets into a manageable format.
o Makes patterns and trends easier to identify.
o Helps in data analysis and interpretation.
Components of a Frequency Distribution Table
1. Class Intervals – The specific ranges into which the data is
grouped.
2. Frequency (f) – The number of values falling within a class interval.
3. Class Boundaries – The actual limits of a class interval that
separate it from adjacent intervals.
Math: Data Analysis
2nd Semester
4. Class Width – The difference between the lower boundaries of
consecutive classes.
5. Midpoint – The central value of a class interval, calculated as:
Lower Class Limit +Upper Class Limit
midpoint =
2
6. Relative Frequency – The proportion of total observations that fall
into a class, given by:
Class Frequency
Relative Frequency=
Total Frequency
7. Cumulative Frequency – The running total of frequencies,
indicating how many values are below a particular class boundary.
2. Types of Frequency Distributions
1. Ungrouped Frequency Distribution:
o A simple count of occurrences for each individual data point.
o Used for small data sets or discrete data.
2. Grouped Frequency Distribution:
o Data is grouped into class intervals for better readability.
o Used for larger data sets or continuous data.
3. Relative Frequency Distribution:
o Expresses class frequencies as proportions or percentages.
o Useful for comparing different data sets of varying sizes.
4. Cumulative Frequency Distribution:
o Displays cumulative totals, showing how frequencies build up
over intervals.
3. Constructing a Grouped Frequency Distribution
Math: Data Analysis
2nd Semester
The following data represent the record high temperatures for each of the
50 states. Construct a grouped frequency distribution for the data using 7
classes.
112 100 127 120 134 118 105 110 109 112
110 118 117
4. Graphical Representations of Frequency Distributions
Graphs help in visualizing frequency distributions effectively.
Types of Graphs:
A. Histogram
A bar graph representing frequency distribution.
Characteristics:
o No gaps between bars (unlike bar charts).
o X-axis represents class intervals.
o Y-axis represents frequencies.
o Heights of bars indicate the frequency of observations.
B. Frequency Polygon
A line graph connecting midpoints of class intervals.
Steps to construct:
1. Plot midpoints on the X-axis.
2. Plot frequencies on the Y-axis.
3. Connect points with straight lines.
C. Ogive (Cumulative Frequency Graph)
A line graph that represents cumulative frequency.
Math: Data Analysis
2nd Semester
Types:
1. Less than Ogive – Shows cumulative frequency below each
class boundary.
2. Greater than Ogive – Shows cumulative frequency above
each class boundary.
Usage:
o Helps determine median and percentiles.
D. Bar Graph
Represents categorical data using rectangular bars.
Bars can be vertical or horizontal.
Spacing: Bars are separated (unlike histograms).
E. Pie Chart
A circular graph divided into proportional segments.
Represents relative frequency of categories.
F. Dot Plot
Uses dots to show individual data points.
Suitable for small data sets.
G. Stem-and-Leaf Plot
Represents data while maintaining original values.
Splits numbers into stems (leading digits) and leaves (trailing
digits).
4. Choosing the Right Graph
DATA TYPE RECOMMENDED GRAPH
Math: Data Analysis
2nd Semester
QUANTITATIVE Histogram, Frequency Polygon,
(CONTINUOUS) Ogive
CATEGORICAL Bar Graph, Pie Chart
SMALL DATA SETS Dot Plot, Stem-and-Leaf Plot
5. Summary
Frequency distributions help in organizing data systematically.
Graphs make it easier to visualize and interpret data.
Choice of graph depends on the type of data being analyzed.