ESSENTIALS OF
STATISTICS
INTRODUCTION TO STATISTICS
Topics
Statistical Thinking
Types of Data
Critical Thinking
Collecting Sample Data
Definitions
• Polls, studies, surveys and other data • obtaining data; and
collecting tools collect data from a • organizing, summarizing, presenting,
small part of a larger group so that we analyzing, and interpreting those data and
then drawing conclusions based on them.
can learn something about the larger
group. • A population:
the complete collection of all measurements
• Goal of statistics: Learn about a large
or data that are being considered.
group by examining data from some of
(Typically, a population is the complete
its members. collection of data that we would like to
make inferences about)
• Data:
• A census:
collections of observations, such as
measurements, genders, or survey the collection of data from every member of
responses. the population.
• A sample:
• Statistics is the science of:
a subcollection of members selected from a
• planning studies and experiments; population.
Statistical Thinking
Key concepts when conducting statistical the validity of the conclusion?
analysis: • Voluntary response (or self-selected)
Context of the data samples often have bias (those with special
interest are more likely to participate).
• What do the values represent? These samples’ results are not necessarily
• Where did the data come from? valid.
• Why were they collected? • Other methods are more likely to produce
• An understanding of the context will directly good results.
affect the statistical procedure used.
Conclusions
Source of the data • Make statements that are clear to those
• Is the source objective? without an understanding of statistics and
its terminology.
• Is the source biased?
• Avoid making statements not justified by the
• Is there some incentive to distort or spin statistical analysis.
results to support some self-serving
position? Practical implications
• Is there something to gain or lose by • State practical implications of the results.
distorting results?
• There may exist some statistical significance
• Be vigilant and skeptical of studies from yet there may be NO practical significance.
sources that may be biased.
• Common sense might suggest that the
Sampling method finding does not make enough of a
difference to justify its use or to be practical.
• Does the method chosen greatly influence
Statistical Significance
• Consider the likelihood of getting the results by
chance.
• If results could easily occur by chance, then they
are not statistically significant.
• If the likelihood of getting the results is so small,
then the results are statistically significant.
Population Parameter and Sample
Statistic
• A parameter is a numerical
measurement describing some
characteristic of a population.
• A statistic is a numerical
measurement describing some
characteristic of a sample.
Data
• Quantitative (or numerical) data number of possible values is
consist of numbers representing either a finite number or a
counts or measurements. ‘countable’ number (i.e. the
E.g. weight, age, IQ number of possible values is 0, 1,
2, 3, . . .)
• Categorical (or qualitative or
attribute) data consist of names or E.g. The number of eggs that a
labels (not numbers that represent hen lays
counts or measurements). Continuous (numerical) data:
E.g. The genders (male/female), result from infinitely many
shirt numbers on professional possible values that correspond
athletes uniforms to some continuous scale that
covers a range of values without
gaps, interruptions, or jumps
• Quantitative data: discrete and
E.g. The amount of milk that a
continuous types.
cow produces; e.g. 2.343
Discrete data: result when the
gallons per day
Level of Measurement
Data and Time (Structure of Data)
• Cross-sectional data:
• data collected at the same or
approximately the same point in time.
• E.g. The data of five variables for the 60
World Trade Organization nations at the
same point in time.
• Time series data:
• data collected over several time periods or
at different points in time.
• E.g. The Average price per gallon of
conventional regular gasoline between
2009 and 2014.
• Panel data or longitudinal data:
• data that contains observations about
different cross sections across time.
Statistics Deceptions
• Success in this course typically
requires more common sense than
mathematical expertise.
• Think carefully about the context,
source, method, conclusions and
practical implications.
• Typically 2 ways in misuse of
statistics:
• evil intent on the part of
dishonest persons;
• unintentional errors on the part
of people who don’t know any
better.
• We should learn to distinguish
between statistical conclusions that
are likely to be valid and those that
are seriously flawed.
Misuse of Statistics
• Misuse of Graphs Conclusions should not be based on
Statistical data are often presented in visual samples that are far too small. E.g. Basing a
form, that is, in graphs. Data represented school suspension rate on a sample of only
graphically must be interpreted carefully, three students.
and beware of misleading graphs. • Percentages
• Bad Samples, Voluntary response sample Misleading or unclear percentages are
In which the respondents themselves sometimes used. For example, if you take
decide whether to be included. In this case, 100% of a quantity, you take it all. If you
valid conclusions can be made only about have improved 100%, then are you
the specific group of people who agree to perfect?! 110% of an effort does not make
participate and not about the population. sense.
• Correlation and Causality • Loaded Questions
Concluding that one variable causes the If survey questions are not worded
other variable when in fact the variables carefully, the results of a study can be
are linked. Two variables may seemed misleading. Survey questions can be
linked, smoking and pulse rate, this “loaded” or intentionally worded to elicit a
relationship is called correlation. Cannot desired response. Too little money is being
conclude the one causes the other. spent on “welfare” versus too little money
Correlation does not imply causality. is being spent on “assistance to the poor.”
Results: 19% versus 63%
• Small Samples
Misuse of Statistics
• Order of Questions homeless or low income).
• Questions are unintentionally loaded by • Self-Interest Study
such factors as the order of the items Some parties with interest to promote will
being considered. sponsor studies. Be wary of a survey in
• Would you say traffic contributes more or which the sponsor can enjoy monetary
less to air pollution than industry? gain from the results. When assessing
Results: traffic - 45%; industry - 27% validity of a study, always consider
• When order reversed. whether the sponsor might influence the
Results: industry - 57%; traffic - 24% results.
• Nonresponse • Precise Numbers
Occurs when someone either refuses to Because as a figure is precise, many
respond to a survey question or is people incorrectly assume that it is also
unavailable. People who refuse to talk to accurate. A precise number can be an
pollsters have a view of the world around estimate, and it should be referred to that
them that is markedly different than those way. E.g. Total citizens.
who will let poll-takers into their homes. • Deliberate Distortions
• Missing Data Some studies or surveys are distorted on
Can dramatically affect results. Subjects purpose. The distortion can occur within
may drop out for reasons unrelated to the the context of the data, the source of the
study. People with low incomes are less data, the sampling method, or the
likely to report their incomes. US Census conclusions.
suffers from missing people (tend to be
Collecting Samples
• If sample data are not collected in an appropriate way, the data may
be so completely useless that no amount of statistical torturing can
salvage them.
• Method used to collect sample data influences the quality of the
statistical analysis.
• Of particular importance is simple random sample.
• Random Sample
Members from the population are selected in such a way that each
individual member in the population has an equal chance of being
selected
• Probability Sample
Selecting members from a population in such a way that each
member of the population has a known (but not necessarily the
same) chance of being selected
Sampling Methods
Simple Random Sample subgroup (or stratum)
A sample of n subjects is selected Cluster Sampling
so that every sample of the same Divide the population area into
size n has the same chance of sections (or clusters); randomly
being selected. select some of those clusters;
Systematic Sampling choose all members from selected
Select some starting point and clusters
then select every kth element in Multistage Sampling
the population Collect data by using some
Convenience Sampling combination of the basic
Use results that are easy to get sampling methods
Stratified Sampling In a multistage sample design,
pollsters select a sample in
Subdivide the population into at different stages, and each stage
least two different subgroups that might use different methods of
share the same characteristics, sampling
then draw a sample from each
Kasus
• Give comment on the given calculation/conclusion!
a. The first Super Bowl attended by the author was Super Bowl XLVIII. On the first play of the
game, the Seattle defense scored on a safety. The defensive players wore jerseys numbered
31, 28, 41, 56, 25, 54, 69, 50, 91, 72, 29, and the average (mean) of those numbers is 49.6.
b. As this exercise is being written, it is 80oF at the author’s home and it is 40°F in Auckland,
New Zealand, so it is twice as warm at the author’s home as it is in Auckland, New Zealand.
• Determine whether each of the following is a simple random sample and a random sample.
a. A statistics class with 36 students is arranged so that there are 6 rows with 6 students in
each row, and the rows are numbered from 1 through 6. A die is rolled and a sample consists
of all students in the row corresponding to the outcome of the die.
b. For the same class described in part (a), the 36 student names are written on 36 individual
index cards. The cards are shuµed and six names are drawn from the top.
c. For the same class described in part (a), the six youngest students are selected.
Observation vs Experiments
In an experiment:
• we apply some treatment and then proceed to observe
its effects on the individuals. (The individuals in
experiments are called experimental units, and they are
often called subjects when they are people.)
In an observational study:
• we observe and measure specific characteristics, but we
don’t attempt to modify the individuals being studied.
Types of Studies
• Cross-sectional study:
data are observed, measured, and collected at one
point in time, not over a period of time.
• Retrospective (or case-control) study:
data are collected from a past time period by going
back in time (through examination of records,
interviews, and so on).
• Prospective (or longitudinal or cohort):
study, data are collected in the future from groups that
share common factors (such groups are called
cohorts).
Experiments
• In an experiment, confounding occurs
when we can see some effect, but we
can’t identify the specific factor that
caused it.
• The bad experimental design creates
problem:
We don’t know if effects are due to
confounding factor or to treatment.
• Designs:
• Completely Randomized Experimental
Design
• Randomized Block Design
• Matched Pairs Design
• Rigorously Controlled Design
Designs of Experiments
Error
• No matter how well you plan and execute the sample collection
process, there is likely to be some error in the results.
• Sampling error:
the difference between a sample result and the true population
result; such an error results from chance sample fluctuations.
• Non-sampling error:
sample data incorrectly collected, recorded, or analyzed (such as
by selecting a biased sample, using a defective instrument, or
copying the data incorrectly).
• A nonrandom sampling error:
the result of using a sampling method that is not random, such
as using a convenience sample or a voluntary response sample.
Important Characteristics of Data
• Center:
A representative value that indicates
where the middle of the data set is located
(mean, median, modus).
• Variation:
A measure of the amount that the data
values vary (range, variance, standard
deviation).
• Distribution:
The nature or shape of the spread of the
data over the range of values (such as
bell-shaped, uniform, or skewed).
• Outliers:
Sample values that lie very far away from
the vast majority of the other sample
values.
• Time:
Any change in the characteristics of the
data over time.
Resources
• Tambahan
https://www.triolastats.com/es13
https://www.triolastats.com/es13-excel
https://www.triolastats.com/es13-supplements
https://www.triolastats.com/es13-excel-videos
• Split and merge pdf
https://pdfsam.org/
• Konversi pdf ke Word
https://online2pdf.com/
• Terjemah dokumen
https://www.onlinedoctranslator.com/id/
translationform