Basic Ideas of Statistics
Basic Ideas of Statistics
Prepared by: Supervised by:
Mohammed Nadir Ali Mr. Serhan
Overview
• Measurement
• Robustness
• Populations and Samples
• Quantitative vs. Categorical Variables
Measurement
• Measurement is the fundamental activity that we do in
statistics. A statistic is a measurement of a particular attribute
that we want to know something about. Any time that we
want to collect information about something and use statistics
to draw conclusions, we MUST have some way to measure it.
Statistics is a way of quantifying and drawing conclusions
about the world that we live in. We transform the world into
measurable quantities and then we use the practice of
statistics to draw conclusions. In this section, we will examine
the best ways to conduct measurement and how we can make
our measurements as good as possible to make our
conclusions as good as possible.
Robustness
• Fundamentally, there is no such thing as a fact in science. There are only conclusions
and hypotheses that have yet to be disproven. What we are always attempting to do
in science is to disprove what we think in order to give us new insights.
• However, we can’t operate from the position that we know nothing, so we must have
some conclusions that are thought to be very well tested and that we treat as-if they
were facts.
• We say that a conclusion that has been well tested and has withstood repeated
attempts at disproof is robust.
• We also say that a finding is robust if it can be applied to a lot of different areas and
under a lot of different circumstances. For example, the most robust finding in
educational statistics is that socioeconomic status (SES), or the amount of income of a
family or individual, factors significantly into educational achievement. This holds true
regardless of whether you are looking at Math, English, Science or whatever.
• Ideally, what we want to create from the practice of science are robust conclusions.
Conclusions that can be treated as facts.
An example of how robust findings work.
• For over 100 years, the conclusion that nothing can travel faster than light has
been thought to be extremely robust. So robust that a giant portion of
quantum mechanics and all of relativity has been founded on this single
principle. However, a group of scientists at CERN, the giant particle accelerator
in Switzerland, have found a group of neutrinos that have been measured to
travel faster than the speed of light. If this is true, and a lot more tests will
have to be conducted to prove the robustness of this claim, then much of
physics might have to be seriously reconsidered. But the scientists at CERN
were skeptical of their findings. Einstein’s findings of the absolute limit of light
speed were so robust that any claim disproving it would be met with
skepticism. So, when they presented their findings, they welcomed the
opportunity for scrutiny. It turns out that experimental error, even with very
carefully conducted checks, was the cause of the unusual findings and physics
continues to believe that nothing moves faster than light.
Samples and Populations
• Population- The group of people or objects about
which we want to know something
– A parameter- a numerical value that describes a
characteristic of the population
• Sample- A subset of the population, generally of
a given size, that we gather information about in
order to draw conclusions about the population
– Statistic- a numerical value that describes a
characteristic of a sample
Samples and Populations
• A statistic are a description of a sample
• A parameter is a description of a population
• We rarely will have the parameter for a
population. We always can gather statistics
from a sample.
Samples vs. Populations
• In statistical reasoning, we are primarily drawing
conclusions using statistics
• We want these conclusions to tell us something
about the population
• We use statistics in order to help us
draw conclusions about parameters
and the characteristics of
populations
Categorical data
• Categorical data consists of names or labels that are
not numbers representing counts or measurements.
– Eg. Colors, Gender, Race, Level of Parental Education
• Discrete data- When the number of possible values
in the sample space can be counted.
– Eg. 14 women in the class, Number of children to a
particular couple, number of students who passed a
course.
• Nominal level of measurement- The data cannot be
ordered from highest to lowest.
Quantitative Data
• Quantitative data consists of numbers representing
counts or measurements
– Weight at the beginning of September, Weight at the end
of August, Body Mass Index, velocity, dollars
• Discrete Data- There can be an infinite number of
data points, but each value is separate and countable
• Continuous Data- There are infinite numbers of
values that the data can take
– Eg. 1, ½, 3.05, 3,005,141, 6.744
Quantitative Data
• Ordinal level of measurement- We can arrange the
data in some order, but differences between
individual values are meaningless
– Rankings of colleges, grades
• Interval level of measurement- We can arrange the
data in some order, and differences between values
have meaning. However, there is no natural zero.
– There is no real idea of nothing being present.
– Eg. Years in which something occurred or body
temperature, BMI, Weights in September
Quantitative Data
• The ratio level of measurement- The interval
level except there is such a thing as having
nothing present. There is a natural zero
starting point.
– Eg. Distances, prices
Quantitative vs. Categorical Data
• Categorical Data can only be broken up using the nominal
level of measurement because we are often not able to order
it meaningful ways. We do order categorical data, when the
names for the data are orders. Like first place, second place,
third place, etc.
• Quantitative Data is generally not categorized using the
nominal level of measurement, but can be ordered and so
can use either the ordinal, the interval, or ratio level of
measurement.
– It is your understanding of what makes something ordinal or interval
that determines if this set of data actually uses the kind of
measurement.
This is important!
• We are going to separate everything in this
class into these two categories: categorical
and quantitative variables.
• Proportions are the statistics that we derive
from categorical variables.
• Means, standard deviations, variances, and
medians are the kinds of statistics that we
derive from quantitative variables
Review
• Measurement
• Robustness
• Populations and Samples
• Quantitative vs. Categorical Variables