Stats_Lecture-4
Stats_Lecture-4
Statistics Lecture -4
Dr Sumeyye BAKIM
2024
1
Outline
• Inferential Statistics
✓ The Normal Curve
✓ Sample and Population
✓ Probability
2
The Normal Curve
Graphs showing the distributions of some of the variables engineers work with are unimodal, roughly
symmetric, bell-shaped curves. These bell-shaped, smooth histograms represent a precise and
significant mathematical distribution called the normal distribution, or more simply, the normal
curve. The normal curve is a mathematical (or theoretical) distribution. Researchers often compare
the actual distributions of the variables they study (i.e., the distributions they find in their studies)
with the normal curve. They do not expect the distributions of their variables to match the normal
curve perfectly (as the normal curve is a theoretical distribution); however, they check whether the
distributions of their variables are approximately normal.
A normal curve. 3
For example, let’s consider the number of different letters a specific person can correctly remember
across various tests (with different random letters each time). In some tests, the number of
remembered letters is high, in others it’s low, and in most cases, it may fall somewhere in between.
In other words, the number of different letters a person can remember across various tests likely
follows a normal curve.
Let’s assume that the person has a basic ability to remember seven letters in such a memory task.
However, the actual number remembered in any given test will be influenced by various factors—
such as the noise in the room, their current mood, a combination of random letters resembling a
familiar name, etc.
These various effects cause the person to remember more than seven letters in some tests and
fewer than seven in others. However, the specific combination of these effects that emerges in any
given test is essentially random; therefore, in most tests, the positive and negative effects should
cancel each other out. When none of the random positive effects occur, it is unlikely that all the
random negative effects will combine in one test. Thus, in general, the person remembers an
average number of letters where all opposing effects cancel each other out. Very high or very low
scores are much less common.
This creates a distribution where the scores are concentrated around the midpoint, with fewer
scores at the extreme points => Normal Distribution
4
Central Limit Theorem
The Central Limit Theorem states that, regardless of the shape of the population
distribution, the sampling distribution approaches a normal distribution as the sample size
increases.
With each increase in sample size beyond n>30, the distribution becomes
more peaked. 5
The Normal Curve and the Percentage of
Scores Between the Mean
The shape of the normal curve is standard. Therefore, a known
percentage of scores falls above or below a certain score. For
example, exactly 50% of the scores in a normal curve are below Reminder:
the mean, as half of the scores in any symmetric distribution
are below the mean. More interestingly, as shown in the figure, The Z-Score is a unit of
approximately 34% of the scores are always within about 1 measurement given in terms of
standard deviation from the mean. standard deviation (it indicates
how many standard deviations
a score is from the mean).
6
Ex 1 :IQ Test
In commonly used intelligence tests, the average IQ is set at 100, with a standard deviation of 15,
and IQ scores are represented using a normal curve.
Since the normal curve and the percentage of scores around the mean are known, one standard
deviation above the mean shows that 34% of IQ scores fall between 100 and 115.
Similarly, because the normal curve is symmetric, we conclude that another 34% of IQ scores fall
between 100 and 85. Thus, 68% (34% + 34%) of scores are within the range of 85 to 115.
Additionally, 14% of scores fall between 115 (one standard deviation above the mean) and 130 (two
standard deviations above the mean)
7
Ex 2: Equipment Selection
Imagine you need to select equipment for a project. Assume that you want to choose equipment
with a typical level of precision, avoiding the extremes (not the highest or lowest precision). The
precision capabilities of the equipment follow a normal distribution, and we are looking for the
middle 2/3 group that represents average performance
2
= %𝟔𝟔, 𝟔 ≈ %68
3
In this case, the equipment should be selected from the range between 1 standard deviation above and
1 standard deviation below the mean (34% + 34% = 68% — the desired percentage).
Remember that 1 standard deviation above the mean is represented by Z = +1, and 1 standard
deviation below the mean is represented by Z = -1.
(We discussed converting raw scores to Z-scores and transforming Z-scores back to raw scores in the
previous class.)
8
The Normal Curve Table and Z-Scores
The table showing percentages of scores associated with the normal curve; the table usually includes
percentages of scores between the mean and various numbers of standard deviations above the mean and
percentages of scores more positive than various numbers of standard deviations above the mean.
The percentages 50%, 34%, and 14%
are important practical guidelines when
working with a group of scores that
follow a normal distribution. However,
in many research and applied
situations, scientists need more precise
information.
Since the normal curve is an exact
mathematical curve, you can determine
the exact percentage of scores between
any two points on the normal curve.
If you know the mean and standard deviation for the technical
skills assessment scores of engineering students, you can
calculate Alex’s actual raw score on the test by converting his
Z-score of 0.52 to a raw score using the formula:
X=(Z×SD)+ Mean
11
12
Steps Required to Find the Percentage of Scores Above or Below a Specific Raw Score or Z-Score Using
the Normal Curve Table:
2. Draw a normal curve graph. Determine where the Z-score falls on this curve (if the Z-
score is positive, it is above the mean; if the Z-score is negative, it is below the mean) and
shade the area for which you want to find the percentage.
3. Estimate the percentage of the shaded area approximately as 50%, 34%, or 14%.
4. Using the normal curve table, calculate the exact percentage corresponding to the Z-
score.
13
Example 4
Let's assume that in an IQ test, the average IQ is 100, and the standard deviation is 15.
A person's IQ score is 125. What percentage of people have an IQ score higher than 125?
1. Z score: Z=(125-100)/15=+1.67
2.
3. If the shaded area started exactly at Z=+1, the area above it would be expressed as 16%. If it started exactly
at Z=+2, we would say 2%. In this case, the value should be somewhere between 2% and 16%.
4. According to the table, the tail percentage corresponding to a Z-score of +1.67 is 4.75%. This means that 4.75%
of the people who took the test have an IQ higher than 125.
A person's IQ score is 95. What percentage of people have an IQ score higher than 95?
2.
3.If the shaded area started exactly at Z=0, the area above it would be expressed as 50%. If it started
exactly at Z=−2, we would say 84%. In this case, the value should be somewhere between 50% and
84%.
4. According to the table, the percentage between the mean and a Z-score of -0.33 is 12.93%. Thus,
the percentage above a Z-score of -0.33 up to the mean is 12.93%, and after the mean, there is
another 50%, making a total of 62.93%.
1. Draw a normal curve graph. Shade the area using approximate percentages of 50%, 34%,
or 14%, based on your percentage.
2. Make a rough estimate of the Z-score based on where the shaded area ends.
3. Use the normal curve table to find the exact Z-score that corresponds to the percentage.
5. If you are looking for a raw score, convert from the Z-score using the formula:
𝑍 = 𝑋 − 𝑀 /𝑆𝐷
16
Example 6
Let's assume that in an IQ test, the average IQ is 100, and the standard deviation is 15.
What IQ score does a person need to score within the top 5%?
1. Since a score of 130 corresponds to +2 standard deviations, which is in the top 2%, the top 5%
should be as follows:
4. The estimate was that the Z-score should be between +1 and +2: +1.64 is within this range.
What IQ score does a person need to score within the top 55%?
Since a score of 100 corresponds to 0 standard deviations, the top 55% should be as follows:
2.The percentage between the mean and the Z-score should be 5%. According to the table, the closest
value is 5.17. Thus, the corresponding Z-score is 0.13. Since it is on the left side of the mean, it’s -
0.13.
3.The estimate was that the Z-score should be between -1 and 0: -0.13 is within this range.
4.Raw score: X=Z×SD+M=(−0.13)(15)+100=98.05. (To be in the top 55%, a person needs to score at
least 98.05 on the IQ test.)
18
Sample and Population
The scores of a specific data being studied; The entire group of data that a researcher aims to make
generally accepted as representative of the conclusions about in a study; a larger group from which
scores in a larger population. conclusions are drawn based on the smaller groups (samples)
examined.
(a) Populations and samples: The entire pot of beans is the population, and the spoonful is the
sample.
(b) The entire large circle represents the population, and the inner circle is the sample.
(c) The histogram represents the population, with the shaded scores indicating the sample. 19
In engineering studies, samples of data are
often analyzed to make inferences about a
larger group (population). All Steel
Beams
Produced
For example, a sample might consist of by the
Factory
measurements from 50 steel beams produced
in a factory to determine the overall quality.
Steel
In this case, the population would be the Bea ms
aim to understand.
20
The entire purpose of research is generally to make generalizations or
predictions about events that you cannot directly access.
21
Methods of Sampling
Typically, the ideal method for selecting a sample to Haphazard Selection:
study is called random selection (which generally
means each person in the population has an equal This is a type of sampling where no
chance of being chosen). The researcher starts with a systematic method is followed in the
complete list of the population and selects a random selection of participants. It does not
portion to study. guarantee representation of the entire
population.
An example of random selection would be putting each
name on a ping-pong ball, placing all the balls in a For example, imagine conducting a
large container, shaking it, and having a blindfolded survey about your statistics professor by
person select as many as needed. (In practice, most only asking friends sitting closest to you.
researchers use a list of random numbers generated by This survey would be influenced by
a computer.) where students sit (which might
indirectly relate to how much they like
the professor or the class). Therefore,
asking students who sit near you will
lead to opinions more like yours, rather
than a truly random sample.
22
Statistical Terminology for Sample and Population
The mean, variance, and standard deviation of a population are called population
parameters. A population parameter is generally unknown and can only be estimated
based on what you know from a sample taken from that population.
We don’t taste all the beans; we taste just a spoonful and say, “The beans are cooked,”
making an inference about the entire pot.
The mean, variance, and standard deviation calculated from scores obtained from a sample are
called sample statistics.
23
PROBABILITY
Probability is very important in science. In inferential statistics, in particular, the methods scientists use
to move from the results of research studies to conclusions about theories or practical applications are
crucial.
Consider the probability of getting heads when a coin is flipped. There is one successful outcome (heads) out of
two possible outcomes (heads or tails). This probability is 1/2 or 0.5.
For a die roll, the probability of rolling a 2 (or any specific face of the die) is 1/6 or approximately 0.17. This is
because there is only one successful outcome out of six possible outcomes.
The probability of rolling a number 3 or lower on a die is 3/6 or 0.5. Out of six possible outcomes, there are three
successful outcomes: 1, 2, or 3.
Now, let’s consider a slightly more complex example. Imagine a database containing 200 different algorithms, of
which 40 are optimized for real-time processing. If you were to randomly select an algorithm from this database,
the probability of selecting one that is optimized for real-time processing would be 40/200 or 0.20. This is
because there are 40 successful outcomes (selecting a real-time optimized algorithm) out of 200 possible
26
outcomes.
Calculating Probability
1. Determine the number of possible successful outcomes.
2. Determine the total number of possible outcomes.
3. Divide the number of possible successful outcomes by the total number of outcomes.
To convert a ratio to a percentage, multiply by 100. The percentage of 0.13 is: 13%.
To convert a percentage to a ratio, divide by 100. The ratio for 4% is: 0.04.
A ratio cannot be less than 0 or greater than 1. In percentage terms, it should be between 0% and 100%.
The probability of impossible events is 0. Events with a probability of 0 are called impossible events.
The probability of certain events is 1. Events with a probability of 1 are called certain events.
When an event has a low probability, such as 1% or 5%, it is called a low-probability event (but it is not
impossible).
1. When a die is rolled, the probability of getting a number less than 3 (1 or 2) is 3.
2. The total number of possible outcomes (1, 2, 3, 4, 5, or 6) is: 6.
3. The ratio of possible successful outcomes to the total number of outcomes: 36=0.563=0.5.
Probability is represented by the letter p. For an event with a 50% chance, p = 0.5.
If you think of probability as a ratio of scores, you’ll see that probability aligns directly with
frequency distributions. According to the distribution shown below, 3 out of 50 components
have a lifespan of 9 or 10 hours. If you randomly select a component from these 50, the
probability of selecting one with a lifespan of 9 or 10 hours is calculated by dividing the number
of successful outcomes (3 components) by the total number of outcomes (50 components).
Thus, p=350=0.06p=503=0.06.
28
The normal distribution can be thought of as a probability distribution. With a normal curve,
the percentage of scores between any two Z-scores is known.
The probability of selecting a value between any two Z-scores is the same as the percentage of
scores between those two Z-scores.
As you know, in a normal curve, approximately 34% of scores lie between the mean and one
standard deviation above the mean. This means that the probability of a score falling between
the mean and a Z-score of +1 is p = 0.34.
In the previous IQ test examples, let’s assume that 95% of scores in a normal curve fall between
a Z-score of +1.96 and -1.96. This represents a high probability. At the same time, in such a
distribution, the probability of selecting a score above +1.96 or below -1.96 is 0.05 (or 5%).
This is a very low probability and corresponds to the tails of the distribution graph
Suppose a component in a sample has a durability score of 4. However, you don’t know if this
component came from Supplier A or Supplier B. Let’s say that the durability scores of
components from Supplier A typically follow a normal distribution with a mean of 10 and a
standard deviation of 3. How likely is it that your sample component came from Supplier A?
Based on your knowledge of the normal curve, you know that in a normal distribution with a
mean of 10 and a standard deviation of 3, there are very few scores as low as 4. This suggests
that the component may not have come from Supplier A's population.
But what if the sample component had a durability score of 9? In this case, it would be more
likely that the component came from Supplier A’s population since a score of 9 is within the
expected range for that population.
30