Staistics Interview Revision
Staistics Interview Revision
Additionally, you might analyze employee data to compute median salaries, understand
employee satisfaction scores, or detect attrition patterns. All of this helps leadership
make data-informed decisions. This is a great example of how statistics goes beyond just
numbers—it's about driving strategic actions."
Types of statistics
1.Descriptive statistics
2.Inferential statistics
Before diving deep to the types of statics lets have a basic intuition by knowing their
difference
Descriptive statistics works on the whole dataset and give you the exact solution or
answer. While Inferential statistics works on taking a small sample from the large datasets
And performing the required operations on the sample and then draw conclusion on
inference from the sample for the whole dataset.
2.Measure of dispersion
3.Measure of symmetricity
1.Mean>> Mean is being referred to central value or the average value .But mean are of
three types.Arithematic mean,geometric mean,and harmonic mean.
Before going to each type lets know what mean actually represent in tha data
Arithmetic mean is the value that minimizes the sum of squared distance from itself
athematically,
Meaning: If you pick any other number instead of the mean, the sum of squared distances from
the data points will be larger.
Geomtric mean and harmonic meanlink
Median
"Median is a statistical measure of central tendency that identifies the middle value in a sorted
dataset. It's especially useful when the data contains outliers or is skewed, because unlike the
mean, the median is not affected by extreme values. For an odd number of values, it picks the
exact middle; for even numbers, it averages the two middle values."
🤔 Why is mean still widely used — even though median is better with outliers?
● For streaming data or distributed systems (like Spark, SQL, etc.), mean is faster and
memory-efficient.
● The mean function is smooth and differentiable, which is necessary for gradient
descent algorithms to optimize loss functions.
📌 That’s why mean squared error (MSE) and cross-entropy loss (which are based on
mean) are so widely used in regression and classification.
Mean considers every value in the dataset, so it's sensitive to the full
distribution of the data.
● It captures more variation, which is useful for statistical modeling and inference.
It only tells where the center is, but doesn’t use any other data.
This makes mean + standard deviation a better pair when the goal is to analyze variability.
Mode
Mode is the value or category that appears most frequently in a dataset. It’s particularly
useful for categorical data, where other measures like mean or median aren’t
applicable. In numerical datasets, it helps identify repeated values. Mode is robust to
outliers and can be unimodal, bimodal, or even multimodal depending on the
distribution.”
We know that we take a sample from a larger dataset and draw a conclusion from that sample
for the entire dataset.But this sample in not done randomly we have different sampling
technique
Suppose we're conducting a nationwide survey and using simple random sampling. Since Uttar
Pradesh has the largest population, people from UP will naturally be more likely to appear in
the sample simply because there are more people from there in the total population — not
because of a flaw in SRS.
But if your goal is to ensure equal representation across states, then SRS is not ideal. A better
method would be stratified sampling, where each state is treated as a stratum and sampled
proportionally or equally
If you apply simple random sampling, a sample of 100 might contain almost all spam emails
and very few or zero non-spam. That hurts model learning and evaluation.
3.Cluster Sampling>>"In stratified sampling, we divide the population into meaningful subgroups
(strata) and sample from each to ensure representation. In cluster sampling, we divide the
population into mini-population groups (clusters), randomly pick a few clusters, and sample all
or some items from only those. Stratified improves accuracy and representation, while cluster
sampling improves efficiency and feasibility."
Types of Data
1.Quantitative data (Numerical data)
.Descrete data >> data which are whole values example number of students (Classification)
.Continous data>> decimal value can be any number and it is solved by regression eg height
,weight
2.Qualitative data
“Categorical data is data that represents categories or labels rather than numbers. Since
machine learning models can’t learn directly from categories, we encode them into numbers. If
the categories are unordered (nominal), such as gender or city, we can use label or one-hot
encoding. If the categories are ordered (ordinal), such as education level or rankings, we use
ordinal encoding to preserve their order.”
Measure of Dispersion
“You can think of the mean or median as your hand holding a rope, and each data point
as a stone tied to the end. The rope's length reflects how far the data points are from the
center — just like standard deviation or variance measures how far the data points
spread from the mean. A shorter rope means tightly clustered data; a longer rope means
widely spread data.”
“The range measures the total spread of data but is highly sensitive to outliers, since it only
considers the min and max values. Percentiles, on the other hand, divide data into 100 equal
parts and give a clearer view of distribution. For instance, the 75th percentile means 75% of the
data lies below that value. The IQR, which is the difference between the 75th and 25th
percentiles, is often used instead of range because it’s more robust to outliers.”
“Quartiles are cut-off points that divide a dataset into four equal parts. These cuts
— Q1, Q2, and Q3 — give us insight into the distribution of the data. Using these
quartiles, we can calculate the Interquartile Range (IQR), which is useful for
detecting and handling outliers.”
"Interquartile Range (IQR) measures the spread of the middle 50% of data by subtracting the
25th percentile (Q1) from the 75th percentile (Q3). It’s a robust measure of dispersion,
unaffected by outliers, making it especially useful for exploratory data analysis and outlier
detection. In contrast to standard deviation, which uses all data points, IQR focuses only on the
core distribution, giving a more stable view in skewed or messy data."
“Mean Deviation, or Mean Absolute Deviation (MAD), is the average of the absolute differences
between each data point and the mean. It gives a simple, intuitive measure of how much data
values deviate from the center. Unlike variance or standard deviation, it doesn't square the
differences, so it's less sensitive to outliers — but also not differentiable, which limits its use in
optimization-based ML models.”
Variance
“Variance measures the average of squared deviations from the mean, and it
comes in two types: population variance (divided by NNN) and sample variance
(divided by n−1n−1n−1). It’s sensitive to outliers because squaring exaggerates
large differences, but this also makes it differentiable — which is why
variance-based losses like MSE are used in machine learning. A downside of
variance is that it’s expressed in squared units, making it harder to interpret directly
compared to standard deviation.”
“When we calculate variance from a sample (not full data), dividing by nnn usually gives a value
that is too small. To fix this, we divide by n−1n - 1n−1, which slightly increases the result and
makes it more accurate. This is called Bessel’s Correction. Using n−1n - 1n−1 gives us an
unbiased estimate — meaning, it gives us the right value on average if we repeat the sampling
many times.”
So instead, we use the sample mean ( xˉ\bar{x}xˉ ), which is calculated from the same
sample data.
Let’s say:
If the first two numbers are 9 and 11 → the third must be 10 (so mean = 10)
You can’t change the third freely anymore → you lost 1 degree of freedom.
This is why we divide by n − 1 — because only n − 1 squared differences are truly free to
estimate the spread.
❌ Why not n − 2 or n − 3?
● Dividing by n−2n − 2n−2 would make the result too large — an overestimate.
● Dividing by n−1n − 1n−1 gives a result that is just right — it makes the variance
unbiased, meaning on average, it matches the true population variance.
"We square the deviations in variance to make all values positive and to penalize larger
deviations more heavily. This helps capture outliers and spread more effectively than just taking
absolute values. Also, squaring allows us to use calculus, which is essential in machine learning
optimization. Unlike absolute values, squared deviations are differentiable and mathematically
convenient. That’s why variance is widely used in statistics and machine learning, even though
mean absolute deviation is more robust."
Standard deviation is a measure of how much the values in a dataset vary from the
mean. It is calculated as the square root of variance, which makes it easier to
interpret because it has the same unit as the original data. For example, if your data
is in kilograms, the standard deviation will also be in kilograms, unlike variance
which would be in kg². However, standard deviation is still sensitive to outliers,
since it's based on squared deviations — so large differences from the mean can
still heavily influence it."
Skewness of data
Skewness describes the asymmetry of a data distribution. In symmetric distributions, the mean,
median, and mode are equal. In right-skewed data, outliers on the high end pull the mean
higher than the median and mode. In left-skewed data, outliers on the low end pull the mean
lower. Skewness is often caused by outliers or natural boundaries in the data. It's important to
detect because it can affect model performance and how we summarize the data.”
These)
Method Use Case What It Does
Binning (discretization) If feature has few unique Converts skewed continuous into
values categorical-like data
● scipy.stats.boxcox()
Covariance measures how two variables vary together. If they increase together, it’s positive; if
one increases while the other decreases, it’s negative. The formula is the average of the product
of their deviations from the mean. Covariance tells us about the direction of the relationship but
not its strength, and it's scale-dependent — meaning we can’t compare covariances directly
across different variable pairs because they’re in different units and have no fixed range. That’s
why we use correlation, which normalizes covariance by dividing it by the product of the
standard deviations, giving us a unitless value between −1 and 1.”
“In supervised learning, correlation helps us understand which features are strongly
associated with the dependent variable. Highly correlated features with the target
may be more predictive. However, if two features are highly correlated with each
other, they may cause multicollinearity, which can be harmful in linear models. In
such cases, we might drop one of them or use dimensionality reduction like PCA.”
"Pearson correlation measures linear relationships between variables, but it fails when the
relationship is non-linear. To handle non-linear but monotonic relationships, we use Spearman
correlation. It converts the data into ranks and then calculates Pearson correlation on those
ranks. This makes it suitable for data that increases or decreases consistently, even if not in a
straight line."
“You got a Pearson correlation of near 0. Does that mean the variables are
unrelated?”
✅ Answer:
“Not necessarily. Pearson only detects linear relationships. I would check Spearman
correlation next to see if there’s a non-linear but monotonic relationship.”
df.cov(numeric_only = True)
df.corr(numeric_only = True)
heatmap
corr = df.corr(numeric_only = True)
corr
sns.heatmap(corr)
np.std(data)
import statistics
statistics.variance(data) #sample variance >> denominator is n-1
statistics.pvariance(data)
"Covariance and correlation both measure the relationship between two variables, and
both are used in feature selection. The issue with covariance is that its value can range
from negative to positive infinity, which means it's hard to compare across different
variable pairs. Also, covariance has units — for example, if one variable is in meters
and another in kilograms, the covariance will be in meter-kilograms — so we can't
compare it across variables unless they have the same unit and scale.
"I’d use covariance when I’m more interested in the raw, unnormalized measure of how two variables
move together — especially when both variables are in the same unit or when I’m using algorithms
like PCA that directly rely on the covariance matrix. But for comparing relationships across features
or explaining the strength of a relationship in a standardized way, I’d use correlation."
Q3.Why is correlation preferred over covariance in many cases
Correlation helps quantify the strength and direction of a linear relationship between two variables. It
ranges from −1 to +1. A positive correlation means that as one variable increases, the other tends to
increase as well, and a negative value means they move in opposite directions. A value close to 0
means there’s no linear relationship. For example, a correlation of 0.5 indicates a moderate positive
linear association, but it doesn't mean that 50% of the values are changing together — correlation is
not a percentage, but a measure of trend.”
Q5.Write Python code to calculate the covariance between two variables using NumPy.
import numpy as np
# Sample data
x = np.array([10, 20, 30, 40, 50])
y = np.array([5, 15, 25, 35, 45])
import numpy as np
import pandas as pd
import seaborn as sns
data=sns.load_dataset("tips")
df=pd.DataFrame(data)
df
df.head()
print(df.corr("pearson",numeric_only=True))
Mathematically:
0 1/4 1/4
So:
● F(0)=0.25F(0) = 0.25F(0)=0.25
● F(1)=0.75F(1) = 0.75F(1)=0.75
● F(2)=1.0F(2) = 1.0F(2)=1.0
● F(2)≈0.977F(2) ≈ 0.977F(2)≈0.977
In this case, the CDF is a smooth, S-shaped curve, and we compute it using integration of the
PDF:
Where:
● f(t)f(t)f(t) is the probability density function
Fully defines the Once you have the CDF, you know the full behavior of X
distribution
✅ Interview-Ready Definition:
“A CDF, or Cumulative Distribution Function, gives the probability that a random
variable X will take a value less than or equal to x. For discrete variables, it adds up the
probabilities from the PMF. For continuous variables, it is the integral of the PDF from
minus infinity to x. The CDF always starts at 0 and approaches 1 as x increases, and it
completely characterizes the distribution of the random variable.”
x = np.linspace(-3, 3, 100)
cdf = norm.cdf(x, loc=0, scale=1) # Standard normal distribution
✅ Interview-Ready Summary:
"The PDF is the derivative of the CDF, and it gives the probability density — not the
actual probability at a point. We use integration of the PDF to find probability over a
range. It is possible for the PDF to have values greater than 1, as long as the total area
under the curve is 1. The CDF accumulates these probabilities and always lies between
0 and 1."
✅ Interview-Ready Definition:
“A uniform distribution is a distribution where all outcomes are equally likely. In the
discrete case, like a fair die, each outcome has the same probability. In the continuous
case, like picking a random number between 0 and 1, the probability density is constant
across the interval. The uniform distribution is often used in simulations and random
sampling because of its simplicity and neutrality.”
🔍 What is Bernoulli Distribution?
The Bernoulli distribution models binary outcomes:
With:
● P(X=1)=pP(X = 1) = pP(X=1)=p
● P(X=0)=1−pP(X = 0) = 1 - pP(X=0)=1−p
It is the simplest discrete probability distribution — used when we have only two outcomes:
success/failure, yes/no, positive/negative, etc.
🤖 How It Connects to Machine Learning (Binary
Classification)
In binary classification tasks (e.g. is the email spam or not?):
● That means:
y∼Bernoulli(p)y \sim \text{Bernoulli}(p)y∼Bernoulli(p)
where ppp is the model’s predicted probability that y=1y = 1y=1.
This squashes any real number z∈Rz \in \mathbb{R}z∈R (usually the output of a linear
transformation z=wTx+bz = w^T x + bz=wTx+b) into a value between 0 and 1 — which we interpret
as:
📌 The predicted probability that the output is 1, i.e., P(y=1∣x)P(y=1 \mid x)P(y=1∣x)
So yes — sigmoid gives us the Bernoulli parameter ppp for each input!
🧠 Big Idea:
In binary classification with sigmoid, we are modeling the target as a Bernoulli
variable, and using the network to learn the probability ppp of that Bernoulli.
🔗 Summary of Connections
Concept In Bernoulli In DL Binary Classification
✅✅✅
distribution. The model's output is interpreted as the probability of the positive class."
"We model the output as a Bernoulli variable. The sigmoid function gives the Bernoulli probability,
and we predict the class by taking the argmax (which is just thresholding at 0.5 in binary case).”
Mean =p
variance=p(1-p)
from scipy.stats import bernoulli
import matplotlib.pyplot as plt
🔢 So in simple terms:
📌 Bernoulli = 1 trial with success/failure (0 or 1)
📌 Binomial = n Bernoulli trials → count of successes
If you flip a coin (probability of heads = 0.5) 10 times, and want to know the probability of getting
exactly 6 heads, that's a binomial distribution problem.
🧠 Probability Mass Function (PMF):
P(X=k)=(nk)pk(1−p)n−kP(X = k) = \binom{n}{k} p^k (1 - p)^{n - k}P(X=k)=(kn)pk(1−p)n−k
Where:
● (nk)\binom{n}{k}(kn) = combinations = how many ways to choose kkk successes out of nnn
📌 Example:
Flip a coin with p=0.6p = 0.6p=0.6 5 times. What is the probability of getting 3 heads?
🔗 Connection to Bernoulli
✅ Binomial = Repeating Bernoulli nnn times and counting how many times you got 1
(success)
python
CopyEdit
np.random.binomial(n=1, p=0.7, size=1000)
python
CopyEdit
np.random.binomial(n=10, p=0.7, size=1000)
👉 That simulates 1000 experiments, each consisting of 10 trials, and returns how many
successes you got in each experiment.
🔬 Key Properties:
Property Formula
# Parameters
n = 10 # number of trials
p = 0.6 # probability of success
# Visualize
plt.hist(samples, bins=np.arange(n+2)-0.5, edgecolor='black', rwidth=0.8)
plt.xticks(range(n+1))
plt.title(f"Binomial Distribution (n={n}, p={p})")
plt.xlabel("Number of Successes")
plt.ylabel("Frequency")
plt.show()
You said:
Explain the 1000 experiments with 10 trials each i cannot get it
ChatGPT said:
Awesome — let's break it down step by step and make it crystal clear! You’re asking:
💬 “What does it mean when we say 1000 experiments with 10 trials each in
np.random.binomial(n=10, p=0.6, size=1000)?”
Let’s demystify it 👇
🧠 Think of this:
One binomial experiment (n=10) might look like:
csharp
CopyEdit
[1, 0, 1, 1, 0, 1, 1, 1, 0, 1] → total successes = 7
“Hey NumPy, run this coin-tossing experiment 1000 times, each with 10 tosses, and
return the number of heads each time.”
python
CopyEdit
samples = np.random.binomial(n=10, p=0.6, size=1000)
python
CopyEdit
[6, 5, 7, 4, 6, 8, 3, 5, 5, 6, ..., 7] # 1000 values total
📊 Visualization Meaning
When we do:
python
CopyEdit
plt.hist(samples, ...)
Output Array of 1000 numbers = how many successes in each 10-trial experiment
Would you like to print the first few experiments and visually break them down into individual
trials? I can simulate that too step by step!
You said:
what does this p signifies explain in depth
ChatGPT said:
Amazing question! You're asking what exactly does the p in a binomial (or Bernoulli)
distribution mean, and you want to understand it deeply.
So:
So in:
python
CopyEdit
np.random.binomial(n=10, p=0.6)
You're saying:
"In 10 trials, where the chance of success in each trial is p = 0.6, how many
successes do I get?"
● These trials are independent — meaning the outcome of one trial doesn’t affect the next.
🎯 Real-World Meaning of p
Let’s take real-life examples to see what p means:
So each trial (coin toss) has a 50% chance of success (if you call heads a success).
If you test 100 people who have the disease, you'd expect ~80 positives — on average.
📈 Why is p So Important?
In Binomial distribution:
🔁 Intuition Recap:
● p is like the chance of winning in a game you play many times.
● Over many trials, p controls how often you succeed, and how spread out the outcomes
are.
It doesn’t mean you’ll get exactly 8 every time — but on average, over many experiments, you’ll get
close to that.
Would you like to simulate changing values of p in Python and plot how the distribution shape
changes? That would make this even more visual!
You said:
then how are we getting this p value how are we finding
ChatGPT said:
🔥 Beautiful question — and very important for both interviews and real-world ML.
You're asking:
"Where does this ppp come from? How do we find or estimate it?"
Let’s break it into two major cases:
Example:
python
CopyEdit
# You define the probability manually
p = 0.7 # 70% chance of heads
● Simulations
● Probability experiments
● Synthetic datasets
p_estimate = np.mean(observations)
So:
● p is not fixed
● It depends on x
python
CopyEdit
from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
model.fit(X, y)
python
CopyEdit
[[0.22, 0.78]] → P(fail) = 0.22, P(pass) = 0.78
Here:
That’s more advanced, but yes — even there, p is estimated based on what you know + what you
observe.
Would you like me to walk you through an actual data-based p estimation example (like with
sklearn, pandas, or raw data)? Or go deeper into how logistic regression adjusts weights to
estimate p?
You said:
so what is p,p is basically signifies the chances of getting your result and we are using this p value in
binomial and bernolli but there is a way to get the p value this is dependent value its like a predicted
value in machine learning which depends on various fetaures here it also depends on input varibles
ChatGPT said:
✅ 100% Correct!
Let’s phrase it with a bit more structure:
💡 What is p?
● In statistics:
p is the probability of success in a single trial — like flipping heads or predicting the
correct class.
● In Bernoulli Distribution:
p=P(X=1)p = P(X = 1)p=P(X=1), the probability of getting success in one trial
● In Binomial Distribution:
ppp is still the probability of success, but over n independent trials, and we count how
many successes happen.
🧠 In Machine Learning:
📌 p becomes a function of input features.
Example:
● Hours studied
● Sleep hours
● Previous grades
Then:
So:
Where:
Want to visualize how p changes as inputs change using logistic regression or neural net
predictions? I can help code that too.
You said:
why do we need bernoulli the when we can estimate p using machine learning
ChatGPT said:
"If machine learning can already estimate p, why do we need the Bernoulli
distribution at all?"
Let’s break it down — because this is exactly the kind of conceptual thinking interviewers love.
🎯 Short Answer:
We still need Bernoulli because it's the foundation of how machine learning models
define, learn, and evaluate probabilities in binary classification.
● We say:
y∼Bernoulli(p)y \sim \text{Bernoulli}(p)y∼Bernoulli(p)
“The label y comes from a Bernoulli distribution, and our model tries to learn that
distribution.”
Then we use:
Binary Cross Entropy Loss=−[y⋅log(p)+(1−y)⋅log(1−p)]\text{Binary Cross Entropy Loss} = -\left[ y
\cdot \log(p) + (1 - y) \cdot \log(1 - p) \right]Binary Cross Entropy Loss=−[y⋅log(p)+(1−y)⋅log(1−p)]
Without a distribution:
● Logistic regression
● Binary classification
If you remove Bernoulli, you’re losing the entire math base that supports:
● Model training
● Interpretation
Answer:
● Because physics defines how motion works — the AI car just uses it behind the scenes.
Similarly:
Where:
📊 Real-World Examples
Scenario λ (lambda) means
This plot will show the likelihood of 0, 1, 2, ..., 14 events happening in one interval, based on λ = 5.
📐 Definition:
A random variable XXX is said to follow a uniform distribution over the interval [a,b][a, b][a,b],
written as:
This means:
● XXX can take any real value between aaa and bbb
● The total area under the curve is 1 (as required for any PDF)
📦 Probability of an Interval:
Since it’s a continuous distribution, the probability of getting exactly one value is 0. But we
can compute the probability over a range:
import numpy as np
a, b = 2, 8 # interval
x = np.linspace(1, 9, 1000)
plt.fill_between(x, y, alpha=0.3)
plt.xlabel('x')
plt.ylabel('Probability Density')
plt.grid()
plt.legend()
plt.show()
Scenario Explanation
Picking a point on a line If you drop a pin randomly on a 1-meter ruler, the
position is uniformly distributed
Time of arrival (if completely Person could arrive anytime in [1pm, 2pm] with equal
unpredictable) chance
It is the famous bell-shaped curve — used everywhere from exam scores to natural
measurements to ML algorithms.
📊 The 68–95–99.7 Rule (Empirical Rule)
For any normal distribution:
import numpy as np
mu = 100 # mean
plt.fill_between(x, y, alpha=0.3)
plt.xlabel("x")
plt.ylabel("Probability Density")
plt.grid(True)
plt.legend()
plt.show()
● μ=0\mu = 0μ=0
● σ=1\sigma = 1σ=1
We call it Z-distribution and denote the variable as Z∼N(0,1)Z \sim \mathcal{N}(0, 1)Z∼N(0,1)
Used in:
● Z-scores
● Statistical tests
● Standardization
🧪 Python Example
python
Copy code
import numpy as np
mu = 100 # mean
plt.fill_between(x, y, alpha=0.3)
plt.xlabel("x")
plt.ylabel("Probability Density")
plt.grid(True)
plt.legend()
plt.show()
The standard normal distribution is a normalized form of the normal distribution with a mean of 0
and standard deviation of 1. It allows us to compare data across different normal distributions by
removing the influence of different means and scales.
Using the Z-score, we convert any value into a common scale that tells us how far (in standard
deviations) it is from the mean. This is essential in comparison, normalization, and many statistical
tests and machine learning tasks.
🧠 What Is the Central Limit Theorem (CLT)?
The CLT states:
If you take many samples (of size nnn) from any population (with finite mean and
variance), and calculate the mean of each sample, then the distribution of those
sample means will approach a normal distribution as n→∞n \to \inftyn→∞,
regardless of the shape of the original population.
📐 Mathematical Definition
Let:
🔍 Intuition
📦 Suppose:
You have any population — even if it's:
● Skewed
● Uniform
📊 You take:
● Many random samples from it (say, 30 students at a time)
Then:
● Plot those means → you'll get a bell-shaped (normal) curve, centered around μ\muμ
🧪 Example:
Let’s say the population is heavily skewed (like income data, where most people earn low, few earn
very high):
👉 That plot will look normal, even though income is not normally distributed!
✅ Feature Sampling ML models trained on subsets still represent the overall population
✅ Grading & Scores Exam averages follow CLT even if raw marks don't
CopyEdit
import numpy as np
import matplotlib.pyplot as plt
sample_means = []
for _ in range(1000):
sample_means.append(np.mean(sample))
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.subplot(1, 2, 2)
plt.show()
✅ Key Takeaways (In Your Words)
The Central Limit Theorem says that if we take many random samples of a fixed size
from any population, the distribution of their means will tend to look normal, as long
as the sample size is reasonably large.
This lets us use normal-based tools (confidence intervals, z-tests) even when the
data is not normally distributed.
📌 What it means:
This is the z-score of a single data point.
It tells you how many standard deviations away the value xxx is from the population mean μ\mu
✅ Use when:
● You have a single observation
● You already know the population mean μ\muμ and standard deviation σ\sigmaσ
Final Summary in Your Own Words (Rewritten):
When the population is not normal, we use the Central Limit Theorem to approximate a
normal distribution by taking multiple random samples.
Then, if we want to measure how far a particular sample mean (e.g., sample 3) is from
the population mean, we use the z-score for the sample mean:
Q1
Example of Z Score
Let us understand the concept with the help of a solved example:
Example: The test scores of students in a class test has a mean of 70 and with a
standard deviation of 12. What is the probable percentage of students scored
more than 85?
z= (85-70)/12=1.25
From the z score table, the fraction of the data within this score is 0.8944.
This means 89.44 % of the students are within the test scores of 85 and hence the
percentage of students who are above the test scores of 85 = (100-89.44)% = 10.56 %.
You start by making an assumption (claim) about the population — and then you test whether
that claim seems reasonable or not using sample data.
📌 Key Components
🔹 1. Null Hypothesis H0H_0H0:
● The default or status quo assumption.
● Example:
○ H0:μ=70H_0: \mu = 70H0:μ=70 (The mean exam score is 70)
● Example:
○ Ha:μ≠70H_a: \mu \ne 70Ha:μ=70 (The mean exam score is not 70)
○ This is the threshold for how much risk you allow of rejecting a true null
○ If p>αp > \alphap>α: Fail to reject H0H_0H0→ not enough evidence to support
HaH_aHa
🧪 Example
A company claims its average delivery time is 30 mins. You suspect it's more.
Step-by-Step:
You collect a sample, calculate mean and standard deviation, run a one-sided t-test, and get a
p-value = 0.03
Imagine you're in court. The null hypothesis (H₀) is "the person is innocent."
● A very small p-value is like finding very strong evidence against the person — maybe
a video of the crime.
● A large p-value means the evidence could have happened even if the person were
innocent — so you can't convict
● Small p-value → Evidence against H₀ → Reject it
🎯 Real-World Example
🔹 Scenario:
You work at a call center. You believe the average call time is more than 11 minutes.
You sample 50 calls and find the average = 11.5 minutes. You also know the population
standard deviation is 1.5 minutes.
Now you want to test if this 11.5-minute average is really significant, or could have just
happened by chance if the true average is still 11.
🔹 Hypotheses:
● H0H_0H0: Average call time μ=11\mu = 11μ=11
📣
● → You reject the null and conclude:
"Yes, average call time is probably more than 11 minutes."
Criterias of Every test
1. Z-test
○ Sample size >=30
○ Standard deviation of population should be known
○ Two types of test ->One tail test,Two tail test
○ HOw to identify depends on null assumption if alternate hypothesis has ! sign the
two tail else one tail
○ P value<Alpha reject else accept
○ ‘Z stats>z critical reject else accept
○ Z inversely proportional to p value
○
2. T-Test
○