0% found this document useful (0 votes)
42 views

Foundations of Probability in Python - Part 4

The document discusses sample means and how they approach the population mean as sample size increases based on the law of large numbers. It then shows examples of calculating sample means from random variable data and plotting sample means. The document is about the concept of sample means in statistics and probability.

Uploaded by

Mohamed Gaber
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

Foundations of Probability in Python - Part 4

The document discusses sample means and how they approach the population mean as sample size increases based on the law of large numbers. It then shows examples of calculating sample means from random variable data and plotting sample means. The document is about the concept of sample means in statistics and probability.

Uploaded by

Mohamed Gaber
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 62

From sample mean

to population mean
F O U N D AT I O N S O F P R O B A B I L I T Y I N P Y T H O N

Alexander A. Ramírez M.
CEO @ Synergy Vision
Sample mean review

LAW OF LARGE NUMBERS


The sample mean approaches the
expected value as the sample size
increases.

FOUNDATIONS OF PROBABILITY IN PYTHON


Sample mean review (Cont.)
¯ x1 + x2
Sample mean = X2 =
2

FOUNDATIONS OF PROBABILITY IN PYTHON


Sample mean review (Cont.)
¯ x1 + x2 + x3
Sample mean = X3 =
3

FOUNDATIONS OF PROBABILITY IN PYTHON


Sample mean review (Cont.)
¯ x1 + x2 + ⋯ + xn
Sample mean = Xn =
n

FOUNDATIONS OF PROBABILITY IN PYTHON


Sample mean review (Cont.)
¯ x1 + x2 + ⋯ + xn
Sample mean = Xn = → E(X)
n

FOUNDATIONS OF PROBABILITY IN PYTHON


Generating the sample
# Import binom and describe
from scipy.stats import binom
from scipy.stats import describe

# Sample of 250 fair coin flips


samples = binom.rvs(n=1, p=0.5, size=250, random_state=42)

# Print first 100 values from the sample


print(samples[0:100])

[0 1 1 1 0 0 0 1 1 1 0 1 1 0 0 0 0 1 0 0 1 0 0 0 0 1 0 1 1 0 1 0 0 1 1 1 0
0 1 0 0 0 0 1 0 1 0 1 1 0 1 1 1 1 1 1 0 0 0 0 0 0 1 0 0 1 0 1 0 1 1 0 0 1
1 1 1 0 0 0 1 1 0 0 0 0 1 1 1 0 0 1 1 1 1 0 1 0 0 0]

FOUNDATIONS OF PROBABILITY IN PYTHON


Calculating the sample mean
# Calculate the sample mean
print(describe(samples[0:10]).mean)

0.6

FOUNDATIONS OF PROBABILITY IN PYTHON


FOUNDATIONS OF PROBABILITY IN PYTHON
FOUNDATIONS OF PROBABILITY IN PYTHON
Plotting the sample mean

from scipy.stats import binom


from scipy.stats import describe
import matplotlib.pyplot as plt

# Define our variables


coin_flips, p, sample_size , averages = 1, 0.5, 1000, []

# Generate the sample


samples = binom.rvs(n=coin_flips, p=p, size=sample_size, random_state=42)

FOUNDATIONS OF PROBABILITY IN PYTHON


Plotting the sample mean (Cont.)

# Calculate the sample mean


for i in range(2,sample_size+1):
averages.append(describe(samples[0:i]).mean)

# Print the first values of averages


print(averages[0:10])

[0.5, 0.6666666666666666, 0.75, 0.6, 0.5, 0.42857142857142855, 0.5,


0.5555555555555556,0.6, 0.5454545454545454]

FOUNDATIONS OF PROBABILITY IN PYTHON


Plotting the sample mean (Cont.)

# Add population mean line and sample mean plot


plt.axhline(binom.mean(n=coin_flips, p=p), color='red')
plt.plot(averages, '-')

# Add legend
plt.legend(("Population mean","Sample mean"), loc='upper right')
plt.show()

FOUNDATIONS OF PROBABILITY IN PYTHON


Sample mean plot

FOUNDATIONS OF PROBABILITY IN PYTHON


Let's practice!
F O U N D AT I O N S O F P R O B A B I L I T Y I N P Y T H O N
Adding random
variables
F O U N D AT I O N S O F P R O B A B I L I T Y I N P Y T H O N

Alexander A. Ramírez M.
CEO @ Synergy Vision
The central limit theorem (CLT)
The sum of random variables tends to a normal distribution as the number of them grows to
in nity.

Conditions:

The variables must have the same distribution.

The variables must be independent.

FOUNDATIONS OF PROBABILITY IN PYTHON


FOUNDATIONS OF PROBABILITY IN PYTHON
FOUNDATIONS OF PROBABILITY IN PYTHON
FOUNDATIONS OF PROBABILITY IN PYTHON
FOUNDATIONS OF PROBABILITY IN PYTHON
Poisson population plot
# Add the imports
from scipy.stats import poisson, describe
from matplotlib import pyplot as plt
import numpy as np

# Generate the population


population = poisson.rvs(mu=2, size=1000, random_state=20)

# Draw the histogram with labels


plt.hist(population, bins=range(9), width=0.8)
plt.show()

FOUNDATIONS OF PROBABILITY IN PYTHON


FOUNDATIONS OF PROBABILITY IN PYTHON
Sample means plot
# Generate 350 sample means, selecting
# from population values
np.random.seed(42)

# Define list of sample means


sample_means = []
for _ in range(350):
# Select 10 from population
sample = np.random.choice(population, 10)
# Calculate sample mean of sample
sample_means.append(describe(sample).mean)

FOUNDATIONS OF PROBABILITY IN PYTHON


Sample means plot (Cont.)

# Draw histogram with labels


plt.xlabel("Sample mean values")
plt.ylabel("Frequency")
plt.title("Sample means histogram")
plt.hist(sample_means)
plt.show()

FOUNDATIONS OF PROBABILITY IN PYTHON


Let's add random
variables
F O U N D AT I O N S O F P R O B A B I L I T Y I N P Y T H O N
Linear regression
F O U N D AT I O N S O F P R O B A B I L I T Y I N P Y T H O N

Alexander A. Ramírez M.
CEO @ Synergy Vision
Linear functions

FOUNDATIONS OF PROBABILITY IN PYTHON


Linear function parameters

y = slope ∗ x + intercept

FOUNDATIONS OF PROBABILITY IN PYTHON


Linear function with random perturbations

y = slope ∗ x + intercept + random_number

FOUNDATIONS OF PROBABILITY IN PYTHON


Start from the data and find a model that fits

FOUNDATIONS OF PROBABILITY IN PYTHON


What model will fit the data?

What would be the criteria to determine which is the best model?

FOUNDATIONS OF PROBABILITY IN PYTHON


What model will fit the data? (Cont.)

FOUNDATIONS OF PROBABILITY IN PYTHON


FOUNDATIONS OF PROBABILITY IN PYTHON
FOUNDATIONS OF PROBABILITY IN PYTHON
Probability and statistics in action

FOUNDATIONS OF PROBABILITY IN PYTHON


Calculating linear model parameters
# Import LinearRegression
from sklearn.linear_model import LinearRegression

# sklearn linear model


model = LinearRegression()
model.fit(hours_of_study, scores)

# Get parameters
slope = model.coef_[0]
intercept = model.intercept_

# Print parameters
print(slope, intercept)

(1.496703900384545, 52.44845266434719)

FOUNDATIONS OF PROBABILITY IN PYTHON


Predicting scores based on hours of study
# Score prediction
score = model.predict(np.array([[15]]))
print(score)

[74.89901117]

FOUNDATIONS OF PROBABILITY IN PYTHON


Plotting the linear model

import matplotlib.pyplot as plt

plt.scatter(hours_of_study, scores)
plt.plot(hours_of_study_values, model.predict(hours_of_study_values))
plt.show()

FOUNDATIONS OF PROBABILITY IN PYTHON


FOUNDATIONS OF PROBABILITY IN PYTHON
Let's practice with
linear models
F O U N D AT I O N S O F P R O B A B I L I T Y I N P Y T H O N
Logistic regression
F O U N D AT I O N S O F P R O B A B I L I T Y I N P Y T H O N

Alexander A. Ramírez M.
CEO @ Synergy Vision
Original data

FOUNDATIONS OF PROBABILITY IN PYTHON


New data

FOUNDATIONS OF PROBABILITY IN PYTHON


Where would you draw the line?

FOUNDATIONS OF PROBABILITY IN PYTHON


Solution based on probability

FOUNDATIONS OF PROBABILITY IN PYTHON


The logistic function

logistic(t) = logistic(slope ∗ x + intercept)

FOUNDATIONS OF PROBABILITY IN PYTHON


Changing the slope

FOUNDATIONS OF PROBABILITY IN PYTHON


Changing the intercept

FOUNDATIONS OF PROBABILITY IN PYTHON


From data to probability

FOUNDATIONS OF PROBABILITY IN PYTHON


Outcomes

FOUNDATIONS OF PROBABILITY IN PYTHON


Misclassifications

FOUNDATIONS OF PROBABILITY IN PYTHON


Logistic regression
# Import LogisticRegression
from sklearn.linear_model import LogisticRegression

# sklearn logistic model


model = LogisticRegression(C=1e9)
model.fit(hours_of_study, outcomes)

# Get parameters
beta1 = model.coef_[0][0]
beta0 = model.intercept_[0]

# Print parameters
print(beta1, beta0)

(1.3406531235010786, -15.05906237996095)

FOUNDATIONS OF PROBABILITY IN PYTHON


Predicting outcomes based on hours of study
hours_of_study_test = [[10]]

outcome = model.predict(hours_of_study_test)
print(outcome)

array([False])

FOUNDATIONS OF PROBABILITY IN PYTHON


Probability calculation

# Put value in an array


value = np.asarray(9).reshape(-1,1)
# Calculate the probability for 9 hours of study
print(model.predict_proba(value)[:,1])

array([0.04773474])

FOUNDATIONS OF PROBABILITY IN PYTHON


Let's practice!
F O U N D AT I O N S O F P R O B A B I L I T Y I N P Y T H O N
Wrapping up
F O U N D AT I O N S O F P R O B A B I L I T Y I N P Y T H O N

Alexander A. Ramírez M.
CEO @ Synergy Vision
Fundamental concepts

FOUNDATIONS OF PROBABILITY IN PYTHON


Important probability distributions

FOUNDATIONS OF PROBABILITY IN PYTHON


The most important results

FOUNDATIONS OF PROBABILITY IN PYTHON


Linear and logistic regression

FOUNDATIONS OF PROBABILITY IN PYTHON


Keep learning at
DataCamp!
F O U N D AT I O N S O F P R O B A B I L I T Y I N P Y T H O N

You might also like