0% found this document useful (0 votes)

9 views12 pages

Found A Mentals

Uploaded by

Sanjoy Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views12 pages

Found A Mentals

Uploaded by

Sanjoy Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Fundamental Concepts of Machine Learning

and Statistics

September 14, 2025

Contents
1 The Core Idea Behind Machine Learning 3
1.1 Difference Between Hard-coding and Machine Learning . . . . 3

2 Classification and Regression 4

2.1 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3 ROC and AUC 5

3.1 ROC (Receiver Operating Characteristic) . . . . . . . . . . . . 6
3.2 AUC (Area Under the Curve) . . . . . . . . . . . . . . . . . . 6

4 SSR and R² 6
4.1 SSR (Sum of Squared Residuals) . . . . . . . . . . . . . . . . 6
4.2 R Squared — Coefficient of Determination . . . . . . . . . . . 6

5 Strengths and Weaknesses of Histograms for Probability Es-

timation 7
5.1 Strengths of Histograms . . . . . . . . . . . . . . . . . . . . . 7
5.2 Weaknesses of Histograms . . . . . . . . . . . . . . . . . . . . 7

6 Calculating Probability Using Histograms 8

6.1 How to Calculate the Probability of a Range of Values . . . . 8
6.2 Assumptions for Histogram-based Probability Estimation . . . 8

1
7 Consequences of Modeling Variables with Inappropriate Dis-
tributions 8
7.1 Modeling a Discrete Variable with a Continuous Distribution . 8
7.2 Modeling a Continuous Variable with a Discrete Distribution . 9

8 Potential Pitfalls of R² When Evaluating Regression 9

9 Distinguishing Valid and Invalid Probability Distributions 10

9.1 Valid Probability Distribution . . . . . . . . . . . . . . . . . . 10
9.2 Invalid Probability Distribution . . . . . . . . . . . . . . . . . 10

10 Importance of Total Area Under Probability Distribution

Curve 10

11 Modeling Variables in Both Discrete and Continuous Distri-

butions 11
11.1 Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

12 How SSR Reflects Model Performance and Why Squaring

Residuals is Useful 12
12.1 How SSR Reflects Model Performance . . . . . . . . . . . . . 12
12.2 Why Squaring the Residual is Useful . . . . . . . . . . . . . . 12

2
1 The Core Idea Behind Machine Learning
Added by Sanjoy:

The core idea behind machine learning is to enable computer systems

to learn from data and improve their performance on tasks without being
explicitly programmed for every scenario. Instead of providing a rigid set of
instructions, machine learning models are trained on large datasets, allowing
them to identify patterns, relationships, and trends within the data. This
learning process then enables them to make predictions, classify information,
or make decisions on new, unseen data.

Added by souvik:
They both involve writing code. They both require data. But they operate
on fundamentally different principles. One is about giving machines exact
instructions. The other is about teaching them to recognize patterns on their
own.
As AI and machine learning become more prominent in the tech land-
scape, there’s a growing anxiety that these technologies might replace tra-
ditional programming — and the programmers behind it. But the reality
is far more collaborative than competitive. Rather than replacing classical
programming, AI and ML are here to augment it, offering new ways to solve
problems that are too complex or dynamic for hard-coded logic alone.

1.1 Difference Between Hard-coding and Machine Learn-

ing
• Traditional Coding (Hard-coding): Involves writing explicit in-
structions that the computer follows step by step. The programmer
defines all the rules, conditions, and logic to solve a specific problem.
– Same input always yields the same output.
– You can trace every outcome back to the code.
– Easy to debug, test, and verify.
For example, a hard-coded spam filter might have rules like ”if email
contains ’free money’, mark as spam.”
• Machine Learning: Involves providing the system with data and
letting it learn the patterns.
– Outputs are based on likelihood, not certainty.

3
– The system learns from examples, not from rules.
– Same input might produce different outputs over time (especially
if the model is updated or retrained).

For the spam filter example, a machine learning approach would in-
volve feeding the system thousands of examples of spam and non-spam
emails, and letting it learn the characteristics that distinguish them.

2 Classification and Regression

2.1 Classification
Classification is a supervised learning task where the goal is to predict a
discrete class label.
Example: Predicting whether an email is spam or not spam is a bi-
nary classification problem. Another example would be classifying images of
animals into categories like ”cat,” ”dog,” or ”bird.”

2.2 Regression
Regression is a supervised learning task where the goal is to predict a con-
tinuous value.
Example: Predicting the price of a house based on features like its size,
number of bedrooms, and location is a regression problem. Another example
would be predicting the temperature tomorrow based on historical weather
data.

4
Figure 1: Classification vs regression

3 ROC and AUC

AUC-ROC curve is a graph used to check how well a binary classification
model works. It helps us to understand how well the model separates the
positive cases like people with a disease from the negative cases like people
without the disease at different threshold level. It shows how good the model
is at telling the difference between the two classes by plotting:

• True Positive Rate (TPR):True Positive Rate (TPR): how often the
model correctly predicts the positive cases also known as Sensitivity or
Recall.

• False Positive Rate (TPR): how often the model incorrectly predicts
a negative case as positive.

• Specificity: measures the proportion of actual negatives that the

model correctly identifies. It is calculated as 1 - FPR.

5
3.1 ROC (Receiver Operating Characteristic)
ROC is a graphical plot that illustrates the diagnostic ability of a binary
classifier system as its discrimination threshold is varied. The ROC curve is
created by plotting the true positive rate (TPR) against the false positive
rate (FPR) at various threshold settings.

• True Positive Rate (TPR) = TP

T P +F N

• False Positive Rate (FPR) = FP

F P +T N

where TP is true positive, FN is false negative, FP is false positive, and

TN is true negative.

3.2 AUC (Area Under the Curve)

AUC represents the area under the ROC curve. It provides a single measure
of a classifier’s performance across all possible thresholds. An AUC of 1
indicates a perfect classifier, while an AUC of 0.5 suggests no discriminative
ability (equivalent to random guessing). The AUC can be interpreted as the
probability that a randomly chosen positive example is ranked higher than
a randomly chosen negative example.

4 SSR and R²
4.1 SSR (Sum of Squared Residuals)
SSR (Sum of Squared Residuals) is the sum of the squared differences between
the observed values and the predicted values. It measures the amount of
variation in the dependent variable that is not explained by the regression
model.
n
X
SSR = (Observedi − P redictedi )2
i=1

4.2 R Squared — Coefficient of Determination

The R-squared is the statistical measure in the stream of regression analysis.
In regression, we generally deal with the dependent and independent vari-
ables. A change in the independent variable is likely to cause a change in the
dependent variable.

6
The R-squared coefficient represents the proportion of variation in the de-
pendent variable (y) that is accounted for by the regression line, compared to
the variation explained by the mean of y. Essentially, it measures how much
more accurately the regression line predicts each point’s value compared to
simply using the average value of y.
R² is calculated as:
SSR(F ittedLine)
R2 = 1 −
SSR(M ean)

where SSR is the total sum of squares (the sum of squared differences
between the observed values and their mean):
n
X
SSR = (Observedi − P redictedi )2
i=1

5 Strengths and Weaknesses of Histograms

for Probability Estimation
5.1 Strengths of Histograms
• Simple and intuitive: Histograms are easy to understand and inter-
pret.

• Non-parametric: They don’t assume any specific distribution for the

data.

• Flexible: Can represent various shapes of distributions.

• Visual representation: Provides a clear visual picture of the data

distribution.

5.2 Weaknesses of Histograms

• Bin dependency: The shape of the histogram depends on the choice
of bin width and starting point.

• Discontinuity: The resulting estimate is discontinuous, which may

not reflect the true continuous nature of the underlying distribution.

• Sensitivity to sample size: With small sample sizes, histograms can

be noisy and unreliable.

7
• Inefficiency: They are not statistically efficient compared to other
density estimation methods like kernel density estimation.

6 Calculating Probability Using Histograms

6.1 How to Calculate the Probability of a Range of
Values
To calculate the probability of a range of values using a histogram:
1. Divide the data into bins (intervals) of equal width.
2. Count the number of observations in each bin.
3. Divide the count in each bin by the total number of observations to get
the probability for each bin.
4. To find the probability of a range of values, sum the probabilities of all
bins that fall within that range.

6.2 Assumptions for Histogram-based Probability Es-

timation
Assumptions that must hold true:
1. The data is independent and identically distributed (i.i.d.).
2. The bin width is appropriate for the data (not too wide to lose impor-
tant details, not too narrow to result in noisy estimates).
3. The sample size is large enough to provide a reliable estimate of the
underlying distribution.
4. The underlying distribution is relatively smooth within each bin.

7 Consequences of Modeling Variables with

Inappropriate Distributions
7.1 Modeling a Discrete Variable with a Continuous
Distribution
• May result in probabilities for values that the variable cannot actually
take.

8
• Can lead to biased estimates and incorrect inferences.

• May produce unrealistic predictions (e.g., predicting 2.5 children when

the actual variable must be an integer).

• Can result in inefficient estimates, as the model is trying to estimate

parameters for a distribution that doesn’t match the nature of the data.

7.2 Modeling a Continuous Variable with a Discrete

Distribution
• Loss of information due to discretization.

• Inability to capture the true variability of the continuous variable.

• May lead to biased estimates, especially if the discretization is coarse.

• Can result in less powerful statistical tests, as information is lost in the

discretization process.

8 Potential Pitfalls of R² When Evaluating

Regression
• R² always increases (or stays the same) when more predictors are added
to the model, even if they are irrelevant. This can lead to overfitting.

• R² does not indicate whether the regression model is adequate. A high

R² does not necessarily mean the model is good.

• R² does not indicate whether the predictors are causally related to the
response.

• R² does not indicate the correctness of the regression model’s functional

form.

• R² can be artificially inflated by outliers or high-leverage points.

• R² does not provide information about the prediction error of the

model.

9
9 Distinguishing Valid and Invalid Probabil-
ity Distributions
9.1 Valid Probability Distribution
• All probabilities are non-negative: P (x) ≥ 0 for all x.

• The sum (for discrete distributions) or integral (for continuous distri-

butions) of all probabilities equals 1.

• For discrete distributions, each probability is between 0 and 1: 0 ≤

P (x) ≤ 1 for all x.

• For continuous distributions, the probability density function is non-

negative.

9.2 Invalid Probability Distribution

• Contains negative probabilities.

• The sum or integral of probabilities is not equal to 1.

• For discrete distributions, probabilities greater than 1.

• For continuous distributions, negative values of the probability density

function.

10 Importance of Total Area Under Proba-

bility Distribution Curve
It’s important for the total area under a probability distribution curve to be
1 because:

1. It represents the total probability of all possible outcomes, which must

be 1 (or 100%).

2. It ensures that the distribution is properly normalized and can be used

to calculate valid probabilities.

3. It allows for meaningful interpretation of the area under the curve as

probabilities.

If the total area is not 1, it could indicate:

10
1. The distribution is not properly normalized.

2. There might be an error in the calculation or specification of the dis-

tribution.

3. The function might not be a valid probability distribution.

4. For empirical distributions, it might indicate that the sample is not

representative of the population or that there are missing data points.

11 Modeling Variables in Both Discrete and

Continuous Distributions
Yes, a variable can sometimes be modeled in both discrete and continuous
distributions, depending on the context and the level of precision required.

11.1 Scenarios
• Time: Time can be modeled as discrete (e.g., number of hours) or
continuous (e.g., exact time in seconds).

• Age: Age can be modeled as discrete (number of years) or continuous

(exact age including fractions of years).

• Money: Monetary values can be modeled as discrete (number of cents)

or continuous (allowing for any real number).

• Measurements: Physical measurements like height or weight can be

discretized (e.g., to the nearest centimeter) or treated as continuous.

• Counts: For large counts, a continuous approximation (like using a

normal distribution to approximate a binomial distribution) can be
more convenient.

The choice between discrete and continuous modeling often depends on

the nature of the data, the precision required, and the mathematical conve-
nience for analysis.

11
12 How SSR Reflects Model Performance and
Why Squaring Residuals is Useful
The sum of squared residuals (SSR) is a measure of the discrepancy between
the data and an estimation model. It is calculated by summing the squares
of the differences between the observed values and the predicted values:
n
X
SSR = (yi − ŷi )2
i=1

where yi is the observed value and ŷi is the predicted value.

12.1 How SSR Reflects Model Performance

SSR reflects model performance by measuring the total amount of varia-
tion in the dependent variable that is not explained by the model. A lower
SSR indicates a better fit, as it means the model’s predictions are closer to
the actual values. SSR is often used as a criterion for model selection and
parameter estimation (e.g., in least squares regression).

12.2 Why Squaring the Residual is Useful

Squaring the residuals is useful for several reasons:

1. It ensures all residuals are positive, preventing positive and negative

residuals from canceling each other out.

2. It gives more weight to larger errors, making the model more sensitive
to outliers and large discrepancies.

3. The squared function is differentiable, which makes it easier to find the

minimum using calculus-based optimization methods.

4. Minimizing the sum of squared residuals is equivalent to maximizing

the likelihood under the assumption of normally distributed errors.

5. The resulting estimator has desirable statistical properties, such as be-

ing the Best Linear Unbiased Estimator (BLUE) under the Gauss-
Markov assumptions.

ML
No ratings yet
ML
22 pages
ML 2 PPT Unit 2
No ratings yet
ML 2 PPT Unit 2
214 pages
AIML
No ratings yet
AIML
30 pages
Classification Metrics
No ratings yet
Classification Metrics
39 pages
AUC and The ROC Curve in Machine Learning - DataCamp
No ratings yet
AUC and The ROC Curve in Machine Learning - DataCamp
12 pages
Week 2: Machine Learning Intro: Instructor: Ting Sun
No ratings yet
Week 2: Machine Learning Intro: Instructor: Ting Sun
21 pages
1 - Intro To Machine Learning
No ratings yet
1 - Intro To Machine Learning
34 pages
Machine Learning Unit-2
No ratings yet
Machine Learning Unit-2
89 pages
CSE4261 Lecture-10
No ratings yet
CSE4261 Lecture-10
50 pages
MISY 631 Final Review Calculators Will Be Provided For The Exam
No ratings yet
MISY 631 Final Review Calculators Will Be Provided For The Exam
9 pages
Machine Learning PDF
No ratings yet
Machine Learning PDF
8 pages
Lecture11evaluationmetricsforclassification 240913060639 0c766554
No ratings yet
Lecture11evaluationmetricsforclassification 240913060639 0c766554
28 pages
CS601 - Machine Learning - Unit 1 - Notes - 1672759748
No ratings yet
CS601 - Machine Learning - Unit 1 - Notes - 1672759748
13 pages
09 Class Advanced
No ratings yet
09 Class Advanced
64 pages
Data Mining: Class Imbalance Solutions
No ratings yet
Data Mining: Class Imbalance Solutions
56 pages
Machine Learning Project Report (Group 3) Shahbaz Khan
No ratings yet
Machine Learning Project Report (Group 3) Shahbaz Khan
11 pages
ROC Auc
No ratings yet
ROC Auc
5 pages
ML Model Evaluation Metrics
No ratings yet
ML Model Evaluation Metrics
22 pages
Classification Metrics Mod 6
No ratings yet
Classification Metrics Mod 6
8 pages
Guide To AUC ROC Curve in Machine Learning
No ratings yet
Guide To AUC ROC Curve in Machine Learning
10 pages
Lecture 3b - Evaluation
No ratings yet
Lecture 3b - Evaluation
37 pages
L 13 Choose Your Own Algorithm D 07062024 111828am
No ratings yet
L 13 Choose Your Own Algorithm D 07062024 111828am
36 pages
DS Notes
No ratings yet
DS Notes
36 pages
Session01 DataScience
No ratings yet
Session01 DataScience
79 pages
Tutorial 3
No ratings yet
Tutorial 3
30 pages
Machine Learning and Data Mining
No ratings yet
Machine Learning and Data Mining
88 pages
Interview Questions Companie
No ratings yet
Interview Questions Companie
72 pages
Data M
No ratings yet
Data M
10 pages
Blue Property
No ratings yet
Blue Property
10 pages
Fiches Machine Learning
No ratings yet
Fiches Machine Learning
21 pages
ML Model Paper 2 Solution
No ratings yet
ML Model Paper 2 Solution
15 pages
Module 5 ML
No ratings yet
Module 5 ML
12 pages
Auc Roc Curve Machine Learning
No ratings yet
Auc Roc Curve Machine Learning
12 pages
Data M11
No ratings yet
Data M11
5 pages
机器学习
No ratings yet
机器学习
41 pages
Linear Regression Example
No ratings yet
Linear Regression Example
26 pages
FDS Notes
No ratings yet
FDS Notes
6 pages
Notes On ML Basics (Classifier, Types of Classification Algorithms, AUC-ROC Curve, Cross-Validation)
No ratings yet
Notes On ML Basics (Classifier, Types of Classification Algorithms, AUC-ROC Curve, Cross-Validation)
1 page
Machine Learning Lab Guide
No ratings yet
Machine Learning Lab Guide
69 pages
13-Module 5 - ROC Curve Analysis - Introduction and Motivation-26-09-2023
No ratings yet
13-Module 5 - ROC Curve Analysis - Introduction and Motivation-26-09-2023
8 pages
Ca 3 Merged
No ratings yet
Ca 3 Merged
275 pages
Lecture 7 Classification
No ratings yet
Lecture 7 Classification
33 pages
Big Data Mid Term
No ratings yet
Big Data Mid Term
14 pages
CLASSIFICATION
No ratings yet
CLASSIFICATION
36 pages
Machine Learning Evaluation Metrics
No ratings yet
Machine Learning Evaluation Metrics
16 pages
Unit 1
No ratings yet
Unit 1
21 pages
ML 01
No ratings yet
ML 01
57 pages
Machine Learning
No ratings yet
Machine Learning
11 pages
Machine Learning With R and Python
No ratings yet
Machine Learning With R and Python
290 pages
Linear Regression Example
No ratings yet
Linear Regression Example
28 pages
Classification
100% (2)
Classification
105 pages
How To Use ROC Curves and Precision-Recall Curves For Classification in Python
No ratings yet
How To Use ROC Curves and Precision-Recall Curves For Classification in Python
47 pages
Machine Learning Note
No ratings yet
Machine Learning Note
40 pages
Business Intelligence Endsem
No ratings yet
Business Intelligence Endsem
12 pages
B83c05aa 672f 4234 A627 Cfc944f11d45 Classification Summary
No ratings yet
B83c05aa 672f 4234 A627 Cfc944f11d45 Classification Summary
6 pages
Compare Class I Fiers Part 13
No ratings yet
Compare Class I Fiers Part 13
32 pages
ML Unit - 3
No ratings yet
ML Unit - 3
23 pages
DAC ML Tutorial Final Deck
No ratings yet
DAC ML Tutorial Final Deck
150 pages
MACHINELEARNING
No ratings yet
MACHINELEARNING
20 pages
Laura Gervais Assignment 2
100% (3)
Laura Gervais Assignment 2
13 pages
7 Primary and Parallel Planning Fillable
No ratings yet
7 Primary and Parallel Planning Fillable
4 pages
Example of An Excellent Dissertation
100% (2)
Example of An Excellent Dissertation
5 pages
L2 A2PLUS Diagnostic Test Teacher S Notes
No ratings yet
L2 A2PLUS Diagnostic Test Teacher S Notes
4 pages
Module 9 PDF
No ratings yet
Module 9 PDF
9 pages
Psy 210.full Notes
100% (1)
Psy 210.full Notes
56 pages
Tor Pci Alumni Gathering June 2022
No ratings yet
Tor Pci Alumni Gathering June 2022
11 pages
Understanding Society & Identity
No ratings yet
Understanding Society & Identity
26 pages
Philippines Animation Directory
No ratings yet
Philippines Animation Directory
3 pages
Roles Responsibilities L3 Unix Linux Admin Administrator
No ratings yet
Roles Responsibilities L3 Unix Linux Admin Administrator
1 page
Application FOR IYF
No ratings yet
Application FOR IYF
5 pages
Project Management Unit - Lesson Learned
No ratings yet
Project Management Unit - Lesson Learned
8 pages
Business English Pronoun Exercises
No ratings yet
Business English Pronoun Exercises
5 pages
Grade-9 Second Quarter Exam
No ratings yet
Grade-9 Second Quarter Exam
6 pages
Branding Guide
No ratings yet
Branding Guide
34 pages
Super Secret Sauce - March 2025
No ratings yet
Super Secret Sauce - March 2025
5 pages
Applied Data Science With Machine Learning
100% (2)
Applied Data Science With Machine Learning
21 pages
MAINSTREAM
No ratings yet
MAINSTREAM
8 pages
Ucc Programmes and Cutoff Points
100% (1)
Ucc Programmes and Cutoff Points
3 pages
DSM 5 Interview ASD English Completed
No ratings yet
DSM 5 Interview ASD English Completed
10 pages
Irregular Verbs Lesson Plan
100% (1)
Irregular Verbs Lesson Plan
13 pages
202303
No ratings yet
202303
1 page
What Parents Can Do To Help Their Child Succeed in School
No ratings yet
What Parents Can Do To Help Their Child Succeed in School
48 pages
Summary Writing: Firstly', Then', However', Moreover' and Similarly'. Addition To That (4w), Besides That (2w)
No ratings yet
Summary Writing: Firstly', Then', However', Moreover' and Similarly'. Addition To That (4w), Besides That (2w)
2 pages
Epic ASAP ED Workflow Guide
No ratings yet
Epic ASAP ED Workflow Guide
5 pages
How Could I Hide My Face
No ratings yet
How Could I Hide My Face
5 pages
NY Senate Graduate Fellowship Guide
No ratings yet
NY Senate Graduate Fellowship Guide
2 pages
Eco 245
No ratings yet
Eco 245
6 pages
A Checklist For Bipa Module A1-C2
No ratings yet
A Checklist For Bipa Module A1-C2
3 pages
Role of Innovation in Global Economic Development - An Empirical Study PDF
No ratings yet
Role of Innovation in Global Economic Development - An Empirical Study PDF
9 pages

Found A Mentals

Uploaded by

Found A Mentals

Uploaded by

Fundamental Concepts of Machine Learning

September 14, 2025

2 Classification and Regression 4

3 ROC and AUC 5

5 Strengths and Weaknesses of Histograms for Probability Es-

6 Calculating Probability Using Histograms 8

8 Potential Pitfalls of R² When Evaluating Regression 9

9 Distinguishing Valid and Invalid Probability Distributions 10

10 Importance of Total Area Under Probability Distribution

11 Modeling Variables in Both Discrete and Continuous Distri-

12 How SSR Reflects Model Performance and Why Squaring

The core idea behind machine learning is to enable computer systems

1.1 Difference Between Hard-coding and Machine Learn-

2 Classification and Regression

3 ROC and AUC

• Specificity: measures the proportion of actual negatives that the

• True Positive Rate (TPR) = TP

• False Positive Rate (FPR) = FP

where TP is true positive, FN is false negative, FP is false positive, and

3.2 AUC (Area Under the Curve)

4.2 R Squared — Coefficient of Determination

5 Strengths and Weaknesses of Histograms

• Non-parametric: They don’t assume any specific distribution for the

• Flexible: Can represent various shapes of distributions.

• Visual representation: Provides a clear visual picture of the data

5.2 Weaknesses of Histograms

• Discontinuity: The resulting estimate is discontinuous, which may

• Sensitivity to sample size: With small sample sizes, histograms can

6 Calculating Probability Using Histograms

6.2 Assumptions for Histogram-based Probability Es-

7 Consequences of Modeling Variables with

• May produce unrealistic predictions (e.g., predicting 2.5 children when

• Can result in inefficient estimates, as the model is trying to estimate

7.2 Modeling a Continuous Variable with a Discrete

• Inability to capture the true variability of the continuous variable.

• May lead to biased estimates, especially if the discretization is coarse.

• Can result in less powerful statistical tests, as information is lost in the

8 Potential Pitfalls of R² When Evaluating

• R² does not indicate whether the regression model is adequate. A high

• R² does not indicate the correctness of the regression model’s functional

• R² can be artificially inflated by outliers or high-leverage points.

• R² does not provide information about the prediction error of the

• The sum (for discrete distributions) or integral (for continuous distri-

• For discrete distributions, each probability is between 0 and 1: 0 ≤

• For continuous distributions, the probability density function is non-

9.2 Invalid Probability Distribution

• The sum or integral of probabilities is not equal to 1.

• For discrete distributions, probabilities greater than 1.

• For continuous distributions, negative values of the probability density

10 Importance of Total Area Under Proba-

1. It represents the total probability of all possible outcomes, which must

2. It ensures that the distribution is properly normalized and can be used

3. It allows for meaningful interpretation of the area under the curve as

If the total area is not 1, it could indicate:

2. There might be an error in the calculation or specification of the dis-

3. The function might not be a valid probability distribution.

4. For empirical distributions, it might indicate that the sample is not

11 Modeling Variables in Both Discrete and

• Age: Age can be modeled as discrete (number of years) or continuous

• Money: Monetary values can be modeled as discrete (number of cents)

• Measurements: Physical measurements like height or weight can be

• Counts: For large counts, a continuous approximation (like using a

The choice between discrete and continuous modeling often depends on

where yi is the observed value and ŷi is the predicted value.

12.1 How SSR Reflects Model Performance

12.2 Why Squaring the Residual is Useful

1. It ensures all residuals are positive, preventing positive and negative

3. The squared function is differentiable, which makes it easier to find the

4. Minimizing the sum of squared residuals is equivalent to maximizing

5. The resulting estimator has desirable statistical properties, such as be-

You might also like