SRCASW, University of Delhi
"Fooled" by Statistics & ML
Dr. Tanujit Chakraborty
Ph.D from Indian Statistical Institute, Kolkata, India.
Assistant Professor of Statistics at Sorbonne University
[email protected] | https://www.ctanujit.org
N ORMALITY IS A MYTH !
C ORRELATION DOES NOT IMPLY CAUSATION !
A LL MODELS ARE WRONG , BUT SOME ARE USEFUL !
2/50
N ORMALITY IS A MYTH !
3/50
Normality & Beyond Normality
Normality is a paved road. It is easy to walk but no flowers
grow on it. — Vincent Van Gogh.
By Dr. Saul McLeod (2019)
4/50
Few Famous Quotations
Normality is a myth; there never was, and there never will be a normal
distribution — Roy C. Geary (1947; Biometrika, vol. 34, 248).
Everybody believes in the exponential law of errors (the normal
distribution), the experimenters, because they think it can be proved by
mathematicians; and the mathematicians, because they believe that it has
been established by observations — E.T. Whittaker and G. Robinson
(1967).
... the statisticians knows ... that in nature there never was a normal
distribution, there never was a straight line, yet with normal and linear
assumptions, known to be false he can often derive results which match
to a useful approximation, those found in real world — George W. Box
(1976, Journal of American Statistical Association, vol. 71, 791-799).
5/50
Normal Distribution
A random variable X is said to be
normally distributed with mean µ and
variance σ 2 , if the probability density
function of X is the following (for
−∞ < µ < ∞ and σ > 0 )
1 (x−µ)2
−
f (x; µ, σ) = √ e 2σ2 ; −∞ < x < ∞
2πσ
Probability Density Function of Normals
6/50
Galton Board
• Sir Francis Galton, Charles
Darwin’s half-cousin, invented
the ’Galton Board’ in 1874 to
demonstrate that the normal
distribution is a natural
phenomenon.
• It specifically shows that the
binomial distribution
approximates a normal
distribution with a large enough
sample size.
Picture of Galton Board
7/50
How it has started?
Gambling Question: A 17th century gambler, the Chevalier de
Mere, asked Pascal for an explanation of his unexpected losses in
gambling.
The famous correspondence between Pascal and Fermat was
instigated in 1654, and they were mainly interested to calculate
the following binomial sum:
j
X n
pk (1 − p)n−k
k
k=i
The problem was not difficult when n is small.
8/50
A Brief History
Within few years the following problem arises in a sociological
study, where the following computation was necessary:
n = 11, 429, i = 5745, j = 6128
j
X n
pk (1 − p)n−k
k
k=i
Original Problem: The problem is to test the hypothesis that male
and female births are equally likely against the actual birth in Lon-
don over 82 years from 1629 - 1710. It is observed that the relative
number of male births varies from a low of 7765/15, 448 = 0.5027
in 1703 to a high of 4748/8855 = 0.5362 in 1661. Given that
11,429 is the average number of births in London over 82 years,
and 5745 and 6128 are two limits.
9/50
Solution
Using the following recurrence relation
n n n−x
=
x+1 x x+1
and some involved rational approximation it has been obtained
6128 i
X 11, 429 1
P(5747 ≤ X ≤ 6128 | p = 1/2) =
i 2
i=5745
≈ 0.292
10/50
Solution
Using the following recurrence relation
n n n−x
=
x+1 x x+1
and some involved rational approximation it has been obtained
6128 i
X 11, 429 1
P(5747 ≤ X ≤ 6128 | p = 1/2) =
i 2
i=5745
≈ 0.292
11/50
The Breakthrough
De Moivre began the search for this approximation in 1721 ,
and in 1733 it has been proved that
n
n 1 2 −2x2 /n
n ≈√ e
2 + x 2 2πn
and n Z a/√n
X n 1 4 2
≈√ e−2y dy.
x 2 2π 0
|x−n/2|≤a
12/50
Normal Approximation
Eventually using the second approximation one gets
j ! !
X n j − np i − np
pk (1 − p)k ≈ Φ −Φ
k
p p
k=i
np(1 − p) np(1 − p)
where Z z
1 2 /2
Φ(z) = √ e−x dx
2π −∞
which is the cumulative distribution function (CDF) of the
standard normal distribution.
13/50
Error Modeling
Gauss (1809) made the following assumptions and deduce the
normal distribution as an error distribution:
1 Small errors are more likely than large errors.
2 For any real numbers ϵ, the likelihood of errors of
magnitudes ϵ and −ϵ are equal.
3 In the presence of several measurements of the same
quantity, the most likely value of the quantity being
measured is their average.
To read more about the evolution of normal distribution: Saul Stahl (2006), “The
evolution of normal distribution”, Mathematics Magazine, vol. 79, no. 2, 96 - 113.
14/50
Central Limit Theorem
Lindeberg-Levy CLT:
Suppose {X1 , X2 , · · · } is a sequence of
independent identically distributed
random variables with mean µ and
variance σ 2 < ∞, then as n → ∞
√ n
!
n 1X
Xi − µ → N(0, 1)
σ n
i=1
CLT in Practice
15/50
Some Drawbacks
What will happen if the data indicate that the parent
distribution
1 is not symmetric?
2 is heavy tail?
3 is not unimodal?
What will happen if error distribution is not normal during
regression modeling?
16/50
Possible Remedy
In Distribution Theory:
1 Skew Normal Distribution (A Azzalini, Scandinavian
Journal of Statistics 1985)
2 Power Normal Distribution (RD Gupta, Test 2008)
3 Geometric Skew-Normal Distribution (D Kundu, Sankhya
2014), etc.
In Regression Theory:
1 Box-Cox Transformation (Box, Cox, JRSS Series-B 1964)
2 Generalized linear model (Nelder, Wedderburn, JRSS
Series-A 1972)
3 Semiparametric and Nonparametric Approaches (see
ESLR/ISLR Book), etc.
17/50
C ORRELATION DOES NOT IMPLY CAUSATION !
18/50
Correlation
Correlation may indicate any type of association. Correlation implies
association, but not causation. Conversely, causation implies
association, but not correlation1
19/50 1
Altman, Krzywinski, "Association, correlation and causation", Nature Methods (2015)
Causality: What is it?
Causality is central notion in science, decision-taking and daily life.
Causal inference ≈ Causal language/model + Statistical inference.
Question: How do you define cause and effect?
20/50
Causality in Philosophy
“. . . Thus we remember to have seen that species of
object we call flame, and to have felt that species of
sensation we call heat. We likewise call to mind
their constant conjunction in all past instances.
Without any farther ceremony, we call the one
cause and the other effect, and infer the existence
of the one from that of the other.”
- David Hume, A Treatise of Human Nature
(1738).
But: Does the stork really
bring babies?
21/50
Causality in Statistics
22/50
Causality in Statistics
23/50
A Paradigm Shift: Basic Contributions
• The modeling of the underlying structures provides a language to
encode causal relationships – the basis of a causality theory.
• Causality theory helps to decide when, and how, causation can be
inferred from domain knowledge and data.
24/50
ML techniques are impacting our life
A day in our life with Machine Learning techniques...
25/50
Now we are stepping into risk-sensitive
areas
Shifting from Performance Driven to Risk Sensitive...
26/50
Problems of today’s ML - Explainability
Most machine learning models are black-box models...
27/50
Problems of today’s ML - Stability
Most ML methods are developed under I.I.D hypothesis...
28/50
Problems of today’s ML - Stability
29/50
Problems of today’s ML - Stability
30/50
A plausible reason: Correlation
Correlation is the very basics of machine learning....
31/50
Correlation is not explainable
32/50
Correlation is "unstable"
33/50
Correlation Vs. Causation
It’s not the fault of correlation, but the way we use it...
34/50
A Practical Definition of Causality
35/50
Benefits of bringing causality into learning
More Explainable and More Stable...
36/50
Causality everywhere
37/50
Correlation does not imply causation
38/50
“Correlation = Causation” is a cognitive bias
39/50
Then, what does imply causation?
Source: https://www.bradyneal.com/causal-inference-course
40/50
Then, what does imply causation?
Source: https://www.bradyneal.com/causal-inference-course
41/50
Languages for Causality
Using potential
Using structural
outcomes /
Using graphs: equations:
counterfactuals:
• 1921 Wright • 1921 Wright • 1923 Neyman
(genetics); (genetics); (statistics);
• 1988 Pearl (computer • 1943 Haavelmo • 1973 Lewis
science “AI”); (econometrics); (philosophy);
• 1993 Spirtes, • 1975 Duncan (social • 1974 Rubin (statistics);
Glymour, Scheines sciences);
(philosophy).
• 2000 Pearl (computer • 1986 Robins
science). (epidemiology);
Reference: The Book of Why: The New Science of Cause and Effect by Judea Pearl and
Dana Mackenzie (2019).
42/50
A LL MODELS ARE WRONG , BUT SOME ARE USEFUL !
43/50
The Science of Forecasting
Forecasting is estimating how the sequence of observations will continue into
the future. Whether it is the rise/fall in exchange rates, the outcome of
elections, or winners at the Oscars, there is sure to be something you want to
know.
44/50
Random futures
Mathematical/Statistical models are simplifications of reality – and life is
sometimes too complex to model accurately.
45/50
Which is easiest to forecast? (Easy to Tough)
1 Time of sunrise this day next year.
2 Maximum temperature tomorrow.
3 Daily electricity demand in 3 days time.
4 Google stock price tomorrow.
5 Exchange rate of USD/INR next week.
How do we measure “easiest”?
What makes something easy/difficult to forecast?
46/50
Forecastability factors
Something is easier to forecast if:
• We have a good understanding of the factors that contribute to it, and
can measure them (for stock price and exchange rates causes are mostly
unknown).
• There is lots of data available.
• The future is somewhat similar to the past.
• The forecasts cannot affect the thing we are trying to forecast (say,
Warren Buffett, CEO of Berkshire Hathaway, make some comment that
stock price may change!).
• When should we give up? When there is insufficient data? When the
models give implausible forecasts?.
47/50
Various Forecasting Models
A recently published survey paper: Nowcasting of COVID-19 confirmed cases:
Foundations, trends, and challenges (Chakraborty et al., Modelling, Control and Drug
Development for COVID-19 Outbreak Prevention, 2021)
48/50
Forecasts can go very wrong
“Prediction is very difficult, especially if it’s about the future!”
- Niels Bohr, Danish Physicist & Nobel laureate in Physics.
49/50
Textbook and References
50/50