0% found this document useful (0 votes)

28 views6 pages

Lecture 07

The document discusses improvements to the Benjamini-Hochberg procedure for controlling the false discovery rate. It describes viewing the procedure through the lens of empirical processes and introduces a martingale theory proof that it controls the FDR. The document also proposes using the distribution of p-values to improve the FDR estimation compared to the original BH procedure.

Uploaded by

Chirayata Kushari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views6 pages

Lecture 07

Uploaded by

Chirayata Kushari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

STATS 300C: Theory of Statistics Spring 2023

Lecture 7 — April 24
Lecturer: Prof. Emmanuel Candès Scribe: Aditya Ghosh

Warning: These notes may contain factual and/or typographic errors. They are
based on Emmanuel Candès’s course from 2018 to 2023, and scribe notes written by
Julie Zhang and Amber Hu.

Outline
Agenda: False Discovery Rate.

1. Empirical Process viewpoint of BH procedure.

2. Martingale Theory and FDR control.

3. Improving on the BH procedure.

Much of the material in this lecture is taken from ?.

7.1 The Empirical Process Viewpoint of BH

In previous lectures, we discussed the Benjamini-Hochberg (BH) procedure (?) by looking at
the sorted p-values on the x-axis and whether they fall below a critical line. An alternative
way to view BH is to flip the axes and view the sorted p-values on the y-axis. This is illus-
trated in the following figure. This alternative view allows us to describe the BH procedure

p-values F̂ (t)
t/↵

↵i/n

i0 /n i/n
t0 t
(a) p-values on the y axis, indices on x (b) p-values on the x axis, indices on y

Figure 7.1: Sorted p-values and BH threshold line.

7-1
STATS 300C Lecture 7 — April 24 Spring 2023

in terms of an empirical process. The coordinates on the y-axis of Fig. 7.1b are the values
of the empirical CDF on the p-values, defined as
#{i : pi ≤ t}
Fbn (t) := .
n
Denote by p(i) the ith smallest p-value, so that p(1) ≤ · · · ≤ p(n) . Let H(i) be the hypothesis
corresponding to p(i) . The BH procedure rejects H(1) , . . . , H(i0 ) where

qi
i0 := max i : p(i) ≤ .
n
The critical p-value is p∗ = p(i0 ) and can be written as

∗ i
p = max p(i) : p(i) ≤ q
n
n o
= max p(i) : p(i) ≤ q Fbn (p(i) )
n o
= max t ∈ {p1 , . . . , pn } : t ≤ q Fn (t) .
b

If the above set is empty, we set p∗ = q/n by convention. The above calculation combined
with Fig. 7.1 gives us the following viewpoint of the BH procedure.

The BH procedure is equivalent to rejecting all hypotheses with pi ≤ τBH , where

( )
t
τBH := max t : ≤q .
Fbn (t) ∨ 1/n

Note that we can equivalently write

1 t
τBH = max t : Fbn (t) ∨ ≥ ,
n q

which directly leads to the interpretation of BH provided in Fig. 7.1. Notice that τBH ≥ q/n.

This formulation has a simple interpretation. Let t ∈ (0, 1) be fixed and consider rejecting all
Hi with p-values pi ≤ t. We can construct the rejection/acceptance table for the hypotheses
whose values depend on t.

H0 accepted H0 rejected Total

H0 true U (t) V (t) n0
H0 false T (t) S(t) n − n0 = n1
n − R(t) R(t) n

We define
V (t)
FDP(t) := , FDR(t) := E[FDP(t)].
R(t) ∨ 1

7-2
STATS 300C Lecture 7 — April 24 Spring 2023

The idea is to choose the threshold t as large as possible while controlling the FDR at level
q. If we had an estimate FDR
[ of the FDR, we define a threshold τ as

τ := sup{t ≤ 1 : FDR(t)
[ ≤ q},

and define the rejection rule to reject Hi if and only if pi ≤ τ , where τ is a data-dependent
threshold. This method defines the most liberal thresholding cutoff that controls FDR(t),
[
and we hope that it will also control the true FDR(t) at level q. The first question is how
to estimate FDR(t).
iid
Assuming pi ∼ Unif(0, 1) under H0 , we have E[V (t)] = n0 t, but n0 is not known. A
conservative estimate of n0 t is nt, which leads to our first estimate
nt t
FDR(t)
[ := = . (7.1)
R(t) ∨ 1 Fbn (t) ∨ 1/n
This leads us to exactly the BH procedure since

nt
τ = sup{t ≤ 1 : FDR(t) ≤ q} = sup t ≤ 1 :
[ ≤q = τBH .
R(t) ∨ 1

The following theorem shows that FDR(t)

[ is a conservatively biased estimate. This is
good, because a procedure that controls FDR(t)
[ also controls FDR(t) at level q. The proof
of this result is relegated to Section A.1, it mainly uses Jensen’s inequality.

Theorem 1. Under independence of p-values, this FDR estimate is biased upwards:

E[FDR(t)]
[ ≥ FDR(t).

7.2 Martingale Theory and FDR Control

Using the Empirical Process viewpoint discussed in Section 7.1 and Martingale theory, we
can give an alternate proof of the fact that the BH procedure controls the FDR. Note that
the FDR for BH is same as E[FDP(τBH )].

Theorem 2 (?). The procedure rejecting all hypotheses with pi ≤ τBH controls the
FDR:
E[FDP(τBH )] = qn0 /n.

Proof. Note that Ft = σ(V (s), R(s) : t ≤ s ≤ 1) is a backwards filtration: for t1 < t2 ,
Ft2 ⊂ Ft1 . We now show that {V (t)/t}t∈[0,1] is a backwards/reverse martingale. For s ≤ t,
we have
V (s) 1 s V (t)
E Ft = · V (t) = ,
s s t t
where the first inequality follows from the fact that under Ft , V (t) = #{pi : pi ≤ t, Hi null}
and these pi ∼ U [0, t] and are independent, so (s/t)V (t) many should contribute to V (s).

7-3
STATS 300C Lecture 7 — April 24 Spring 2023

Next, write τ = τBH for brevity. Note that knowing V (s), R(s) = nFbn (s) for s ≥ t will
determine whether τ ≤ t. Hence {τ ≤ t} ∈ Ft , and thus τ is a stopping time w.r.t. {Ft }.

We are now ready to apply Doob’s Optional Stopping Theorem (after all, that’s why we
bring Martinagales!). By definition, R(τ ) ∨ 1 = nτ /q. Therefore,

V (τ ) q V (τ ) q V (1)
FDR(τ ) = E = E = E ,
R(τ ) ∨ 1 n τ n 1
by Doob’s OST. Since V (1) = n0 , this completes the proof.

7.3 Improving on the BH procedure

In our estimate FDR(t)
[ in (7.1), we used the simple conservative bound π0 = n0 /n ≤ 1.
Here, we consider using the distribution of p-values to improve this estimate. Fix λ ∈ [0, 1)
and define
n − R(λ)
π̂0λ :=
(1 − λ)n
We usually will take λ = 1/2, while λ = 0 recovers the BH procedure. The motivation for
this estimation is the following:
n0 − V (λ) + n1 − S(λ)
π̂0λ =
(1 − λ)n
We expect the non-null p-values to be small: n1 − S(λ) ≪ n. We also expect V (λ) ≈ λn0 .
Thus,
n0 − V (λ) n0 − λn0 n0
π̂0λ ≈ ≈ = .
(1 − λ)n (1 − λ)n n
Also note that
n − E[R(λ)] n − n1 − n0 λ n0
E[π̂0λ ] = ≥ = = π0 .
(1 − λ)n (1 − λ)n n
Thus, in general, π̂0λ is a conservatively biased estimate of π0 . Our new estimate for the false
discovery rate is
λ
[ (t) := π̂0λ · nt
FDR
R(t) ∨ 1
and the natural test would be to reject all hypotheses with p-values pi ≤ τ , where
λ
τ := sup{t ≤ 1 : FDR
[ (t) ≤ q}

In cases where π̂0λ is smaller than 1, say 0.8, this may give us more powerful results than BH
because we have a significant proportion of non-nulls.

There are several drawbacks to this approach. One drawback is that we may have π̂0λ > 1,
in which we are being even more conservative in our estimation. More importantly, the
threshold τ may not even control the FDR at level q. To resolve this issue of FDR control,
we introduce a modified version called Storey’s procedure in the next section.

7-4
STATS 300C Lecture 7 — April 24 Spring 2023

7.4 Storey’s Procedure

Storey’s procedure involves a simple modification of to the estimate of π0 defined in the
previous section. Define
1 + n − R(1/2)
π̂0 :=
n/2
1/2
The only difference between π̂0 and π̂0 is the added 1 in the numerator. Our test now
rejects all hypotheses with p-values pi ≤ τ , where

1 [ 1 + n − R(1/2) nt
τ := sup t ≤ : FDR(t) = · ≤q .
2 n/2 R(t) ∨ 1
Notice that we only take the supremum over t ≤ 21 , which is necessary because the estimate
of π0 used the information of the p-values > 1/2.
Theorem 3. Storey’s procedure controls FDR at level q.

Proof. We use martingale theory similar to what we did in the proof of Theorem 2. It follows
from the definition that FDR(τ
[ ) = q. Now, the FDR for Storey’s procedure is given by

V (τ )
E[FDP(τ )] = E
R(τ ) ∨ 1

V (τ ) nτ 1 + n − R(1/2) n/2
=E · · ·
nτ R(τ ) ∨ 1 n/2 1 + n − R(1/2)

[ ) · V (τ ) ·
= E FDR(τ
n/2
nτ 1 + n − R(1/2)

V (τ ) 1/2
=q·E · . (7.2)
τ 1 + n − R(1/2)
Applying Doob’s Optional Stopping Theorem to the martingale {V (t)/t : t ∈ [0, 1/2]} and
stopping time τ , we have

V (1/2) 1/2
E[FDP(τ )] = q · E ·
1/2 1 + n − R(1/2)

V (1/2)
=q·E
1 + n − S(1/2) − V (1/2)

V (1/2)
≤q·E . (since n1 − S(1/2) ≥ 0)
1 + n0 − V (1/2)
Now V (1/2) ∼ Bin(n0 , 1/2), so the last expectation above is given by
n0 n0
X i n0 −n0 −n0
X n0
2 =2 = 2−n0 (2n0 − 1) = 1 − 2−n0 < 1.
i=1
1 + n0 − i i i=1
i−1

This implies that E[FDP(τ )] ≤ q, as desired to show.

In fact, (7.2) shows why we need the extra +1 in the numerator of π̂0 : otherwise we
would have 0 in the denominator of (7.2) with positive probability!

7-5
STATS 300C Lecture 7 — April 24 Spring 2023

A Appendix
A.1 Proof of Theorem 1
We divide this proof into two cases: S(t) ≥ 1 and S(t) = 0. First suppose S(t) ≥ 1. Then,

[ − FDR(t)] = E nt − V (t)
E[FDR(t)
S(t) + V (t)

nt + S(t)
=E −1
S(t) + V (t)

nt + S(t)
=E E | S(t) − 1. (7.3)
S(t) + V (t)
nt+S(t)
Note that S(t)+V (t)
is convex in V (t), so by Jensen’s inequality,

nt + S(t) nt + S(t)
E E | S(t) ≥ E . (7.4)
S(t) + V (t) S(t) + E[V (t) | S(t)]
Now V (t) and S(t) are independent by assumption, hence
E[V (t) | S(t)] = E[V (t)] = n0 t. (7.5)
Substituting (7.4) and (7.5) into (7.3), we have

[ − FDR(t)] ≥ E nt + S(t)
E[FDR(t) − 1 ≥ 0.
n0 t + S(t)
For the second case, suppose S(t) = 0. Then,

[ − FDR(t)] = E nt − V (t)
E[FDR(t)
V (t) ∨ 1
We know that V (t) ∼ Bin(n0 , t). This implies that E[V (t)∨1] = P(V (t) = 0)+n0 t. Applying
Jensen’s inequality with this identity yields

nt n0 t n0 t
E ≥E ≥
V (t) ∨ 1 V (t) ∨ 1 E[V (t) ∨ 1]
n0 t − E[V (t) ∨ 1]
=1+
E[V (t) ∨ 1]
P(V (t) = 0)
=1−
E[V (t) ∨ 1]
≥ 1 − P(V (t) = 0) = P(V (t) = 1). (7.6)
In addition, note that

V (t)
E = P(V (t) = 1). (7.7)
V (t) ∨ 1

nt − V (t)
Substituting (7.7) into (7.6) yields E ≥ 0, which completes the proof.
V (t) ∨ 1

7-6

Lecture 05
No ratings yet
Lecture 05
6 pages
Sup LAWS
No ratings yet
Sup LAWS
11 pages
Lecture BDS 9-23-24 Print
No ratings yet
Lecture BDS 9-23-24 Print
13 pages
Power of Ordered Hypothesis Testing
No ratings yet
Power of Ordered Hypothesis Testing
18 pages
False Discovery Rate Control Methods
No ratings yet
False Discovery Rate Control Methods
34 pages
Lecture BDS 10-23-24 Print
No ratings yet
Lecture BDS 10-23-24 Print
13 pages
Brownian Motion Problem Solutions
0% (1)
Brownian Motion Problem Solutions
22 pages
Lecture 04
No ratings yet
Lecture 04
9 pages
A Handout On Statistical Approach To Nonparametric Methods
No ratings yet
A Handout On Statistical Approach To Nonparametric Methods
62 pages
1964 On The Amount of Information Concerning An Unknown Parameter in A Sequence of Observations
No ratings yet
1964 On The Amount of Information Concerning An Unknown Parameter in A Sequence of Observations
9 pages
RY-Expected Optimal Exercise Time
No ratings yet
RY-Expected Optimal Exercise Time
10 pages
Exercise3 Solutions
No ratings yet
Exercise3 Solutions
6 pages
Solutions Chapter 3
100% (3)
Solutions Chapter 3
29 pages
Chapter 4
No ratings yet
Chapter 4
26 pages
Answers For Stochastic Calculus For Finance I Steven Shreve Vjul 15 2009
No ratings yet
Answers For Stochastic Calculus For Finance I Steven Shreve Vjul 15 2009
7 pages
Asymptotic Efficiency of Statistical Tests
No ratings yet
Asymptotic Efficiency of Statistical Tests
8 pages
Lecture 10 Test T
No ratings yet
Lecture 10 Test T
4 pages
Lecture 4 Inequalities and Asymptotic Estimates
No ratings yet
Lecture 4 Inequalities and Asymptotic Estimates
9 pages
Stat 245 Homework 3 Solution
No ratings yet
Stat 245 Homework 3 Solution
8 pages
Shreve I I Solutions Chapter 07
No ratings yet
Shreve I I Solutions Chapter 07
8 pages
נוסחאות ואי שיוויונים
No ratings yet
נוסחאות ואי שיוויונים
12 pages
Appendix H - Brownian Motion More Res - 2016 - Computational Finance Using C An
No ratings yet
Appendix H - Brownian Motion More Res - 2016 - Computational Finance Using C An
8 pages
Ec2 5
No ratings yet
Ec2 5
28 pages
Change of Measure For Brownian Motion
No ratings yet
Change of Measure For Brownian Motion
4 pages
MLV4 Hazard
No ratings yet
MLV4 Hazard
46 pages
Rosenthal
No ratings yet
Rosenthal
7 pages
BSDS Slides Module 8 9 11
No ratings yet
BSDS Slides Module 8 9 11
14 pages
A Martingale Approach To Optimal Stopping: Hans Rudolf Lerche University of Freiburg I. Br. Gemany
No ratings yet
A Martingale Approach To Optimal Stopping: Hans Rudolf Lerche University of Freiburg I. Br. Gemany
39 pages
Automatica: Guangchen Wang Hua Xiao Guojing Xing
No ratings yet
Automatica: Guangchen Wang Hua Xiao Guojing Xing
6 pages
103 Sept 2000 Solution
No ratings yet
103 Sept 2000 Solution
9 pages
3 Corrige E
No ratings yet
3 Corrige E
4 pages
Stock Watson 4E Exercisesolutions Chapter3 Instructors
No ratings yet
Stock Watson 4E Exercisesolutions Chapter3 Instructors
25 pages
Stochastic Calculus Guide
No ratings yet
Stochastic Calculus Guide
12 pages
Probability Bounds in Learning Theory
No ratings yet
Probability Bounds in Learning Theory
14 pages
ACTSC445 - Assignment 1 (Q2-Q6) : Ahad Shoaib - 20634235 October 15, 2018
No ratings yet
ACTSC445 - Assignment 1 (Q2-Q6) : Ahad Shoaib - 20634235 October 15, 2018
9 pages
Large Deviations Theory
No ratings yet
Large Deviations Theory
12 pages
2017 Poissonconcentration
No ratings yet
2017 Poissonconcentration
3 pages
ემპირიული პროცესები
No ratings yet
ემპირიული პროცესები
131 pages
Lec 0421
No ratings yet
Lec 0421
6 pages
Burkholder's Method
No ratings yet
Burkholder's Method
6 pages
1 Inequalities: 1.1 Markov
No ratings yet
1 Inequalities: 1.1 Markov
15 pages
Martingales and Muckenhoupt's A1 Condition
No ratings yet
Martingales and Muckenhoupt's A1 Condition
8 pages
Multiple Testing
No ratings yet
Multiple Testing
8 pages
Discussion Notes 2-6
No ratings yet
Discussion Notes 2-6
3 pages
103 April 2000 Solution
No ratings yet
103 April 2000 Solution
10 pages
Understanding Wiener Processes in Finance
No ratings yet
Understanding Wiener Processes in Finance
6 pages
Exercise Solutions
No ratings yet
Exercise Solutions
63 pages
Risk Neutral
No ratings yet
Risk Neutral
12 pages
STA 303 Theory of Estimation 9th Lecture-1
No ratings yet
STA 303 Theory of Estimation 9th Lecture-1
7 pages
HW 4 Key - 217
No ratings yet
HW 4 Key - 217
9 pages
On The Super-Additivity and Estimation Biases of Quantile Contributions
No ratings yet
On The Super-Additivity and Estimation Biases of Quantile Contributions
6 pages
AEJmicro - Appendix - Dynamic Persuaion With Outside Information
No ratings yet
AEJmicro - Appendix - Dynamic Persuaion With Outside Information
27 pages
EC501 Lecture 03
No ratings yet
EC501 Lecture 03
30 pages
X400004 20220215 Solutions
No ratings yet
X400004 20220215 Solutions
8 pages
Pseph QuantileEstimation
No ratings yet
Pseph QuantileEstimation
21 pages
Lecture 06
No ratings yet
Lecture 06
14 pages
Lecture 03
No ratings yet
Lecture 03
11 pages
Sample Jrfcs 2022ch2 Cs2
No ratings yet
Sample Jrfcs 2022ch2 Cs2
62 pages
List of Publications 4 Page KU
No ratings yet
List of Publications 4 Page KU
4 pages
Basic - Econometrics - (Major) Sem 4th - Sol - Material
No ratings yet
Basic - Econometrics - (Major) Sem 4th - Sol - Material
130 pages
Declaration of Helsinki
No ratings yet
Declaration of Helsinki
25 pages
Native American Unit Plan for Fifth Grade
100% (1)
Native American Unit Plan for Fifth Grade
10 pages
Day Et Al 2024 Understanding The Barriers To Hiring Autistic People As Perceived by Employers in The United Kingdom
No ratings yet
Day Et Al 2024 Understanding The Barriers To Hiring Autistic People As Perceived by Employers in The United Kingdom
12 pages
The Causes of A Lack of Discipline Among Secondary
No ratings yet
The Causes of A Lack of Discipline Among Secondary
12 pages
The Impact of Sustainability Considerations On Project Management Practices
No ratings yet
The Impact of Sustainability Considerations On Project Management Practices
18 pages
Elementary Statistics A Brief 6th Edition Bluman Solutions Manual Instant Download
100% (7)
Elementary Statistics A Brief 6th Edition Bluman Solutions Manual Instant Download
26 pages
Singing Exercises For Dummies Pamelia S. Phillips Instant Download
100% (1)
Singing Exercises For Dummies Pamelia S. Phillips Instant Download
103 pages
Crypto Sentiment's Impact on Risk
No ratings yet
Crypto Sentiment's Impact on Risk
9 pages
Urban Design STudio
No ratings yet
Urban Design STudio
65 pages
Unit-1 NET Question
No ratings yet
Unit-1 NET Question
22 pages
Unconstrained Optimization Techniques
No ratings yet
Unconstrained Optimization Techniques
44 pages
Business Unit 4 Assignment - Critical Thinking Questions
No ratings yet
Business Unit 4 Assignment - Critical Thinking Questions
2 pages
Chapter 7 Hypothesis Testing HL
No ratings yet
Chapter 7 Hypothesis Testing HL
2 pages
Summary Session 4 Yupeng Liu
No ratings yet
Summary Session 4 Yupeng Liu
3 pages
End of Term Written Assignments
No ratings yet
End of Term Written Assignments
11 pages
Systematic Survey On Smart Home Safety and Security Systems Using The Arduino Platform
No ratings yet
Systematic Survey On Smart Home Safety and Security Systems Using The Arduino Platform
24 pages
Scannappeal: Targeting Younger Donors
No ratings yet
Scannappeal: Targeting Younger Donors
7 pages
9e1d2968 Social Science Project Assessment Form
No ratings yet
9e1d2968 Social Science Project Assessment Form
2 pages
A Study of Achievement Motivation of Secondary School Students
100% (4)
A Study of Achievement Motivation of Secondary School Students
38 pages
PH Research Study Design
No ratings yet
PH Research Study Design
95 pages
Business Rules Elicitation Guide
No ratings yet
Business Rules Elicitation Guide
33 pages
PS4 PDF
No ratings yet
PS4 PDF
10 pages
Smallest Standard Deviation Explained
No ratings yet
Smallest Standard Deviation Explained
5 pages
B.Tech. Degree Examination Civil, CSE, IT & Mechanical: (Nov-16) (EMA-203)
No ratings yet
B.Tech. Degree Examination Civil, CSE, IT & Mechanical: (Nov-16) (EMA-203)
3 pages
Current Status of Philippine Mollusk Museum Collections and Research, and Their Implications On Biodiversity Science and Conservation
No ratings yet
Current Status of Philippine Mollusk Museum Collections and Research, and Their Implications On Biodiversity Science and Conservation
41 pages
Talavera Senior High School: Roxas Street, Pagasa District, Talavera, Nueva Ecija
No ratings yet
Talavera Senior High School: Roxas Street, Pagasa District, Talavera, Nueva Ecija
18 pages
Book List - 2021-22 (Class 9)
No ratings yet
Book List - 2021-22 (Class 9)
3 pages
STAT1060 Final Exam Overview
No ratings yet
STAT1060 Final Exam Overview
45 pages
Digital Transformation in Education - The Virtual Learning Revolution
No ratings yet
Digital Transformation in Education - The Virtual Learning Revolution
3 pages

Lecture 07

Uploaded by

Lecture 07

Uploaded by

STATS 300C: Theory of Statistics Spring 2023

1. Empirical Process viewpoint of BH procedure.

2. Martingale Theory and FDR control.

3. Improving on the BH procedure.

Much of the material in this lecture is taken from ?.

7.1 The Empirical Process Viewpoint of BH

Figure 7.1: Sorted p-values and BH threshold line.

The BH procedure is equivalent to rejecting all hypotheses with pi ≤ τBH , where

Note that we can equivalently write

H0 accepted H0 rejected Total

The following theorem shows that FDR(t)

Theorem 1. Under independence of p-values, this FDR estimate is biased upwards:

7.2 Martingale Theory and FDR Control

7.3 Improving on the BH procedure

7.4 Storey’s Procedure

This implies that E[FDP(τ )] ≤ q, as desired to show.

You might also like