0% found this document useful (0 votes)
28 views6 pages

Lecture 07

The document discusses improvements to the Benjamini-Hochberg procedure for controlling the false discovery rate. It describes viewing the procedure through the lens of empirical processes and introduces a martingale theory proof that it controls the FDR. The document also proposes using the distribution of p-values to improve the FDR estimation compared to the original BH procedure.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views6 pages

Lecture 07

The document discusses improvements to the Benjamini-Hochberg procedure for controlling the false discovery rate. It describes viewing the procedure through the lens of empirical processes and introduces a martingale theory proof that it controls the FDR. The document also proposes using the distribution of p-values to improve the FDR estimation compared to the original BH procedure.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

STATS 300C: Theory of Statistics Spring 2023

Lecture 7 — April 24
Lecturer: Prof. Emmanuel Candès Scribe: Aditya Ghosh

 Warning: These notes may contain factual and/or typographic errors. They are
based on Emmanuel Candès’s course from 2018 to 2023, and scribe notes written by
Julie Zhang and Amber Hu.

Outline
Agenda: False Discovery Rate.

1. Empirical Process viewpoint of BH procedure.

2. Martingale Theory and FDR control.

3. Improving on the BH procedure.

Much of the material in this lecture is taken from ?.

7.1 The Empirical Process Viewpoint of BH


In previous lectures, we discussed the Benjamini-Hochberg (BH) procedure (?) by looking at
the sorted p-values on the x-axis and whether they fall below a critical line. An alternative
way to view BH is to flip the axes and view the sorted p-values on the y-axis. This is illus-
trated in the following figure. This alternative view allows us to describe the BH procedure

p-values F̂ (t)
t/↵

↵i/n

i0 /n i/n
t0 t
(a) p-values on the y axis, indices on x (b) p-values on the x axis, indices on y

Figure 7.1: Sorted p-values and BH threshold line.

7-1
STATS 300C Lecture 7 — April 24 Spring 2023

in terms of an empirical process. The coordinates on the y-axis of Fig. 7.1b are the values
of the empirical CDF on the p-values, defined as
#{i : pi ≤ t}
Fbn (t) := .
n
Denote by p(i) the ith smallest p-value, so that p(1) ≤ · · · ≤ p(n) . Let H(i) be the hypothesis
corresponding to p(i) . The BH procedure rejects H(1) , . . . , H(i0 ) where
 
qi
i0 := max i : p(i) ≤ .
n
The critical p-value is p∗ = p(i0 ) and can be written as
 
∗ i
p = max p(i) : p(i) ≤ q
n
n o
= max p(i) : p(i) ≤ q Fbn (p(i) )
n o
= max t ∈ {p1 , . . . , pn } : t ≤ q Fn (t) .
b

If the above set is empty, we set p∗ = q/n by convention. The above calculation combined
with Fig. 7.1 gives us the following viewpoint of the BH procedure.

The BH procedure is equivalent to rejecting all hypotheses with pi ≤ τBH , where


( )
t
τBH := max t : ≤q .
Fbn (t) ∨ 1/n

Note that we can equivalently write


 
1 t
τBH = max t : Fbn (t) ∨ ≥ ,
n q

which directly leads to the interpretation of BH provided in Fig. 7.1. Notice that τBH ≥ q/n.

This formulation has a simple interpretation. Let t ∈ (0, 1) be fixed and consider rejecting all
Hi with p-values pi ≤ t. We can construct the rejection/acceptance table for the hypotheses
whose values depend on t.

H0 accepted H0 rejected Total


H0 true U (t) V (t) n0
H0 false T (t) S(t) n − n0 = n1
n − R(t) R(t) n

We define
V (t)
FDP(t) := , FDR(t) := E[FDP(t)].
R(t) ∨ 1

7-2
STATS 300C Lecture 7 — April 24 Spring 2023

The idea is to choose the threshold t as large as possible while controlling the FDR at level
q. If we had an estimate FDR
[ of the FDR, we define a threshold τ as

τ := sup{t ≤ 1 : FDR(t)
[ ≤ q},

and define the rejection rule to reject Hi if and only if pi ≤ τ , where τ is a data-dependent
threshold. This method defines the most liberal thresholding cutoff that controls FDR(t),
[
and we hope that it will also control the true FDR(t) at level q. The first question is how
to estimate FDR(t).
iid
Assuming pi ∼ Unif(0, 1) under H0 , we have E[V (t)] = n0 t, but n0 is not known. A
conservative estimate of n0 t is nt, which leads to our first estimate
nt t
FDR(t)
[ := = . (7.1)
R(t) ∨ 1 Fbn (t) ∨ 1/n
This leads us to exactly the BH procedure since
 
nt
τ = sup{t ≤ 1 : FDR(t) ≤ q} = sup t ≤ 1 :
[ ≤q = τBH .
R(t) ∨ 1

The following theorem shows that FDR(t)


[ is a conservatively biased estimate. This is
good, because a procedure that controls FDR(t)
[ also controls FDR(t) at level q. The proof
of this result is relegated to Section A.1, it mainly uses Jensen’s inequality.

Theorem 1. Under independence of p-values, this FDR estimate is biased upwards:

E[FDR(t)]
[ ≥ FDR(t).

7.2 Martingale Theory and FDR Control


Using the Empirical Process viewpoint discussed in Section 7.1 and Martingale theory, we
can give an alternate proof of the fact that the BH procedure controls the FDR. Note that
the FDR for BH is same as E[FDP(τBH )].

Theorem 2 (?). The procedure rejecting all hypotheses with pi ≤ τBH controls the
FDR:
E[FDP(τBH )] = qn0 /n.

Proof. Note that Ft = σ(V (s), R(s) : t ≤ s ≤ 1) is a backwards filtration: for t1 < t2 ,
Ft2 ⊂ Ft1 . We now show that {V (t)/t}t∈[0,1] is a backwards/reverse martingale. For s ≤ t,
we have  
V (s) 1 s V (t)
E Ft = · V (t) = ,
s s t t
where the first inequality follows from the fact that under Ft , V (t) = #{pi : pi ≤ t, Hi null}
and these pi ∼ U [0, t] and are independent, so (s/t)V (t) many should contribute to V (s).

7-3
STATS 300C Lecture 7 — April 24 Spring 2023

Next, write τ = τBH for brevity. Note that knowing V (s), R(s) = nFbn (s) for s ≥ t will
determine whether τ ≤ t. Hence {τ ≤ t} ∈ Ft , and thus τ is a stopping time w.r.t. {Ft }.

We are now ready to apply Doob’s Optional Stopping Theorem (after all, that’s why we
bring Martinagales!). By definition, R(τ ) ∨ 1 = nτ /q. Therefore,
     
V (τ ) q V (τ ) q V (1)
FDR(τ ) = E = E = E ,
R(τ ) ∨ 1 n τ n 1
by Doob’s OST. Since V (1) = n0 , this completes the proof.

7.3 Improving on the BH procedure


In our estimate FDR(t)
[ in (7.1), we used the simple conservative bound π0 = n0 /n ≤ 1.
Here, we consider using the distribution of p-values to improve this estimate. Fix λ ∈ [0, 1)
and define
n − R(λ)
π̂0λ :=
(1 − λ)n
We usually will take λ = 1/2, while λ = 0 recovers the BH procedure. The motivation for
this estimation is the following:
n0 − V (λ) + n1 − S(λ)
π̂0λ =
(1 − λ)n
We expect the non-null p-values to be small: n1 − S(λ) ≪ n. We also expect V (λ) ≈ λn0 .
Thus,
n0 − V (λ) n0 − λn0 n0
π̂0λ ≈ ≈ = .
(1 − λ)n (1 − λ)n n
Also note that
n − E[R(λ)] n − n1 − n0 λ n0
E[π̂0λ ] = ≥ = = π0 .
(1 − λ)n (1 − λ)n n
Thus, in general, π̂0λ is a conservatively biased estimate of π0 . Our new estimate for the false
discovery rate is
λ
[ (t) := π̂0λ · nt
FDR
R(t) ∨ 1
and the natural test would be to reject all hypotheses with p-values pi ≤ τ , where
λ
τ := sup{t ≤ 1 : FDR
[ (t) ≤ q}

In cases where π̂0λ is smaller than 1, say 0.8, this may give us more powerful results than BH
because we have a significant proportion of non-nulls.

There are several drawbacks to this approach. One drawback is that we may have π̂0λ > 1,
in which we are being even more conservative in our estimation. More importantly, the
threshold τ may not even control the FDR at level q. To resolve this issue of FDR control,
we introduce a modified version called Storey’s procedure in the next section.

7-4
STATS 300C Lecture 7 — April 24 Spring 2023

7.4 Storey’s Procedure


Storey’s procedure involves a simple modification of to the estimate of π0 defined in the
previous section. Define
1 + n − R(1/2)
π̂0 :=
n/2
1/2
The only difference between π̂0 and π̂0 is the added 1 in the numerator. Our test now
rejects all hypotheses with p-values pi ≤ τ , where
 
1 [ 1 + n − R(1/2) nt
τ := sup t ≤ : FDR(t) = · ≤q .
2 n/2 R(t) ∨ 1
Notice that we only take the supremum over t ≤ 21 , which is necessary because the estimate
of π0 used the information of the p-values > 1/2.
Theorem 3. Storey’s procedure controls FDR at level q.

Proof. We use martingale theory similar to what we did in the proof of Theorem 2. It follows
from the definition that FDR(τ
[ ) = q. Now, the FDR for Storey’s procedure is given by
 
V (τ )
E[FDP(τ )] = E
R(τ ) ∨ 1
 
V (τ ) nτ 1 + n − R(1/2) n/2
=E · · ·
nτ R(τ ) ∨ 1 n/2 1 + n − R(1/2)
 
[ ) · V (τ ) ·
= E FDR(τ
n/2
nτ 1 + n − R(1/2)
 
V (τ ) 1/2
=q·E · . (7.2)
τ 1 + n − R(1/2)
Applying Doob’s Optional Stopping Theorem to the martingale {V (t)/t : t ∈ [0, 1/2]} and
stopping time τ , we have
 
V (1/2) 1/2
E[FDP(τ )] = q · E ·
1/2 1 + n − R(1/2)
 
V (1/2)
=q·E
1 + n − S(1/2) − V (1/2)
 
V (1/2)
≤q·E . (since n1 − S(1/2) ≥ 0)
1 + n0 − V (1/2)
Now V (1/2) ∼ Bin(n0 , 1/2), so the last expectation above is given by
n0   n0  
X i n0 −n0 −n0
X n0
2 =2 = 2−n0 (2n0 − 1) = 1 − 2−n0 < 1.
i=1
1 + n0 − i i i=1
i−1

This implies that E[FDP(τ )] ≤ q, as desired to show.


In fact, (7.2) shows why we need the extra +1 in the numerator of π̂0 : otherwise we
would have 0 in the denominator of (7.2) with positive probability!

7-5
STATS 300C Lecture 7 — April 24 Spring 2023

A Appendix
A.1 Proof of Theorem 1
We divide this proof into two cases: S(t) ≥ 1 and S(t) = 0. First suppose S(t) ≥ 1. Then,
 
[ − FDR(t)] = E nt − V (t)
E[FDR(t)
S(t) + V (t)
 
nt + S(t)
=E −1
S(t) + V (t)
  
nt + S(t)
=E E | S(t) − 1. (7.3)
S(t) + V (t)
nt+S(t)
Note that S(t)+V (t)
is convex in V (t), so by Jensen’s inequality,
    
nt + S(t) nt + S(t)
E E | S(t) ≥ E . (7.4)
S(t) + V (t) S(t) + E[V (t) | S(t)]
Now V (t) and S(t) are independent by assumption, hence
E[V (t) | S(t)] = E[V (t)] = n0 t. (7.5)
Substituting (7.4) and (7.5) into (7.3), we have


[ − FDR(t)] ≥ E nt + S(t)
E[FDR(t) − 1 ≥ 0.
n0 t + S(t)
For the second case, suppose S(t) = 0. Then,
 
[ − FDR(t)] = E nt − V (t)
E[FDR(t)
V (t) ∨ 1
We know that V (t) ∼ Bin(n0 , t). This implies that E[V (t)∨1] = P(V (t) = 0)+n0 t. Applying
Jensen’s inequality with this identity yields
   
nt n0 t n0 t
E ≥E ≥
V (t) ∨ 1 V (t) ∨ 1 E[V (t) ∨ 1]
n0 t − E[V (t) ∨ 1]
=1+
E[V (t) ∨ 1]
P(V (t) = 0)
=1−
E[V (t) ∨ 1]
≥ 1 − P(V (t) = 0) = P(V (t) = 1). (7.6)
In addition, note that
 
V (t)
E = P(V (t) = 1). (7.7)
V (t) ∨ 1
 
nt − V (t)
Substituting (7.7) into (7.6) yields E ≥ 0, which completes the proof.
V (t) ∨ 1

7-6

You might also like