0% found this document useful (0 votes)
10 views8 pages

Shih 2009

The article discusses a novel method for two-stage sample size reassessment in clinical trials using perturbed unblinding, which allows for variance estimation while keeping treatment effects masked. It highlights the ongoing debate between treatment-blinded and unblinded approaches for sample size reassessment and proposes a solution that maintains study integrity while providing unbiased variance estimates. The method aims to address regulatory concerns and improve the accuracy of sample size calculations in biopharmaceutical research.

Uploaded by

wakeupinsights
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views8 pages

Shih 2009

The article discusses a novel method for two-stage sample size reassessment in clinical trials using perturbed unblinding, which allows for variance estimation while keeping treatment effects masked. It highlights the ongoing debate between treatment-blinded and unblinded approaches for sample size reassessment and proposes a solution that maintains study integrity while providing unbiased variance estimates. The method aims to address regulatory concerns and improve the accuracy of sample size calculations in biopharmaceutical research.

Uploaded by

wakeupinsights
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Statistics in Biopharmaceutical Research

ISSN: (Print) 1946-6315 (Online) Journal homepage: http://www.tandfonline.com/loi/usbr20

Two-Stage Sample Size Reassessment Using


Perturbed Unblinding

Weichung Joe Shih

To cite this article: Weichung Joe Shih (2009) Two-Stage Sample Size Reassessment Using
Perturbed Unblinding, Statistics in Biopharmaceutical Research, 1:1, 74-80, DOI: 10.1198/
sbr.2009.0007

To link to this article: http://dx.doi.org/10.1198/sbr.2009.0007

Published online: 01 Jan 2012.

Submit your article to this journal

Article views: 45

View related articles

Full Terms & Conditions of access and use can be found at


http://www.tandfonline.com/action/journalInformation?journalCode=usbr20

Download by: [Nanyang Technological University] Date: 12 June 2016, At: 13:23
Two-Stage Sample Size Reassessment Using
Perturbed Unblinding

WEICHUNG JOE SHIH


Statistics in Biopharmaceutical Research 2009.1:74-80.

1. Introduction
It is a rare occurrence for a common or leading
Sample size reassessment (SSR) is an increasingly
theme to appear in three peer-reviewed statistical journals
popular strategy for designing and conducting clinical
around the same time. From August to October 2006,
trials. In particular, SSR based on updating the variance
many authors contributed to the discussions in Statistics
estimate is a prudent practice accepted by the regulatory
in Medicine (vol. 25, Oct. 15, 2006), Biometrical Journal
authorities to assure adequate power for a study. Since
(vol. 48, August 2006), and Biometrics (vol. 62, Septem-
its development in the early 1990s, however, debate has
ber 2006), regarding adaptive (or flexible) designs, show-
continued over whether a treatment-blinded or unblinded
ing how dear this subject is to clinical biostatisticians.
approach should be used for SSR based on the variance
Among many kinds of adaptive or flexible designs in
estimate. A blind procedure is preferred from the regu-
clinical trials, sample size reassessment (SSR) is of the
latory standpoint, because it better preserves the study
greatest interest to practitioners (Burman and Sonesson
integrity; however, it does not provide the best-unbiased
2006), as well as the most frequently requested matter for
estimate of the variance. On the other hand, the usual un-
regulatory comments at the U.S. Food and Drug Admin-
blinded analysis reveals the treatment effect, which leads
istration (FDA) from the pharmaceutical industry (Wang
to controversy regarding the interpretation of the targeted
2006). Several factors have contributed to SSR’s popular-
effect size as well as concerns of inflating the Type I error
ity. First, sample size (or, in a general sense, information
and possibly biasing the trial. In this article, we devise a
related to study power) calculation is the most basic re-
novel solution to this problem, one that uses perturbed
quirement for planning all studies, as it is critical to the
unblinding to estimate the variance but still keeps the
success of a study and pertains to the budget considera-
treatment effect masked. We then give a bias-corrected
tion. Second, researchers are always confronted with the
final test, which preserves the Type I error rate. We also
issue of uncertainty regarding the effect size and/or vari-
discuss several points of consideration, with a focus on
ability assumptions needed for sample size calculation.
the issues of SSR, that were raised at the recent work-
Regulatory guidelines (CPMP Working Party on Effi-
shop on adaptive designs held jointly by the U.S. Food
cacy of Medical Products 1995; ICH-E9 Expert Working
and Drug Administration (FDA) and the Pharmaceutical
Group 1999; EMEA/CHMP 2006) recognize this issue
Research and Manufacturers of America (PhRMA). In
very well. Third, compared to other mid-course changes
particular, we propose a switch of paradigm from SSR to
in treatment arms, endpoints, superiority/noninferiority,
a two-stage design for clinical trials to alleviate the con-
phase II/III combination, and so on, methods for SSR
cern of possible “change” of behavior due to “change” of
were perhaps developed earliest in the literature (Shih
sample size.
2001). Last but not least, several commercial or free com-

Key Words: Adaptive designs; Double-blind clinical trials; ICH;


c American Statistical Association
Sample size re-estimation. Statistics in Biopharmaceutical Research
February 2009, Vol. 1, No. 1
DOI: 10.1198/sbr.2009.0007

74
Two-Stage Sample Size Reassessment Using Perturbed Unblinding

puter software packages are now available for performing conventional notation, let
a number of SSR procedures (Wassmer and Vandemeule-
n1
2 X
X
broeke 2006). 2
2
Despite the various methods for sample size re- SW = yi j − y i· /2(n 1 − 1),
i=1 j=1
assessment that have been proposed and used in prac-
tice, a basic debate remains: Should the interim effi- and
cacy data be examined with treatment group blinded n1
2 X
X 2
or unblinded? Regulatory authorities certainly favor ST2 = yi j − y ·· /(2n 1 − 1)
blinded SSR (CPMP Working Party on Efficacy of Med- i=1 j=1
ical Products 1995; ICH-E9 Expert Working Group;
EMEA/CHMP 2006), since it does not unveil the treat- be the sample within-group and total variances, respec-
ment effect size, and hence provides more protection for tively, at the interim stage. When only blind data are
the trial’s integrity. But blinding the treatment group ob- used, the index i for treatment group is masked, hence
viously casts some doubt on the information obtained. yi j would be condensed as yk , where k runs through 1
Estimation of the variance is best done with unblinded to 2n 1 , and the within-group sample mean y i· and vari-
ance SW 2 would not be known. Gould and Shih (1992)
data. To resolve this dilemma, we propose a simple
method that uses the ungrouped interim data for vari- proposed two methods that use only blind data to esti-
mate σ 2 midstream. One is a simple formula based on
Statistics in Biopharmaceutical Research 2009.1:74-80.

ance estimation for continuous efficacy variables but


still masks the treatment effect size (and direction). The the following relationship
method is straightforward and does not require special 2n 1 − 1 2 n1
2
computer software or involvement of another indepen- SW = S − δ̂ 2 ,
2(n 1 − 1) T 4(n 1 − 1)
dent statistician external to the sponsor. In Section 2, we
first briefly review some of the methods currently used in
where δ̂ = y 1· − y 2 . Obviously this formula depends
practice and comment on their shortcomings. In Section
on the within-groups mean difference, which is unknown
3, we describe our new method, and give the final test
for blind data. Instead, one may substitute it with δ ∗ , the
that provides the correct Type I error. Finally, in Section
delta value targeted for power, as a compromise:
4, we discuss several points for consideration on sample
size issues including the operational aspect, and in light 0 2n 1 − 1 2 n1
of the recent debates on this topic, offer a possible future SW2 = ST − δ ∗2 .
2n 1 − 2 4(n 1 − 1)
paradigm change to the concept of SSR.
Other authors have replicated this method (Zucker et
al. 1999; Kieser and Friede 2003) in a very similar way.
2. Current Sample Size Reassessment But obviously, when the hypothesized δ ∗ deviates from
0
Methods δ̂, SW2 will not be a good estimate. Another blind method
is based on the idea of solving a mixture of two normal
Consider testing the hypothesis H0 : δ = 0 for two in- distributions with common variance. It can be done by
dependent normal populations with the primary endpoint moment or maximum likelihood estimation (MLE) meth-
Yi ∼ N (µi , σ 2 ), i = 1, 2, where δ = µ1 − µ2 . An initial ods. For the MLE, the iterative EM algorithm requires
sample size n 0 (per group) is calculated for a two-sided an intelligent starting value of the parameters within the
t test to detect δ = δ ∗ with power 1 − β at the signifi- framework of a typical Phase III study. Although the al-
cance level α. For various reasons we are uncertain about gorithm is easy to program and SAS or S-Plus codes ex-
the assumed value of σ 2 for this initial sample size cal- ist in the public domain, the initial value for iterations
culation. A reasonable remedy would be to examine the may be mishandled by naïve users when applying it to
interim data yi j from j = 1, . . . , n 1 < n 0 patients per different phases of clinical trials (Gould and Shih 2005).
group (i = 1, 2) and to recalculate the sample size based One may also simply use the total variance, but this is
on the updated estimate of σ 2 . In particular, if the origi- over-conservative when the total variance is very much
nal assumption of σ 2 is too small, hence n 0 is not large larger than the within-group variance. Xing and Ganju
enough, we would then need to increase the sample size (2005) proposed another method that uses the enrollment
so that the study will have the desired power of 1 − β to order of subjects and the randomization block size to es-
detect δ = δ ∗ . Whether to decrease the sample size or timate the variance. It applies to normal and non-normal
not is worth consideration, but it does not matter to the variables. However, besides the burden of maintaining
method that we will propose in the next section. the auxiliary information needed, the precision and vari-
The above problem is a general framework for con- ability of such estimates can be unsatisfactory (Xing and
tinuous endpoints with normal approximation. With the Ganju 2005).

75
Statistics in Biopharmaceutical Research: Vol. 1, No. 1

Unblinding, or at least partial unblinding (i.e., repre- programmer’s choice (say, a) to everyone’s Y in one of
senting treatment groups by dummy alphabets), is cer- the treatment groups, and another number (say, b 6= a)
tainly an option. The unblinded pooled within-group to everyone’s Y in the other group. These constants a
sample variance, SW 2 , is the minimum variance unbiased and b are protected by at least the same degree of con-
2
estimator of σ based on the first 2n 1 patients. Some au- fidentiality as the randomization schedule’s seed number
thors in this camp have advocated the use of an Indepen- and blocking factor are; no one in the project team, not
dent Data Monitoring Committee (IDMC) to examine the even the statistician at this time, can know them. Since
unblinded or partially unblinded data. Others, however, these constants are sealed, no “back-calculation” is pos-
feel that another separate independent committee should sible to reveal the interim treatment effect. At the final
be appointed to carry out the SSR (Quinlan and Krams data analysis, the programmer is to remove/subtract these
2006; Chuang-Stein et al. 2006), since SSR is for a dif- constants from the individual patients. This modified un-
ferent purpose, and no other data than the primary end- blinding with shift perturbation is termed “perturbed un-
point should be involved. No matter who will be doing blinding.” (I suppose one can even view that a = b = 0,
the unblind SSR, it puts the unblinded group of experts in that is, no perturbation at all, is a very special choice of
an awkward position to just use the observed sigma but the perturbation factors, when they are also kept secret.)
somehow to disregard the observed delta, even though Of course, the calculation of the sample within-group
how to use the observed delta is also highly controver- variance is not affected by such translations of Y. But the
Statistics in Biopharmaceutical Research 2009.1:74-80.

sial (Hung 2006). Therefore, the best approach would be unblinded treatment effect size will be shifted by a factor
to unblind the data for estimating the variance but still to that is unknown to the project team. The (perturbed) un-
keep the treatment effect size masked. In the next section, blinded sample variance, denoted by S12 , is the best (min-
we propose such a method. imum variance) unbiased estimator of σ 2 based on the
first 2n 1 patients. That is,
 
3. Proposed Method X2 X n1
2 
S12 = SW = (yi j − y i· )2 /2(n 1 − 1)
3.1 Unblinded Estimation of σ 2 Without Unmask- i=1 j=1
ing the Treatment Effect
numerically; however, it is calculated with the perturbed
It is well recognized that a successful conduct of any yi j , especially with the shifted sample mean in each treat-
clinical trial requires cross-departmental coordination. ment group. With S12 we will then recalculate the new
In the pharmaceutical industry, the computer program- sample size n ≥ n 1 (per group) so that the final test will
ming group usually handles the randomization schedule have power of 1 − β to detect δ = δ ∗ at the significance
in a discrete way, after the project statistician provides level α:
the blocking factor, the number of sites, and other re- ( )
quired specifications. The programmer chooses a seed 2(t2(n 1 −1),α/2 + t2(n 1 −1),β )2 S12
number in the randomization generator. Standard oper- n = max n 1 , +1 ,
δ ∗2
ating procedures (SOPs) have long been established, for
instance, to direct the linkage between the randomiza- where t j,k is the upper kth percentile of the t-distribution
tion and packaging department to properly ship the study with j degrees of freedom. This new sample size formula
drugs to investigators. They have also automated the pro- allows for a decrease of sample size from the originally
cess for the analysis programs to run when unblinding the planned n 0 . If we do not want to decrease from n 0 , then
data occurs. Firewalls have been established to prevent ( )
2
others from gaining access to the randomization schedule 2 t2(n 1 −1),α/2 + t2(n 1 −1),β S12
and the associated information that generates it, such as n = max n 0 , +1 .
δ ∗2
the seed number, the block size, and so on. This program-
ming function may be handled in-house or externally by a A general expression is
contracted clinical research organization (CRO). The fol- (
lowing novel method of “perturbed unblinding” allows
one to extend this experience to the sample size reassess- n = max n 1 + n min ,
ment. )
During the interim analysis for SSR, the designated ef- 2(t2(n 1 −1),α/2 + t2(n 1 −1),β )2 S12
+1 ,
ficacy endpoint data, Y, will be unblinded, but with the δ ∗2
following modification. Before the unblinding occurs,
the programmer is instructed by the project statistician, which includes n min = 0 for the former case, n min =
according to an established SOP, to add a number of the n 0 − n 1 for the latter case, and other situations such as

76
Two-Stage Sample Size Reassessment Using Perturbed Unblinding

n min = a fraction of n 1 (due to some patients being en- factors, (a1 , b1 ) and (a2 , b2 ), one for each of the sub-
rolled but not yet reaching the endpoint for SSR). Fur- groups. We will then obtain two unbiased estimates of σ 2
thermore, in practice, we may also need to cap the sam- and, since they are based on the same number of patients,
ple size to a resource permissible upper limit, say n max . we simply average them. This extension adds complexity
Then we will have and masking protection, but loses some minor efficiency
( ( of the estimate. The relative efficiency to the previous
estimate using no subgroup can easily be shown to be
n = min max n 1 + n min ,
(n 1 − 2)/(n 1 − 1).
2 ) ) Furthermore, the SOP can also guide some possible
2 t2(n 1 −1),α/2 + t2(n 1 −1),β S12 ways of making up these shifting constants. For example,
+ 1 , n max .
δ ∗2 the project statistician can do a preliminary calculation
from the blind data to obtain the overall range, quintiles,
Of course, this modified unblinding procedure also works and/or other summary statistics of the primary endpoint
with non-normal data after suitable transformation. In the Y. Then the programmer can pick two from this list, use
next subsection we further discuss the choice of these them either directly or with some modification (such as
perturbation constants together with more consideration dividing the range by 2, 3, or 4, for example) as shift-
on the implementation aspects. ing constants a and b. An advantage of using summary
Statistics in Biopharmaceutical Research 2009.1:74-80.

statistics of the primary endpoint is that the shifting con-


stants are in the same scale as Y. In fact, one can also use
3.2 Choice of the Shifting Constants for Perturba-
summary statistics of the same variable Y from a differ-
tion and Other Implementation Issues
ent study. However, again, this is not mandatory for the
The main feature of the proposed method is to use method; the best approach is to let a and b be completely
shifting constants to mask the treatment difference and, at unspecified. Since the users of the perturbed data (project
the same time, to obtain the same within-group variance statistician and study monitor) are unaware of what these
estimate as that from the unblind data. As for the random- shifting constants are or how they are chosen, they sim-
ization process where someone has to know the random ply do not know how much of the masking is effectual or
seed number, block sizes, source code, and so on, some- ineffectual—that, by itself, is effectual.
one (a programmer) has to know the shifting constants in As an aside, a referee asked whether the SSR can be to-
this case. The critical matter is to have a proper firewall tally handled by a computer program without a program-
between this person and those who monitor the study and mer. Although an entirely automated process is possible,
can see the interim results. Conceivably, different organi- in reality the process would be better handled by a pro-
zations will develop their own standard operating proce- grammer, who will write and operate the computer pro-
dures to carry out this method, but the same principle gram, including inputting the perturbation factors.
should apply. The justification and purpose of any in-
terim analysis is the foremost important consideration. 3.3 Adjustment for the Final Test to Control the
For SSR, the primary endpoint, which determines the Type-I Error Rate
power/sample size of the study, is the only data needed
for its intended purpose; secondary efficacy and safety After the above SSR, when the study is unblinded
data should be avoided during the SSR process. Even the and the interim shifting constants are removed, the usual
patient ID is not necessary for SSR. This is in contrast pooled within-group sample variance, S 2 , is calculated,
to the monitoring safety data, where the primary efficacy and the treatment effect is estimated by the sample mean
endpoint should not be reviewed routinely. The SSR task difference, X N (= Y 1n −Y 2n ), based on a total of N = 2n
is usually conducted once and should be separated from patients (n in each group, found from Section 3.1) at the
safety data monitoring. (These recommendations are in final stage. It has been shown that S 2 underestimates σ 2
fact the norm in the author’s experience of participating (Proschan and Wittes 2000). The (negative) bias of S 2 is
in safety data monitoring and SSR for many studies.) (n 1 − 1)cov(S12 , 1/(n − 1)), which is a function of σ 2 .
For configuring the perturbation, there is a balance be- Hence a naïve t test based on S 2 would inflate the Type I
tween the complexity of the system and its degree of error. On the other hand, S12 is unbiased, but it has a larger
masking protection. As indicated previously, the simplest variance compared to S 2 . If there were a best (minimum
way is to have only two numbers, a and b, as shifting variance) unbiased estimator based on the 2n patients,
constants for the two treatment groups. More complex it would be obtained by the Rao-Blackwell technique,
ways can also be developed. For example, we can ran- taking conditional expectation of S12 given the (com-
domly split the patients in each treatment group into two plete) sufficient statistic. However, Proschan and Wittes
subgroups of equal size. Choose two sets of perturbation (2000) also showed that the minimal sufficient statistic

77
Statistics in Biopharmaceutical Research: Vol. 1, No. 1

(X N , S12 , S 2 ) is not complete, hence the existence of a variability (sigma), since the hypotheses are usually ex-
best unbiased estimate of σ 2 is not guaranteed. Further- pressed in term of the treatment effect size (delta). On
more, since the (negative) bias depends on the unknown the other hand, the sense of a phase III trial being confir-
σ 2 , it cannot be accordingly corrected. Miller (2005) matory still has room for debate. For example, phase III
found the lower bound of the bias trials are seldom a repetition to previous phase II trials for
“confirmation.” A different patient population (e.g., mild
f
c=− , to moderate disease condition for phase II and moderate
ν( f − 2) to severe disease condition for phase III), a different ver-
sion of instrument or technology (e.g., improved densit-
where 2 ometer to measure the bone mineral density or sharpened
2 t f,1−α/2 + t f,1−β
ν= , questionnaires to count symptom scores for phase III af-
δ ∗2 ter phase II trials), and so on, make the phase III study
and f = 2(n 1 −1), and proposed the following estimator: hardly an absolute confirmation of what was observed in
 2 phase II studies. Hence the notion of the confirmatory
2 S − c if n > n 1 + n min aspect of the phase III trial should not be in the sense
Smc =
S2 if n = n 1 + n min . of repetition (but in the sense of testing the same clini-
cal hypothesis). Treatment effect size and variability are
(Proof that Miller’s bias correction c also applies to the
Statistics in Biopharmaceutical Research 2009.1:74-80.

subject to vary from phase II to phase III studies. Using


min-max sample size formula in Section 3.1 is available
the results from previous studies remains problematic. In
from the author.) With this maximum bias-corrected es-
fact, we argue that an absolute sense of study confirma-
timator of σ 2 , the final test statistic becomes
tion would imply too strong a pre-judgmental view of the
XN investigational treatment to allow a phase III trial to be
tmc = p . conducted on ethical grounds. The regulatory concern of
2 /n
2Smc a mid-course alteration “typically changes the emphasis
This test statistic is to be compared to the t2(n−1),1−α/2 . from a confirmatory trial to an exploratory trial” pertains
Miller (2005) showed that this test can bring the actual to other ideas of modification on, for example, hypothe-
Type I error rate very close to the nominal level. ses, primary endpoints, or treatment arms, rather than to
the sample size.

4. Discussion
B. Changing the Sample Size May Induce Behavior
For any adaptive design to be acceptable to regula- Changes in Study Personnel
tory authorities, not only must the methodology be valid,
but also the process that implements the method needs Regulatory concern: With a sample size increase
to be set up so that no potential bias will be introduced some study investigators may get the message that the
by the interim data analysis. In this section, we discuss new treatment is not working as well as predicted, which
several “points for consideration” that were raised at the may cause some patients to withdraw from the study. If
recent FDA/PhRMA Annual Workshop on adaptive de- it were a decrease of sample size, then some investiga-
signs (Wang 2006). We focus only on the issues pertinent tor may guess that the new treatment works better than
to SSR. From this discussion, we also offer a possible fu- predicted. The sponsor may also be encouraged some-
ture paradigm change to the concept of SSR. how. This fluctuation of originally planned sample size
more or less leaks the treatment effect and may then in-
duce certain behavior changes, which would ultimately
A. Confirmatory Aspect of a Late Phase II or III
bias the study.
Study
Discussion: We recommend that the justification of
Regulatory concern: A foremost concern by the reg-
SSR should be given in the protocol, and that the investi-
ulatory authority is that the late phase trials are sup-
gators are informed that the SSR will be conducted only
posed to confirm hypotheses generated in earlier trials.
to ensure sufficient power in the presence of uncertainty
The need to reassess sample size (and other design speci-
regarding the inter-patient variability; no one will have
fications) “typically changes the emphasis from a confir-
information regarding the treatment effect size. No other
matory trial to an exploratory trial.”
specific information about the SSR shall be provided to
Discussion: For sample size calculation based on the investigators. We also recommend that when the new
a continuous endpoint, this concern is probably more sample size is calculated based on the updated variance
on the hypothesized treatment effect size than on the estimate, a new patient entry cut-off day can be esti-

78
Two-Stage Sample Size Reassessment Using Perturbed Unblinding

mated accordingly with the updated patient enrollment size is planned to detect a MCIE of magnitude of δ with
and dropout rates. This new patient entry cut-off day, but two-sided alpha = 0.05 and 90% power, then a p = 0.05
not the sample size, can be provided to the investigators associated with an observed effect size at completion of
later when the target number of patients is near. The in- the study means that the estimated effect size will be ap-
vestigators will not be informed whether the new cut-off proximately only 60% of δ (Hung 2006), causing a con-
date is due to slower-than-planed enrollment rate or the troversy because a statistically significant result would
SSR. no longer be clinically acceptable. There also could be
concerns over bias resulting from knowledge of interim
Our ultimate solution to this concern is: do not think observed treatment effect.
of change; think of a two-stage design. In reviewing the
original sample size reassessment or internal pilot meth- Discussion: We believe that the original δ ∗ at which
ods one finds that the starting literature was the 1945 the power of the study is targeted should be used con-
Stein’s two-stage sampling method (Shih 2001; Proschan sistently throughout the study. A delta has to be given
2005). The application and extension of Stein’s method for a study regardless of whether the design is sequential
to the current standard practice of providing sample size or a fixed size. The debate as to whether the delta is in a
estimation in a study protocol leads to the term of sample sense of MCIE, an investigator’s expected/wishful effect,
size reassessment. However, in Stein’s idea, there was no or marketable difference can go on without reference to
“original sample size,” only the sample size of the first flexible designs. Therefore, it is better to use the same
Statistics in Biopharmaceutical Research 2009.1:74-80.

stage (n 1 ) and that of the next stage (n 2 ) derived from approach one uses for the fixed sample size design. For
the results of the n 1 samples. Hence, if we adopt a simi- the second stage, this delta should not change. The sam-
lar two-stage design concept, there will be no “originally ple size for the second stage should use the original delta
planned total sample size” to change from. A careful re- and the only updating information will be the variance
view of Sections 2 and 3, where the new method is pro- estimate. This also agrees with Stein’s original method.
posed, shows that we never use the original sample size SSR based on updating the variance estimate has no such
n 0 after it was introduced. This concept of “two-stage de- controversy. If, however, there is much uncertainty about
sign, not a sample size recalculation” was delineated in a treatment effect, then the above proposed SSR can be fol-
commentary paper by Shih (2006). If for budgetary rea- lowed by a group sequential plan with the resized total
sons we need to give a sample size at the beginning of a information, as recommended by Gould and Shih (1998).
study, then we give the expected sample size or a range of
sample sizes. Actually, it is true that in any sequential de- D. Intentional or Unintentional Dissemination of the
sign one cannot give a fixed sample size, but an expected Interim Results
sample size since sample size depends on interim results
and is a random variable. In fact the practice of not giv- Regulatory concern (EMEA/CHMP 2006): “It is al-
ing a fixed but an expected sample size is quite common ways difficult to convincingly demonstrate that no un-
for Phase-I dose finding algorithm-based designs such as blinded interim results have been released. Interim anal-
“3 + 3” or the general “ A + B” designs (Lin and Shih yses, therefore, always introduce the possibility of dam-
2001) as well as for the model-based designs such as aging the integrity of a trial. A balance has to be achieved
CRM (Continual Reassessment Method) in cancer stud- between the needs for assessing accumulating informa-
ies. In single-arm Phase IIA cancer studies, Simon’s pro- tion and the risk of damaging the integrity of the trial.
cedure (Simon 1989) or its variation (Lin and Shih 2004) Routinely breaking the blind should be avoided.”
are also standard two-stage designs that do not provide a
Discussion: This is more an issue for interim anal-
fixed sample size in the protocol, only rules to indicate
yses with early stopping in mind (for efficacy, futility,
how the second stage sample size depends on the pos-
or safety) than it is for SSR. The EMEA draft reflection
sible outcome of the first-stage samples. This paradigm
paper (EMEA/CHMP 2006), where the excerpt comes
switch from SSR to two-stage designs is the future way
from, actually supports what is recommended in this ar-
to resolve this change of sample size issue.
ticle and elsewhere, which is that SSR would be best
done with treatment effect masked, (by total blinding
C. Second-Stage Sample Size Depends on What Delta or the proposed perturbed unblinding), performed sep-
arately from the safety data monitoring process, and con-
Regulatory concern: It is unclear how to postulate a ducted only once during the trial. The discussion item
new delta. If the original delta (treatment difference) (b) also mentioned that the investigators are notified only
is the so-called minimum clinically important effect with updated patient enrollment cut-off date, not the new
(MCIE), then the study should be planned to rule out the sample size. Therefore, with the proposed SSR, there is
MCIE, not simply to rule out a zero effect. If the sample minimal analysis and little chance of dissemination of in-

79
Statistics in Biopharmaceutical Research: Vol. 1, No. 1

formation on the treatment effect at the interim stage. Kieser, M., and Friede (2003), “Simple Procedures for Blinded Sample
Size Adjustment that do not Affect the Type I Error Rate,” Statis-
tics in Medicine, 22, 3571–3581.
Acknowledgments Lin, Y., and Shih, W.J. (2001), “Statistical Properties of the Traditional
Algorithm-Based Designs for Phase-I Cancer Clinical Trials,” Bio-
The work is partially supported by grant no. CA-72720-10. The author statistics, 2, 203–215.
would like to thank Dirk Moore for his helpful review of the initial draft (2004), “Adaptive Two-Stage Designs for Single-Arm Phase
of this article and the Associate Editor and referees for their helpful IIA Cancer Clinical Trials,” Biometrics, 60, 482–490.
comments. A part of this article was presented at the National Health
Miller, F. (2005), “Variance Estimation in Clinical Studies with Interim
Research Institute and Center for Drug Evaluation Joint Symposium on
Sample Size Reestimation,” Biometrics, 61, 355–361.
“Current Advanced Statistical Issues in Clinical Trials-Adaptive De-
signs” at Taipei, November 25, 2006. The author also benefited from Proschan, M. (2005), “Two-Stage Sample Size Re-estimation Based on
discussions with Jim Hung, Sue-Jane Wang, and Gordon Lan. a Nuisance Parameter: A Review,” Journal of Biopharmaceutical
Statistics, 15, 559–574.
[Received March 2007. Revised August 2007.]
Proschan, M.A., and Wittes, J. (2000), “An Improved Double Sampling
Procedure Based on the Variance,” Biometrics, 56, 1183–1187.
References Quinlan, J.A., and Krams, M. (2006), “Implementing Adaptive De-
signs: Logistical and Operational Considerations,” Drug Informa-
Burman, C.-F., and Sonesson, C. (2006), “Are Flexible Designs tion Journal, 40, 437–444.
Sound?” Biometrics, 62, 664–683. Shih, W. J. (2001), “Commentary: Sample Size Re-estimation—
Journey for a Decade,” Statistics in Medicine, 20, 515–518.
Statistics in Biopharmaceutical Research 2009.1:74-80.

CPMP Working Party on Efficacy of Medical Products (1995), “Bio-


statistics Methodology in Clinical Trials in Applications for Mar- (2006), “Commentary: Group Sequential, Sample Size Re-
keting Authorization for Medical Products,” Statistics in Medicine, estimation and Two-Stage Adaptive Designs in Clinical Trials: A
14, 1659–1682. Comparison,” Statistics in Medicine, 25, 933–941.
Chuang-Stein, C., Anderson, K., Gallo, P., and Collins, S. (2006), Simon, R.M. (1989), “Optimal Two-Stage Designs for Phase II Clinical
“Sample Size Re-estimation: A Review and Recommendations,” Trials,” Controlled Clinical Trials, 10, 1–10.
Drug Information Journal, 40, 475–484. Wang, S.-J. (2006), “Regulatory Experience of Adaptive Designs in
EMEA (European Medicines Agency) CHMP (Committee for Medic- Well-Controlled Clinical Trials,” FDA/PhRMA Annual Workshop
inal Products for Human Use) (2006), “Reflection Paper on on Adaptive Designs: Opportunities, Challenges and Scope in
Methodological Issues in Confirmatory Clinical Trials with Flexi- Drug Development, North Bethesda, MD.
ble Design and Analysis Plan,” London, March 23, 2006, doc. ref. Wassmer, G., and Vandemeulebroeke, M. (2006), “A Brief Review on
CHMP/EWP/2459/02. Available online at http:// www.emea.eu.int. Software Developments for Group Sequential and Adaptive De-
Gould, A. L., and Shih, W. J. (1992), “Sample Size Re-estimation With- signs,” Biometrical Journal, 48, 732–737.
out Unblinding for Normally Distributed Outcomes with Unknown Xing, B., and Ganju, J. (2005), “A Method to Estimate the Variance
Variance,” Communications in Statistics (A)—Theory and Meth- of an Endpoint from an On-going Blinded Trial,” Statistics in
ods, 21, 2833–2853. Medicine, 24, 1807–1814.
(1998), “Modifying the Design of Ongoing Trials Without Un- Zucker, D., Wittes, J.T., Schabenberger, O., and Brittan, E. (1999), “In-
blinding,” Statistics in Medicine , 17, 89–100. ternal Pilot Studies II: Comparison of Various Procedures,” Statis-
(2005), Letter to the Editor, Statistics in Medicine, 24, 147– tics in Medicine, 18, 3493–3509.
154.
Hung, H.-M. J. (2006), “Adaptive Clinical Trial Designs: Ready for
Prime Time?” Discussion presented at 2004 Harvard-MIT Divi-
About the Authors
sion of Health Science and Technology Workshop, Statistics in
Medicine, 25, 3313–3314. Weichung Joe Shih, Department of Biostatistics, University of
ICH-E9 Expert Working Group (1999), “Statistical Principles for Clin- Medicine and Dentistry of New Jersey and Division of Biometrics,
ical Trials (ICH Harmonized Tripartite Guideline E-9),” Statistics The Cancer Institute of New Jersey 683 Hoes Lane West, P.O.
in Medicine, 18, 1905–1942. Box 9, Piscataway, NJ 08854 (E-mail: [email protected]).

80

You might also like