\corrauth

Fan Li, Department of Biostatistics, Yale School of Public Health, 135 College Street, New Haven, CT 06510, USA.

Can discrete-time analyses be trusted for stepped wedge trials with continuous recruitment?

Hao Wang1,2    Guangyu Tong1,2,3    Heather Allore1,4    Monica Taljaard5,6    and Fan Li1,2,3,∗ 1Department of Biostatistics, Yale School of Public Health, CT, USA
2Center for Methods in Implementation and Prevention Science, Yale School of Public Health, CT, USA
3Section of Cardiovascular Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA
4Section of Geriatrics Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, US
5Methodological and Implementation Research Program, The Ottawa Hospital Research Institute, Ottawa, ON, Canada
6School of Epidemiology and Public Health, University of Ottawa, Ottawa, ON, Canada
[email protected]
Abstract

In stepped wedge cluster randomized trials (SW-CRTs), interventions are sequentially rolled out to clusters over multiple periods. It is common practice to analyze SW-CRTs using discrete-time linear mixed models, in which measurements are considered to be taken at discrete time points. However, a recent systematic review found that 95.1% of cross-sectional SW-CRTs recruit individuals continuously over time. Despite the high prevalence of designs with continuous recruitment, there has been limited guidance on how to draw model-robust inference when analyzing such SW-CRTs. In this article, we investigate through simulations the implications of using discrete-time linear mixed models in the case of continuous recruitment designs with a continuous outcome. First, in the data-generating process, we characterize continuous recruitment with a continuous-time exponential decay correlation structure in the presence or absence of a continuous period effect, addressing scenarios both with and without a random or exposure-time-dependent intervention effect. Then, we analyze the simulated data under three popular discrete-time working correlation structures: simple exchangeable, nested exchangeable, and discrete-time exponential decay, with a robust sandwich variance estimator. Our results demonstrate that discrete-time analysis often yields minimum bias, and the robust variance estimator with the Mancl and DeRouen correction consistently achieves nominal coverage and type I error rate. One important exception occurs when recruitment patterns vary systematically between control and intervention periods, where discrete-time analysis leads to slightly biased estimates. Finally, we illustrate these findings by reanalyzing a concluded SW-CRT.

keywords:
Cluster Randomized Trials, Continuous-Time Decay, Linear Mixed Models, Model Misspecification, Robust Sandwich Variance, Recruitment Pattern

1 Introduction

In stepped wedge cluster randomized trials (SW-CRTs), interventions are sequentially rolled out to clusters (such as hospitals or clinics) over multiple periods.Hussey2007 Once a cluster has crossed over to the intervention condition, it remains in the intervention until the end of the trial. Three major types of SW-CRT designs exist depending on whether different or the same individuals are included in each cluster-period: cross-sectional, closed-cohort, or open-cohort designs.Copas2015 Although these designs are often conceptualized and presented in distinct time periods, in practice, recruiting individuals continuously over time is almost always the case. Nevin et al.Nevins2024 reviewed 160 SW-CRTs published from January 2016 to March 2022, and found that 76.3% were cross-sectional SW-CRTs. Among these trials, 95.1% implemented continuous-time individual recruitment. Despite this practice, most existing methodological developments for cross-sectional designs are based on regression models with discrete time periods that rarely reflect the nature of the individual recruitment process.Hooper2019; Hooper2021 Figure 1 depicts two SW-CRTs with either discrete sampling or continuous recruitment of the participants. Under discrete sampling designs, individuals are recruited at fixed time points (typically once per period), whereas continuous recruitment designs allow individuals to enter the trial at arbitrary times within each period. This distinction was first mentioned when Copas et al.Copas2015 identified “continuous recruitment short exposure designs” as designs in which individuals are recruited continuously, exposed only briefly, and assessed just once under either control or intervention condition. Hooper and CopasHooper2019 emphasized the need to distinguish continuous recruitment designs from discrete sampling in the design and analysis of SW-CRTs; they have pointed out that understanding continuous recruitment characteristics is essential for identifying contamination risks and informing appropriate design and analysis decisions.

Refer to caption
Figure 1: A comparison of two standard SW-CRTs, each consisting of 3 clusters over 4 periods of equal length, with discrete sampling in Figure 1(a) and continuous recruitment in Figure 1(b). The x-axis indicates the exact recruitment time tijk(j1,j]t_{ijk}\in(j-1,j] for the kk-th individual from the ii-th cluster in the jj-th period. Each SW-CRT has 3 sequences with exactly one cluster assigned to each sequence, where the cluster in sequence q{1,2,3}q\in\{1,2,3\} begins receiving the intervention in period j=q+1j=q+1. The purple cells indicate the treatment condition, and the white cells indicate the control condition. For illustration purposes, each cluster-period cell includes exactly 4 individuals. Figure 1(a) depicts discrete sampling, where exactly 4 individuals are sampled at distinct intervals in every cluster-period. In contrast, Figure 1(b) depicts continuous recruitment, where individuals enter the trial at arbitrary times throughout each period.

A growing body of literature has focused on developing appropriate design methods for SW-CRTs with continuous recruitment. For instance, Grantham et al.Grantham2019 found that relying on the conventional discrete-time model in the presence of continuous recruitment often leads to an underestimation of the required sample size. Hooper et al.Hooper2020 developed a computational algorithm to identify efficient, incomplete designs of SW-CRTs with continuous recruitment and continuous outcomes. More recently, Hooper et al.Hooper2024 proposed optimal designs for three-sequence SW-CRTs with continuous recruitment. Despite these advances with respect to study planning, there has been limited attention to analyzing SW-CRTs with continuous recruitment, nor evidence that discrete-time analysis of such designs delivers credible results. In this article, we tackle this practical question and study whether conventional discrete-time linear mixed model can be trusted when used to analyze SW-CRTs with continuous-time recruitment. The essential ingredients of the discrete-time linear mixed model are a period effect to adjust for the secular trend, the intervention effects of interest, and terms to account for sources of heterogeneity, including within-cluster correlations.Li2020 In the presence of continuous-time recruitment, such a linear mixed model can be considered as a misspecified model. In the context of SW-CRTs, several prior studies demonstrated that misspecification of the correlation structure under linear mixed models can lead to substantial bias when using model-based variance estimators.Kasza2019b; Bowden2021; Voldal2022 More recently, Ouyang et al.Ouyang2024 illustrated through a simulation study that the robust variance estimator (RVE), also referred to as the sandwich variance estimator, can provide nominal coverage for the intervention effect under linear mixed models even when the correlation structure is misspecified (under appropriate small-sample correction). Wang et al.Wang2024 showed that the linear mixed model is robust against misspecification of covariate effects, the correlation structure, and the error structure, as long as the intervention effect structure (whether treatment effect varies by calendar and/or exposure time) is correctly specified. However, these studies have all be carried out in the context of discrete-time sampling and have not at all considered the setting where the true data-generating process includes a continuous-time recruitment element. Hence, despite such prior knowledge, the performance of discrete-time linear mixed models in SW-CRTs with continuous recruitment has not been empirically evaluated.

Our primary objective is to investigate the implications of using discrete-time linear mixed models to analyze SW-CRTs with continuous recruitment. Three popular forms of discrete-time correlation structures have been widely used in practice: the exchangeable (EXCH),Hussey2007 the nested exchangeable (NE),Girling2016; Hooper2016 and discrete-time exponential decay (DTD)Kasza2019a; Kasza2019b correlation structures. Additionally, researchers can include random intervention effects to capture heterogeneity in treatment effects across clusters through a random cluster-by-intervention interaction term.Hemming2018b With respect to the intervention effect, different models may be used, depending on whether the intervention effect is assumed to be instantaneous or varying with exposure time, that is, the time elapsed since the intervention was first introduced to a cluster;Kenny2022; Maleyeff2022; Wang2024 Beyond these elements, continuous recruitment is fundamentally different from discrete sampling: individuals enter trials at arbitrary times throughout periods, creating temporal orderings within cluster-periods. We therefore characterize continuous recruitment in the data-generating process by two additional aspects. First, we use continuous period effects to replace discrete counterparts. Second, we adopt the continuous-time decay (CTD) correlation structure, which assumes that correlations between individual-level outcomes within clusters decay exponentially as the time between recruitments increases.Grantham2019; Hooper2021; Hooper2024 This decay pattern reflects the real-world scenario where individuals recruited closer in time experience more similar contextual factors. Importantly, because CTD depends on exact recruitment times rather than period indices, it accommodates different recruitment patterns. These can be quantified as cluster-period-specific enrollment density, i.e., the distribution of recruitment times in each period for each cluster. Section 2.3.2 provides several empirically motivated recruitment patterns. Although CTD typically represents the true data-generating process for continuous recruitment, implementing it as a working model can be computationally challenging or even infeasible in practice; we return to a discussion on this topic in Section 5. This practical constraint motivates our central research question: can standard discrete-time linear mixed models with robust variance estimation provide valid inference for SW-CRTs with continuous recruitment? We investigate this question through extensive simulations and further illustrate our findings by reanalyzing a real SW-CRT.

The remainder of this article is organized as follows. Section 2 introduces the setting, estimands, different linear mixed model formulations, and robust variance estimators. Section 3 presents a simulation study to investigate the performance of discrete-time linear mixed models in SW-CRTs with continuous recruitment under different true data-generating models. Section 4 provides a reanalysis of a completed SW-CRT using discrete-time linear mixed models when individuals are recruited continuously, and Section 5 concludes.

2 Considerations in Linear Mixed Models for Cross-Sectional SW-CRTs

2.1 Set Up and Notation

We consider a standard and complete SW-CRT with II clusters, JJ equally spaced periods, and KijK_{ij} individuals per cluster-period. To deliver key ideas, we focus on a cross-sectional design, where different individuals are observed in each cluster over time. All clusters are in the control condition when j=1j=1 and the treatment condition when j=Jj=J. Let YijkY_{ijk} be the observed outcome for the kk-th individual (k=1,,Kij)(k=1,\ldots,K_{ij}) from the ii-th cluster (i=1,,I)(i=1,\ldots,I) in the jj-th period (j=1,,J)(j=1,\ldots,J), and ZijZ_{ij} be the binary treatment indicator which is equal to 1 when the ii-th cluster in the jj-th period is under intervention and 0 otherwise. There are Q=J1Q=J-1 intervention sequences and the number of clusters in the qq-th sequence (q=1,,Q)(q=1,\ldots,Q) is IqI_{q} with q=1QIq=I\sum_{q=1}^{Q}I_{q}=I. If the ii-th cluster is in the qq-th sequence, where the intervention starts at the (q+1)(q+1)-th period, then its treatment indicator vector is 𝐙i=(Zi1,,ZiJ)\mathbf{Z}_{i}=(Z_{i1},\ldots,Z_{iJ}) with Zij=0Z_{ij}=0 for 1jq1\leq j\leq q and Zij=1Z_{ij}=1 for q<jJq<j\leq J. For SW-CRTs with continuous recruitment, we let tijk(j1,j]t_{ijk}\in(j-1,j] be the exact recruitment time for the kk-th individual from the ii-th cluster in the jj-th period. Note that under a cross-sectional design, the period jj is uniquely determined by the recruitment time itself, so the subscript jj is auxiliary; nonetheless, we keep this index to facilitate discussion of recruitment patterns across different cluster-periods in Section 2.3.2. In practice, routinely collected information (such as calendar date or days since trial initiation) can be mapped to tijkt_{ijk}. The fractional part of tijkt_{ijk} represents the proportion of the period jj that has elapsed until enrollment. For example, suppose the kk-th individual from the ii-th cluster is enrolled on the day 30 of a trial in which the first period (i.e, j=1j=1) spans the initial 60 days, then ti1k=0.5t_{i1k}=0.5. Similar standardization procedures can be used when such recruitment-time information is available.

2.2 Estimands

We first describe a linear mixed model with a generic model representation in Li et al.Li2020 and then introduce specific model variants under discrete-time versus continuous-time considerations. The essential ingredients for a linear mixed model in SW-CRTs are

Yijk=secular trend+intervention effect+heterogeneity+residual error,\displaystyle Y_{ijk}=\text{{secular trend}}+\text{{intervention effect}}+\text{{heterogeneity}}+\text{{residual error}}, (1)

where YijkY_{ijk} is assumed to be continuous. The residual error term is also independent and identically distributed as ϵijk𝒩(0,σϵ2)\epsilon_{ijk}\sim\mathcal{N}(0,\sigma_{\epsilon}^{2}). For period effect and heterogeneity terms, we discuss their different specifications under either discrete sampling or continuous recruitment in Section 2.3. The intervention effect term is the change of the mean outcome in each period from each sequence and includes the parameter of interest. For example, one may be interested in a constant intervention effect δ\delta. Alternatively, when the intervention effect varies by the exposure time, one can target the intervention effect curve δ(s)\delta(s), with s=jq{1,,J1}s=j-q\in\{1,\ldots,J-1\} indicating the duration of exposure. Under this scenario, the estimand of interest is the exposure time-averaged intervention effect:

Δ=1J1s=1J1δ(s),\displaystyle\Delta=\frac{1}{J-1}\sum_{s=1}^{J-1}\delta(s),

where δ(s)\delta(s) is the point intervention effect at the ss-th exposure time. When the intervention effect curve does not change over time, the intervention effect term becomes a constant with δ(s)=δ\delta(s)=\delta for s{1,,J1}s\in\{1,\ldots,J-1\}, which also implies Δ=δ\Delta=\delta. Here, Δ\Delta averages over (0,J1](0,J-1] following recent conventions,Kenny2022; Maleyeff2022; wang2025anticipation although alternative intervals may be of interest depending on the research question.

2.3 Envisioning the Data-Generating Process Under Continuous Recruitment

We characterize possible linear mixed model formulations based on (1) with special considerations in continuous recruitment processes. This will be considered as the data-generating process when evaluating the performance of the conventional discrete-time models.

2.3.1 Period Effect

Because the intervention is confounded with time, modeling the background secular trend is necessary to remove bias in estimating the effect that is attributable solely to the intervention.Hussey2007; Hemming2017 The conventional formulation of the period effect in Hussey and HughesHussey2007 model under discrete sampling is typically given by

μ+β2𝟙(j=2)++βJ𝟙(j=J),\displaystyle\mu+\beta_{2}\mathbbm{1}(j=2)+\ldots+\beta_{J}\mathbbm{1}(j=J),

where μ\mu is the grand mean and 𝜷=(β2,,βJ)\bm{\beta}=(\beta_{2},\ldots,\beta_{J}) are the secular trend parameters for all periods (β1=0\beta_{1}=0 for identifiability). This saturated specification utilizes a nonparametric representation of the period effect term and is sufficient for trials with a limited number of discrete periods.Li2020 In contrast, the period effect term under continuous recruitment can be defined asHooper2024

μ+T(tijk),\displaystyle\mu+T(t_{ijk}),

where T()T(\cdot) is a function that describes the underlying continuous effect of time on expected outcome in terms of recruitment time tijkt_{ijk}. Here, we implicitly assume the same functional form of the period effect across all clusters.

2.3.2 Heterogeneity

One key implication of the cluster-level randomization is that individual-level outcomes in the same cluster tend to be positively correlated in SW-CRTs.Taljaard2020 This within-cluster interference is often characterized by the intracluster correlation coefficient (ICC),Murray2007 and must be taken into account during analysis to obtain valid statistical inference.Turner2017 In the linear mixed model framework, the ICC is parameterized through random-effect specifications that assume a correlation structure on the marginal outcome vector for each cluster. Three widely-used discrete structures are EXCH, NE, and DTD; see Section 3.4 in Li et al.Li2020 for a review of this topic. Under the EXCH, NE, and DTD structures, the heterogeneity term is αi\alpha_{i}, αi+γij\alpha_{i}+\gamma_{ij}, and γij\gamma_{ij}, respectively. The EXCH structure specifies αi𝒩(0,τα2)\alpha_{i}\sim\mathcal{N}(0,\tau_{\alpha}^{2}), which implies a common ICC both within and across periods.Hughes2015 The NE structure introduces an additional term γij𝒩(0,τγ2)\gamma_{ij}\sim\mathcal{N}(0,\tau_{\gamma}^{2}), which is the random cluster-by-period interaction independent of the random cluster effect αi\alpha_{i}, and distinguishes a within-period ICC (correlation between measurements from two individuals in the same cluster-period) and a constant between-period ICC (correlation between measurements from two individuals in the same cluster but different periods).Girling2016; Hooper2016 The DTD structure assumes (γi1,,γiJ)𝒩(0,τγ2𝐌(r0,r))(\gamma_{i1},\ldots,\gamma_{iJ})^{\prime}\sim\mathcal{N}(0,\tau_{\gamma}^{2}\mathbf{M}(r_{0},r)) and 𝐌(r0,r)\mathbf{M}(r_{0},r) follows a first-order auto-regressive structure

𝐌(r0,r)=(1r0rr0r2r0rJ1r0r1r0rr0rJ2r0rJ1r0rJ2r0rJ31)\displaystyle\mathbf{M}(r_{0},r)=\left(\begin{array}[]{ccccc}1&r_{0}r&r_{0}r^{2}&\cdots&r_{0}r^{J-1}\\ r_{0}r&1&r_{0}r&\cdots&r_{0}r^{J-2}\\ \vdots&\vdots&\vdots&\ddots&\vdots\\ r_{0}r^{J-1}&r_{0}r^{J-2}&r_{0}r^{J-3}&\cdots&1\end{array}\right)

which allows the between-period ICC decays exponentially.Kasza2019a; Kasza2019b As a side note, the EXCH and NE structures are returned by 𝐌(1,1)\mathbf{M}(1,1) and 𝐌(r0,1)\mathbf{M}(r_{0},1), respectively. Following Kasza et al.,Kasza2019a we consider 𝐌(1,r)\mathbf{M}(1,r) for the DTD structure in this article.

The correlation structure under continuous-time recruitment patterns can be significantly more complicated. We primarily simulate various continuous recruitment patterns through the use of a CTD correlation structure. The CTD assumes {γi,minjJ,kKij(tijk),,γi,maxjJ,kKij(tijk)}𝒩(0,τγ2𝐌~)\{\gamma_{i,\min_{j\in J,k\in K_{ij}}(t_{ijk})},\ldots,\gamma_{i,\max_{j\in J,k\in K_{ij}}(t_{ijk})}\}^{\prime}\sim\mathcal{N}(0,\tau_{\gamma}^{2}\widetilde{\mathbf{M}}), which is a KiK_{i\cdot}-dimensional vector of random effects across all measurement times (ordered chronologically from earliest to latest) within the ii-th cluster. Here, Ki=j=1JKijK_{i\cdot}=\sum_{j=1}^{J}K_{ij} is the number of individuals in the ii-th cluster across all periods, and 𝐌~\widetilde{\mathbf{M}} is a Ki×KiK_{i\cdot}\times K_{i\cdot} matrix with ones on the diagonal and r|tijktijk|r^{|t_{ijk}-t_{ij^{\prime}k^{\prime}}|} on the off-diagonal for jjj\neq j^{\prime} and kKijk^{\prime}\in K_{ij^{\prime}} following Grantham et al.Grantham2019

We quantify the recruitment pattern in terms of cluster-period-specific enrollment density, that is, the distribution of individual enrollment timing in each period for each cluster. Motivated by empirical evidence of continuous recruitment patterns in Section 4, we first consider three basic continuous recruitment patterns:

  • Uniform pattern: Individual enrollment times are uniformly distributed during each period, with tijkUniform(j1,j]t_{ijk}\sim\text{Uniform}(j-1,j].

  • Normal pattern: Individual enrollment times follow a truncated normal distribution, where tijk𝒩(0,1)t_{ijk}\sim\mathcal{N}(0,1) is rescaled to the range (j1,j](j-1,j] using max-min normalization, creating a bell-shaped enrollment curve centered within each period.

  • Exponential pattern: Individual enrollment times follow a truncated exponential distribution, where tijkExp(1.5)t_{ijk}\sim\text{Exp}(1.5) is rescaled to the range (j1,j](j-1,j] using max-min normalization, resulting in early-heavy enrollment during each period.

To capture the heterogeneity in recruitment dynamics observed in real-world trials (see Figure 3), we additionally introduce two mixed recruitment patterns that combine these simple distributions:

  • Cluster mixed pattern: Each cluster ii is randomly assigned to one of the three distributions (uniform, normal, or exponential) with probability 1/31/3, and all individuals within that cluster follow the assigned distribution across all periods. This pattern maintains enrollment consistency within clusters while allowing heterogeneity across clusters.

  • Cluster-period mixed pattern: Each cluster-period (i,j)(i,j) independently draws one of the three distributions (uniform, normal, or exponential) with probability 1/31/3, generating tijkt_{ijk} accordingly. This pattern allows maximum heterogeneity in enrollment dynamics both across clusters and within clusters over time.

Figure 2 provides a visual comparison of these two mixed patterns, illustrating the different levels of heterogeneity in enrollment dynamics. We consider an SW-CRT with I=4I=4 clusters, J=5J=5 periods, and Kij=10,000K_{ij}=10,000 individuals per cluster-period, where the large sample size produces stabilized enrollment density surfaces for illustration. In Figure 2(a), the cluster mixed pattern assigns each cluster to one of three enrollment distributions (uniform, normal, or exponential) that remains constant across all periods, resulting in consistent surface shapes within clusters but variation across clusters. In Figure 2(b), the cluster-period mixed pattern allows each cluster-period combination to independently draw from the three distributions, producing heterogeneous surfaces both within and across clusters. The key difference is that the cluster mixed pattern maintains temporal consistency within each cluster, whereas the cluster-period mixed pattern allows enrollment patterns to change between periods for each cluster.

2.3.3 Intervention Effect

To account for potential variation across clusters in the magnitude of intervention effect, we also consider an random intervention effect,Hemming2018b which can be expressed as vi×Zijv_{i}\times Z_{ij}. Here, vi𝒩(0,τv2)v_{i}\sim\mathcal{N}(0,\tau_{v}^{2}) is the random intervention effect quantifying the departure from the cluster-specific intervention effect relative to the average δ\delta (or δ(s)\delta(s)). In Table 1, we summarize ICC-related parameters under EXCH, NE, DTD, and CTD structures with or without an random intervention. As a side note, the definitions of the within- and between-period ICCs are different under control and intervention conditions in the presence of an random intervention effect. Therefore, we let ρ0\rho_{0}, ρ1\rho_{1} be the within-period ICC under control and intervention, respectively. We further introduce the cluster autocorrelation coefficient (CAC), which is the ratio of between- and within-ICCs and is generally smaller than one because the variance component for the random cluster-by-period interaction is typically greater than zero.Hooper2016 For the DTD structure, the CAC measures the decay rate per period.Kasza2019a

Although we consider continuous-time recruitment patterns in our data-generating process, we maintain the discrete intervention effect structure. This modeling choice is motivated by two reasons. First, incorporating individual-level exposure time into the intervention effect introduces substantial complexity, particularly when the effect depends on the duration since each individual’s enrollment; in this latter case, there has been little work to date on defining the treatment effect estimands that may be relevant and interpretable. Second, for cluster-level interventions that are typical in SW-CRTs, the provider or cluster exposure time is more relevant than individual exposure time, especially when the intervention targets organizational practices or policies in continuous recruitment short exposure designs.Hooper2016 Therefore, we continue to specify the intervention effect based on cluster-level exposure time (periods since intervention adoption) rather than individual-level exposure duration, depiste continuous-time individual recruitment. This helps us define the target estimands that are essential for our simulation study.

(a) Cluster Mixed Pattern
Refer to caption
(b) Cluster-Period Mixed Pattern
Refer to caption
Figure 2: An illustration of the cluster and cluster-period mixed patterns in panels (a) and (b), respectively. We consider an SW-CRT with I=4I=4, J=5J=5, and Kij=10,000K_{ij}=10,000, where the large sample size of 10,000 per cluster-period is specifically chosen to generate stabilized curves for illustration. Each colored surface represents the enrollment density for a specific cluster over time, with height indicating the number of individuals enrolled at each continuous time point.

2.4 Robust Variance Estimators

For discrete-time linear mixed models, we consider maximum likelihood estimators of the period effect (μ,𝜷)(\mu,\bm{\beta}), the intervention effect (δ\delta or Δ\Delta depending on the working intervention structure), and the heterogeneity (τα2\tau_{\alpha}^{2}, (τα2,τγ2)(\tau_{\alpha}^{2},\tau_{\gamma}^{2}), τγ2\tau_{\gamma}^{2} or rr depending on the working correlation structure) based on the observed data {(Yijk,Zij):i=1,,I,j=1,,J,k=1,,Kij}\{(Y_{ijk},Z_{ij}):i=1,\ldots,I,j=1,\ldots,J,k=1,\ldots,K_{ij}\}. In general, the linear mixed model can be written as

𝐘i=𝐃i𝜻+𝐄i𝐮i+ϵi,\displaystyle\mathbf{Y}_{i}=\mathbf{D}_{i}\bm{\zeta}+\mathbf{E}_{i}\mathbf{u}_{i}+\bm{\epsilon}_{i},

where 𝐘i=(Yi11,,YiJKiJ)\mathbf{Y}_{i}=(Y_{i11},\ldots,Y_{iJK_{iJ}})^{\prime} is the Ki×1K_{i\cdot}\times 1 vector of outcomes for all individuals across all periods in the ii-th cluster, 𝐃i\mathbf{D}_{i} (or 𝐄i\mathbf{E}_{i}) is the design matrix of fixed (or random) effects for each cluster, 𝜻=(μ,𝜷,δ)\bm{\zeta}=(\mu,\bm{\beta},\delta)^{\prime} (or 𝐮i\mathbf{u}_{i}) is the fixed (or random) effect parameters, and ϵi=(ϵi11,,ϵiJKiJ)\bm{\epsilon}_{i}=(\epsilon_{i11},\ldots,\epsilon_{iJK_{iJ}})^{\prime} is the residual error. Here, 𝐮i𝒩(0,𝐑i)\mathbf{u}_{i}\sim\mathcal{N}(0,\mathbf{R}_{i}) and , where 𝐑i\mathbf{R}_{i} is the working correlation matrix in Section 2.3.2. Furthermore, we define the working covariance structure for the outcome vector 𝐘i\mathbf{Y}_{i} as

𝐖i=𝕍(𝐘i)=𝐄i𝐑i𝐄i+σϵ2𝐈Ki,\displaystyle\mathbf{W}_{i}=\mathbb{V}(\mathbf{Y}_{i})=\mathbf{E}_{i}\mathbf{R}_{i}\mathbf{E}_{i}^{\prime}+\sigma_{\epsilon}^{2}\mathbf{I}_{K_{i\cdot}},

where 𝐈Ki\mathbf{I}_{K_{i\cdot}} is the Ki×KiK_{i\cdot}\times K_{i\cdot} identity matrix. Then, the fixed effect parameters are estimated using the weighted least squares estimator:

𝜻^=(i=1I𝐃i𝐖i1𝐃i)1i=1I𝐃i𝐖i1𝐃i.\displaystyle\hat{\bm{\zeta}}=\left(\sum_{i=1}^{I}\mathbf{D}_{i}^{\prime}\mathbf{W}_{i}^{-1}\mathbf{D}_{i}\right)^{-1}\sum_{i=1}^{I}\mathbf{D}_{i}^{\prime}\mathbf{W}_{i}^{-1}\mathbf{D}_{i}.

We can further define its model-based variance estimator of 𝜻^\hat{\bm{\zeta}} as

𝕍Naive(𝜻^)=(i=1I𝐃i𝐖i1𝐃i)1.\displaystyle\mathbb{V}_{\text{Naive}}(\hat{\bm{\zeta}})=\left(\sum_{i=1}^{I}\mathbf{D}_{i}^{\prime}\mathbf{W}_{i}^{-1}\mathbf{D}_{i}\right)^{-1}.

An alternative variance estimator for 𝜻^\hat{\bm{\zeta}} is the robust variance estimator by Liang and Zeger:Liang1986

𝕍RVE(𝜻^)={(i=1I𝐃i𝐖i1𝐃i)1(i=1I𝐃i𝐖i1𝐫i𝐫i𝐖i1𝐃i)(i=1I𝐃i𝐖i1𝐃i)1},\displaystyle\resizebox{433.62pt}{}{$\mathbb{V}_{\text{RVE}}(\hat{\bm{\zeta}})=\left\{\left(\sum_{i=1}^{I}\mathbf{D}_{i}^{\prime}\mathbf{W}_{i}^{-1}\mathbf{D}_{i}\right)^{-1}\left(\sum_{i=1}^{I}\mathbf{D}_{i}^{\prime}\mathbf{W}_{i}^{-1}\mathbf{r}_{i}\mathbf{r}_{i}^{\prime}\mathbf{W}_{i}^{-1}\mathbf{D}_{i}\right)\left(\sum_{i=1}^{I}\mathbf{D}_{i}^{\prime}\mathbf{W}_{i}^{-1}\mathbf{D}_{i}\right)^{-1}\right\}$},

where 𝐫i=𝐘i𝐃i𝜻^\mathbf{r}_{i}=\mathbf{Y}_{i}-\mathbf{D}_{i}\hat{\bm{\zeta}} is the residual vector for the ii-th cluster. However, this standard RVE may lead to inflated type I error when the number of clusters is small (e.g., I<50I<50).Maas2004 To address this potential issue, Ouyang et al.Ouyang2024 recommended Mancl and DeRouen (MD)Mancl2001 with I2I-2 degrees of freedom correction (DoF) for SW-CRTs with fewer than 32 clusters. Specifically, we can write

𝕍RVEMD(𝜻^)={(i=1I𝐃i𝐖i1𝐃i)1(i=1I𝐃i𝐖i1𝐀i𝐫i𝐫i𝐀i𝐖i1𝐃i)(i=1I𝐃i𝐖i1𝐃i)1},\displaystyle\resizebox{433.62pt}{}{$\mathbb{V}_{\text{RVE}}^{\text{MD}}(\hat{\bm{\zeta}})=\left\{\left(\sum_{i=1}^{I}\mathbf{D}_{i}^{\prime}\mathbf{W}_{i}^{-1}\mathbf{D}_{i}\right)^{-1}\left(\sum_{i=1}^{I}\mathbf{D}_{i}^{\prime}\mathbf{W}_{i}^{-1}\mathbf{A}_{i}\mathbf{r}_{i}\mathbf{r}_{i}^{\prime}\mathbf{A}_{i}^{\prime}\mathbf{W}_{i}^{-1}\mathbf{D}_{i}\right)\left(\sum_{i=1}^{I}\mathbf{D}_{i}^{\prime}\mathbf{W}_{i}^{-1}\mathbf{D}_{i}\right)^{-1}\right\}$},

where 𝐀i=𝐈Ki𝐇i\mathbf{A}_{i}=\mathbf{I}_{K_{i\cdot}}-\mathbf{H}_{i} is the adjustment matrix and 𝐇i=𝐃i(i=1I𝐃i𝐖i1𝐃i)1𝐃i𝐖i1\mathbf{H}_{i}=\mathbf{D}_{i}(\sum_{i=1}^{I}\mathbf{D}_{i}^{\prime}\mathbf{W}_{i}^{-1}\mathbf{D}_{i})^{-1}\mathbf{D}_{i}^{\prime}\mathbf{W}_{i}^{-1} is the cluster-leverage matrix. Extensive simulations have demonstrated that coverage percentage of the 95% confidence interval (CI) based on 𝕍RVEMD(δ^)\mathbb{V}_{\text{RVE}}^{\text{MD}}(\hat{\delta}) achieves the nominal 95% with as few as eight clusters, at the expense of a slight overestimation of standard errors; see Section 3.6 in Ouyang et al.Ouyang2024 for a comparison of different small-sample correction methods for RVEs when analyzing SW-CRTs, under discrete sampling.

Table 1: Summary of intracluster correlation coefficients (ICCs) and cluster autocorrelation coefficients (CACs) under under simple exchangeable (EXCH), nested exchangeable (NE), discrete-time exponential decay (DTD), and continuous-time decay (CTD) in the presence of a random intervention (RI) effect. A check mark ([Uncaptioned image]) indicates the presence of a structure, and a cross mark ([Uncaptioned image]) indicates its absence.
Correlation Structure Parameters Condition
RI EXCH NE DTD CTD Control Intervention
[Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] ICC τα2τα2+τε2\frac{\tau_{\alpha}^{2}}{\tau_{\alpha}^{2}+\tau_{\varepsilon}^{2}} τα2+τυ2τα2+τυ2+τε2\frac{\tau_{\alpha}^{2}+\tau_{\upsilon}^{2}}{\tau_{\alpha}^{2}+\tau_{\upsilon}^{2}+\tau_{\varepsilon}^{2}}
[Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] Within‑period ICC τα2+τγ2τα2+τγ2+τε2\frac{\tau_{\alpha}^{2}+\tau_{\gamma}^{2}}{\tau_{\alpha}^{2}+\tau_{\gamma}^{2}+\tau_{\varepsilon}^{2}} τα2+τγ2+τυ2τα2+τγ2+τυ2+τε2\frac{\tau_{\alpha}^{2}+\tau_{\gamma}^{2}+\tau_{\upsilon}^{2}}{\tau_{\alpha}^{2}+\tau_{\gamma}^{2}+\tau_{\upsilon}^{2}+\tau_{\varepsilon}^{2}}
[Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] Between‑period ICC τα2τα2+τγ2+τε2\frac{\tau_{\alpha}^{2}}{\tau_{\alpha}^{2}+\tau_{\gamma}^{2}+\tau_{\varepsilon}^{2}} τα2+τυ2τα2+τγ2+τυ2+τε2\frac{\tau_{\alpha}^{2}+\tau_{\upsilon}^{2}}{\tau_{\alpha}^{2}+\tau_{\gamma}^{2}+\tau_{\upsilon}^{2}+\tau_{\varepsilon}^{2}}
[Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] CAC τα2τα2+τγ2\frac{\tau_{\alpha}^{2}}{\tau_{\alpha}^{2}+\tau_{\gamma}^{2}} τα2+τυ2τα2+τγ2+τυ2\frac{\tau_{\alpha}^{2}+\tau_{\upsilon}^{2}}{\tau_{\alpha}^{2}+\tau_{\gamma}^{2}+\tau_{\upsilon}^{2}}
[Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] Within‑period ICC τγ2τγ2+τε2\frac{\tau_{\gamma}^{2}}{\tau_{\gamma}^{2}+\tau_{\varepsilon}^{2}} τγ2+τυ2τγ2+τυ2+τε2\frac{\tau_{\gamma}^{2}+\tau_{\upsilon}^{2}}{\tau_{\gamma}^{2}+\tau_{\upsilon}^{2}+\tau_{\varepsilon}^{2}}
[Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] Between‑period ICC (jjj\neq j^{\prime}) τγ2r|jj|τγ2+τε2\frac{\tau_{\gamma}^{2}r^{|\,j-j^{\prime}\,|}}{\tau_{\gamma}^{2}+\tau_{\varepsilon}^{2}} τγ2r|jj|+τυ2τγ2+τυ2+τε2\frac{\tau_{\gamma}^{2}r^{|\,j-j^{\prime}\,|}+\tau_{\upsilon}^{2}}{\tau_{\gamma}^{2}+\tau_{\upsilon}^{2}+\tau_{\varepsilon}^{2}}
[Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] CAC (jjj\neq j^{\prime}) r|jj|r^{|\,j-j^{\prime}\,|} τγ2r|jj|+τυ2τγ2+τυ2\frac{\tau_{\gamma}^{2}r^{|\,j-j^{\prime}\,|}+\tau_{\upsilon}^{2}}{\tau_{\gamma}^{2}+\tau_{\upsilon}^{2}}
[Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] Within‑period ICC τγ2τγ2+τε2\frac{\tau_{\gamma}^{2}}{\tau_{\gamma}^{2}+\tau_{\varepsilon}^{2}} τγ2+τυ2τγ2+τυ2+τε2\frac{\tau_{\gamma}^{2}+\tau_{\upsilon}^{2}}{\tau_{\gamma}^{2}+\tau_{\upsilon}^{2}+\tau_{\varepsilon}^{2}}
[Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] Between‑period ICC (jj,kKijj\neq j^{\prime},\,k^{\prime}\in K_{ij^{\prime}}) τγ2r|tijktijk|τγ2+τε2\frac{\tau_{\gamma}^{2}r^{|\,t_{ijk}-t_{ij^{\prime}k^{\prime}}\,|}}{\tau_{\gamma}^{2}+\tau_{\varepsilon}^{2}} τγ2r|tijktijk|+τυ2τγ2+τυ2+τε2\frac{\tau_{\gamma}^{2}r^{|\,t_{ijk}-t_{ij^{\prime}k^{\prime}}\,|}+\tau_{\upsilon}^{2}}{\tau_{\gamma}^{2}+\tau_{\upsilon}^{2}+\tau_{\varepsilon}^{2}}
[Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] CAC (jj,kKijj\neq j^{\prime},\,k^{\prime}\in K_{ij^{\prime}}) r|tijktijk|r^{|\,t_{ijk}-t_{ij^{\prime}k^{\prime}}\,|} τγ2r|tijktijk|+τυ2τγ2+τυ2\frac{\tau_{\gamma}^{2}r^{|\,t_{ijk}-t_{ij^{\prime}k^{\prime}}\,|}+\tau_{\upsilon}^{2}}{\tau_{\gamma}^{2}+\tau_{\upsilon}^{2}}

EXCH: simple exchangeable; NE: nested exchangeable; DTD: discrete-time exponential decay; RI: random intervention; ICC: intracluster correlation coefficient; CAC: cluster autocorrelation coefficient.

3 Simulation Study

In this section, we conduct a simulation study to evaluate the performance of discrete-time linear mixed model estimators when data are generated under continuous-time recruitment. Section 2.3 provides a framework for conceptualizing the transition from discrete-time to continuous-time recruitment patterns, but adding such structures directly into analytical models is computationally challenging due to the high-dimensional correlation structure, even when using Markov Chain Monte Carlo maximum likelihood approximation.Samuel2024 We therefore generate data using continuous-time recruitment, but analyze these data using discrete-time working models. This reflects the common practice in analyzing cross-sectional SW-CRTs,Nevins2024 and we thus aim to evaluate the credibility of discrete-time linear mixed models under model misspecification that arises from continuoue-time individual recruitment. The simulation scenarios are summarized in Table 2 for ease of reference.

3.1 Trial Design

We consider a standard, complete, cross-sectional SW-CRT with II clusters, JJ equally spaced periods, Q=J1Q=J-1 intervention sequences, and KK individuals in the jj-th period from the ii-th cluster. Here, we first assume an equal number of individuals per cluster-period and return to a discussion on the topic of unequal cluster-period sizes in Section 3.5.7. A standard SW-CRT assumes a balanced design where II is divisible by QQ and sequence qq has Iq=I/QI_{q}=I/Q clusters for q{1,,Q}q\in\{1,\ldots,Q\}. The intervention adoption period for clusters from sequence qq is in period j=q+1j=q+1.

Table 2: Specifications of true and working models in the simulation study. For the period effect, “[Uncaptioned image]” indicates continuous specification and “[Uncaptioned image]” indicates discrete specification. For the intervention effect, under the “Time Varying” column, “[Uncaptioned image]” indicates an exposure-time-dependent effect and “[Uncaptioned image]” indicates a constant effect; under the “RI” column, “[Uncaptioned image]” indicates the presence of a random intervention (RI) effect and “[Uncaptioned image]” indicates its absence. For the correlation structure, “[Uncaptioned image]” indicates that the presence of a specific structure and “[Uncaptioned image]” indicates its absence. All true models assume a continuous-time decay (CTD) correlation structure, and working models assume one of three discrete-time structures: simple exchangeable (EXCH), nested exchangeable (NE), or discrete-time exponential decay (DTD).
Scenario True Model Working Model Simulation Study
Period Effect Intervention Effect Correlation Structure Period Effect Intervention Effect Correlation Structure
Continuous Time Varying RI CTD Continuous Time Varying RI EXCH NE DTD
1 [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] I
2 [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image]
3 [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image]
4 [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] I, II, III, V, VII
5 [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image]
6 [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image]
7 [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] -
8 [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image]
9 [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image]
10 [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] IV
11 [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image]
12 [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image]
13 [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] -
14 [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image]
15 [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image]
16 [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] VI, VII
17 [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image]
18 [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image]
19 [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] -
20 [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image]
21 [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image]
22 [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] -
23 [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image]
24 [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image]

EXCH: simple exchangeable; NE: nested exchangeable; DTD: discrete-time exponential decay; RI: random intervention.

3.2 Data-Generating Process Under Continuous Recruitment

Within the framework of linear mixed models, continuous recruitment can be characterized in two complementary ways. First, when individuals enter the study at continuous time intervals rather than at a set of single discrete time points, it is reasonable to replace the discrete period effect with its continuous counterpart. Second, the correlation structure can be utilized to reflect various recruitment patterns. Unlike DTD, which relies only on discrete period indices, the CTD structure depends on the exact timing of each individual recruitment. In particular, the CTD structure assumes that the correlation between individual-level outcomes within the same cluster continuously decays as the gap between their recruitment times increases. Therefore, the true model under the CTD structure can accommodate (potentially different) recruitment time distributions within each cluster-period combination. Motivated by the actual recruitment patterns observed in the Australian disinvestment and reinvestment trials in Figure 3, we consider three recruitment patterns from ideal to more realistic settings: 1) uniform recruitment, where individuals are recruited evenly over time within each cluster-period; 2) cluster mixed recruitment, where recruitment patterns vary across clusters but remain consistent within each cluster across periods; and 3) cluster-period mixed recruitment, where recruitment patterns vary both between clusters and within clusters across different periods. All three recruitment patterns are formally defined in Section 2.3.2.

In Table 2, we consider 8 specifications of the true data-generating process. Under scenarios 7-9, the true model is most similar to the Hussey and Hughes model:

Yijk=μ+βj+δZij+γi,tijk+ϵijk,\displaystyle Y_{ijk}=\mu+\beta_{j}+\delta Z_{ij}+\gamma_{i,t_{ijk}}+\epsilon_{ijk},

which assumes a discrete period effect, a constant intervention effect, and a CTD (instead of an EXCH) correlation structure. Scenarios 10-12 introduce greater complexity by replacing the discrete period effect βj\beta_{j} with a continuous period effect T(tijk)T(t_{ijk}). Scenarios 1-6 parallel scenarios 7-12, but introduce additional variability through a random constant intervention effect, replacing δZij\delta Z_{ij} with (δ+vi)Zij(\delta+v_{i})Z_{ij}. Lastly, scenarios 13-24 further extend the previous 12 scenarios by allowing the intervention effect to vary with the exposure time, replacing δ\delta with δ(s)\delta(s).

3.3 Simulation Parameters

Table 3: Range of parameter values used in the simulation study under the continuous-time decay (CTD) correlation structures, with or without a random intervention (RI) effect.
Parameter Values
Number of clusters (II) (16,32)(16,32)
Number of periods (JJ) 55
Number of sequences (QQ) 44
Number of individuals per cluster in each period (KK) 5050
Continuous recruitment pattern (tijkt_{ijk}) Uniform, Cluster-Mixed, Cluster-Period Mixed
Period effect βj=0.5j2/J\beta_{j}=0.5j^{2}/J or T(tijk)={0.5t2+sin(6πt)}/JT(t_{ijk})=\{0.5t^{2}+\sin(6\pi t)\}/J
Intervention effect size (δ,Δ)(\delta,\Delta) 0
Within‑period ICC under control (ρ0\rho_{0}) for CTD (0.01,0.05)(0.01,0.05)
Within‑period ICC under intervention (ρ1\rho_{1}) for CTD 0.10.1
CAC under control for NE, DTD, CTD (0.5,0.8)(0.5,0.8)
Standard deviation of residual term (τε2\tau_{\varepsilon}^{2}) 11

EXCH: simple exchangeable; NE: nested exchangeable; DTD: discrete-time exponential decay; RI: random intervention; ICC: intracluster correlation coefficients; CAC: cluster autocorrelation coefficient.

For practical relevance, the simulation parameters are motivated by published literature and summarized in Table 3. Importantly, Nevins et al.Nevins2023; Nevins2024 reported that the median number of clusters randomized was 11 (Q1-Q3: 8-18), the median number of sequences was 5 (Q1-Q3: 4-7), and the median sample size was 2,724 (Q1-Q3: 643-14,734), where Q1 and Q3 denote the first and third quartiles, respectively. Furthermore, Korevaar et al.Korevaar2021 investigated 44 continuous outcomes from the CLustered Outcome Dataset (CLOUD) bank, reporting ICCs typically ranging between 0.01 and 0.1, with a median CAC of 0.73 (Q1-Q3: 0.19-0.91).

Based on these findings, we consider a standard SW-CRT design with I(16,32)I\in(16,32) clusters, J(5,9)J\in(5,9) periods, Q(4,8)Q\in(4,8) sequences, and Kij=50K_{ij}=50 for all i{1,,I}i\in\{1,\ldots,I\} and j{1,,J}j\in\{1,\ldots,J\} (i.e., equal numbers of individuals per cluster-period). For ICC-related parameters, the CTD structure assumes ρ0(0.01,0.05)\rho_{0}\in(0.01,0.05), ρ1=0.1\rho_{1}=0.1, and τϵ2=1\tau_{\epsilon}^{2}=1 with the CAC equal to 0.5 or 0.8 under the control condition. The corresponding CAC under the intervention condition and τv2\tau_{v}^{2} can then be calculated based on formulas provided in Table 1. Additionally, for evaluating the validity of the Wald test in terms of type I error rate, we assume an intervention effect size of zero (i.e., δ=Δ=0\delta=\Delta=0). It is important to note that when we set Δ=0\Delta=0, the average of all point intervention effects equals zero. However, it does not necessarily imply that each point intervention effect δ(s)\delta(s) is equal to zero. For each simulation study, we simulated 2,000 datasets, ensuring the Monte Carlo standard error within ±0.5%\pm 0.5\% for a coverage probability of 95%.Morris2019

3.4 Data Analysis and Performance Measures

Table 4: Summary of performance measures in the presence of a constant intervention effect. Here, we let δ¯=n1p=1nδ^p\bar{\delta}=n^{-1}\sum_{p=1}^{n}\hat{\delta}_{p} be the empirical mean of δ^p\hat{\delta}_{p} across nn simulated datasets. When the intervention depends on exposure times, we replace (δ,δ^p,δ¯)(\delta,\hat{\delta}_{p},\bar{\delta}) with (Δ,Δ^p,Δ¯)(\Delta,\hat{\Delta}_{p},\bar{\Delta}).
Measure Names Estimands Estimators
Bias 𝔼(δ^)δ\mathbb{E}(\widehat{\delta})-\delta 1np=1n(δ^pδ)\frac{1}{n}\sum_{p=1}^{n}(\hat{\delta}_{p}-\delta)
Coverage Pr(δ^low<δ^<δ^high)\Pr(\hat{\delta}_{\text{low}}<\hat{\delta}<\hat{\delta}_{\text{high}}) 1np=1n𝟙(δ^low<δ^p<δ^high)\frac{1}{n}\sum_{p=1}^{n}\mathbbm{1}(\hat{\delta}_{\text{low}}<\hat{\delta}_{p}<\hat{\delta}_{\text{high}})
Average Standard Error 𝔼{𝕍(δ^)}\mathbb{E}\left\{\sqrt{\mathbb{V}(\hat{\delta})}\right\} 1np=1n𝕍(δ^p)\frac{1}{n}\sum_{p=1}^{n}\sqrt{\mathbb{V}(\hat{\delta}_{p})}
Empirical Standard Deviation 𝕍(δ^)\sqrt{\mathbb{V}(\hat{\delta})} 1np=1n(δ^pδ¯)2\sqrt{\frac{1}{n}\sum_{p=1}^{n}(\hat{\delta}_{p}-\bar{\delta})^{2}}

: 𝕍()\mathbb{V}(\cdot) denotes either 𝕍Naive()\mathbb{V}_{\text{Naive}}(\cdot) for model-based variance or 𝕍RVE()\mathbb{V}_{\text{RVE}}(\cdot) for model-robust variance.

All simulated datasets are analyzed at the individual level using discrete-time linear mixed models with R (version 4.4.3). Specifically, our working models always assume a discrete period effect and correctly specify whether the intervention effect is constant or depends on exposure times. Although scenarios 1-6 and 13-18 introduce an random intervention effect in the true model, our working models always assume a fixed intervention effect. For estimating 𝜻^\hat{\bm{\zeta}} and 𝕍Naive(𝜻^)\mathbb{V}_{\text{Naive}}(\hat{\bm{\zeta}}), when the working model assumes an EXCH or NE, the lmerTest (version 3.1-3) package is used; when the working model assumes a DTD, the glmmTMB (version 1.1.11) package is used. For estimating 𝕍RVE(𝜻^)\mathbb{V}_{\text{RVE}}(\hat{\bm{\zeta}}), we use the clubSandwich (version 0.6.0) package with “CR0” for the RVE without any correction, and “CR3” for the RVE with the MD correction across all correlation structures (including EXCH, NE, and DTD). The only exception is the RVE under DTD with the MD correction because this option is currently not available in the clubSandwich package. Instead, we use the R package developed in Wang et al.Wang2024 All models are fitted using their default nonlinear optimizers. For each setting in Table 3, we simulated n=2,000n=2,000 datasets for each scenario in Table 2, which ensures the Monte Carlo standard error below 0.5% with a 95% coverage.Morris2019 We also provide a list of performance measures in Table 4. Of note, we evaluate 95% coverage using a two-sided t-test with I2I-2 DoF for the small-sample bias correction. This convenient choice of DoF was first proposed by Ford and WestgateFord2020 for GEE estimators with the MD correction and has consistently maintained valid inference (but sometimes conservative due to over-correction) in many empirical studies for analyzing SW-CRTs.Li2019; Li2021; DavisPlourde2021; Ouyang2024

3.5 Simulation Results

3.5.1 Simulation Study I (Scenarios 1-6): Continuous Period Effect.

In Simulation Study I, we first investigate whether a continuous period effect in the true data-generating process further affects the performance of discrete-time linear mixed models, relative to a discrete period effect. Among the 24 scenarios in Table 2, we first restrict attention to scenarios with a constant intervention effect (scenarios 1-12) as the base case. Within this subset, we focus on scenarios 1-6 that include an random intervention because these are more realistic for capturing treatment effect heterogeneity across clusters; comparisons between data-generating processes with and without a random intervention are provided in Section 3.5.4. We consider a standard SW-CRT with I=32I=32, J=5J=5, and Kij=50K_{ij}=50. Data are generated under a CTD correlation structure with a cluster-period mixed recruitment pattern, fixing ρ0=0.01\rho_{0}=0.01, ρ1=0.1\rho_{1}=0.1, σϵ2=1\sigma_{\epsilon}^{2}=1, and CAC =0.5=0.5. We set δ=0\delta=0 to assess type I error rates. For data generation, we compare two period effect specifications: scenarios 1-3 use a discrete period effect with βj=0.5j2/J\beta_{j}=0.5j^{2}/J, and scenarios 4-6 use a continuous period effect with T(tijk)=0.5tijk2+sin(6πtijk)/JT(t_{ijk})={0.5t_{ijk}^{2}+\sin(6\pi t_{ijk})}/J. Both specifications include nonlinear trends through quadratic terms, but the continuous period effect additionally includes sinusoidal variation to simulate seasonal fluctuations commonly observed in healthcare settings. For analysis, we apply discrete-time linear mixed models with EXCH, NE, and DTD working correlation structures to both sets of generated data.

Table 5 indicates that the specification of the period effect in the data-generating process has minimal impact on the performance of discrete-time linear mixed models. First, the estimator δ^\hat{\delta} is consistent for the true intervention effect δ\delta, with negligible bias across all scenarios. This aligns with the theoretical results from Wang et al.,Wang2024 demonstrating that the intervention effect estimator is consistent as long as the intervention structure is correctly specified, even when the period effect, heterogeneity, or residual error terms are misspecified in SW-CRTs (even though the theory there developed is based on discrete sampling rather than continuous-time recruitment). Given the negligible bias, subsequent comparisons focus on variance estimation. Second, the model-based variance estimator under the EXCH structure gives the lowest 95% CI coverage (e.g., Naive(δ^)74%\mathbb{C}_{\text{Naive}}(\hat{\delta})\approx 74\%). This is expected because EXCH is an overly restrictive correlation structure relative to the CTD data-generating process. More flexible working correlation structures such as NE or DTD improve coverage to approximately 90%, though remaining below the nominal level. Third, the RVE without correction increases coverage to approximately 94% across all correlation structures, but maintains inflated type I error rates. Fourth, the RVE with MD correction achieves nominal coverage for all three working correlation structures. These findings establish that the RVE with MD correction is necessary for valid inference when analyzing continuous-time recruitment data with discrete-time models. Given that continuous and discrete period effects yield comparable results for discrete-time linear mixed models and their variance estimators, subsequent simulations focus on the more realistic scenarios in the presence of a continuous period effect, which better reflects continuous recruitment patterns commonly observed in real-world SW-CRTs.

3.5.2 Simulation Study II (Scenarios 4-6): Recruitment Patterns.

In Simulation Study II, we examine the impact of different recruitment patterns on the performance of discrete-time linear mixed models. The setting remains identical to Simulation Study I, except that data are generated under three recruitment patterns: uniform, cluster mixed, and cluster-period mixed. Results in Table 5 indicate that the recruitment pattern has minimal influence on the performance of linear mixed model estimators or their variance estimators. The empirical standard deviation of δ^\hat{\delta} increases slightly under the cluster-period mixed pattern relative to the uniform and cluster mixed patterns, reflecting the additional variation introduced by this recruitment specification. This difference does not affect the overall conclusions regarding bias or coverage. Given the minimal impact of recruitment patterns on estimator performance, subsequent simulations (Studies III-VI) use the cluster-period mixed pattern for data generation, as this specification most closely reflects the complex recruitment dynamics observed in real-world SW-CRTs.

Table 5: Results over 2,0002,000 simulated datasets generated for scenarios 1-6 under the setting in Simulation Study I and II. We consider a standard SW-CRT with I=32I=32, J=5J=5, Kij=50K_{ij}=50, δ=0\delta=0, ρ0=0.01\rho_{0}=0.01, ρ1=0.1\rho_{1}=0.1, σϵ2=1\sigma_{\epsilon}^{2}=1, and CAC =0.5=0.5. For a discrete period effect, we set βj=0.5j2/J\beta_{j}=0.5j^{2}/J; for a continuous period effect, we set T(tijk)=0.5tijk2+sin(6πtijk)/JT(t_{ijk})={0.5t_{ijk}^{2}+\sin(6\pi t_{ijk})}/J. Here, sd()\mathrm{sd}(\cdot) is the empirical (Monte Carlo) standard deviation. 𝕍Naive()\mathbb{V}_{\text{Naive}}(\cdot), 𝕍RVE()\mathbb{V}_{\text{RVE}}(\cdot), 𝕍RVEMD()\mathbb{V}_{\text{RVE}}^{\text{MD}}(\cdot) are the average model-based standard error, average model-robust standard error, and average model-robust standard error with the MD correction, respectively. Naive()\mathbb{C}_{\text{Naive}}(\cdot), RVE()\mathbb{C}_{\text{RVE}}(\cdot), RVEMD()\mathbb{C}_{\text{RVE}}^{\text{MD}}(\cdot) are the empirical coverage percentage of 95% confidence intervals based on 𝕍Naive()\mathbb{V}_{\text{Naive}}(\cdot), 𝕍RVE()\mathbb{V}_{\text{RVE}}(\cdot), and 𝕍RVEMD()\mathbb{V}_{\text{RVE}}^{\text{MD}}(\cdot), respectively.
Period Effect Pattern Model Bias sd(δ^)(\hat{\delta}) 𝕍Naive(δ^)\mathbb{V}_{\text{Naive}}(\hat{\delta}) Naive(δ^)\mathbb{C}_{\text{Naive}}(\hat{\delta}) 𝕍RVE(δ^)\mathbb{V}_{\text{RVE}}(\hat{\delta}) RVE(δ^)\mathbb{C}_{\text{RVE}}(\hat{\delta}) 𝕍RVEMD(δ^)\mathbb{V}_{\text{RVE}}^{\text{MD}}(\hat{\delta}) RVEMD(δ^)\mathbb{C}_{\text{RVE}}^{\text{MD}}(\hat{\delta})
Discrete CP EXCH 0.0005 0.0727 0.0398 73.60 0.0681 93.20 0.0727 95.00
NE 0.0010 0.0731 0.0593 89.55 0.0681 93.15 0.0727 95.20
DTD 0.0011 0.0727 0.0568 88.55 0.0676 93.95 0.0722 95.05
Continuous CP EXCH 0.0007 0.0741 0.0405 73.90 0.0699 93.70 0.0745 95.45
NE 0.0015 0.0742 0.0614 89.75 0.0696 93.60 0.0743 95.25
DTD 0.0015 0.0741 0.0591 89.15 0.0694 93.80 0.0740 95.30
C EXCH 0.0011 0.0726 0.0405 73.80 0.0689 94.65 0.0735 95.85
NE 0.0006 0.0731 0.0609 91.15 0.0688 94.15 0.0734 95.55
DTD 0.0009 0.0721 0.0582 90.45 0.0683 94.10 0.0729 95.70
U EXCH 0.0005 0.0719 0.0405 74.35 0.0683 94.10 0.0728 95.50
NE 0.0003 0.0723 0.0595 90.35 0.0681 93.40 0.0726 95.15
DTD -0.0001 0.0721 0.0572 88.85 0.0677 93.90 0.0722 95.00

EXCH: simple exchangeable; NE: nested exchangeable; DTD: discrete-time exponential decay; U: uniform pattern; C: cluster mixed pattern; CP: cluster-period mixed pattern; RVE: robust variance estimator; MD: Mancl and DeRouen.

3.5.3 Simulation Study III (Scenarios 4-6): ICC-Related Parameters.

In Simulation Study III, we investigate how varying the CAC and within-period ICC affects the performance of discrete-time linear mixed models. The setting remains identical to Simulation Study I, except that data are generated with CAC {0.5,0.8}\in\{0.5,0.8\} and ρ0{0.01,0.05}\rho_{0}\in\{0.01,0.05\} to examine a broader range of correlation structures. Table 6 reveals that the CAC has minimal impact on the performance of discrete-time linear mixed model estimators or their variance estimators across the range examined. In contrast, the within-period ICC substantially influences variance estimation. Increasing ρ0\rho_{0} from 0.01 to 0.05 improves the coverage of model-based variance estimators from approximately 89% to 93% for a fixed CAC. The RVE maintains consistent performance across all parameter combinations, with coverage of approximately 94% without correction and nominal coverage of 95% with MD correction.

Table 6: Results over 2,0002,000 simulated datasets generated for scenarios 4-6 under the setting in Simulation Study III. We consider a standard SW-CRT with I=32I=32, J=5J=5, Kij=50K_{ij}=50, δ=0\delta=0, ρ0{0.01,0.05}\rho_{0}\in\{0.01,0.05\}, ρ1=0.1\rho_{1}=0.1, σϵ2=1\sigma_{\epsilon}^{2}=1, CAC {0.5,0.8}\in\{0.5,0.8\}, a continuous period effect T(tijk)=0.5tijk2+sin(6πtijk)/JT(t_{ijk})={0.5t_{ijk}^{2}+\sin(6\pi t_{ijk})}/J, and a cluster-period mixed pattern. Here, sd()\mathrm{sd}(\cdot) is the empirical (Monte Carlo) standard deviation. 𝕍Naive()\mathbb{V}_{\text{Naive}}(\cdot), 𝕍RVE()\mathbb{V}_{\text{RVE}}(\cdot), 𝕍RVEMD()\mathbb{V}_{\text{RVE}}^{\text{MD}}(\cdot) are the average model-based standard error, average model-robust standard error, and average model-robust standard error with the MD correction, respectively. Naive()\mathbb{C}_{\text{Naive}}(\cdot), RVE()\mathbb{C}_{\text{RVE}}(\cdot), RVEMD()\mathbb{C}_{\text{RVE}}^{\text{MD}}(\cdot) are the empirical coverage percentage of 95% confidence intervals based on 𝕍Naive()\mathbb{V}_{\text{Naive}}(\cdot), 𝕍RVE()\mathbb{V}_{\text{RVE}}(\cdot), and 𝕍RVEMD()\mathbb{V}_{\text{RVE}}^{\text{MD}}(\cdot), respectively.
CAC ρ0\rho_{0} Model Bias sd(δ^)(\hat{\delta}) 𝕍Naive(δ^)\mathbb{V}_{\text{Naive}}(\hat{\delta}) Naive(δ^)\mathbb{C}_{\text{Naive}}(\hat{\delta}) 𝕍RVE(δ^)\mathbb{V}_{\text{RVE}}(\hat{\delta}) RVE(δ^)\mathbb{C}_{\text{RVE}}(\hat{\delta}) 𝕍RVEMD(δ^)\mathbb{V}_{\text{RVE}}^{\text{MD}}(\hat{\delta}) RVEMD(δ^)\mathbb{C}_{\text{RVE}}^{\text{MD}}(\hat{\delta})
0.5 0.01 EXCH 0.0007 0.0741 0.0405 73.90 0.0699 93.70 0.0745 95.45
NE 0.0015 0.0742 0.0614 89.75 0.0696 93.60 0.0743 95.25
DTD 0.0015 0.0741 0.0591 89.15 0.0694 93.80 0.0740 95.30
0.05 EXCH 0.0002 0.0763 0.0410 72.40 0.0718 94.10 0.0766 95.30
NE 0.0007 0.0758 0.0672 92.90 0.0712 93.45 0.0760 95.25
DTD 0.0006 0.0744 0.0649 92.70 0.0692 93.55 0.0739 94.95
0.8 0.01 EXCH 0.0003 0.0740 0.0405 73.90 0.0684 93.00 0.0730 94.65
NE 0.0008 0.0741 0.0604 89.05 0.0683 93.00 0.0728 94.65
DTD 0.0006 0.0741 0.0578 87.95 0.0681 92.95 0.0727 94.65
0.05 EXCH 0.0007 0.0698 0.0410 76.55 0.0656 93.65 0.0700 95.40
NE 0.0010 0.0701 0.0627 93.15 0.0657 93.80 0.0701 95.20
DTD 0.0009 0.0686 0.0589 92.00 0.0641 93.80 0.0684 95.40

EXCH: simple exchangeable; NE: nested exchangeable; DTD: discrete-time exponential decay; ICC: intracluster correlation coefficients; CAC: cluster autocorrelation coefficient; ρ0\rho_{0}: within‑period ICC under control; ρ1\rho_{1}: within‑period ICC under treatment; RVE: robust variance estimator; MD: Mancl and DeRouen.

3.5.4 Simulation Study IV (Scenarios 10-12): Random Intervention Effect.

In Simulation Study IV, we examine whether the presence of a random intervention effect influences the performance of discrete-time linear mixed models. The setting remains identical to Simulation Study III, except that data are generated with a fixed constant intervention effect rather than a random intervention effect. Web Table 1 demonstrates that the absence of a random intervention effect leads to different performance across working correlation structures. For model-based variance estimators, the NE and DTD structures yield conservative inference with coverage exceeding 95%, indicating deflated type I error rates. In contrast, the type I error rate is inflated for the EXCH structure with the 95% CI coverage ranging from 81.75% to 93.55%. The RVE demonstrates greater robustness, achieving coverage of approximately 94% without correction and nominal coverage of 95% with MD correction. These findings suggest that the presence or absence of a random intervention effect primarily affects model-based variance estimation, particularly under restrictive working correlation structures (e.g., EXCH), while robust variance estimation maintains validity across both specifications.

3.5.5 Simulation Study V (Scenarios 4-6): Trial Design.

In Simulation Study V, we examine whether our findings generalize to different numbers of clusters and periods. We first consider a smaller trial with I=16I=16 clusters while fixing all other parameters from Simulation Study III. Results in Web Table 2 confirm that all conclusions remain consistent, though the empirical standard deviation increases due to the reduced sample size. We next consider a larger trial with J=9J=9 periods while maintaining I=32I=32 clusters. In Web Table 3, model-based variance estimators achieve lower coverage with more periods compared to the J=5J=5 setting. It is particularly clear that, for the model-based variance estimator, using the DTD working structure leads to the highest 95% CI coverage, followed by NE, with EXCH being the most restrictive one that has the lowest coverage. This pattern emphasizes the growing importance of correctly specifying the correlation structure as the number of periods increases. Consistent with results in Table 6, increasing ρ0\rho_{0} from 0.01 to 0.05 leads to higher 95% CI coverage for both NE and DTD structures. Across Table 6, Web Tables 2 and 3, we observe that the combination of CAC =0.8=0.8 and ρ0=0.05\rho_{0}=0.05 consistently results in the smallest empirical standard deviation of δ^\hat{\delta} compared to other combinations of the CAC and ρ0\rho_{0}. Lastly, the RVE maintains robust performance regardless of cluster or period number, achieving coverage of approximately 94% without correction and nominal coverage with MD correction.

3.5.6 Simulation Study VI (Scenarios 16-18): Exposure-Time-Dependent Intervention Effect.

In Simulation Study VI, we investigate the performance of discrete-time linear mixed models when the intervention effect varies as a function of exposure times. Specifically, we use the same setting as Simulation Study I but set δ(1)1.3768\delta(1)\approx-1.3768, δ(2)0.3831\delta(2)\approx 0.3831, δ(3)0.9785\delta(3)\approx 0.9785, δ(4)0.0152\delta(4)\approx 0.0152, and Δ=0\Delta=0. In Web Table 4, consistent with previous findings, the model-based variance estimator under the EXCH working correlation structure consistently gives the lowest 95% CI coverage across all point and time-averaged intervention effect estimates, ranging from 69.70%69.70\% to 76.75%76.75\%. In contrast, the NE and DTD working correlation structures show improved model-based variance estimator performance compared to EXCH, with coverages ranging from 84.95%84.95\% to 92.40%92.40\%, though still below the nominal level. Furthermore, both RVEs and MD-corrected RVEs maintain the 95% CI coverage close to the nominal level, with RVE()94%\mathbb{C}_{\text{RVE}}(\cdot)\approx 94\% and RVEMD()95%\mathbb{C}_{\text{RVE}}^{\text{MD}}(\cdot)\approx 95\% across all estimates for all three discrete correlation structures.

3.5.7 Simulation Study VII (Scenarios 4-6 and 16-18): Intervention-Dependent Recruitment.

In Simulation Study VII, we investigate the impact of intervention-dependent recruitment on the performance of discrete-time linear mixed models. This scenario reflects settings where the implementation of an intervention systematically alters recruitment practices, potentially leading to increased cluster-period sizes and fundamentally different recruitment patterns after receiving the intervention. This phenomenon can take place in trials when the intervention is perceived as beneficial by patients and providers. Specifically, we consider a standard SW-CRT with I=96I=96 and fix J=5J=5, ρ0=0.01\rho_{0}=0.01, ρ1=0.01\rho_{1}=0.01, CAC =0.5=0.5 as a preliminary exploration. We systematically explore nine scenarios with varying recruitment sizes and patterns. For recruitment sizes, we consider 1) balanced recruitment with Kij=50K_{ij}=50 under both control and treatment, 2) moderately unbalanced recruitment with Kij=25K_{ij}=25 under control and Kij=75K_{ij}=75 under treatment, and 3) severely unbalanced recruitment with Kij=10K_{ij}=10 under control and Kij=90K_{ij}=90 under treatment. For each recruitment size, we consider uniform patterns under both conditions, uniform under control but cluster-period mixed under treatment, and cluster-period mixed under both conditions.

Web Table 5 leads to two major findings. First, when recruitment patterns are consistent across periods (i.e., uniform patterns or cluster-period mixed under both conditions), the intervention effect estimator is consistent, and RVEs achieve the nominal coverage. However, when the recruitment pattern is uniform in control periods and then becomes cluster-period mixed in treatment periods, the intervention effect estimator carries bias with δ^0.03\hat{\delta}\approx-0.03. Of note, when recruitment sizes are balanced, the bias arising from shifting recruitment patterns (uniform in control periods to cluster-period mixed in intervention periods) is approximately 20-fold larger than the bias under uniform recruitment across all periods for the EXCH model in Web Table 5. Importantly, this bias is independent of the intervention effect size. This can be seen in Web Table 5, where comparing the last three rows (δ=2\delta=2) with the preceding three rows (δ=0\delta=0) reveals identical bias estimates despite the different effect sizes. Moreover, regardless of the degree of imbalance in recruitment sizes, even the RVEs (both with and without the MD correction) fail to achieve nominal coverage. Of note, this under-coverage is mainly caused by the biased point estimate of the intervention effect instead of the underestimated variance. This is because 𝕍RVEMD(δ^)\mathbb{V}_{\text{RVE}}^{\text{MD}}(\hat{\delta}) is close to sd(δ^)\mathrm{sd}(\hat{\delta}) across all scenarios.

Additionally, we explore the scenario where the recruitment size depends on exposure time in a standard SW-CRT with I{16,32,96}I\in\{16,32,96\} and fix J=5J=5, ρ0=0.01\rho_{0}=0.01, ρ1=0.01\rho_{1}=0.01, CAC =0.5=0.5. For recruitment sizes, we consider 1) Kij=25K_{ij}=25 under control and Kij=50,75,100,125K_{ij}=50,75,100,125 when exposure time is 1, 2, 3, 4, respectively (denoted as S1), and 2) Kij=10K_{ij}=10 under control and Kij=100,110,130,160K_{ij}=100,110,130,160 when exposure time is 1, 2, 3, 4, respectively (denoted as S2). For each recruitment size, we consider uniform patterns under both conditions, and uniform under control but cluster-period mixed under treatment. In Web Table 6. as expected, as long as the recruitment pattern is consistent before and after receiving the intervention, the intervention effect estimator is consistent, and RVEs achieve the nominal coverage. However, when the recruitment pattern is uniform in control periods and then becomes cluster-period mixed in treatment periods, the intervention effect estimator carries bias with δ^0.03\hat{\delta}\approx-0.03 regardless of the number of clusters or recruitment sizes. Similar findings hold when the intervention effect depends on the exposure time; more simulation results can be found in Web Tables 7-9 under the same setting.

(a) Trial 1: A Disinvestment-Specific SW-CRT
Refer to caption
(b) Trial 2: A Reinvestment-Specific SW-CRT
Refer to caption
Figure 3: Empirical recruitment patterns of the Australia (disinvestment) trial 1 and (reinvestment) trial 2 in (a) and (b), respectively. Gray lines represent control periods, and black lines represent intervention periods. Only six of the twelve clusters contain information on exact recruitment time. The x-axis shows tijkt_{ijk} as the fraction of time elapsed within each period.

4 An Illustrative Example: Reanalysis of the Australian Disinvestment Trial

We provide a secondary analysis of the Australian Disinvestment and Reinvestment SW-CRTs, which were conducted from February 2014 to April 2015 across 12 acute medical or surgical wards in two hospitals.Haines2017 This study aimed to evaluate whether weekend allied health services (including physical therapy, occupational therapy, and social work) deliver their intended benefits and whether resources allocated to these services could be better utilized elsewhere in the healthcare system. The primary outcome is length of stay (in days). This continuous outcome is crucial for evaluating patient flow that may be influenced by removing existing or introducing a newly developed weekend allied health services. Two SW-CRTs were conducted: Trial 1 (disinvestment) incrementally removed the existing weekend allied health service from participating wards, and Trial 2 (reinvestment) incrementally introduced a newly developed weekend allied health service to the same wards. This design enabled assessment of both the impact of removing the current service and the effectiveness of implementing the new service. The study included 14,834 patients in Trial 1 and 12,674 patients in Trial 2. Following the original trial’s decision, we excluded patients exposed to both the no-service condition and either service condition to prevent contamination.Haines2017 In both trials, the 12 wards were treated as clusters, each assigned to one of eight distinct sequences over nine monthly periods. Figure 3 provides the empirical recruitment patterns, which vary substantially across clusters and periods, exemplifying the cluster–period mixed pattern in Section 2.3.2. For simplicity, we only present analysis results for Trial 1.

To assess whether there exists an intervention-dependent recruitment, we conduct a preliminary test for associations between treatment status and cluster-period size. We first calculate the number of individuals enrolled in each cluster-period. Using linear regression, we fit two models: the first regress cluster-period size on period indicators (accounting for secular trends) and treatment status, while the second regress cluster-period size on period indicators and exposure time indicators. Neither model reveal significant associations with recruitment size. The coefficient for treatment status yield a p-value greater than 0.30, while the coefficients for exposure time indicators all produce p-values greater than 0.18. These findings suggest no strong evidence of intervention-dependent recruitment in Trial 1, indicating that the potential biases identified in Section 3.5.7 are unlikely to affect the validity of our discrete-time analyses.

To test for exposure-time-induced heterogeneity in intervention effect in Trial 1, we follow the likelihood ratio test described in Maleyeff et al.Maleyeff2022 Specifically, we test the null hypothesis H0:δ(1)=δ(2)==δ(J1)H_{0}:\delta(1)=\delta(2)=\cdots=\delta(J-1), which gives a p-value of 0.00120.0012, 0.07340.0734, 0.18470.1847 under the EXCH, NE, and DTD working correlation structure, respectively. The variation in p-values across correlation structures reflects their different assumptions about within-cluster dependencies; EXCH, being the most restrictive structure, provides strong evidence of exposure-time-heterogeneity (p-value =0.0012=0.0012), and the more flexible DTD structure yields weaker evidence (p-value =0.1847=0.1847). This finding aligns with the report by Haines et al.Haines2017 that staff likely required an accommodation period to adapt to service withdrawal. Therefore, we present results from both constant and exposure-time-dependent intervention effect models in our subsequent analysis.

Table 7: Estimated constant intervention effect, standard error, and 95% confidence interval using the linear mixed model under the simple exchangeable (EXCH), nest exchangeable (NE), and discrete-time exponential decay (DTD) working correlation structures for the Australian Disinvestment Trial.
Time Varying Working Model Estimates Variance Estimator Estimated Standard Error 95% CI
[Uncaptioned image] EXCH 1.561.56 Model-based 0.240.24 (1.02,2.10)(1.02,2.10)
RVE 0.390.39 (0.69,2.43)(0.69,2.43)
RVE + MD 0.510.51 (0.43,2.69)(0.43,2.69)
NE 1.741.74 Model-based 0.400.40 (0.84,2.64)(0.84,2.64)
RVE 0.390.39 (0.87,2.61)(0.87,2.61)
RVE + MD 0.480.48 (0.68,2.81)(0.68,2.81)
DTD 1.861.86 Model-based 0.410.41 (0.95,2.77)(0.95,2.77)
RVE 0.360.36 (1.05,2.67)(1.05,2.67)
RVE + MD 0.430.43 (0.89,2.83)(0.89,2.83)
[Uncaptioned image] EXCH 2.432.43 Model-based 0.370.37 (1.60,3.26)(1.60,3.26)
RVE 0.400.40 (1.53,3.33)(1.53,3.33)
RVE + MD 0.510.51 (1.30,3.56)(1.30,3.56)
NE 2.582.58 Model-based 0.580.58 (1.30,3.86)(1.30,3.86)
RVE 0.430.43 (1.61,3.55)(1.61,3.55)
RVE + MD 0.490.49 (1.48,3.68)(1.48,3.68)
DTD 2.792.79 Model-based 0.640.64 (1.37,4.20)(1.37,4.20)
RVE 0.570.57 (1.51,4.06)(1.51,4.06)
RVE + MD 0.710.71 (1.19,4.38)(1.19,4.38)

EXCH: simple exchangeable; NE: nested exchangeable; DTD: discrete-time exponential decay; RVE: robust variance estimator; MD: Mancl and DeRouen; CI: confidence interval.

Estimation results are shown in Table 7. For the constant intervention effect model, the estimated intervention effects are 1.561.56, 1.741.74, and 1.861.86 days under the EXCH, NE, and DTD working correlation structure, respectively. All 95% CIs exclude zero and these positive estimates indicate that removing existing weekend allied health services increased length of stay, consistent with the main finding in Haines et al.Haines2017 The choice of variance estimator substantially affects the estimated standard errors. Model-based standard errors are consistently smaller than those from the RVE with the MD correction; the RVE with MD correction provides the most conservative estimates. Moreover, as the correlation structure becomes more general from EXCH to DTD, the model-based standard errors approach those from the RVE. This suggests some degree of random-effects structure misspecification, particularly indicating that model-based estimates may underestimate the standard errors, especially under EXCH. Interestingly, for the NE and DTD structures, the RVE without correction produces smaller standard errors than the model-based estimator. This can occur in the absence of random intervention effects, which aligns with findings from Simulation Study IV. Additionally, the relatively small number of clusters (I=12I=12) also requires the MD correction to properly account for finite-sample bias, consistent with our findings in Simulation Study V. These results collectively suggest that the RVE with MD correction can be a robust choice for maintaining valid inference in practice.

When accounting for exposure-time-dependent intervention effects, all estimated time-average intervention effects increase substantially, ranging from 2.43 (EXCH) to 2.79 (DTD) days, with all 95% CIs excluding zero. This finding aligns with insights in Kenny et al.,Kenny2022 which demonstrate that when the true model has an exposure-time-dependent intervention effect but the working model assumes a constant intervention effect, the magnitude of the intervention effect may be underestimated. Other findings regarding estimated standard errors using different variance estimators remain consistent with those obtained under the assumption of a constant intervention effect. The only exception is that under the NE working correlation structure, both RVEs with or without the MD correction lead to estimated standard errors that are smaller than the model-based standard error.

5 Discussion

Our current study has two sets of primary findings. First, our results provide empirical evidence addressing the methodological challenges identified by Hooper and Copas,Hooper2019 who noted that SW-CRTs with continuous-time recruitment requires special consideration at both design and analysis stages, but the stepped wedge literature had not differentiated closely between continuous recruitment and discrete sampling. Although the CONSORT 2010 extension for SW-CRTs required transparent reporting of recruitment mechanisms,Hemming2018a the analytical implications of continuous recruitment have remained unclear. Our simulations demonstrate that discrete-time linear mixed models yield consistent intervention effect estimates under continuous recruitment, given that the intervention effect structure is correctly specified. This robustness of discrete-time linear mixed models holds across all 24 scenarios in Table 2, including different true models that consider continuous or discrete period effects, fixed or random intervention effects, constant or exposure-time-dependent intervention effects, and the presence of a CTD correlation structure. Second, although the point estimate is consistent, achieving valid statistical inference depends critically on the choice of variance estimator. Under continuous recruitment, model-based variance estimators systematically underestimate empirical variances in most scenarios, with this underestimation being most severe under EXCH, which is the most restrictive of the three correlation structures considered in this article. One exception occurs in scenarios without an random intervention effect, where model-based variance estimators under NE and DTD structures may overestimate empirical variances. Importantly, only the RVE with the MD correction consistently achieves nominal coverage across all scenarios in Table 2. This finding is especially important given that 95.1% of cross-sectional SW-CRTs implement continuous recruitment, but common practice mainly relies on discrete-time linear mixed models.

In practice, researchers conducting SW-CRTs with continuous recruitment can use discrete-time linear mixed models without substantial concerns about bias, eliminating the need for complex continuous-time modeling approaches that may be computationally intensive or require specialized software. However, the choice of variance estimator requires careful consideration. Our results consistently demonstrate that the RVE with the MD correction can ensure the validity of inference. This recommendation is also supported by our reanalysis of the Australian Disinvestment trial, where model-based standard errors under EXCH were substantially smaller than those from the RVE with the MD correction, suggesting potential misspecification of the random effect model. Furthermore, the choice of working correlation structure affects the 95% CI coverage and type I error rate primarily for model-based inference but has minimal impact when using the RVE with or without the MD correction. Among the three discrete-time correlation structures commonly used in practice, DTD generally performs best, followed by NE, with EXCH being the most restrictive and yielding the lowest coverage for model-based estimators. Moreover, continuous period effects in the true model slightly increase the empirical standard deviation of intervention effect estimates compared to discrete period effects, but this increase is modest and does not compromise the validity of discrete-time analyses.

An important exception to these generally reassuring findings occurs when recruitment patterns change systematically between control and intervention periods. In Simulation Study VII, we demonstrated that when clusters recruit uniformly during control periods but switch to a cluster-period mixed pattern during intervention periods, the intervention effect estimator becomes biased. This bias exists even with large sample sizes and cannot be corrected through variance estimation alone. We recognize that certain interventions may inherently alter recruitment dynamics. For instance, in trials of social interventions such as cash transfers or housing assistance, participant enthusiasm and recruitment patterns may naturally evolve as the intervention becomes known in the community. Similarly, interventions perceived as highly beneficial may generate increased referrals or patient interest over time, leading to unavoidable recruitment pattern changes because of the intervention’s inherent characteristics.

Given these findings, we offer several considerations for both trial design and analysis. At the design stage, investigators may not need to impose rigid constraints on recruitment timing to ensure valid analysis, as our results demonstrate that any consistent recruitment pattern maintains inference validity. However, investigators should establish protocols for documenting recruitment patterns throughout the trial. When individual-level recruitment times can be recorded without compromising privacy, these are recommended to be collected systematically. In settings where exact enrollment dates may identify participants (particularly in smaller clusters or rare disease populations) cluster-period-specific enrollment densities provide an acceptable alternative that preserves analytical utility while protecting confidentiality. At the analysis stage, investigators can first examine whether recruitment patterns are different between control and intervention periods using the documented data; we provide a simple regression-based approach that tests for associations between treatment status and cluster-period enrollment sizes in Section 4. When patterns remain consistent across conditions, standard discrete-time analyses with model-robust variance estimation are recommended. However, when substantial changes are detected, the intervention effect estimates may be biased, requiring careful interpretation that acknowledges this limitation. In such cases, alternative analytical approaches that explicitly model time-varying recruitment may be required, though development of these methods remains an area for future research.

Although we attempt to investigate the impact of continuous recruitment on the performance of discrete-time linear mixed models under a wide array of settings in Table 2, the current study is not free of limitations. First, while the RVE with MD correction achieves nominal 95% CI coverage and type I error rate under continuous recruitment, the estimates of random effects variance components from misspecified models is not valid. When discrete-time models are applied to SW-CRTs with continuous recruitment, the variance component estimates under the EXCH, NE and DTD may not accurately reflect the true correlation structures (e.g., CTD). Although it is technically feasible to fit CTD models using Markov Chain Monte Carlo maximum likelihood approximation,Samuel2024 the high dimensionality of the CTD correlation structure raises substantial computational burden. We therefore focus on discrete-time working models in this article and encourage future software development to improve the accessibility of continuous-time methods. Because ICC estimates calculated from pilot studies or concluded trials are routinely used for sample size calculations in future SW-CRTs, the development of software implementations for variance component estimation under CTD structures, along with guidance for translating these continuous-time parameters into design specifications for discrete-time analyses, would be helpful in practice when recruitment-time information is available. This variance component misspecification issue represents one aspect of a broader challenge in modeling continuous recruitment data. Second, we focus exclusively on continuous outcomes due to fundamental challenges that arise with non-Gaussian outcomes under continuous recruitment. For binary outcomes, the non-collapsibility property of odds ratios means that marginal and conditional effect estimates differ systematically, even without confounding. This divergence becomes particularly problematic under continuous recruitment, where varying enrollment times within cluster-periods introduce additional layers of complexity that cannot be adequately addressed through standard adjustment methods. Moreover, the inherent mean-variance relationship in binary outcomes implies that the correlation structure also depends on outcome prevalence. When recruitment occurs continuously during each period, prevalence may vary continuously within cluster-periods, causing the correlation structure to change over time in ways that discrete-time models cannot accommodate. These complexities suggest that developing methods to address continuous recruitment for binary, count, and time-to-event outcomes represents an important area for future methodological advancement.

Appendix

Web Table 1: Results over 2,0002,000 simulated datasets generated for scenarios 10-12 under the setting in Simulation Study IV. We consider a standard SW-CRT with I=32I=32, J=5J=5, Kij=50K_{ij}=50, δ=0\delta=0, ρ0{0.01,0.05}\rho_{0}\in\{0.01,0.05\}, σϵ2=1\sigma_{\epsilon}^{2}=1, CAC {0.5,0.8}\in\{0.5,0.8\}, a continuous period effect T(tijk)=0.5tijk2+sin(6πtijk)/JT(t_{ijk})={0.5t_{ijk}^{2}+\sin(6\pi t_{ijk})}/J, a cluster-period mixed pattern, and in the absence of a random intervention (RI) effect. Here, sd()\mathrm{sd}(\cdot) is the empirical (Monte Carlo) standard deviation. 𝕍Naive()\mathbb{V}_{\text{Naive}}(\cdot), 𝕍RVE()\mathbb{V}_{\text{RVE}}(\cdot), 𝕍RVEMD()\mathbb{V}_{\text{RVE}}^{\text{MD}}(\cdot) are the average model-based standard error, average model-robust standard error, and average model-robust standard error with the MD correction, respectively. Naive()\mathbb{C}_{\text{Naive}}(\cdot), RVE()\mathbb{C}_{\text{RVE}}(\cdot), RVEMD()\mathbb{C}_{\text{RVE}}^{\text{MD}}(\cdot) are the empirical coverage percentage of 95% confidence intervals based on 𝕍Naive()\mathbb{V}_{\text{Naive}}(\cdot), 𝕍RVE()\mathbb{V}_{\text{RVE}}(\cdot), and 𝕍RVEMD()\mathbb{V}_{\text{RVE}}^{\text{MD}}(\cdot), respectively.
CAC ρ0\rho_{0} Model Bias sd(δ^)(\hat{\delta}) 𝕍Naive(δ^)\mathbb{V}_{\text{Naive}}(\hat{\delta}) Naive(δ^)\mathbb{C}_{\text{Naive}}(\hat{\delta}) 𝕍RVE(δ^)\mathbb{V}_{\text{RVE}}(\hat{\delta}) RVE(δ^)\mathbb{C}_{\text{RVE}}(\hat{\delta}) 𝕍RVEMD(δ^)\mathbb{V}_{\text{RVE}}^{\text{MD}}(\hat{\delta}) RVEMD(δ^)\mathbb{C}_{\text{RVE}}^{\text{MD}}(\hat{\delta})
0.5 0.01 EXCH 0.0002 0.0421 0.0362 91.70 0.0398 93.70 0.0426 95.10
NE 0.0003 0.0417 0.0409 95.55 0.0394 93.85 0.0422 94.70
DTD 0.0004 0.0416 0.0417 95.95 0.0393 93.85 0.0421 94.95
0.05 EXCH 0.0010 0.0611 0.0400 81.75 0.0578 93.10 0.0617 94.55
NE 0.0007 0.0601 0.0580 94.80 0.0568 93.20 0.0608 95.20
DTD 0.0006 0.0580 0.0580 95.80 0.0549 93.55 0.0586 95.35
0.8 0.01 EXCH 0.0002 0.0412 0.0373 93.55 0.0393 93.85 0.0420 95.45
NE 0.0002 0.0414 0.0411 95.95 0.0391 93.90 0.0419 95.45
DTD 0.0001 0.0414 0.0415 95.70 0.0391 93.85 0.0418 95.40
0.05 EXCH 0.0001 0.0535 0.0404 88.05 0.0506 94.15 0.0541 95.60
NE 0.0002 0.0537 0.0536 96.00 0.0507 93.30 0.0542 95.60
DTD 0.0000 0.0521 0.0519 96.30 0.0492 94.45 0.0525 96.30

EXCH: simple exchangeable; NE: nested exchangeable; DTD: discrete-time exponential decay; ICC: intracluster correlation coefficients; CAC: cluster autocorrelation coefficient; ρ0\rho_{0}: within‑period ICC under control; ρ1\rho_{1}: within‑period ICC under treatment; RVE: robust variance estimator; MD: Mancl and DeRouen.

Web Table 2: Results over 2,0002,000 simulated datasets generated for scenarios 4-5 under the setting in Simulation Study V. We consider a standard SW-CRT with I=16I=16, J=5J=5, Kij=50K_{ij}=50, δ=0\delta=0, ρ0{0.01,0.05}\rho_{0}\in\{0.01,0.05\}, ρ1=0.1\rho_{1}=0.1, σϵ2=1\sigma_{\epsilon}^{2}=1, CAC {0.5,0.8}\in\{0.5,0.8\}, a continuous period effect T(tijk)=0.5tijk2+sin(6πtijk)/JT(t_{ijk})={0.5t_{ijk}^{2}+\sin(6\pi t_{ijk})}/J, and a cluster-period mixed pattern. Here, sd()\mathrm{sd}(\cdot) is the empirical (Monte Carlo) standard deviation. 𝕍Naive()\mathbb{V}_{\text{Naive}}(\cdot), 𝕍RVE()\mathbb{V}_{\text{RVE}}(\cdot), 𝕍RVEMD()\mathbb{V}_{\text{RVE}}^{\text{MD}}(\cdot) are the average model-based standard error, average model-robust standard error, and average model-robust standard error with the MD correction, respectively. Naive()\mathbb{C}_{\text{Naive}}(\cdot), RVE()\mathbb{C}_{\text{RVE}}(\cdot), RVEMD()\mathbb{C}_{\text{RVE}}^{\text{MD}}(\cdot) are the empirical coverage percentage of 95% confidence intervals based on 𝕍Naive()\mathbb{V}_{\text{Naive}}(\cdot), 𝕍RVE()\mathbb{V}_{\text{RVE}}(\cdot), and 𝕍RVEMD()\mathbb{V}_{\text{RVE}}^{\text{MD}}(\cdot), respectively.
CAC ρ0\rho_{0} Model Bias sd(δ^)(\hat{\delta}) 𝕍Naive(δ^)\mathbb{V}_{\text{Naive}}(\hat{\delta}) Naive(δ^)\mathbb{C}_{\text{Naive}}(\hat{\delta}) 𝕍RVE(δ^)\mathbb{V}_{\text{RVE}}(\hat{\delta}) RVE(δ^)\mathbb{C}_{\text{RVE}}(\hat{\delta}) 𝕍RVEMD(δ^)\mathbb{V}_{\text{RVE}}^{\text{MD}}(\hat{\delta}) RVEMD(δ^)\mathbb{C}_{\text{RVE}}^{\text{MD}}(\hat{\delta})
0.5 0.01 EXCH 0.0024 0.1028 0.0570 75.95 0.0935 93.40 0.1067 95.85
NE 0.0028 0.1038 0.0859 92.05 0.0926 93.05 0.1060 95.80
DTD 0.0021 0.1035 0.0833 90.90 0.0923 93.00 0.1056 95.90
0.05 EXCH -0.0006 0.1078 0.0577 75.00 0.0962 93.80 0.1101 96.45
NE -0.0006 0.1076 0.0940 93.95 0.0948 93.05 0.1088 96.35
DTD 0.0001 0.1047 0.0914 93.65 0.0923 93.00 0.1056 95.70
0.8 0.01 EXCH 0.0008 0.1038 0.0571 76.75 0.0916 93.50 0.1045 96.05
NE 0.0004 0.1041 0.0846 91.65 0.0910 92.85 0.1040 95.55
DTD 0.0001 0.1035 0.0815 90.55 0.0908 92.55 0.1039 95.40
0.05 EXCH 0.0008 0.0979 0.0579 80.00 0.0878 92.80 0.1003 95.65
NE 0.0003 0.0983 0.0878 94.55 0.0876 93.00 0.1003 95.65
DTD 0.0004 0.0958 0.0830 93.85 0.0856 92.60 0.0979 95.90

EXCH: simple exchangeable; NE: nested exchangeable; DTD: discrete-time exponential decay; ICC: intracluster correlation coefficients; CAC: cluster autocorrelation coefficient; ρ0\rho_{0}: within‑period ICC under control; ρ1\rho_{1}: within‑period ICC under treatment; RVE: robust variance estimator; MD: Mancl and DeRouen.

Web Table 3: Results over 2,0002,000 simulated datasets generated for scenarios 4-5 under the setting in Simulation Study V. We consider a standard SW-CRT with I=32I=32, J=9J=9, Kij=50K_{ij}=50, δ=0\delta=0, ρ0{0.01,0.05}\rho_{0}\in\{0.01,0.05\}, ρ1=0.1\rho_{1}=0.1, σϵ2=1\sigma_{\epsilon}^{2}=1, CAC {0.5,0.8}\in\{0.5,0.8\}, a continuous period effect T(tijk)=0.5tijk2+sin(6πtijk)/JT(t_{ijk})={0.5t_{ijk}^{2}+\sin(6\pi t_{ijk})}/J, and a cluster-period mixed pattern. Here, sd()\mathrm{sd}(\cdot) is the empirical (Monte Carlo) standard deviation. 𝕍Naive()\mathbb{V}_{\text{Naive}}(\cdot), 𝕍RVE()\mathbb{V}_{\text{RVE}}(\cdot), 𝕍RVEMD()\mathbb{V}_{\text{RVE}}^{\text{MD}}(\cdot) are the average model-based standard error, average model-robust standard error, and average model-robust standard error with the MD correction, respectively. Naive()\mathbb{C}_{\text{Naive}}(\cdot), RVE()\mathbb{C}_{\text{RVE}}(\cdot), RVEMD()\mathbb{C}_{\text{RVE}}^{\text{MD}}(\cdot) are the empirical coverage percentage of 95% confidence intervals based on 𝕍Naive()\mathbb{V}_{\text{Naive}}(\cdot), 𝕍RVE()\mathbb{V}_{\text{RVE}}(\cdot), and 𝕍RVEMD()\mathbb{V}_{\text{RVE}}^{\text{MD}}(\cdot), respectively.
CAC ρ0\rho_{0} Model Bias sd(δ^)(\hat{\delta}) 𝕍Naive(δ^)\mathbb{V}_{\text{Naive}}(\hat{\delta}) Naive(δ^)\mathbb{C}_{\text{Naive}}(\hat{\delta}) 𝕍RVE(δ^)\mathbb{V}_{\text{RVE}}(\hat{\delta}) RVE(δ^)\mathbb{C}_{\text{RVE}}(\hat{\delta}) 𝕍RVEMD(δ^)\mathbb{V}_{\text{RVE}}^{\text{MD}}(\hat{\delta}) RVEMD(δ^)\mathbb{C}_{\text{RVE}}^{\text{MD}}(\hat{\delta})
0.5 0.01 EXCH -0.0034 0.0699 0.0292 60.80 0.0655 93.55 0.0697 95.05
NE -0.0032 0.0693 0.0443 79.95 0.0643 93.45 0.0685 95.30
DTD -0.0027 0.0692 0.0455 82.40 0.0642 93.50 0.0685 94.95
0.05 EXCH 0.0008 0.0727 0.0295 60.55 0.0674 93.20 0.0718 95.15
NE 0.0011 0.0706 0.0503 85.20 0.0650 92.50 0.0693 94.65
DTD 0.0014 0.0676 0.0529 89.00 0.0622 93.10 0.0664 94.65
0.8 0.01 EXCH 0.0007 0.0705 0.0292 59.85 0.0648 93.00 0.0690 95.05
NE 0.0006 0.0702 0.0436 79.55 0.0639 92.95 0.0681 94.70
DTD 0.0012 0.0696 0.0443 80.60 0.0636 92.85 0.0678 94.10
0.05 EXCH -0.0004 0.0668 0.0295 62.90 0.0632 93.85 0.0674 95.65
NE -0.0008 0.0660 0.0476 85.70 0.0623 94.10 0.0665 95.50
DTD -0.0011 0.0617 0.0480 88.75 0.0582 93.40 0.0621 94.80

EXCH: simple exchangeable; NE: nested exchangeable; DTD: discrete-time exponential decay; ICC: intracluster correlation coefficients; CAC: cluster autocorrelation coefficient; ρ0\rho_{0}: within‑period ICC under control; ρ1\rho_{1}: within‑period ICC under treatment; RVE: robust variance estimator; MD: Mancl and DeRouen.

Web Table 4: Results over 2,0002,000 simulated datasets generated for scenarios 16-18 under the setting in Simulation Study VI. We consider a standard SW-CRT with I=32I=32, J=5J=5, Kij=50K_{ij}=50, δ(1)1.3768\delta(1)\approx-1.3768, δ(2)0.3831\delta(2)\approx 0.3831, δ(3)0.9785\delta(3)\approx 0.9785, δ(4)0.0152\delta(4)\approx 0.0152, Δ=0\Delta=0, ρ0=0.01\rho_{0}=0.01, ρ1=0.1\rho_{1}=0.1, σϵ2=1\sigma_{\epsilon}^{2}=1, CAC =0.5=0.5, a continuous period effect T(tijk)=0.5tijk2+sin(6πtijk)/JT(t_{ijk})={0.5t_{ijk}^{2}+\sin(6\pi t_{ijk})}/J, and a cluster-period mixed pattern. Here, sd()\mathrm{sd}(\cdot) is the empirical (Monte Carlo) standard deviation. 𝕍Naive()\mathbb{V}_{\text{Naive}}(\cdot), 𝕍RVE()\mathbb{V}_{\text{RVE}}(\cdot), 𝕍RVEMD()\mathbb{V}_{\text{RVE}}^{\text{MD}}(\cdot) are the average model-based standard error, average model-robust standard error, and average model-robust standard error with the MD correction, respectively. Naive()\mathbb{C}_{\text{Naive}}(\cdot), RVE()\mathbb{C}_{\text{RVE}}(\cdot), RVEMD()\mathbb{C}_{\text{RVE}}^{\text{MD}}(\cdot) are the empirical coverage percentage of 95% confidence intervals based on 𝕍Naive()\mathbb{V}_{\text{Naive}}(\cdot), 𝕍RVE()\mathbb{V}_{\text{RVE}}(\cdot), and 𝕍RVEMD()\mathbb{V}_{\text{RVE}}^{\text{MD}}(\cdot), respectively.
Estimand Model Bias sd(δ^)(\hat{\delta}) 𝕍Naive(δ^)\mathbb{V}_{\text{Naive}}(\hat{\delta}) Naive(δ^)\mathbb{C}_{\text{Naive}}(\hat{\delta}) 𝕍RVE(δ^)\mathbb{V}_{\text{RVE}}(\hat{\delta}) RVE(δ^)\mathbb{C}_{\text{RVE}}(\hat{\delta}) 𝕍RVEMD(δ^)\mathbb{V}_{\text{RVE}}^{\text{MD}}(\hat{\delta}) RVEMD(δ^)\mathbb{C}_{\text{RVE}}^{\text{MD}}(\hat{\delta})
δ(1)\delta(1) EXCH 0.0006 0.0758 0.0431 76.75 0.0729 93.75 0.0779 95.10
NE 0.0004 0.0747 0.0641 90.85 0.0720 93.75 0.0770 95.40
DTD 0.0003 0.0749 0.0601 89.40 0.0723 93.65 0.0774 95.15
δ(2)\delta(2) EXCH 0.0006 0.1017 0.0564 73.55 0.0994 94.90 0.1068 96.30
NE 0.0004 0.0998 0.0816 90.65 0.0973 94.90 0.1047 96.45
DTD -0.0002 0.0991 0.0862 92.40 0.0965 94.95 0.1040 96.30
δ(3)\delta(3) EXCH 0.0025 0.1453 0.0739 70.40 0.1377 93.55 0.1486 95.45
NE 0.0020 0.1442 0.1049 86.10 0.1364 93.50 0.1475 95.55
DTD 0.0021 0.1427 0.1143 89.80 0.1354 93.70 0.1466 95.80
δ(4)\delta(4) EXCH 0.0022 0.1965 0.1003 70.20 0.1842 93.15 0.2019 94.85
NE 0.0015 0.1987 0.1420 85.55 0.1850 93.40 0.2038 95.50
DTD 0.0015 0.2021 0.1529 87.00 0.1896 93.75 0.2085 95.45
Δ\Delta EXCH 0.0015 0.1187 0.0599 69.70 0.1133 93.55 0.1225 95.20
NE 0.0011 0.1178 0.0831 84.95 0.1119 94.15 0.1215 95.40
DTD 0.0009 0.1185 0.0906 87.85 0.1131 93.90 0.1228 95.90

EXCH: simple exchangeable; NE: nested exchangeable; DTD: discrete-time exponential decay; RVE: robust variance estimator; MD: Mancl and DeRouen.

Web Table 5: Results over 2,0002,000 simulated datasets generated for scenarios 4-5 under the setting in Simulation Study VII. We consider a standard SW-CRT with I=96I=96, J=5J=5, δ=0\delta=0, ρ0=0.01\rho_{0}=0.01, ρ1=0.1\rho_{1}=0.1, σϵ2=1\sigma_{\epsilon}^{2}=1, CAC =0.5=0.5, and a continuous period effect T(tijk)=0.5tijk2+sin(6πtijk)/JT(t_{ijk})={0.5t_{ijk}^{2}+\sin(6\pi t_{ijk})}/J. For recruitment sizes, we consider 1) balanced recruitment with Kij=50K_{ij}=50 under both control and treatment, 2) moderately unbalanced recruitment with Kij=25K_{ij}=25 under control and Kij=75K_{ij}=75 under treatment, and 3) severely unbalanced recruitment with Kij=10K_{ij}=10 under control and Kij=90K_{ij}=90 under treatment. For each recruitment size, we consider uniform patterns under both conditions, uniform under control but cluster-period mixed under treatment, and cluster-period mixed under both conditions. Here, sd()\mathrm{sd}(\cdot) is the empirical (Monte Carlo) standard deviation. 𝕍Naive()\mathbb{V}_{\text{Naive}}(\cdot), 𝕍RVE()\mathbb{V}_{\text{RVE}}(\cdot), 𝕍RVEMD()\mathbb{V}_{\text{RVE}}^{\text{MD}}(\cdot) are the average model-based standard error, average model-robust standard error, and average model-robust standard error with the MD correction, respectively. Naive()\mathbb{C}_{\text{Naive}}(\cdot), RVE()\mathbb{C}_{\text{RVE}}(\cdot), RVEMD()\mathbb{C}_{\text{RVE}}^{\text{MD}}(\cdot) are the empirical coverage percentage of 95% confidence intervals based on 𝕍Naive()\mathbb{V}_{\text{Naive}}(\cdot), 𝕍RVE()\mathbb{V}_{\text{RVE}}(\cdot), and 𝕍RVEMD()\mathbb{V}_{\text{RVE}}^{\text{MD}}(\cdot), respectively.
KijK_{ij} Pattern Model Bias sd(δ^)(\hat{\delta}) 𝕍Naive(δ^)\mathbb{V}_{\text{Naive}}(\hat{\delta}) Naive(δ^)\mathbb{C}_{\text{Naive}}(\hat{\delta}) 𝕍RVE(δ^)\mathbb{V}_{\text{RVE}}(\hat{\delta}) RVE(δ^)\mathbb{C}_{\text{RVE}}(\hat{\delta}) 𝕍RVEMD(δ^)\mathbb{V}_{\text{RVE}}^{\text{MD}}(\hat{\delta}) RVEMD(δ^)\mathbb{C}_{\text{RVE}}^{\text{MD}}(\hat{\delta})
50 + 50 U + U EXCH 0.0015 0.0417 0.0234 74.05 0.0409 94.45 0.0418 94.80
NE 0.0015 0.0418 0.0346 90.10 0.0410 94.50 0.0419 95.00
DTD 0.0017 0.0415 0.0331 89.20 0.0407 94.55 0.0415 95.25
U + CP EXCH -0.0297 0.0426 0.0234 62.30 0.0412 88.55 0.0421 89.45
NE -0.0294 0.0429 0.0354 82.20 0.0412 88.55 0.0421 89.45
DTD -0.0293 0.0424 0.0339 80.05 0.0410 88.70 0.0419 89.70
CP + CP EXCH -0.0001 0.0432 0.0234 72.40 0.0417 94.00 0.0425 94.55
NE -0.0003 0.0432 0.0356 88.90 0.0416 93.95 0.0425 94.30
DTD -0.0003 0.0432 0.0342 88.20 0.0415 94.00 0.0424 94.65
25 + 75 U + U EXCH 0.0008 0.0437 0.0258 75.80 0.0429 94.55 0.0437 95.10
NE 0.0004 0.0421 0.0369 92.05 0.0414 94.65 0.0423 95.05
DTD 0.0007 0.0424 0.0354 90.50 0.0418 94.40 0.0427 94.95
U + CP EXCH -0.0307 0.0445 0.0258 64.10 0.0439 89.25 0.0448 90.15
NE -0.0322 0.0432 0.0382 84.00 0.0419 87.75 0.0428 88.30
DTD -0.0324 0.0437 0.0367 82.30 0.0423 87.85 0.0432 88.45
CP + CP EXCH -0.0059 0.0452 0.0258 74.45 0.0440 94.15 0.0450 94.65
NE -0.0057 0.0436 0.0383 91.20 0.0422 93.60 0.0431 94.15
DTD -0.0059 0.0437 0.0368 90.65 0.0426 94.10 0.0435 94.45
10 + 90 U + U EXCH 0.0014 0.0510 0.0332 80.20 0.0496 94.55 0.0506 94.95
NE 0.0014 0.0482 0.0404 90.65 0.0468 94.45 0.0477 94.95
DTD 0.0013 0.0482 0.0411 91.15 0.0469 94.75 0.0479 95.25
U + CP EXCH -0.0274 0.0520 0.0331 73.70 0.0511 91.80 0.0522 92.45
NE -0.0291 0.0482 0.0430 87.65 0.0473 90.35 0.0482 91.05
DTD -0.0297 0.0487 0.0432 87.45 0.0477 90.80 0.0487 91.30
CP + CP EXCH 0.0144 0.0515 0.0331 78.55 0.0513 94.05 0.0523 94.35
NE -0.0144 0.0481 0.0431 90.85 0.0473 93.40 0.0483 93.90
DTD -0.0145 0.0483 0.0432 91.50 0.0478 93.50 0.0488 94.25

EXCH: simple exchangeable; NE: nested exchangeable; DTD: discrete-time exponential decay; U: uniform pattern; C: cluster mixed pattern; CP: cluster-period mixed pattern; RVE: robust variance estimator; MD: Mancl and DeRouen.

Web Table 6: Results over 2,0002,000 simulated datasets generated for scenarios 4-5 under the setting in Simulation Study VII. We consider a standard SW-CRT with I{16,32,96}I\in\{16,32,96\}, J=5J=5, δ{0,2}\delta\in\{0,2\}, ρ0=0.01\rho_{0}=0.01, ρ1=0.1\rho_{1}=0.1, σϵ2=1\sigma_{\epsilon}^{2}=1, a continuous period effect T(tijk)=0.5tijk2+sin(6πtijk)/JT(t_{ijk})={0.5t_{ijk}^{2}+\sin(6\pi t_{ijk})}/J, and CAC =0.5=0.5. For recruitment sizes, we consider 1) Kij=25K_{ij}=25 under control and Kij=50,75,100,125K_{ij}=50,75,100,125 when exposure time is 1, 2, 3, 4, respectively (denoted as S1), and 2) Kij=10K_{ij}=10 under control and Kij=100,110,130,160K_{ij}=100,110,130,160 when exposure time is 1, 2, 3, 4, respectively (denoted as S2). For each recruitment size, we consider uniform patterns under both conditions, and uniform under control but cluster-period mixed under treatment. Here, sd()\mathrm{sd}(\cdot) is the empirical (Monte Carlo) standard deviation. 𝕍Naive()\mathbb{V}_{\text{Naive}}(\cdot), 𝕍RVE()\mathbb{V}_{\text{RVE}}(\cdot), 𝕍RVEMD()\mathbb{V}_{\text{RVE}}^{\text{MD}}(\cdot) are the average model-based standard error, average model-robust standard error, and average model-robust standard error with the MD correction, respectively. Naive()\mathbb{C}_{\text{Naive}}(\cdot), RVE()\mathbb{C}_{\text{RVE}}(\cdot), RVEMD()\mathbb{C}_{\text{RVE}}^{\text{MD}}(\cdot) are the empirical coverage percentage of 95% confidence intervals based on 𝕍Naive()\mathbb{V}_{\text{Naive}}(\cdot), 𝕍RVE()\mathbb{V}_{\text{RVE}}(\cdot), and 𝕍RVEMD()\mathbb{V}_{\text{RVE}}^{\text{MD}}(\cdot), respectively.
δ\delta II Size Pattern Model Bias sd(δ^)(\hat{\delta}) 𝕍Naive(δ^)\mathbb{V}_{\text{Naive}}(\hat{\delta}) Naive(δ^)\mathbb{C}_{\text{Naive}}(\hat{\delta}) 𝕍RVE(δ^)\mathbb{V}_{\text{RVE}}(\hat{\delta}) RVE(δ^)\mathbb{C}_{\text{RVE}}(\hat{\delta}) 𝕍RVEMD(δ^)\mathbb{V}_{\text{RVE}}^{\text{MD}}(\hat{\delta}) RVEMD(δ^)\mathbb{C}_{\text{RVE}}^{\text{MD}}(\hat{\delta})
0 16 S1 U + U EXCH -0.0040 0.1098 0.0663 80.15 0.0984 92.90 0.1112 95.80
NE -0.0040 0.1081 0.0900 91.90 0.0932 92.05 0.1063 95.10
DTD -0.0036 0.1090 0.0880 90.85 0.0946 91.90 0.1078 95.20
U + CP EXCH -0.0254 0.1120 0.0662 78.05 0.1014 92.70 0.1149 95.30
NE -0.0281 0.1067 0.0934 92.80 0.0948 91.65 0.1083 94.75
DTD -0.0277 0.1074 0.0916 92.00 0.0961 92.15 0.1096 94.65
S2 U + U EXCH 0.0005 0.1229 0.0791 83.70 0.1130 92.75 0.1278 95.70
NE 0.0012 0.1164 0.0962 91.25 0.1044 92.35 0.1184 95.35
DTD 0.0003 0.1155 0.0990 92.25 0.1045 92.50 0.1187 95.35
U + CP EXCH -0.0229 0.1272 0.0788 80.85 0.1159 92.65 0.1313 95.00
NE -0.0264 0.1173 0.1032 92.85 0.1042 92.35 0.1185 94.95
DTD -0.0266 0.1176 0.1047 93.10 0.1054 92.75 0.1200 95.05
32 S1 U + U EXCH -0.0056 0.0767 0.0470 78.10 0.0737 94.85 0.0782 96.00
NE -0.0048 0.0750 0.0643 91.95 0.0708 94.45 0.0754 95.65
DTD -0.0049 0.0754 0.0624 91.25 0.0716 94.40 0.0762 95.45
U + CP EXCH -0.0265 0.0802 0.0469 73.65 0.0755 92.40 0.0802 93.80
NE -0.0280 0.0757 0.0668 90.85 0.0714 92.10 0.0762 93.75
DTD -0.0278 0.0757 0.0649 89.75 0.0723 92.55 0.0770 94.15
S2 U + U EXCH -0.0021 0.0880 0.0560 80.65 0.0839 94.35 0.0890 95.50
NE -0.0026 0.0831 0.0678 90.25 0.0785 94.35 0.0834 95.45
DTD -0.0028 0.0827 0.0698 91.35 0.0783 94.40 0.0833 95.90
U + CP EXCH -0.0264 0.0929 0.0558 76.25 0.0874 93.10 0.0928 94.55
NE -0.0279 0.0847 0.0735 90.05 0.0791 92.70 0.0841 93.90
DTD -0.0284 0.0850 0.0742 90.00 0.0798 92.80 0.0849 94.10
96 S1 U + U EXCH -0.0002 0.0440 0.0272 78.85 0.0439 94.80 0.0447 95.20
NE 0.0002 0.0425 0.0373 92.40 0.0423 95.35 0.0432 95.75
DTD 0.0001 0.0430 0.0361 90.45 0.0428 95.25 0.0437 95.45
U + CP EXCH -0.0254 0.0474 0.0271 67.90 0.0451 89.70 0.0460 90.65
NE 0.0275 0.0449 0.0388 85.05 0.0429 89.05 0.0438 90.20
DTD -0.0274 0.0451 0.0375 83.20 0.0434 89.60 0.0443 90.65
S2 U + U EXCH 0.0004 0.0492 0.0323 81.35 0.0499 95.45 0.0508 96.05
NE 0.0003 0.0465 0.0390 89.95 0.0469 95.45 0.0479 96.15
DTD 0.0002 0.0465 0.0402 91.05 0.0468 95.00 0.0478 95.35
U + CP EXCH -0.0258 0.0530 0.0322 71.95 0.0515 91.70 0.0526 92.15
NE -0.0283 0.0484 0.0423 85.90 0.0471 90.45 0.0480 90.95
DTD -0.0292 0.0488 0.0427 86.30 0.0475 89.95 0.0484 90.60
2 U + CP EXCH -0.0258 0.0530 0.0322 71.95 0.0515 91.70 0.0526 92.15
NE -0.0283 0.0484 0.0423 85.90 0.0471 90.45 0.0480 90.95
DTD -0.0292 0.0488 0.0427 86.30 0.0475 89.95 0.0484 90.60

EXCH: simple exchangeable; NE: nested exchangeable; DTD: discrete-time exponential decay; U: uniform pattern; C: cluster mixed pattern; CP: cluster-period mixed pattern; RVE: robust variance estimator; MD: Mancl and DeRouen.

Web Table 7: Results over 2,0002,000 simulated datasets generated for scenarios 16-18 under the setting in Simulation Study VI. We consider a standard SW-CRT with I=16I=16, J=5J=5, δ(1)1.3768\delta(1)\approx-1.3768, δ(2)0.3831\delta(2)\approx 0.3831, δ(3)0.9785\delta(3)\approx 0.9785, δ(4)0.0152\delta(4)\approx 0.0152, Δ=0\Delta=0, ρ0=0.01\rho_{0}=0.01, ρ1=0.1\rho_{1}=0.1, σϵ2=1\sigma_{\epsilon}^{2}=1, CAC =0.5=0.5, and a continuous period effect T(tijk)=0.5tijk2+sin(6πtijk)/JT(t_{ijk})={0.5t_{ijk}^{2}+\sin(6\pi t_{ijk})}/J. For recruitment sizes, Kij=10K_{ij}=10 under control and Kij=100,110,130,160K_{ij}=100,110,130,160 when exposure time is 1, 2, 3, 4, respectively. For each recruitment size, we consider uniform patterns under both conditions, and uniform under control but cluster-period mixed under treatment. Here, sd()\mathrm{sd}(\cdot) is the empirical (Monte Carlo) standard deviation. 𝕍Naive()\mathbb{V}_{\text{Naive}}(\cdot), 𝕍RVE()\mathbb{V}_{\text{RVE}}(\cdot), 𝕍RVEMD()\mathbb{V}_{\text{RVE}}^{\text{MD}}(\cdot) are the average model-based standard error, average model-robust standard error, and average model-robust standard error with the MD correction, respectively. Naive()\mathbb{C}_{\text{Naive}}(\cdot), RVE()\mathbb{C}_{\text{RVE}}(\cdot), RVEMD()\mathbb{C}_{\text{RVE}}^{\text{MD}}(\cdot) are the empirical coverage percentage of 95% confidence intervals based on 𝕍Naive()\mathbb{V}_{\text{Naive}}(\cdot), 𝕍RVE()\mathbb{V}_{\text{RVE}}(\cdot), and 𝕍RVEMD()\mathbb{V}_{\text{RVE}}^{\text{MD}}(\cdot), respectively.
Pattern Estimand Model Bias sd(δ^)(\hat{\delta}) 𝕍Naive(δ^)\mathbb{V}_{\text{Naive}}(\hat{\delta}) Naive(δ^)\mathbb{C}_{\text{Naive}}(\hat{\delta}) 𝕍RVE(δ^)\mathbb{V}_{\text{RVE}}(\hat{\delta}) RVE(δ^)\mathbb{C}_{\text{RVE}}(\hat{\delta}) 𝕍RVEMD(δ^)\mathbb{V}_{\text{RVE}}^{\text{MD}}(\hat{\delta}) RVEMD(δ^)\mathbb{C}_{\text{RVE}}^{\text{MD}}(\hat{\delta})
δ(1)\delta(1) EXCH 0.0014 0.1296 0.1003 89.70 0.1184 92.80 0.1341 95.40
NE 0.0017 0.1258 0.1093 92.35 0.1129 92.70 0.1293 95.40
DTD 0.0015 0.1257 0.1084 92.20 0.1128 92.40 0.1294 95.65
δ(2)\delta(2) EXCH 0.0040 0.1799 0.1338 89.20 0.1610 92.60 0.1853 95.40
NE 0.0045 0.1712 0.1417 92.00 0.1515 92.30 0.1752 95.45
DTD 0.0044 0.1691 0.1484 93.85 0.1492 91.95 0.1728 95.35
δ(3)\delta(3) EXCH 0.0041 0.2382 0.1723 88.75 0.2131 92.40 0.2483 95.80
U + U NE 0.0047 0.2289 0.1804 90.55 0.2015 91.60 0.2353 95.10
DTD 0.0050 0.2248 0.1915 92.40 0.1983 91.80 0.2325 95.25
δ(4)\delta(4) EXCH 0.0042 0.3058 0.2147 86.50 0.2704 92.15 0.3192 95.45
NE 0.0044 0.2941 0.2283 90.00 0.2536 90.95 0.3018 94.45
DTD 0.0048 0.2932 0.2429 91.80 0.2537 90.40 0.3025 94.55
Δ\Delta EXCH 0.0034 0.2051 0.1506 88.70 0.1832 92.05 0.2125 95.45
NE 0.0038 0.1958 0.1543 90.80 0.1711 91.10 0.1999 95.00
DTD 0.0039 0.1940 0.1625 92.50 0.1698 91.45 0.1989 95.00
δ(1)\delta(1) EXCH -0.0289 0.1304 0.1005 89.40 0.1196 93.00 0.1355 95.85
NE -0.0293 0.1251 0.1136 94.25 0.1127 92.80 0.1294 95.55
DTD -0.0295 0.1256 0.1119 93.95 0.1131 92.80 0.1302 95.80
δ(2)\delta(2) EXCH -0.0352 0.1820 0.1344 87.60 0.1643 92.55 0.1892 95.20
NE -0.0362 0.1709 0.1462 92.40 0.1519 92.00 0.1761 94.95
DTD -0.0366 0.1688 0.1538 93.55 0.1504 91.60 0.1747 94.80
δ(3)\delta(3) EXCH -0.0384 0.2437 0.1734 86.65 0.2190 92.20 0.2556 95.50
U + CP NE -0.0390 0.2300 0.1858 91.30 0.2041 91.90 0.2390 95.40
DTD -0.0385 0.2275 0.1981 93.45 0.2019 92.10 0.2375 95.30
δ(4)\delta(4) EXCH -0.0433 0.3203 0.2162 84.10 0.2807 92.20 0.3327 95.55
NE -0.0443 0.3034 0.2365 89.75 0.2598 90.30 0.3116 95.20
DTD -0.0414 0.3051 0.2516 91.80 0.2627 90.60 0.3156 95.20
Δ\Delta EXCH -0.0365 0.2089 0.1516 87.00 0.1870 92.75 0.2172 95.45
NE -0.0372 0.1961 0.1575 90.45 0.1716 90.95 0.2013 95.40
DTD -0.0365 0.1953 0.1666 92.30 0.1716 91.30 0.2018 95.40

EXCH: simple exchangeable; NE: nested exchangeable; DTD: discrete-time exponential decay; U: uniform pattern; CP: cluster-period mixed pattern; RVE: robust variance estimator; MD: Mancl and DeRouen.

Web Table 8: Results over 2,0002,000 simulated datasets generated for scenarios 16-18 under the setting in Simulation Study VI. We consider a standard SW-CRT with I=32I=32, J=5J=5, δ(1)1.3768\delta(1)\approx-1.3768, δ(2)0.3831\delta(2)\approx 0.3831, δ(3)0.9785\delta(3)\approx 0.9785, δ(4)0.0152\delta(4)\approx 0.0152, Δ=0\Delta=0, ρ0=0.01\rho_{0}=0.01, ρ1=0.1\rho_{1}=0.1, σϵ2=1\sigma_{\epsilon}^{2}=1, CAC =0.5=0.5, and a continuous period effect T(tijk)=0.5tijk2+sin(6πtijk)/JT(t_{ijk})={0.5t_{ijk}^{2}+\sin(6\pi t_{ijk})}/J. For recruitment sizes, Kij=10K_{ij}=10 under control and Kij=100,110,130,160K_{ij}=100,110,130,160 when exposure time is 1, 2, 3, 4, respectively. For each recruitment size, we consider uniform patterns under both conditions, and uniform under control but cluster-period mixed under treatment. Here, sd()\mathrm{sd}(\cdot) is the empirical (Monte Carlo) standard deviation. 𝕍Naive()\mathbb{V}_{\text{Naive}}(\cdot), 𝕍RVE()\mathbb{V}_{\text{RVE}}(\cdot), 𝕍RVEMD()\mathbb{V}_{\text{RVE}}^{\text{MD}}(\cdot) are the average model-based standard error, average model-robust standard error, and average model-robust standard error with the MD correction, respectively. Naive()\mathbb{C}_{\text{Naive}}(\cdot), RVE()\mathbb{C}_{\text{RVE}}(\cdot), RVEMD()\mathbb{C}_{\text{RVE}}^{\text{MD}}(\cdot) are the empirical coverage percentage of 95% confidence intervals based on 𝕍Naive()\mathbb{V}_{\text{Naive}}(\cdot), 𝕍RVE()\mathbb{V}_{\text{RVE}}(\cdot), and 𝕍RVEMD()\mathbb{V}_{\text{RVE}}^{\text{MD}}(\cdot), respectively.
Pattern Estimand Model Bias sd(δ^)(\hat{\delta}) 𝕍Naive(δ^)\mathbb{V}_{\text{Naive}}(\hat{\delta}) Naive(δ^)\mathbb{C}_{\text{Naive}}(\hat{\delta}) 𝕍RVE(δ^)\mathbb{V}_{\text{RVE}}(\hat{\delta}) RVE(δ^)\mathbb{C}_{\text{RVE}}(\hat{\delta}) 𝕍RVEMD(δ^)\mathbb{V}_{\text{RVE}}^{\text{MD}}(\hat{\delta}) RVEMD(δ^)\mathbb{C}_{\text{RVE}}^{\text{MD}}(\hat{\delta})
δ(1)\delta(1) EXCH -0.0029 0.0916 0.0715 88.85 0.0880 93.45 0.0934 95.2
NE -0.0028 0.0885 0.0773 92.10 0.0849 93.70 0.0906 95.30
DTD -0.0027 0.0885 0.0767 91.85 0.0848 93.85 0.0906 95.30
δ(2)\delta(2) EXCH -0.0029 0.1265 0.0958 87.60 0.1208 93.90 0.1292 95.05
NE -0.0024 0.1204 0.1009 91.10 0.1149 93.90 0.1232 95.10
DTD -0.0025 0.1186 0.1055 92.85 0.1131 93.60 0.1214 94.85
δ(3)\delta(3) EXCH -0.0050 0.1690 0.1236 86.70 0.1607 93.80 0.1729 95.50
U + U NE -0.0050 0.1621 0.1292 89.30 0.1535 94.10 0.1653 95.25
DTD -0.0041 0.1596 0.1366 91.55 0.1509 93.85 0.1629 95.45
δ(4)\delta(4) EXCH -0.0039 0.2155 0.1541 85.10 0.2051 93.75 0.2220 95.75
NE -0.0033 0.2065 0.1635 88.60 0.1952 93.75 0.2120 95.55
DTD -0.0023 0.2068 0.1735 91.10 0.1949 93.65 0.2120 95.15
Δ\Delta EXCH -0.0037 0.1447 0.1080 86.75 0.1380 93.95 0.1482 95.30
NE -0.0034 0.1379 0.1108 89.70 0.1309 93.60 0.1410 95.25
DTD -0.0029 0.1367 0.1160 91.70 0.1296 93.90 0.1398 95.35
δ(1)\delta(1) EXCH -0.0290 0.0924 0.0716 86.45 0.0889 92.75 0.0944 94.20
NE -0.0289 0.0884 0.0808 92.55 0.0847 92.60 0.0905 94.75
DTD -0.0290 0.0889 0.0794 92.15 0.0850 92.60 0.0910 94.65
δ(2)\delta(2) EXCH -0.0310 0.1258 0.0961 86.45 0.1227 93.85 0.1313 95.55
NE -0.0315 0.1176 0.1044 91.75 0.1144 93.30 0.1229 95.55
DTD -0.0316 0.1161 0.1097 93.65 0.1132 92.90 0.1216 95.45
δ(3)\delta(3) EXCH -0.0351 0.1692 0.1241 85.45 0.1642 93.85 0.1769 95.45
U + CP NE -0.0349 0.1592 0.1331 90.00 0.1542 93.85 0.1663 95.60
DTD -0.0338 0.1573 0.1416 92.55 0.1522 93.95 0.1645 95.30
δ(4)\delta(4) EXCH -0.0336 0.2186 0.1549 84.95 0.2126 94.30 0.2307 96.40
NE -0.0321 0.2059 0.1697 90.35 0.1993 94.25 0.2174 95.85
DTD -0.0282 0.2069 0.1799 92.00 0.2007 94.20 0.2191 95.65
Δ\Delta EXCH -0.0322 0.1439 0.1085 86.50 0.1402 93.95 0.1507 95.35
NE -0.0318 0.1343 0.1130 90.50 0.1303 93.95 0.1407 95.55
DTD -0.0307 0.1337 0.1191 92.35 0.1299 93.75 0.1404 95.35

EXCH: simple exchangeable; NE: nested exchangeable; DTD: discrete-time exponential decay; U: uniform pattern; CP: cluster-period mixed pattern; RVE: robust variance estimator; MD: Mancl and DeRouen.

Web Table 9: Results over 2,0002,000 simulated datasets generated for scenarios 16-18 under the setting in Simulation Study VI. We consider a standard SW-CRT with I=96I=96, J=5J=5, δ(1)1.3768\delta(1)\approx-1.3768, δ(2)0.3831\delta(2)\approx 0.3831, δ(3)0.9785\delta(3)\approx 0.9785, δ(4)0.0152\delta(4)\approx 0.0152, Δ=0\Delta=0, ρ0=0.01\rho_{0}=0.01, ρ1=0.1\rho_{1}=0.1, σϵ2=1\sigma_{\epsilon}^{2}=1, CAC =0.5=0.5, and a continuous period effect T(tijk)=0.5tijk2+sin(6πtijk)/JT(t_{ijk})={0.5t_{ijk}^{2}+\sin(6\pi t_{ijk})}/J. For recruitment sizes, Kij=10K_{ij}=10 under control and Kij=100,110,130,160K_{ij}=100,110,130,160 when exposure time is 1, 2, 3, 4, respectively. For each recruitment size, we consider uniform patterns under both conditions, and uniform under control but cluster-period mixed under treatment. Here, sd()\mathrm{sd}(\cdot) is the empirical (Monte Carlo) standard deviation. 𝕍Naive()\mathbb{V}_{\text{Naive}}(\cdot), 𝕍RVE()\mathbb{V}_{\text{RVE}}(\cdot), 𝕍RVEMD()\mathbb{V}_{\text{RVE}}^{\text{MD}}(\cdot) are the average model-based standard error, average model-robust standard error, and average model-robust standard error with the MD correction, respectively. Naive()\mathbb{C}_{\text{Naive}}(\cdot), RVE()\mathbb{C}_{\text{RVE}}(\cdot), RVEMD()\mathbb{C}_{\text{RVE}}^{\text{MD}}(\cdot) are the empirical coverage percentage of 95% confidence intervals based on 𝕍Naive()\mathbb{V}_{\text{Naive}}(\cdot), 𝕍RVE()\mathbb{V}_{\text{RVE}}(\cdot), and 𝕍RVEMD()\mathbb{V}_{\text{RVE}}^{\text{MD}}(\cdot), respectively.
Pattern Estimand Model Bias sd(δ^)(\hat{\delta}) 𝕍Naive(δ^)\mathbb{V}_{\text{Naive}}(\hat{\delta}) Naive(δ^)\mathbb{C}_{\text{Naive}}(\hat{\delta}) 𝕍RVE(δ^)\mathbb{V}_{\text{RVE}}(\hat{\delta}) RVE(δ^)\mathbb{C}_{\text{RVE}}(\hat{\delta}) 𝕍RVEMD(δ^)\mathbb{V}_{\text{RVE}}^{\text{MD}}(\hat{\delta}) RVEMD(δ^)\mathbb{C}_{\text{RVE}}^{\text{MD}}(\hat{\delta})
δ(1)\delta(1) EXCH -0.0003 0.0530 0.0414 87.55 0.0523 94.95 0.0534 95.25
NE -0.0003 0.0514 0.0445 91.60 0.0508 94.65 0.0519 95.25
DTD -0.0002 0.0513 0.0442 91.40 0.0507 94.65 0.0518 94.95
δ(2)\delta(2) EXCH -0.0008 0.0739 0.0555 87.00 0.0719 94.20 0.0735 94.85
NE -0.0007 0.0708 0.0583 90.20 0.0688 94.65 0.0704 95.00
DTD -0.0007 0.0696 0.0609 92.30 0.0676 94.60 0.0692 95.15
δ(3)\delta(3) EXCH -0.0014 0.0986 0.0717 85.45 0.0961 93.60 0.0985 94.20
U + U NE -0.0015 0.0951 0.0748 88.15 0.0923 94.05 0.0945 94.75
DTD -0.0012 0.0936 0.0790 90.70 0.0906 94.75 0.0929 95.10
δ(4)\delta(4) EXCH -0.0017 0.1271 0.0895 83.95 0.1234 94.00 0.1267 94.50
NE -0.0018 0.1229 0.0946 87.10 0.1184 94.25 0.1216 94.60
DTD -0.0018 0.1233 0.1003 89.40 0.1179 93.70 0.1212 94.70
Δ\Delta EXCH -0.0011 0.0849 0.0627 86.40 0.0826 94.25 0.0845 94.80
NE -0.0011 0.0815 0.0642 88.60 0.0789 94.35 0.0808 95.00
DTD -0.0010 0.0808 0.0671 90.20 0.0780 94.60 0.0799 95.10
δ(1)\delta(1) EXCH -0.0308 0.0534 0.0414 81.65 0.0527 91.15 0.0538 91.50
NE -0.0307 0.0517 0.0466 87.70 0.0505 90.65 0.0516 91.20
DTD -0.0308 0.0520 0.0458 86.95 0.0507 90.65 0.0519 91.25
δ(2)\delta(2) EXCH -0.0344 0.0735 0.0557 82.60 0.0730 91.75 0.0746 92.65
NE -0.0346 0.0695 0.0604 87.30 0.0684 91.50 0.0700 92.00
DTD -0.0348 0.0691 0.0634 89.85 0.0676 91.75 0.0692 92.40
δ(3)\delta(3) EXCH -0.0386 0.1002 0.0720 81.65 0.0981 92.55 0.1005 93.20
U + CP NE -0.0384 0.0954 0.0771 85.75 0.0925 92.40 0.0948 93.30
DTD -0.0372 0.0941 0.0819 89.35 0.0912 92.40 0.0935 92.90
δ(4)\delta(4) EXCH -0.0418 0.1289 0.0899 80.90 0.1272 93.50 0.1306 94.00
NE -0.0405 0.1232 0.0983 86.25 0.1202 92.80 0.1236 93.30
DTD -0.0366 0.1238 0.1041 88.50 0.1207 92.50 0.1242 93.15
Δ\Delta EXCH -0.0364 0.0847 0.0629 82.15 0.0836 92.15 0.0856 92.65
NE -0.0361 0.0803 0.0655 85.60 0.0783 91.95 0.0803 92.70
DTD -0.0349 0.0800 0.0689 88.40 0.0779 92.05 0.0799 92.85

EXCH: simple exchangeable; NE: nested exchangeable; DTD: discrete-time exponential decay; U: uniform pattern; CP: cluster-period mixed pattern; RVE: robust variance estimator; MD: Mancl and DeRouen.