Sample size calculator for cluster randomized trials
Donner A, Klar N. Murray D. Design and Analysis of Group-Randomized Trials. Hayes R, Moulton L. Cluster Randomised Trials. Methods for evaluating area-wide and organisation-based interventions in health and health care: a systematic review. Health Technol Assess ; 3 : iii— On design considerations and randomization-based inference for community intervention trials. Stat Med ; 15 : — Issues in the design and interpretation of studies to evaluate the impact of community-based interventions.
Trop Med Int Health ; 2 : — Campbell MJ. Cluster randomized trials in general family practice research. Stat Methods Med Res ; 9 : 81— Selected methodological issues in evaluating community-based health promotion and disease prevention programs. Annu Rev Public Health ; 13 — Design and analysis of group-randomized trials: a review of recent methodological developments.
Am J Public Health ; 94 — Cornfield J. Randomization by group: a formal analysis. Am J Epidemiol ; : — Randomization by cluster- sample size requirements and analysis. Statistical considerations in the design and analysis of community intervention trials. J Clin Epidemiol ; 49 : — Incorporation of clustering effects for the Wilcoxon rank sum test: a large-sample approach. Biometrics ; 59 : — Austin PC. A comparison of the statistical power of different methods for the analysis of cluster randomization trials with binary outcomes.
Stat Med ; 26 : — Donner A. A review of inference procedures for the intraclass correlation-coefficient in the one-way random effects model. Int Stat Rev ; 54 : 67— Estimating intraclass correlation for binary data. Biometrics ; 55 : — Patterns of intra-cluster correlation from primary care research to inform study design and analysis.
J Clin Epidemiol ; 57 : — Determinants of the intracluster correlation coefficient in cluster randomized trials: the case of implementation research. Clin Trials ; 2 — Components of variance and intraclass correlations for the design of community-based surveys and intervention studies: data from the Health Survey for England Intracluster correlation coefficients and coefficients of variation for perinatal outcomes from five cluster-randomised controlled trials in low and middle-income countries: results and methodological implications.
Trials ; 12 : Paediatr Perinat Epidemiol ; 22 : — Parameters to aid in the design and analysis of community trials: intraclass correlations from the Minnesota Heart Health Program. Epidemiology ; 5 : 88— The worksite component of variance: design effects and the Healthy Worker Project. Health Educ Res ; 8 : — School-level intraclass correlation for physical activity in adolescent girls.
Med Sci Sports Exerc ; 36 : — Murray DM, Short B. Intraclass correlation among measures related to alcohol use by young adults: estimates, correlates and applications in intervention studies. J Stud Alcohol ; 56 : — Hayes R, Bennett S. Simple sample size calculation for cluster-randomized trials. Int J Epidemiol ; 28 : — Developments in cluster randomized trials and statistics in medicine. Stat Med ; 26 : 2— Shih W. Sample size and power calculations for periodontal and other studies with clustered samples using the method of generalized estimating equations.
Biometr J ; 39 : — Kerry S, Bland J. Trials which randomize practices II: sample size. Fam Pract ; 15 : 84— Connelly LB. Balancing the number and size of sites: an economic approach to the optimal design of cluster samples.
Control Clin Trials ; 24 : — Hsieh F. Sample-size formulas for intervention studies with the cluster as unit of randomisation. Stat Med ; 7 : — Rosner B, Glynn R. Power and Sample size estimation for the clustered Wilcoxon test.
Biometrics ; 67 : — Sample size determination for clustered count data. Stat Med ; 32 : — Sample-size calculations for studies with correlated ordinal outcomes. Stat Med ; 24 : — Campbell M, Walters S. How to design, analyse and report cluster randomised trials in medicine and health related research. Wiley, Chichester, Whitehead J. Sample size calculations for ordered categorical data. Stat Med ; 12 : — Schoenfeld D.
Sample-size formula for the proportional-hazards regression model. Biometrics ; 39 : — Gangnon R, Kosorok M. Sample-size formula for clustered survival data using weighted log-rank statistics. Biometrika ; 91 : — Sample size in cluster-randomized trials with time to event as the primary endpoint.
Byar DP. The design of cancer prevention trials. Recent Results Cancer Res ; : 34— Xie T, Waksman J. Design and sample size estimation in clinical trials with clustered survival times as the primary endpoint.
Stat Med ; 22 : — Manatunga A, Chen S. Sample size estimation for survival outcomes in cluster-randomized studies with small cluster sizes. Biometrics ; 56 — Spiegelhalter D.
Bayesian methods for cluster randomized trials with continuous responses. Stat Med ; 20 : — Prior distributions for the intracluster correlation coefficient, based on multiple previous estimates, and their application in cluster randomized trials.
Clin Trials ; 2 : — Allowing for imprecision of the intracluster correlation coefficient in the design of cluster randomized trials. Stat Med ; 23 : — Feng Z, Grizzle JE. Correlated binomial variates: properties of estimator of intraclass correlation and its effect on sample size calculation. Stat Med ; 11 : — Exploratory cluster randomised controlled trial of shared care development for long-term mental illness.
Br J Gen Pract ; 54 : — Mukhopadhyay S, Looney S. Quantile dispersion graphs to compare the efficiencies of cluster randomized designs. J Appl Stat ; 36 : — Sample size for cluster randomized trials: effect of coefficient of variation of cluster size and analysis method.
Int J Epidemiol ; 35 : — Unequal cluster sizes for trials in English and Welsh general practice: implications for sample size calculations. Pan W. Sample size and power calculations with correlated binary data. Control Clin Trials ; 22 : — Liu G, Liang K. Sample size calculations for studies with correlated observations.
Biometrics ; 53 : — Sample size calculation for dichotomous outcomes in cluster randomization trials with varying cluster size. Drug Inform J ; 37 : — Sample size estimation in cluster randomized studies with varying cluster size.
Biometr J ; 43 : 75— Relative efficiency of unequal versus equal cluster sizes in cluster randomized and multicentre trials. Sample size adjustments for varying cluster sizes in cluster randomized trials with binary outcomes analyzed with second-order PQL mixed logistic regression.
Stat Med ; 29 : — Sample size re-estimation in cluster randomization trials. Stat Med ; 21 : — Yin G, Shen Y. Adaptive design and estimation in randomized clinical trials with correlated observations. Biometrics ; 61 : — Liu X. Statistical power and optimum sample allocation ratio for treatment and control having unequal costs per unit of randomization. J Educ Behav Stat ; 28 : — Hoover D. Power for t-test comparisons of unbalanced cluster exposure studies.
J Urban Health ; 79 : — Statistical Methods. Some aspects of the design and analysis of cluster randomization trials. Lui K, Chang K. Test non-inferiority and sample size determination based on the odds ratio under a cluster randomized trial with noncompliance. J Biopharm Stat ; 21 — Accounting for expected attrition in the planning of community intervention trials.
Sample size determination for hierarchical longitudinal designs with differential attrition rates. Biometrics ; 63 : — Sample size determination for testing equality in a cluster randomized trial with noncompliance. J Biopharm Stat ; 21 :1— J Clin Epidemiol ; 62 : — Design effects for binary regression models fitted to dependent data. A simple sample size formula for analysis of covariance in cluster randomized trials. Stat Med ; 31 : — Planning for the appropriate analysis in school-based drug-use prevention studies.
J Consult Clin Psychol ; 58 : — The importance and role of intracluster correlations in planning cluster trials. Epidemiology ; 18 : — An integrated population-averaged approach to the design, analysis and sample size determination of cluster-unit trials.
Cohort versus cross-sectional design in large field trials: precision, sample size, and a unifying model. Stat Med ; 13 : 61— McKinlay S. Cost-efficient designs of cluster unit trials. Prev Med ; 23 : — Raudenbush S. Statistical analysis and optimal design for cluster randomized trials. Psychol Methods ; 2 : — Design issues for experiments in multilevel populations. J Educ Behav Stat ; 25 : — Optimal experimental designs for multilevel logistic models.
Optimal experimental designs for multilevel models with covariates. Commun Stat Theor Stat ; 30 : — Moerbeek M, Maas C. Optimal experimental designs for multilevel logistic models with two binary predictors. Commun Stat Theor Stat ; 34 : — Moerbeek M. Power and money in cluster randomized trials: When is it worth measuring a covariate? Stat Med ; 25 : — Data-analysis and sample size issues in evaluations of community-based health promotion and disease prevention programs — a mixed-model analysis of variance approach.
J Clin Epidemiol ; 44 : — Heo M, Leon A. Sample size requirements to detect an intervention by time interaction in longitudinal cluster randomized clinical trials. Stat Med ; 28 : — Sizing a trial to alter the trajectory of health behaviours: methods, parameter estimates, and their application.
Sample size and power determination for clustered repeated measurements. Parallel GRTs are common in animal research, where the units of assignment may be litters of mice or rats, or other collections of animals. The design and analytic issues are the same, whether the study involves human or animals, and whether the research is applied or basic.
Parallel GRTs have a nested or hierarchical design: the groups randomized to each study condition are nested within those study conditions so that each group appears in only one study condition.
The members are nested within those groups so that each member appears in only one group. In cohort GRTs, members are observed repeatedly so that measurements are nested within members; in cross-sectional GRTs, different members are observed in each group at each measurement occasion. In each case, the units of observation are nested within the units of assignment, which are nested within the study conditions.
Parallel GRTs can be employed in a wide variety of settings and populations to address a wide variety of research questions. They are the best comparative design available when the investigator wants to evaluate an intervention that:. Parallel GRTs often involve a limited number of groups randomized to each study condition.
A recent review found that the median number of groups randomized to each study condition in GRTs related to cancer was 25, though many were much smaller Murray et al. When the number of groups available for randomization is limited, there is a greater risk that potentially confounding variables will be unevenly distributed among the study conditions, and this can threaten the internal validity of the trial.
As a result, when the number of groups to be randomized to each study condition is limited, a priori matching and a priori stratification are widely recommended to help ensure balance across the study conditions on potential confounders Campbell and Walters , ; Donner and Klar , ; Hayes and Moulton , ; Murray , The more challenging feature of parallel GRTs is that members of the same group usually share some physical, geographic, social, or other connection.
Those connections create the expectation for a positive intraclass correlation ICC among observations taken on members of the same group, as members of the same group tend to be more like one another than to members of other groups.
The ICC is simply the average bivariate correlation on the outcome among members of the same group or cluster. Positive ICC reduces the variation among the members of the same group but increases the variation among the groups. As such, the variance of any group-level statistic will be larger in a parallel GRT than in a randomized clinical trial RCT. Complicating matters further, the degrees of freedom df available to estimate the ICC or the group-level component of variance will be based on the number of groups, and so are often limited.
Any analysis that ignores the extra variation or positive ICC or the limited df will have a type I error rate that is inflated, often badly Campbell and Walters , ; Donner and Klar , ; Eldridge and Kerry , ; Hayes and Moulton , ; Murray , The recommended solution to these challenges is to employ a priori matching, a priori stratification, or constrained randomization to balance potential confounders, to reflect the hierarchical structure of the design in the analytic plan; and to estimate the sample size for the GRT based on realistic and data-based estimates of the ICC and the other parameters indicated by the analytic plan.
Extra variation and limited df always reduce power, so it is essential to consider these factors while the study is being planned, and particularly as part of the estimation of sample size.
The sections below provide additional resources for investigators considering a parallel group- or cluster-randomized trial. Use a parallel GRT if you have an intervention that operates at a group level, manipulates the social or physical environment, or simply cannot be delivered to individuals without serious risk of contamination. If you can deliver your intervention to individuals without risk of contamination and can avoid interaction among participants post-randomization, it is more efficient and easier to use a traditional RCT.
A pragmatic trial is one that helps users choose between options for care. These trials are usually done in the real world, under less well-controlled conditions than more traditional clinical trials. There are five published textbooks on the design and analysis of group- or cluster-randomized trials Campbell and Walters , ; Donner and Klar , ; Eldridge and Kerry , ; Hayes and Moulton , ; Murray , A recent textbook is devoted to power and sample size calculation for multilevel designs, including parallel GRTs, IRGTs, and stepped wedge group-randomized trials Moerbeek and Teerenstra , The most accurate result will be available with t-scores.
For studies in which the number of units randomized to conditions is 50 or more, z-scores will work well. As the number of randomization units decreases, the df available for the test of the intervention effect also decrease, and the difference between z-scores and t-scores increases. Sometimes investigators randomize months or weeks within clinics to study conditions. As an example, consider a study in which over the course of a year, six months are spent delivering the intervention condition and six months are spent delivering the control condition, with the order randomized within each clinic.
The unit of assignment in this case is the time block within the clinic, rather than the clinic itself. Patients receive the intervention or control condition appropriate to the time block when they come to the clinic.
While these groups are not structural groups like whole clinics, they are still groups, and this is still a parallel group- or cluster-randomized trial with the time block as the group. The key number in this case for power or sample size calculations is the number of time blocks, not the number of clinics.
In this example, the clinic is crossed with study conditions as there are both interventions and control participants in each clinic; the clinic can be included in the analysis as a fixed effect stratification factor and that may improve power. It is important to distinguish between changing study conditions or study arms and changing groups or clusters. In a parallel GRT or IRGT, it is important to ensure that each participant remains in the study condition to which they were randomized.
Those assigned to the intervention condition should not move to the control condition, and vice versa. Sometimes that is unavoidable, but it should be uncommon. If it does happen, standard practice is to analyze as randomized, under the intention-to-treat principle.
The other possibility is that a participant in a GRT or IRGT would change groups or clusters even as they stay in the same study condition or study arm. In a school-based trial, a participant from one intervention school might move to another intervention school. Or in an IRGT, a participant who usually went to the Tuesday night class might sometimes go to the Saturday morning class. Recent studies have shown that failure to account for changing group membership can result in an inflated type I error rate Andridge et al.
Several authors provide methods for analyzing data to account for such changes Candlish et al. Standard sources assume that each group or cluster has the same number of observations, but that is almost never true in practice. So long as the ratio of the largest to the smallest group is no worse than about , such variation can be ignored. But as the variation grows more marked, analysts risk an inflated type I error rate if they ignore it Johnson et al.
In addition, power falls as the variation in group or cluster size increases, so that it needs to be addressed in the sample size calculations. We have known for some time that the magnitude of the ICC is inversely related to the level of aggregation Donner , The smaller the level of aggregation, the larger the ICC.
Spouse pairs and family units are small clusters, so their ICCs are often large. Moving to larger aggregates, like worksites or schools, the ICCs are usually smaller. Moving to even larger aggregates, like communities, the ICCs are usually even smaller.
If a school study, the ICC may be much smaller, e. It is important to account for the ICC, but also the average number of observations expected in each group randomized to the study conditions, as well as the number of groups randomized, as that dictates the df available for the test of the intervention effect. It is true that this approach will improve the fidelity of implementation.
In a parallel GRT, the groups are the units of assignment and are nested within study conditions, with different groups in each condition. In an IRGT, the groups are created in the intervention condition to facilitate delivery of the intervention; those groups may be defined by their instructor or facilitator, surgeon, therapist, or other interventionist, or they may be virtual groups.
So long as the groups are nested within study conditions, they must be included in the analysis as levels of a random effect; ignoring them, or including them as levels of a fixed effect, will result in an inflated type 1 error rate.
This is because nested factors must be modeled as random effects. This explanation also offers a potential solution — if the investigator can avoid nesting groups within study conditions, the requirement to model those groups as levels of a random effect disappears. The alternative to nesting is crossing, so if it is possible to cross the levels of the grouping factor with study conditions, then the grouping factor becomes a stratification factor and the investigator is free to model the grouping factor as a random effect, as a fixed effect, or to ignore the grouping factor in the analysis.
For example, if schools are randomized to study conditions, the study is a GRT. But if students within schools are randomized to study conditions, the schools will be crossed with study conditions and we have a stratified RCT; the investigator can model the schools as a random effect, as a fixed effect, or ignore it in the analysis. As another example, if the therapists used to deliver the intervention in an IRGT also deliver an alternative intervention in the control condition, the therapists will be crossed with study condition and the investigator can model therapist as a random effect, as a fixed effect, or ignore therapist in the analysis.
In either example, the choice between modeling the grouping factor as random, as fixed, or ignoring it will depend on factors like power and generalizability. The best estimate for the ICC will reflect the circumstances for the trial being planned. That estimate will be from the same target population, so that it reflects the appropriate groups or clusters e.
That estimate will derive from data collected for the same outcome using the same measurement methods to be used for the primary outcome in the trial being planned. For example, if planning a trial to improve servings of fruits and vegetables in inner-city third graders, it would be important to get an ICC estimate for servings of fruits and vegetables, measured in the same way as servings would be measured in the trial being planned, from third-graders in inner-city schools like the schools that would be recruited for the trial being planned.
As such, it is important to choose covariates carefully. The best covariates will be related to the outcome and unevenly distributed between the study conditions or among the groups or clusters randomized to the study conditions. A priori matching can improve power in a GRT, but it can also reduce power, so investigators need to be thoughtful about a priori matching in their design and analysis.
A priori matching reduces the df for the test of the intervention effect by half, and if the correlation between the matching factor and the outcome is not large enough to overcome the loss of df, power will be reduced in the matched analysis compared to the unmatched analysis. A priori matching is often used to balance potential confounders, and it is then up to the investigator to decide whether to reflect that a priori matching in the analysis. It is not required, because the type 1 error rate is unaffected when the matching or stratification factor is ignored in the analysis of intervention effects Diehr et al.
Overview Fingerprint. Access to Document Fingerprint Dive into the research topics of 'Sample size calculations for cluster randomised controlled trials with a fixed number of clusters'. Together they form a unique fingerprint.
0コメント