Skip to content

Commit 9e7f99a

Browse files
author
NightlordTW
committed
Clarify multiple testing issues
1 parent cdaef65 commit 9e7f99a

File tree

2 files changed

+40
-11
lines changed

2 files changed

+40
-11
lines changed

vignettes/intopkg.Rmd

Lines changed: 36 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -107,17 +107,46 @@ One possible solution, if the required sample size is not feasible, is to power
107107

108108
## Testing multiple primary endpoints
109109

110-
When a trial aims to evaluate equivalence for at least $k$ out of $m$ primary endpoints, multiple tests are required. The total number of tests depends on the way equivalence is evaluated. Specifically, if pairwise comparisons are being considered among $m$ endpoints, the number of tests is calculated as:
110+
When a trial aims to evaluate equivalence for at least $k$ out of $m$ primary endpoints, multiple tests are required. The total number of tests depends on the approach used to evaluate equivalence. For example, with $m=3$ independent primary endpoints and a significance level of $\alpha = 5\%$, the probability of making a Type I error on at least one hypothesis test is:
111111

112-
$$ k(k-1)/2 \leq m$$
112+
$$ 1 – (1-\alpha)^m = 1 - (1-0.05)^3 = 0.1426. $$
113+
This means that the overall probability of making any false positive error, also known as the **family-wise error rate (FWER)**, increases to approximately 14%.
113114

114-
A total of 𝑘(𝑘−1)/2≤𝑚 tests are being considered
115-
The statistical probability of incorrectly rejecting a true H0 will inflate along with the increased number of simultaneously tested hypotheses
116-
Strategies to control the type I error when evaluating multiple comparisons
117-
Directly adjust the observed P value for each hypothesis
118-
Adjust the cutoff level 𝛼 to reject each hypothesis
115+
To address this, adjustments to the significance level are necessary for multiple endpoint comparisons. Various methods have been proposed to handle this issue. In the package under consideration, the following approaches are included:
119116

120117

118+
### Bonferroni correction
119+
The most common and easiest procedure for multiplicity adjustment that controls the family-wise error rate (FWER) is the Bonferroni method. Each hypothesis is tested at level
120+
121+
$$\alpha_{bon}= \alpha/m$$
122+
123+
where $m$ is the total number of tests. While simple, this method is conservative, especially when the tests are correlated, as it assumes all tests are independent. In the [sampleSize()](../reference/sampleSize.html) function, the Bonferroni correction can be applied by specifying `adjust = "bon"`.
124+
125+
### Sidak correction
126+
127+
The Sidak correction is an alternative method for controlling the family-wise error rate (FWER) in hypothesis testing. Like the Bonferroni correction, it assumes that the tests are independent. However, the Sidak correction accounts for the joint probability of all tests being non-significant, making it mathematically less conservative than Bonferroni. The adjusted significance level is calculated as:
128+
129+
$$\alpha_{sid}= 1-(1-\alpha)^ {1/m}$$
130+
131+
The Sidak correction can be implemented by specifying `adjust = "sid"` in the [sampleSize()](../reference/sampleSize.html) function.
132+
133+
### K adjustment
134+
135+
This correction explicitly accounts for the scenario where equivalence is required for only $k$ out of $m$ endpoints. Unlike the Bonferroni and Sidak corrections, which assume that all $m$ tests contribute equally to the overall Type I error rate, the *k*-adjustment directly incorporates the number of endpoints ($k$) required for equivalence into the adjustment. The adjusted significance level is calculated as:
136+
137+
$$\alpha_k= \frac{k*\alpha}{m}$$
138+
139+
where $k$ is the number of endpoints required for equivalence, and $m$ is the total number of endpoints evaluated.
140+
141+
### Sequential adjustment
142+
In this approach, the user specifies which endpoints are primary and which are secondary using the `type_y` vector parameter. Tests are then performed sequentially, starting with the primary endpoints. If the tests on the primary endpoints are accepted, the procedure proceeds to testing the secondary endpoints.
143+
144+
The significance level ($\alpha$) is adjusted separately for each group of endpoints:
145+
146+
* **Primary Endpoints**: A Bonferroni adjustment is applied based on the number of primary endpoints.
147+
* **Secondary Endpoints**: If the primary endpoints meet the equivalence criteria, the secondary endpoints are tested. These are also adjusted using the Bonferroni method, based on the number of secondary endpoints.
148+
149+
This sequential adjustment ensures that the Type I error is controlled while prioritizing the evaluation of primary endpoints before moving on to secondary ones.
121150

122151

123152

vignettes/sampleSize_parallel_2A3E.Rmd

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ kableExtra::kable_styling(kableExtra::kable(data,
5252

5353
In the sections below, we explore various strategies for determining the sample size required for a parallel trial to demonstrate equivalence across the three co-primary endpoints. These strategies are based on the Ratio of Means (ROM) approach, with equivalence bounds set between 80\% and 125\%.
5454

55-
# Independent Testing of PK Measures
55+
# Independent Testing of Co-Primary Endpoints
5656
A conservative approach to sample size calculation involves testing each pharmacokinetic (PK) measure independently. This method assumes that the endpoints are uncorrelated and that equivalence must be demonstrated for each endpoint separately. Consequently, the overall sample size required for the trial is the sum of the sample sizes for each PK measure.
5757

5858
```{r}
@@ -104,7 +104,7 @@ library(SimTOST)
104104

105105
If we were to test each PK measure independently, we would find a total sample size of `r sim_AUCinf$response$n_total` for AUCinf, `r sim_AUClast$response$n_total` for AUClast, and `r sim_Cmax$response$n_total` for Cmax. This means that we would have to enroll `r sim_AUCinf$response$n_total` + `r sim_AUClast$response$n_total` + `r sim_Cmax$response$n_total` = `r sim_AUCinf$response$n_total + sim_AUClast$response$n_total + sim_Cmax$response$n_total` patients in order to reject $H_0$ at a significance level of 5\%. For context, the original trial was a randomized, single-blind, three-arm, parallel-group study conducted in 159 healthy subjects, slightly more than the `r sim_AUCinf$response$n_total + sim_AUClast$response$n_total + sim_Cmax$response$n_total` patients estimated as necessary. This suggests that the original trial had a small buffer above the calculated sample size requirements.
106106

107-
# Simultaneous Testing of PK Measures with Independent Endpoints
107+
# Simultaneous Testing of Independent Co-Primary Endpoints
108108
This approach focuses on simultaneous testing of pharmacokinetic (PK) measures while assuming independence between endpoints. Unlike the previous approach, which evaluated each PK measure independently, this method integrates comparisons across multiple endpoints, accounting for correlations (or lack thereof) between them. By doing so, it enables simultaneous testing for equivalence without inflating the overall Type I error rate.
109109

110110
## Key Assumptions
@@ -172,7 +172,7 @@ We can inspect more detailed sample size requirements as follows:
172172
N_ss$response
173173
```
174174

175-
# Simultaneous Testing of PK Measures with Correlated Endpoints
175+
# Simultaneous Testing of Correlated Co-Primary Endpoints
176176

177177
Incorporating the correlation among endpoints into power and sample size calculations for co-primary continuous endpoints offers significant advantages. [@sozu_sample_2015] Without accounting for correlation, adding more endpoints typically reduces the power. However, by including positive correlations in the calculations, power can be increased, and required sample sizes may be reduced.
178178

@@ -198,7 +198,7 @@ If correlations differ between endpoints, they can be specified individually usi
198198
seed = 1234))
199199
```
200200

201-
Referring to the output above, the required sample size for this setting is `r N_mult_corr$response$n_total`. This is `r N_ss$response$n_SB2 - N_mult_corr$response$n_SB2` fewer patients than the scenario where the endpoints are assumed to be uncorrelated.
201+
Referring to the output above, the required sample size for this setting is `r N_mult_corr$response$n_total`. This is `r N_ss$response$n_total - N_mult_corr$response$n_total` fewer patients than the scenario where the endpoints are assumed to be uncorrelated.
202202

203203

204204

0 commit comments

Comments
 (0)