Clarify multiple testing issues

NightlordTW · NightlordTW · commit 9e7f99a95ba5 · 2025-01-09T15:39:15.000+01:00
diff --git a/vignettes/intopkg.Rmd b/vignettes/intopkg.Rmd
@@ -107,17 +107,46 @@ One possible solution, if the required sample size is not feasible, is to power
 
 ## Testing multiple primary endpoints
 
-When a trial aims to evaluate equivalence for at least $k$ out of $m$ primary endpoints, multiple tests are required. The total number of tests depends on the way equivalence is evaluated. Specifically, if pairwise comparisons are being considered among $m$ endpoints, the number of tests is calculated as:
+When a trial aims to evaluate equivalence for at least $k$ out of $m$ primary endpoints, multiple tests are required.  The total number of tests depends on the approach used to evaluate equivalence. For example, with $m=3$ independent primary endpoints and a significance level of $\alpha = 5\%$, the probability of making a Type I error on at least one hypothesis test is:
 
-$$ k(k-1)/2 \leq m$$
+$$ 1 – (1-\alpha)^m  = 1 - (1-0.05)^3 = 0.1426. $$
+This means that the overall probability of making any false positive error, also known as the **family-wise error rate (FWER)**, increases to approximately 14%. 
 
-A total of 𝑘(𝑘−1)/2≤𝑚 tests are being considered
-The statistical probability of incorrectly rejecting a true H0 will inflate along with the increased number of simultaneously tested hypotheses
-Strategies to control the type I error when evaluating multiple comparisons
-Directly adjust the observed P value for each hypothesis 
-Adjust the cutoff level 𝛼 to reject each hypothesis
+To address this, adjustments to the significance level are necessary for multiple endpoint comparisons. Various methods have been proposed to handle this issue. In the package under consideration, the following approaches are included:
 
 
+### Bonferroni correction
+The most common and easiest procedure for multiplicity adjustment that controls the family-wise error rate (FWER) is the Bonferroni method. Each hypothesis is tested at level
+
+$$\alpha_{bon}= \alpha/m$$
+
+where $m$ is the total number of tests. While simple, this method is conservative, especially when the tests are correlated, as it assumes all tests are independent. In the [sampleSize()](../reference/sampleSize.html) function, the Bonferroni correction can be applied by specifying `adjust = "bon"`.
+
+### Sidak correction
+
+The Sidak correction is an alternative method for controlling the family-wise error rate (FWER) in hypothesis testing. Like the Bonferroni correction, it assumes that the tests are independent. However, the Sidak correction accounts for the joint probability of all tests being non-significant, making it mathematically less conservative than Bonferroni. The adjusted significance level is calculated as:
+
+$$\alpha_{sid}= 1-(1-\alpha)^ {1/m}$$
+
+The Sidak correction can be implemented by specifying `adjust = "sid"` in the [sampleSize()](../reference/sampleSize.html) function.
+
+### K adjustment
+
+This correction explicitly accounts for the scenario where equivalence is required for only $k$ out of $m$ endpoints. Unlike the Bonferroni and Sidak corrections, which assume that all  $m$ tests contribute equally to the overall Type I error rate, the *k*-adjustment directly incorporates the number of endpoints ($k$) required for equivalence into the adjustment. The adjusted significance level is calculated as:
+
+$$\alpha_k= \frac{k*\alpha}{m}$$
+
+where $k$ is the number of endpoints required for equivalence, and $m$ is the total number of endpoints evaluated.
+
+### Sequential adjustment
+In this approach, the user specifies which endpoints are primary and which are secondary using the `type_y` vector parameter. Tests are then performed sequentially, starting with the primary endpoints. If the tests on the primary endpoints are accepted, the procedure proceeds to testing the secondary endpoints.
+
+The significance level ($\alpha$) is adjusted separately for each group of endpoints:
+
+* **Primary Endpoints**: A Bonferroni adjustment is applied based on the number of primary endpoints.
+* **Secondary Endpoints**: If the primary endpoints meet the equivalence criteria, the secondary endpoints are tested. These are also adjusted using the Bonferroni method, based on the number of secondary endpoints.
+
+This sequential adjustment ensures that the Type I error is controlled while prioritizing the evaluation of primary endpoints before moving on to secondary ones.
 
 
 
diff --git a/vignettes/sampleSize_parallel_2A3E.Rmd b/vignettes/sampleSize_parallel_2A3E.Rmd
@@ -52,7 +52,7 @@ kableExtra::kable_styling(kableExtra::kable(data,
 
 In the sections below, we explore various strategies for determining the sample size required for a parallel trial to demonstrate equivalence across the three co-primary endpoints. These strategies are based on the Ratio of Means (ROM) approach, with equivalence bounds set between 80\% and 125\%.
 
-# Independent Testing of PK Measures
+# Independent Testing of Co-Primary Endpoints
 A conservative approach to sample size calculation involves testing each pharmacokinetic (PK) measure independently. This method assumes that the endpoints are uncorrelated and that equivalence must be demonstrated for each endpoint separately. Consequently, the overall sample size required for the trial is the sum of the sample sizes for each PK measure.
 
 ```{r}
@@ -104,7 +104,7 @@ library(SimTOST)
 
 If we were to test each PK measure independently, we would find a total sample size of `r sim_AUCinf$response$n_total` for AUCinf, `r sim_AUClast$response$n_total` for AUClast, and `r sim_Cmax$response$n_total` for Cmax. This means that we would have to enroll `r sim_AUCinf$response$n_total` + `r sim_AUClast$response$n_total` + `r sim_Cmax$response$n_total` = `r sim_AUCinf$response$n_total + sim_AUClast$response$n_total + sim_Cmax$response$n_total` patients in order to reject $H_0$ at a significance level of 5\%. For context, the original trial was a randomized, single-blind, three-arm, parallel-group study conducted in 159 healthy subjects, slightly more than the `r sim_AUCinf$response$n_total + sim_AUClast$response$n_total + sim_Cmax$response$n_total` patients estimated as necessary. This suggests that the original trial had a small buffer above the calculated sample size requirements.
 
-# Simultaneous Testing of PK Measures with Independent Endpoints
+# Simultaneous Testing of Independent Co-Primary Endpoints
 This approach focuses on simultaneous testing of pharmacokinetic (PK) measures while assuming independence between endpoints. Unlike the previous approach, which evaluated each PK measure independently, this method integrates comparisons across multiple endpoints, accounting for correlations (or lack thereof) between them. By doing so, it enables simultaneous testing for equivalence without inflating the overall Type I error rate.
 
 ## Key Assumptions
@@ -172,7 +172,7 @@ We can inspect more detailed sample size requirements as follows:
 N_ss$response
 ```
 
-# Simultaneous Testing of PK Measures with Correlated Endpoints
+# Simultaneous Testing of Correlated Co-Primary Endpoints
 
 Incorporating the correlation among endpoints into power and sample size calculations for co-primary continuous endpoints offers significant advantages. [@sozu_sample_2015] Without accounting for correlation, adding more endpoints typically reduces the power. However, by including positive correlations in the calculations, power can be increased, and required sample sizes may be reduced.
 
@@ -198,7 +198,7 @@ If correlations differ between endpoints, they can be specified individually usi
                            seed = 1234))
 ```
 
-Referring to the output above, the required sample size for this setting is `r N_mult_corr$response$n_total`. This is `r N_ss$response$n_SB2 - N_mult_corr$response$n_SB2` fewer patients than the scenario where the endpoints are assumed to be uncorrelated.
+Referring to the output above, the required sample size for this setting is `r N_mult_corr$response$n_total`. This is `r N_ss$response$n_total - N_mult_corr$response$n_total` fewer patients than the scenario where the endpoints are assumed to be uncorrelated.