Practical: Population-Adjusted Indirect Comparisons with outstandR
+
+
+
+
+
+
+
+
Author
+
+
University College London (UCL)
+
+
+
+
+
Published
+
+
May 19, 2025
+
+
+
+
+
+
+
+
+
+
+
+
+
Introduction
+
This practical session investigates the use of population-adjusted indirect treatment comparisons (ITCs).
+
When we want to compare two treatments, say A and B, we ideally use a head-to-head randomized controlled trial (RCT). However, such trials are not always available. Instead, we might have:
+
+
An RCT comparing A to a common comparator C (the AC trial), for which we have Individual Patient Data (IPD).
+
An RCT comparing B to the same common comparator C (the BC trial), for which we only have Aggregate Level Data (ALD), like summary statistics from a publication.
+
+
If the patient populations in the AC and BC trials differ in characteristics that modify the treatment effect (effect modifiers), a simple indirect comparison (A vs C minus B vs C) can be misleading. Population adjustment methods aim to correct for these differences, providing a more valid comparison of A vs B in a chosen target population (often the population of the BC trial).
+
We will use our {outstandR} R package which provides a suite of tools to perform these adjustments. In this practical, we will:
+
+
Simulate IPD and ALD for both binary and continuous outcomes.
+
Use {outstandR} to apply methods like Matching-Adjusted Indirect Comparison (MAIC) and G-computation.
+
Explore how to change the outcome scale for reporting.
+
Interpret the basic output from {outstandR}.
+
+
Learning Objectives: By the end of this practical, you will be able to:
+
+
Understand the scenario requiring population adjustment.
+
Prepare IPD and ALD in the format required by {outstandR}.
+
Apply MAIC and G-computation methods for binary and continuous outcomes.
+
Interpret and report results on different scales.
+
+
+
+
Part 0: Setup and Package Loading
+
First, we need to load the necessary R packages. If you haven’t installed them, you’ll need to do so. We have created the simcovariates package to use here for data generation which you’ll need to install from GitHub. The outstandR package will also need to be installed.
+
+
+Show/hide code
+
# Ensure packages are installed:
+#
+# install.packages(c("dplyr", "tidyr", "boot", "copula", "rstanarm", "remotes"))
+#
+# remotes::install_github("n8thangreen/simcovariates") # For gen_data
+# remotes::install_github("StatisticsHealthEconomics/outstandR")
+
+library(outstandR)
+library(simcovariates) # For gen_data
+library(dplyr)
+library(tidyr)
+# library(rstanarm) # Loaded by outstandR if/when needed for Bayesian G-comp
+# library(boot) # Loaded by outstandR if/when needed for MAIC
+
+# For reproducibility of simulated data
+set.seed(123)
+
+
+
+
+
Part 1: Data Simulation & Preparation - Binary Outcomes
+
We’ll start with a scenario involving a binary outcome (e.g., treatment response: yes/no).
+
+
1.1 Simulation Parameters
+
We use parameters similar to the {outstandR} vignette to define our simulation. These control sample size, treatment effects, covariate effects, and population characteristics.
+
+
+Show/hide code
+
N <-200# Sample size per trial
+
+# Active treatment vs. placebo allocation ratio (2:1 implies ~2/3 on active)
+allocation <-2/3
+
+# Conditional log-OR for active treatment vs. common comparator C
+b_trt <-log(0.17)
+
+# Conditional log-OR for each unit increase in prognostic variables (X3, X4)
+b_X <--log(0.5)
+
+# Conditional log-OR for interaction term (treatment * effect modifier) for X1, X2
+b_EM <--log(0.67)
+
+# Mean of prognostic factors (X3, X4) in AC trial
+meanX_AC <-c(0.45, 0.45)
+
+# Mean of prognostic factors (X3, X4) in BC trial (DIFFERENT from AC)
+meanX_BC <-c(0.6, 0.6)
+meanX_EM_AC <-c(0.45, 0.45) # Mean of effect modifiers (X1, X2) in AC trial
+
+# Mean of effect modifiers (X1, X2) in BC trial (DIFFERENT from AC)
+meanX_EM_BC <-c(0.6, 0.6)
+sdX <-c(0.4, 0.4) # Standard deviation of prognostic factors
+sdX_EM <-c(0.4, 0.4) # Standard deviation of effect modifiers
+corX <-0.2# Covariate correlation coefficient
+b_0 <--0.6# Baseline intercept coefficient on logit scale
+
+
+
+
+
+
+
+
+Note
+
+
+
+
Effect Modifiers vs. Prognostic Variables:
+- Prognostic variables (X3, X4 here) predict the outcome regardless of treatment. - Effect modifiers (X1, X2 here) change the magnitude or direction of the treatment effect. Differences in the distribution of effect modifiers between trials are a key reason for population adjustment.
+
+
+
+
+
1.2 Generate IPD for AC Trial (Binary Outcome)
+
We simulate Individual Patient Data (IPD) for a trial comparing treatments A and C.
+
+
+Show/hide code
+
ipd_trial_bin <-gen_data(N,
+ b_trt,
+ b_X,
+ b_EM,
+b_0 = b_0,
+ meanX_AC,
+ sdX,
+ meanX_EM_AC,
+ sdX_EM,
+ corX,
+ allocation,
+family =binomial("logit"))
+
+# Treatment 'trt' is 0 or 1
+# We map 0 to 'C' (comparator) and 1 to 'A' (new treatment)
+ipd_trial_bin$trt <-factor(ipd_trial_bin$trt, labels =c("C", "A"))
+
+
+
Lets look at the generated data.
+
+
+Show/hide code
+
head(ipd_trial_bin)
+
+
+
X1 X2 X3 X4 trt y
+1 0.420906647 0.6501898 0.6817174 0.61434770 A 0
+2 0.009771062 1.0476893 0.9321016 0.04336125 A 0
+3 0.086942077 -0.4289788 0.3807218 0.18401299 A 0
+4 -0.039661515 0.7256527 0.4987618 0.54389751 A 1
+5 0.585786267 0.2143042 0.2207665 0.64831303 A 0
+6 0.600816955 -0.3921163 -0.3156147 0.17139023 A 0
+
+
+Show/hide code
+
summary(ipd_trial_bin)
+
+
+
X1 X2 X3 X4
+ Min. :-0.7022 Min. :-0.7504 Min. :-0.7311 Min. :-0.6038
+ 1st Qu.: 0.1864 1st Qu.: 0.2158 1st Qu.: 0.1819 1st Qu.: 0.1820
+ Median : 0.4603 Median : 0.5231 Median : 0.4557 Median : 0.4032
+ Mean : 0.4636 Mean : 0.4723 Mean : 0.4390 Mean : 0.4289
+ 3rd Qu.: 0.7043 3rd Qu.: 0.7245 3rd Qu.: 0.7171 3rd Qu.: 0.6886
+ Max. : 1.5292 Max. : 1.5804 Max. : 1.6463 Max. : 1.3328
+ trt y
+ C: 67 Min. :0.00
+ A:133 1st Qu.:0.00
+ Median :0.00
+ Mean :0.32
+ 3rd Qu.:1.00
+ Max. :1.00
+
+
+
The ipd_trial_bin dataframe contains patient-level data: covariates (X1-X4), treatment assignment (trt), and outcome (y).
+
+
+
1.3 Generate ALD for BC Trial (Binary Outcome)
+
For the BC trial (comparing B vs C), we only have Aggregate Level Data (ALD). We first simulate IPD for BC and then summarize it. The key here is that meanX_BC and meanX_EM_BC are different from the AC trial, creating a population imbalance.
+
+
+Show/hide code
+
# Simulate IPD for BC trial (using BC trial's covariate means)
+BC_IPD_bin <-gen_data(N,
+ b_trt,
+ b_X,
+ b_EM,
+ b_0,
+ meanX_BC, # Using BC means
+ sdX,
+ meanX_EM_BC, # Using BC means
+ sdX_EM,
+ corX,
+ allocation,
+family =binomial("logit"))
+
+BC_IPD_bin$trt <-factor(BC_IPD_bin$trt, labels =c("C", "B")) # 0=C, 1=B
+
+# Now, aggregate BC_IPD_bin to create ald_trial_bin
+# This mimics having only published summary statistics.
+
+# Covariate summaries (mean, sd for X1-X4,
+# assumed same across arms in BC trial for simplicity)
+cov_summary_bin <- BC_IPD_bin %>%
+select(X1, X2, X3, X4) %>%# Select covariate columns
+summarise(across(everything(), list(mean = mean, sd = sd))) %>%
+pivot_longer(everything(), names_to ="stat_var", values_to ="value") %>%
+# 'stat_var' will be like "X1_mean", "X1_sd". We need to separate these.
+separate(stat_var, into =c("variable", "statistic"), sep ="_") %>%
+# Covariate summaries are often reported for the overall trial population
+mutate(trt =NA_character_)
+
+# Outcome summaries (number of events 'sum',
+# mean proportion 'mean', sample size 'N' for y by trt)
+outcome_summary_bin <- BC_IPD_bin %>%
+group_by(trt) %>%
+summarise(
+sum_y =sum(y), # Number of events
+mean_y =mean(y), # Proportion of events
+N =n() # Sample size in this arm
+ ) %>%
+ungroup() %>%
+pivot_longer(cols =-trt, names_to ="stat_var", values_to ="value") %>%
+# 'stat_var' will be "sum_y", "mean_y", "N". We need to parse this.
+mutate(
+variable =case_when(
+grepl("_y$", stat_var) ~"y", # If it ends with _y, variable is y
+ stat_var =="N"~NA_character_, # For N, variable can be NA
+TRUE~ stat_var # Default
+ ),
+statistic =case_when(
+grepl("sum_", stat_var) ~"sum",
+grepl("mean_", stat_var) ~"mean",
+ stat_var =="N"~"N",
+TRUE~ stat_var # Default
+ )
+ ) %>%
+select(variable, statistic, value, trt)
+
+
+# Combine covariate and outcome summaries for the final ALD structure
+ald_trial_bin <-bind_rows(cov_summary_bin, outcome_summary_bin) %>%
+select(variable, statistic, value, trt)
+
+
+
Viewing the data,
+
+
+Show/hide code
+
print(as.data.frame(ald_trial_bin))
+
+
+
variable statistic value trt
+1 X1 mean 0.5961081 <NA>
+2 X1 sd 0.4015645 <NA>
+3 X2 mean 0.5779233 <NA>
+4 X2 sd 0.3895705 <NA>
+5 X3 mean 0.5799632 <NA>
+6 X3 sd 0.3981054 <NA>
+7 X4 mean 0.5944841 <NA>
+8 X4 sd 0.4316603 <NA>
+9 y sum 28.0000000 C
+10 y mean 0.4179104 C
+11 <NA> N 67.0000000 C
+12 y sum 32.0000000 B
+13 y mean 0.2406015 B
+14 <NA> N 133.0000000 B
+
+
+
The ald_trial_bin is in a ‘long’ format with columns: variable (e.g., “X1”, “y”), statistic (e.g., “mean”, “sd”, “sum”, “N”), value, and trt (treatment arm, or NA if overall). This is the format {outstandR} expects.
+
+
+
+
Part 2: Model Fitting - Binary Outcomes
+
Now we use {outstandR} to perform population adjustments. We’ll compare treatment A (from AC trial IPD) with treatment B (from BC trial ALD), using C as the common anchor. The target population for comparison will be the BC trial population.
+
+
2.1 Define the Model Formula
+
The model formula specifies the relationship between the outcome (y), prognostic variables (X3, X4), treatment (trt), and effect modifiers (X1, X2). For a binary outcome with a logit link, the model is:
MAIC reweights the IPD from the AC trial so that the mean covariate values of the effect modifiers match those of the BC trial population.
+
+
+Show/hide code
+
# MAIC involves bootstrapping, which can take a moment.
+# The number of bootstrap replicates can sometimes be
+# controlled in strategy_maic() for speed,
+# e.g. n_boot = 100 for a quick check, but higher
+# (e.g., 1000) is better for stable results.
+# We'll use the default for now.
+
+out_maic_bin <-outstandR(
+ipd_trial = ipd_trial_bin,
+ald_trial = ald_trial_bin,
+strategy =strategy_maic(
+formula = lin_form_bin,
+family =binomial(link ="logit")
+# If your package allows, you might add:
+# , n_boot = 200 # for faster demo
+ )
+)
+
+
+
The MAIC results (default: Log-Odds Ratio scale):
+
+
+Show/hide code
+
print(out_maic_bin)
+
+
+
Object of class 'outstandR'
+Model: binomial
+Scale: log_odds
+Common treatment: C
+Individual patient data study: AC
+Aggregate level data study: BC
+Confidence interval level: 0.95
+
+Contrasts:
+
+# A tibble: 3 × 5
+ Treatments Estimate Std.Error lower.0.95 upper.0.95
+ <chr> <dbl> <dbl> <dbl> <dbl>
+1 AB -0.0495 0.226 -0.981 0.882
+2 AC -0.868 0.123 -1.56 -0.179
+3 BC -0.818 0.103 -1.45 -0.191
+
+Absolute:
+
+# A tibble: 2 × 5
+ Treatments Estimate Std.Error lower.0.95 upper.0.95
+ <chr> <dbl> <dbl> <lgl> <lgl>
+1 A 0.266 0.00183 NA NA
+2 C 0.461 0.00431 NA NA
+
+
+
The output provides contrasts (e.g., A vs B) and absolute_effects in the target (BC) population. By default, for binomial(link="logit"), the effect measure is the log-odds ratio.
+
+
+
2.3 Changing the Outcome Scale (MAIC Example)
+
Often, we want results on a different scale, like log-relative risk or risk difference. The scale argument in outstandR() allows this.
Object of class 'outstandR'
+Model: binomial
+Scale: risk_difference
+Common treatment: C
+Individual patient data study: AC
+Aggregate level data study: BC
+Confidence interval level: 0.95
+
+Contrasts:
+
+# A tibble: 3 × 5
+ Treatments Estimate Std.Error lower.0.95 upper.0.95
+ <chr> <dbl> <dbl> <dbl> <dbl>
+1 AB -0.0165 0.433 -1.31 1.27
+2 AC -0.194 0.00684 -0.356 -0.0317
+3 BC -0.177 0.426 -1.46 1.10
+
+Absolute:
+
+# A tibble: 2 × 5
+ Treatments Estimate Std.Error lower.0.95 upper.0.95
+ <chr> <dbl> <dbl> <lgl> <lgl>
+1 A 0.266 0.00192 NA NA
+2 C 0.460 0.00486 NA NA
+
+
+
+
+
2.4 Parametric G-computation with Maximum Likelihood (G-comp ML)
+
G-computation fits an outcome regression model to the IPD (AC trial) and then uses this model to predict outcomes for each patient as if they had received treatment A and as if they had received treatment C, but standardized to the covariate distribution of the target (BC) population.
Object of class 'outstandR'
+Model: binomial
+Scale: log_odds
+Common treatment: C
+Individual patient data study: AC
+Aggregate level data study: BC
+Confidence interval level: 0.95
+
+Contrasts:
+
+# A tibble: 3 × 5
+ Treatments Estimate Std.Error lower.0.95 upper.0.95
+ <chr> <dbl> <dbl> <dbl> <dbl>
+1 AB -0.0774 0.230 -1.02 0.863
+2 AC -0.895 0.128 -1.60 -0.195
+3 BC -0.818 0.103 -1.45 -0.191
+
+Absolute:
+
+# A tibble: 2 × 5
+ Treatments Estimate Std.Error lower.0.95 upper.0.95
+ <chr> <dbl> <dbl> <lgl> <lgl>
+1 A 0.273 0.00224 NA NA
+2 C 0.477 0.00449 NA NA
+
+
+
+
+
+
Part 3: Adapting for Continuous Outcomes
+
What if our outcome is continuous, like change in blood pressure or a quality-of-life score? The principles are similar, but we need to adjust the data generation and model specification.
+
+
3.1 Simulate Continuous Data
+
We’ll use family = gaussian("identity") for the gen_data function. We might also adjust some coefficients to be more sensible for a continuous scale.
+
+
+Show/hide code
+
# Adjust some parameters for a continuous outcome
+b_0_cont <-5# Intercept on the continuous scale
+b_trt_cont <--1.5# Mean difference for treatment A vs C
+b_X_cont <-0.5# Effect of prognostic vars on continuous outcome
+
+# Effect of effect modifiers on treatment effect (continuous)
+b_EM_cont <-0.3
X1 X2 X3 X4 trt y
+1 0.3347920 1.25173997 0.8251443 0.3626829 A 5.287769
+2 1.3518916 0.34102429 0.7105816 0.2199213 A 4.530242
+3 0.8390412 0.15676647 0.6917419 0.3092831 A 3.491232
+4 0.8418207 -0.17637220 -0.2343619 -0.5699550 A 5.199625
+5 0.5758347 0.06594836 0.2838801 0.1641963 A 4.366868
+6 -0.3415192 0.58383236 0.4612317 0.1143282 A 3.391340
+
+
+Show/hide code
+
summary(ipd_trial_cont$y)
+
+
+
Min. 1st Qu. Median Mean 3rd Qu. Max.
+ 1.827 3.717 4.536 4.503 5.267 8.399
+
+
+
+
+
3.1.2 ALD for BC Trial (Continuous)
+
+
+Show/hide code
+
BC_IPD_cont <-gen_data(N,
+ b_trt_cont,
+ b_X_cont,
+ b_EM_cont,
+ b_0_cont,
+ meanX_BC, # Using BC means
+ sdX,
+ meanX_EM_BC, # Using BC means
+ sdX_EM,
+ corX,
+ allocation,
+family =gaussian("identity")) # Key change!
+
+BC_IPD_cont$trt <-factor(BC_IPD_cont$trt, labels =c("C", "B"))
+
+# Aggregate BC_IPD_cont for ALD
+# Covariate summaries structure remains the same
+cov_summary_cont <- BC_IPD_cont %>%
+select(X1, X2, X3, X4) %>%
+summarise(across(everything(), list(mean = mean, sd = sd))) %>%
+pivot_longer(everything(), names_to ="stat_var", values_to ="value") %>%
+separate(stat_var, into =c("variable", "statistic"), sep ="_") %>%
+mutate(trt =NA_character_)
+
+# Outcome summaries for continuous data: mean, sd, N for y by trt
+outcome_summary_cont <- BC_IPD_cont %>%
+group_by(trt) %>%
+summarise(
+mean_y =mean(y), # Mean outcome
+sd_y =sd(y), # Standard deviation of outcome
+N =n() # Sample size
+ ) %>%
+ungroup() %>%
+pivot_longer(cols =-trt, names_to ="stat_var", values_to ="value") %>%
+mutate(
+variable =case_when(
+grepl("_y$", stat_var) ~"y",
+ stat_var =="N"~NA_character_,
+TRUE~ stat_var
+ ),
+statistic =case_when(
+grepl("mean_", stat_var) ~"mean",
+grepl("sd_", stat_var) ~"sd", # Changed from sum to sd
+ stat_var =="N"~"N",
+TRUE~ stat_var
+ )
+ ) %>%
+select(variable, statistic, value, trt)
+
+ald_trial_cont <-bind_rows(cov_summary_cont, outcome_summary_cont) %>%
+select(variable, statistic, value, trt)
+
+print(as.data.frame(ald_trial_cont))
+
+
+
variable statistic value trt
+1 X1 mean 0.5941535 <NA>
+2 X1 sd 0.3856699 <NA>
+3 X2 mean 0.5695096 <NA>
+4 X2 sd 0.4234775 <NA>
+5 X3 mean 0.5642288 <NA>
+6 X3 sd 0.3971081 <NA>
+7 X4 mean 0.5739154 <NA>
+8 X4 sd 0.3957993 <NA>
+9 y mean 5.3318057 C
+10 y sd 0.8647773 C
+11 <NA> N 67.0000000 C
+12 y mean 4.5914634 B
+13 y sd 0.9994386 B
+14 <NA> N 133.0000000 B
+
+
+
+
+
+
3.2 Model Fitting for Continuous Outcomes
+
The model formula structure can remain the same if we assume linear relationships. The key change is in the family argument of the strategy function.
out_gcomp_ml_cont <-outstandR(
+ipd_trial = ipd_trial_cont,
+ald_trial = ald_trial_cont,
+strategy =strategy_gcomp_ml(
+formula = lin_form_cont,
+family =gaussian(link ="identity") # Key change!
+ )
+# For Gaussian family, the default scale is typically
+# "mean_difference", # which is often what we want.
+# We could explicitly state: scale = "mean_difference"
+)
+
+
+
+
+Show/hide code
+
print(out_gcomp_ml_cont)
+
+
+
Object of class 'outstandR'
+Model: gaussian
+Scale: mean_difference
+Common treatment: C
+Individual patient data study: AC
+Aggregate level data study: BC
+Confidence interval level: 0.95
+
+Contrasts:
+
+# A tibble: 3 × 5
+ Treatments Estimate Std.Error lower.0.95 upper.0.95
+ <chr> <dbl> <dbl> <dbl> <dbl>
+1 AB -0.557 0.0472 -0.982 -0.131
+2 AC -1.30 0.0285 -1.63 -0.966
+3 BC -0.740 0.0187 -1.01 -0.473
+
+Absolute:
+
+# A tibble: 2 × 5
+ Treatments Estimate Std.Error lower.0.95 upper.0.95
+ <chr> <dbl> <dbl> <lgl> <lgl>
+1 A 4.28 0.00703 NA NA
+2 C 5.58 0.0215 NA NA
+
+
+
+
+
+
+
+
+Tip
+
+
+
+
Your Turn! Try applying MAIC to the continuous outcome data. 1. Use family = gaussian(link = "identity") within strategy_maic(). 2. What scale would be appropriate if not the default? (e.g., "mean_difference")
Bayesian G-computation (G-comp Bayes): Similar to G-comp ML but uses Bayesian methods (e.g., MCMC via rstanarm), which can better propagate uncertainty but is computationally more intensive.
+
+
+
+Show/hide code
+
# This would require rstanarm and can be slow.
+out_gcomp_stan_bin <-outstandR(
+ipd_trial = ipd_trial_bin,
+ald_trial = ald_trial_bin,
+strategy =strategy_gcomp_stan(
+formula = lin_form_bin,
+family =binomial(link ="logit")
+# For a faster demo if options are passed through:
+# stan_args = list(iter = 500, chains = 2, refresh = 0)
+ )
+)
+print(out_gcomp_stan_bin)
+
+
+
+
Multiple Imputation Marginalisation (MIM): Another approach for marginalization.
Let’s briefly revisit one of the binary outcome results to understand the structure of the {outstandR} output.
+
+
+Show/hide code
+
str(out_maic_bin)
+
+
+
List of 2
+ $ contrasts:List of 3
+ ..$ means :List of 3
+ .. ..$ AB: num -0.0495
+ .. ..$ AC: num -0.868
+ .. ..$ BC: num -0.818
+ ..$ variances:List of 3
+ .. ..$ AB: num 0.226
+ .. ..$ AC: num 0.123
+ .. ..$ BC: num 0.103
+ ..$ CI :List of 3
+ .. ..$ AB: num [1:2] -0.981 0.882
+ .. ..$ AC: num [1:2] -1.556 -0.179
+ .. ..$ BC: num [1:2] -1.446 -0.191
+ $ absolute :List of 2
+ ..$ means :List of 2
+ .. ..$ A: Named num 0.266
+ .. .. ..- attr(*, "names")= chr "mean_A"
+ .. ..$ C: Named num 0.461
+ .. .. ..- attr(*, "names")= chr "mean_C"
+ ..$ variances:List of 2
+ .. ..$ A: Named num 0.00183
+ .. .. ..- attr(*, "names")= chr "mean_A"
+ .. ..$ C: Named num 0.00431
+ .. .. ..- attr(*, "names")= chr "mean_C"
+ - attr(*, "CI")= num 0.95
+ - attr(*, "ref_trt")= chr "C"
+ - attr(*, "scale")= chr "log_odds"
+ - attr(*, "model")= chr "binomial"
+ - attr(*, "class")= chr [1:2] "outstandR" "list"
+
+
+
The output object (here out_maic_bin) is a list containing:
+
+
$contrasts: This list provides the estimated treatment effects (e.g., mean difference, log-OR), their variances, and confidence intervals for each pairwise comparison, adjusted to the target population (BC trial).
+
$contrasts$means$AB: The estimated effect of A versus B. This is often the primary interest.
+
$contrasts$means$AC: The estimated effect of A versus C.
+
$contrasts$means$BC: The estimated effect of B versus C (usually derived directly from the ALD).
+
$absolute_effects: This list provides the estimated mean outcome for each treatment (A, B, C) in the target population. This can be useful for understanding the baseline and treated outcomes.
+
+
For example, to extract the estimated log-odds ratio for A vs. B and its variance:
+
+
+Show/hide code
+
log_or_AB <- out_maic_bin$contrasts$means$AB
+variance_log_or_AB <- out_maic_bin$contrasts$variances$AB
+
+cat(paste("Estimated Log-OR for A vs. B:", round(log_or_AB, 3), "\n"))
+
+
+
Estimated Log-OR for A vs. B: -0.05
+
+
+Show/hide code
+
cat(paste("Variance of Log-OR for A vs. B:", round(variance_log_or_AB, 3), "\n"))
+
+
+
Variance of Log-OR for A vs. B: 0.226
+
+
+
The vignette for {outstandR} (which this practical is based on) shows how to combine results from multiple methods into tables and forest plots for a comprehensive comparison. This is highly recommended for actual analyses.
+
+
Key Takeaways
+
+
Population adjustment is crucial when comparing treatments indirectly using IPD and ALD from trials with different patient characteristics (especially different distributions of effect modifiers).
+
The {outstandR} package provides a unified interface (outstandR() function) to apply various adjustment methods.
+
You need to:
+
+
Prepare your IPD (for the “anchor” trial, e.g., AC) and ALD (for the “comparator” trial, e.g., BC, which also serves as the target population).
+
Define an appropriate model formula.
+
Choose a strategy_*() function corresponding to the desired adjustment method (MAIC, STC, G-comp, etc.).
+
Specify the outcome family (e.g., binomial(), gaussian()) within the strategy.
+
Optionally, use the scale argument in outstandR() to transform results to a desired effect measure scale.
+
+
The methods can be adapted for different outcome types (binary, continuous, count, time-to-event, though we only covered binary and continuous here).
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/scripts/exercises.pdf b/scripts/exercises.pdf
new file mode 100644
index 0000000..bc82a12
Binary files /dev/null and b/scripts/exercises.pdf differ
diff --git a/scripts/exercises.qmd b/scripts/exercises.qmd
new file mode 100644
index 0000000..3269686
--- /dev/null
+++ b/scripts/exercises.qmd
@@ -0,0 +1,597 @@
+---
+title: "Practical: Population-Adjusted Indirect Comparisons with `outstandR`"
+author: "University College London (UCL)"
+date: "today"
+format:
+ html:
+ toc: true
+code-fold: true
+code-summary: "Show/hide code"
+theme: cosmo
+self-contained: true
+editor_options:
+ chunk_output_type: console
+execute:
+ echo: true
+warning: false
+message: false
+editor:
+ markdown:
+ wrap: sentence
+---
+
+## Introduction
+
+This practical session investigates the use of population-adjusted indirect treatment comparisons (ITCs).
+
+When we want to compare two treatments, say A and B, we ideally use a head-to-head randomized controlled trial (RCT).
+However, such trials are not always available.
+Instead, we might have:
+
+1. An RCT comparing A to a common comparator C (the AC trial), for which we have **Individual Patient Data (IPD)**.
+2. An RCT comparing B to the same common comparator C (the BC trial), for which we only have **Aggregate Level Data (ALD)**, like summary statistics from a publication.
+
+If the patient populations in the AC and BC trials differ in characteristics that modify the treatment effect (effect modifiers), a simple indirect comparison (A vs C minus B vs C) can be misleading.
+**Population adjustment methods** aim to correct for these differences, providing a more valid comparison of A vs B in a chosen target population (often the population of the BC trial).
+
+We will use our `{outstandR}` R package which provides a suite of tools to perform these adjustments.
+In this practical, we will:
+
+1. Simulate IPD and ALD for both binary and continuous outcomes.
+2. Use `{outstandR}` to apply methods like Matching-Adjusted Indirect Comparison (MAIC) and G-computation.
+3. Explore how to change the outcome scale for reporting.
+4. Interpret the basic output from `{outstandR}`.
+
+**Learning Objectives:** By the end of this practical, you will be able to:
+
+- Understand the scenario requiring population adjustment.
+- Prepare IPD and ALD in the format required by `{outstandR}`.
+- Apply MAIC and G-computation methods for binary and continuous outcomes.
+- Interpret and report results on different scales.
+
+## Part 0: Setup and Package Loading
+
+First, we need to load the necessary R packages.
+If you haven't installed them, you'll need to do so.
+We have created the `simcovariates` package to use here for data generation which you'll need to install from GitHub.
+The `outstandR` package will also need to be installed.
+
+```{r setup}
+# Ensure packages are installed:
+#
+# install.packages(c("dplyr", "tidyr", "boot", "copula", "rstanarm", "remotes"))
+#
+# remotes::install_github("n8thangreen/simcovariates") # For gen_data
+# remotes::install_github("StatisticsHealthEconomics/outstandR")
+
+library(outstandR)
+library(simcovariates) # For gen_data
+library(dplyr)
+library(tidyr)
+# library(rstanarm) # Loaded by outstandR if/when needed for Bayesian G-comp
+# library(boot) # Loaded by outstandR if/when needed for MAIC
+
+# For reproducibility of simulated data
+set.seed(123)
+```
+
+## Part 1: Data Simulation & Preparation - Binary Outcomes
+
+We'll start with a scenario involving a binary outcome (e.g., treatment response: yes/no).
+
+### 1.1 Simulation Parameters
+
+We use parameters similar to the `{outstandR}` vignette to define our simulation.
+These control sample size, treatment effects, covariate effects, and population characteristics.
+
+```{r binary-sim-params}
+N <- 200 # Sample size per trial
+
+# Active treatment vs. placebo allocation ratio (2:1 implies ~2/3 on active)
+allocation <- 2/3
+
+# Conditional log-OR for active treatment vs. common comparator C
+b_trt <- log(0.17)
+
+# Conditional log-OR for each unit increase in prognostic variables (X3, X4)
+b_X <- -log(0.5)
+
+# Conditional log-OR for interaction term (treatment * effect modifier) for X1, X2
+b_EM <- -log(0.67)
+
+# Mean of prognostic factors (X3, X4) in AC trial
+meanX_AC <- c(0.45, 0.45)
+
+# Mean of prognostic factors (X3, X4) in BC trial (DIFFERENT from AC)
+meanX_BC <- c(0.6, 0.6)
+meanX_EM_AC <- c(0.45, 0.45) # Mean of effect modifiers (X1, X2) in AC trial
+
+# Mean of effect modifiers (X1, X2) in BC trial (DIFFERENT from AC)
+meanX_EM_BC <- c(0.6, 0.6)
+sdX <- c(0.4, 0.4) # Standard deviation of prognostic factors
+sdX_EM <- c(0.4, 0.4) # Standard deviation of effect modifiers
+corX <- 0.2 # Covariate correlation coefficient
+b_0 <- -0.6 # Baseline intercept coefficient on logit scale
+```
+
+::: callout-note
+**Effect Modifiers vs. Prognostic Variables:**\
+- **Prognostic variables** (`X3`, `X4` here) predict the outcome regardless of treatment.
+- **Effect modifiers** (`X1`, `X2` here) change the magnitude or direction of the treatment effect.
+Differences in the distribution of effect modifiers between trials are a key reason for population adjustment.
+:::
+
+### 1.2 Generate IPD for AC Trial (Binary Outcome)
+
+We simulate Individual Patient Data (IPD) for a trial comparing treatments A and C.
+
+```{r generate-ipd-binary}
+ipd_trial_bin <- gen_data(N,
+ b_trt,
+ b_X,
+ b_EM,
+ b_0 = b_0,
+ meanX_AC,
+ sdX,
+ meanX_EM_AC,
+ sdX_EM,
+ corX,
+ allocation,
+ family = binomial("logit"))
+
+# Treatment 'trt' is 0 or 1
+# We map 0 to 'C' (comparator) and 1 to 'A' (new treatment)
+ipd_trial_bin$trt <- factor(ipd_trial_bin$trt, labels = c("C", "A"))
+```
+
+Lets look at the generated data.
+
+```{r}
+head(ipd_trial_bin)
+
+summary(ipd_trial_bin)
+```
+
+The `ipd_trial_bin` dataframe contains patient-level data: covariates (`X1`-`X4`), treatment assignment (`trt`), and outcome (`y`).
+
+### 1.3 Generate ALD for BC Trial (Binary Outcome)
+
+For the BC trial (comparing B vs C), we only have Aggregate Level Data (ALD).
+We first simulate IPD for BC and then summarize it.
+The key here is that `meanX_BC` and `meanX_EM_BC` are different from the AC trial, creating a population imbalance.
+
+```{r generate-ald-binary}
+# Simulate IPD for BC trial (using BC trial's covariate means)
+BC_IPD_bin <- gen_data(N,
+ b_trt,
+ b_X,
+ b_EM,
+ b_0,
+ meanX_BC, # Using BC means
+ sdX,
+ meanX_EM_BC, # Using BC means
+ sdX_EM,
+ corX,
+ allocation,
+ family = binomial("logit"))
+
+BC_IPD_bin$trt <- factor(BC_IPD_bin$trt, labels = c("C", "B")) # 0=C, 1=B
+
+# Now, aggregate BC_IPD_bin to create ald_trial_bin
+# This mimics having only published summary statistics.
+
+# Covariate summaries (mean, sd for X1-X4,
+# assumed same across arms in BC trial for simplicity)
+cov_summary_bin <- BC_IPD_bin %>%
+ select(X1, X2, X3, X4) %>% # Select covariate columns
+ summarise(across(everything(), list(mean = mean, sd = sd))) %>%
+ pivot_longer(everything(), names_to = "stat_var", values_to = "value") %>%
+ # 'stat_var' will be like "X1_mean", "X1_sd". We need to separate these.
+ separate(stat_var, into = c("variable", "statistic"), sep = "_") %>%
+ # Covariate summaries are often reported for the overall trial population
+ mutate(trt = NA_character_)
+
+# Outcome summaries (number of events 'sum',
+# mean proportion 'mean', sample size 'N' for y by trt)
+outcome_summary_bin <- BC_IPD_bin %>%
+ group_by(trt) %>%
+ summarise(
+ sum_y = sum(y), # Number of events
+ mean_y = mean(y), # Proportion of events
+ N = n() # Sample size in this arm
+ ) %>%
+ ungroup() %>%
+ pivot_longer(cols = -trt, names_to = "stat_var", values_to = "value") %>%
+ # 'stat_var' will be "sum_y", "mean_y", "N". We need to parse this.
+ mutate(
+ variable = case_when(
+ grepl("_y$", stat_var) ~ "y", # If it ends with _y, variable is y
+ stat_var == "N" ~ NA_character_, # For N, variable can be NA
+ TRUE ~ stat_var # Default
+ ),
+ statistic = case_when(
+ grepl("sum_", stat_var) ~ "sum",
+ grepl("mean_", stat_var) ~ "mean",
+ stat_var == "N" ~ "N",
+ TRUE ~ stat_var # Default
+ )
+ ) %>%
+ select(variable, statistic, value, trt)
+
+
+# Combine covariate and outcome summaries for the final ALD structure
+ald_trial_bin <- bind_rows(cov_summary_bin, outcome_summary_bin) %>%
+ select(variable, statistic, value, trt)
+```
+
+Viewing the data,
+
+```{r}
+print(as.data.frame(ald_trial_bin))
+```
+
+The `ald_trial_bin` is in a 'long' format with columns: `variable` (e.g., "X1", "y"), `statistic` (e.g., "mean", "sd", "sum", "N"), `value`, and `trt` (treatment arm, or NA if overall).
+This is the format `{outstandR}` expects.
+
+## Part 2: Model Fitting - Binary Outcomes
+
+Now we use `{outstandR}` to perform population adjustments.
+We'll compare treatment A (from AC trial IPD) with treatment B (from BC trial ALD), using C as the common anchor.
+The target population for comparison will be the BC trial population.
+
+### 2.1 Define the Model Formula
+
+The model formula specifies the relationship between the outcome (`y`), prognostic variables (`X3`, `X4`), treatment (`trt`), and effect modifiers (`X1`, `X2`).
+For a binary outcome with a logit link, the model is:
+
+$$
+\text{logit}(p_{t}) = \beta_0 + \beta_X (X_3 + X_4) + [\beta_{t} + \beta_{EM} (X_1 + X_2)] \; \text{I}(t \neq C)
+$$
+
+This translates to the R formula: `y ~ X3 + X4 + trt + trt:X1 + trt:X2` (The intercept $\beta_0$ is implicit).
+
+```{r define-formula-binary}
+lin_form_bin <- as.formula("y ~ X3 + X4 + trt + trt:X1 + trt:X2")
+```
+
+### 2.2 Matching-Adjusted Indirect Comparison (MAIC)
+
+MAIC reweights the IPD from the AC trial so that the mean covariate values of the effect modifiers match those of the BC trial population.
+
+```{r run-maic-binary}
+# MAIC involves bootstrapping, which can take a moment.
+# The number of bootstrap replicates can sometimes be
+# controlled in strategy_maic() for speed,
+# e.g. n_boot = 100 for a quick check, but higher
+# (e.g., 1000) is better for stable results.
+# We'll use the default for now.
+
+out_maic_bin <- outstandR(
+ ipd_trial = ipd_trial_bin,
+ ald_trial = ald_trial_bin,
+ strategy = strategy_maic(
+ formula = lin_form_bin,
+ family = binomial(link = "logit")
+ # If your package allows, you might add:
+ # , n_boot = 200 # for faster demo
+ )
+)
+```
+
+The MAIC results (default: Log-Odds Ratio scale):
+
+```{r}
+print(out_maic_bin)
+```
+
+The output provides `contrasts` (e.g., A vs B) and `absolute_effects` in the target (BC) population.
+By default, for `binomial(link="logit")`, the effect measure is the log-odds ratio.
+
+### 2.3 Changing the Outcome Scale (MAIC Example)
+
+Often, we want results on a different scale, like log-relative risk or risk difference.
+The `scale` argument in `outstandR()` allows this.
+
+```{r maic-binary-lrr}
+out_maic_bin_lrr <- outstandR(
+ ipd_trial = ipd_trial_bin,
+ ald_trial = ald_trial_bin,
+ strategy = strategy_maic(
+ formula = lin_form_bin,
+ family = binomial(link = "logit")
+ ),
+ scale = "log_relative_risk" # Key change!
+)
+```
+
+The MAIC results on the log-relative risk scale,
+
+```{r}
+print(out_maic_bin_lrr)
+```
+
+::: callout-tip
+**Your Turn!** Try getting MAIC results on the **risk difference** scale.\
+Hint: `scale = "risk_difference"`.
+:::
+
+```{r maic-binary-rd, eval=TRUE, echo=TRUE}
+out_maic_bin_rd <- outstandR(
+ ipd_trial = ipd_trial_bin,
+ ald_trial = ald_trial_bin,
+ strategy = strategy_maic(
+ formula = lin_form_bin,
+ family = binomial(link = "logit")
+ ),
+ scale = "risk_difference" # Key change!
+)
+```
+
+The MAIC results on the risk difference scale,
+
+```{r, eval=TRUE, echo=TRUE}
+print(out_maic_bin_rd)
+```
+
+### 2.4 Parametric G-computation with Maximum Likelihood (G-comp ML)
+
+G-computation fits an outcome regression model to the IPD (AC trial) and then uses this model to predict outcomes for each patient *as if* they had received treatment A and *as if* they had received treatment C, but standardized to the covariate distribution of the target (BC) population.
+
+```{r run-gcomp-ml-binary}
+out_gcomp_ml_bin <- outstandR(
+ ipd_trial = ipd_trial_bin,
+ ald_trial = ald_trial_bin,
+ strategy = strategy_gcomp_ml(
+ formula = lin_form_bin,
+ family = binomial(link = "logit")
+ )
+)
+```
+
+```{r}
+print(out_gcomp_ml_bin)
+```
+
+## Part 3: Adapting for Continuous Outcomes
+
+What if our outcome is continuous, like change in blood pressure or a quality-of-life score?
+The principles are similar, but we need to adjust the data generation and model specification.
+
+### 3.1 Simulate Continuous Data
+
+We'll use `family = gaussian("identity")` for the `gen_data` function.
+We might also adjust some coefficients to be more sensible for a continuous scale.
+
+```{r continuous-sim-params}
+# Adjust some parameters for a continuous outcome
+b_0_cont <- 5 # Intercept on the continuous scale
+b_trt_cont <- -1.5 # Mean difference for treatment A vs C
+b_X_cont <- 0.5 # Effect of prognostic vars on continuous outcome
+
+# Effect of effect modifiers on treatment effect (continuous)
+b_EM_cont <- 0.3
+```
+
+#### 3.1.1 IPD for AC Trial (Continuous)
+
+```{r generate-ipd-continuous}
+ipd_trial_cont <- gen_data(N,
+ b_trt_cont,
+ b_X_cont,
+ b_EM_cont,
+ b_0_cont,
+ meanX_AC,
+ sdX,
+ meanX_EM_AC,
+ sdX_EM,
+ corX,
+ allocation,
+ family = gaussian("identity")) # Key change!
+
+ipd_trial_cont$trt <- factor(ipd_trial_cont$trt, labels = c("C", "A"))
+
+head(ipd_trial_cont)
+summary(ipd_trial_cont$y)
+```
+
+#### 3.1.2 ALD for BC Trial (Continuous)
+
+```{r generate-ald-continuous}
+BC_IPD_cont <- gen_data(N,
+ b_trt_cont,
+ b_X_cont,
+ b_EM_cont,
+ b_0_cont,
+ meanX_BC, # Using BC means
+ sdX,
+ meanX_EM_BC, # Using BC means
+ sdX_EM,
+ corX,
+ allocation,
+ family = gaussian("identity")) # Key change!
+
+BC_IPD_cont$trt <- factor(BC_IPD_cont$trt, labels = c("C", "B"))
+
+# Aggregate BC_IPD_cont for ALD
+# Covariate summaries structure remains the same
+cov_summary_cont <- BC_IPD_cont %>%
+ select(X1, X2, X3, X4) %>%
+ summarise(across(everything(), list(mean = mean, sd = sd))) %>%
+ pivot_longer(everything(), names_to = "stat_var", values_to = "value") %>%
+ separate(stat_var, into = c("variable", "statistic"), sep = "_") %>%
+ mutate(trt = NA_character_)
+
+# Outcome summaries for continuous data: mean, sd, N for y by trt
+outcome_summary_cont <- BC_IPD_cont %>%
+ group_by(trt) %>%
+ summarise(
+ mean_y = mean(y), # Mean outcome
+ sd_y = sd(y), # Standard deviation of outcome
+ N = n() # Sample size
+ ) %>%
+ ungroup() %>%
+ pivot_longer(cols = -trt, names_to = "stat_var", values_to = "value") %>%
+ mutate(
+ variable = case_when(
+ grepl("_y$", stat_var) ~ "y",
+ stat_var == "N" ~ NA_character_,
+ TRUE ~ stat_var
+ ),
+ statistic = case_when(
+ grepl("mean_", stat_var) ~ "mean",
+ grepl("sd_", stat_var) ~ "sd", # Changed from sum to sd
+ stat_var == "N" ~ "N",
+ TRUE ~ stat_var
+ )
+ ) %>%
+ select(variable, statistic, value, trt)
+
+ald_trial_cont <- bind_rows(cov_summary_cont, outcome_summary_cont) %>%
+ select(variable, statistic, value, trt)
+
+print(as.data.frame(ald_trial_cont))
+```
+
+### 3.2 Model Fitting for Continuous Outcomes
+
+The model formula structure can remain the same if we assume linear relationships.
+The key change is in the `family` argument of the strategy function.
+
+```{r define-formula-continuous}
+lin_form_cont <- as.formula("y ~ X3 + X4 + trt + trt:X1 + trt:X2")
+```
+
+Let's use G-computation ML as an example.
+
+```{r run-gcomp-ml-continuous}
+out_gcomp_ml_cont <- outstandR(
+ ipd_trial = ipd_trial_cont,
+ ald_trial = ald_trial_cont,
+ strategy = strategy_gcomp_ml(
+ formula = lin_form_cont,
+ family = gaussian(link = "identity") # Key change!
+ )
+ # For Gaussian family, the default scale is typically
+ # "mean_difference", # which is often what we want.
+ # We could explicitly state: scale = "mean_difference"
+)
+```
+
+```{r}
+print(out_gcomp_ml_cont)
+```
+
+::: callout-tip
+**Your Turn!** Try applying **MAIC** to the continuous outcome data.
+1.
+Use `family = gaussian(link = "identity")` within `strategy_maic()`.
+2.
+What `scale` would be appropriate if not the default?
+(e.g., `"mean_difference"`)
+
+```{r maic-continuous-solution, eval=FALSE, echo=TRUE}
+# Solution for MAIC with continuous data:
+out_maic_cont <- outstandR(
+ ipd_trial = ipd_trial_cont,
+ ald_trial = ald_trial_cont,
+ strategy = strategy_maic(
+ formula = lin_form_cont,
+ family = gaussian(link = "identity")
+ ),
+ scale = "mean_difference"
+)
+print(out_maic_cont)
+```
+:::
+
+### 2.5 Other Methods
+
+`{outstandR}` supports other methods.
+Here's how you might call them.
+These are set to `eval=FALSE` to save time in this practical.
+
+- **Simulated Treatment Comparison (STC):** A conventional outcome regression approach.
+
+```{r stc-binary-code, eval=FALSE}
+out_stc_bin <- outstandR(
+ ipd_trial = ipd_trial_bin,
+ ald_trial = ald_trial_bin,
+ strategy = strategy_stc(
+ formula = lin_form_bin,
+ family = binomial(link = "logit")
+ )
+)
+print(out_stc_bin)
+```
+
+- **Bayesian G-computation (G-comp Bayes):** Similar to G-comp ML but uses Bayesian methods (e.g., MCMC via `rstanarm`), which can better propagate uncertainty but is computationally more intensive.
+
+```{r gcomp-bayes-binary-code, eval=FALSE}
+# This would require rstanarm and can be slow.
+out_gcomp_stan_bin <- outstandR(
+ ipd_trial = ipd_trial_bin,
+ ald_trial = ald_trial_bin,
+ strategy = strategy_gcomp_stan(
+ formula = lin_form_bin,
+ family = binomial(link = "logit")
+ # For a faster demo if options are passed through:
+ # stan_args = list(iter = 500, chains = 2, refresh = 0)
+ )
+)
+print(out_gcomp_stan_bin)
+```
+
+- **Multiple Imputation Marginalisation (MIM):** Another approach for marginalization.
+
+```{r mim-binary-code, eval=FALSE}
+out_mim_bin <- outstandR(
+ ipd_trial = ipd_trial_bin,
+ ald_trial = ald_trial_bin,
+ strategy = strategy_mim(
+ formula = lin_form_bin,
+ family = binomial(link = "logit")
+ )
+)
+print(out_mim_bin)
+```
+
+## Part 4: Understanding Output & Wrap-up
+
+Let's briefly revisit one of the binary outcome results to understand the structure of the `{outstandR}` output.
+
+```{r revisit-binary-output}
+str(out_maic_bin)
+```
+
+The output object (here `out_maic_bin`) is a list containing:
+
+- `$contrasts`: This list provides the estimated treatment effects (e.g., mean difference, log-OR), their variances, and confidence intervals for each pairwise comparison, adjusted to the target population (BC trial).
+- `$contrasts$means$AB`: The estimated effect of A versus B. This is often the primary interest.
+- `$contrasts$means$AC`: The estimated effect of A versus C.
+- `$contrasts$means$BC`: The estimated effect of B versus C (usually derived directly from the ALD).
+- `$absolute_effects`: This list provides the estimated mean outcome for each treatment (A, B, C) in the target population. This can be useful for understanding the baseline and treated outcomes.
+
+For example, to extract the estimated log-odds ratio for A vs. B and its variance:
+
+```{r extract-results}
+log_or_AB <- out_maic_bin$contrasts$means$AB
+variance_log_or_AB <- out_maic_bin$contrasts$variances$AB
+
+cat(paste("Estimated Log-OR for A vs. B:", round(log_or_AB, 3), "\n"))
+cat(paste("Variance of Log-OR for A vs. B:", round(variance_log_or_AB, 3), "\n"))
+```
+
+The vignette for `{outstandR}` (which this practical is based on) shows how to combine results from multiple methods into tables and forest plots for a comprehensive comparison.
+This is highly recommended for actual analyses.
+
+### Key Takeaways
+
+- Population adjustment is crucial when comparing treatments indirectly using IPD and ALD from trials with different patient characteristics (especially different distributions of effect modifiers).
+- The `{outstandR}` package provides a unified interface (`outstandR()` function) to apply various adjustment methods.
+- You need to:
+ 1. Prepare your IPD (for the "anchor" trial, e.g., AC) and ALD (for the "comparator" trial, e.g., BC, which also serves as the target population).
+ 2. Define an appropriate model `formula`.
+ 3. Choose a `strategy_*()` function corresponding to the desired adjustment method (MAIC, STC, G-comp, etc.).
+ 4. Specify the outcome `family` (e.g., `binomial()`, `gaussian()`) within the strategy.
+ 5. Optionally, use the `scale` argument in `outstandR()` to transform results to a desired effect measure scale.
+- The methods can be adapted for different outcome types (binary, continuous, count, time-to-event, though we only covered binary and continuous here).
diff --git a/vignettes/Binary_data_example.Rmd b/vignettes/Binary_data_example.Rmd
index db0112e..91f196f 100644
--- a/vignettes/Binary_data_example.Rmd
+++ b/vignettes/Binary_data_example.Rmd
@@ -20,59 +20,85 @@ knitr::opts_chunk$set(
## Introduction
-Population adjustment methods such as *matching-adjusted indirect
-comparison* (MAIC) are increasingly used to compare marginal treatment
-effects when there are cross-trial differences in effect modifiers and
-limited patient-level data. MAIC is based on propensity score weighting,
-which is sensitive to poor covariate overlap and cannot extrapolate
-beyond the observed covariate space. Current outcome regression-based
-alternatives can extrapolate but target a conditional treatment effect
-that is incompatible in the indirect comparison. When adjusting for
-covariates, one must integrate or average the conditional estimate over
-the relevant population to recover a compatible marginal treatment
-effect.
-
-We propose a marginalization method based on *parametric G-computation*
-that can be easily applied where the outcome regression is a generalized
-linear model or a Cox model. The approach views the covariate adjustment
-regression as a nuisance model and separates its estimation from the
-evaluation of the marginal treatment effect of interest. The method can
-accommodate a Bayesian statistical framework, which naturally integrates
-the analysis into a probabilistic framework. A simulation study provides
-proof-of-principle and benchmarks the method's performance against MAIC
-and the conventional outcome regression. Parametric G-computation
-achieves more precise and more accurate estimates than MAIC,
-particularly when covariate overlap is poor, and yields unbiased
-marginal treatment effect estimates under no failures of assumptions.
-Furthermore, the marginalized regression-adjusted estimates provide
-greater precision and accuracy than the conditional estimates produced
-by the conventional outcome regression.
+Population adjustment methods are increasingly used to compare marginal
+treatment effects when there are cross-trial differences in effect
+modifiers and limited patient-level data.
+
+The `{outstandR}` package allows the implementation of a range of
+methods for this situation including the following:
+
+- *Matching-Adjusted Indirect Comparison (MAIC)* is based on
+ propensity score weighting, which is sensitive to poor covariate
+ overlap and cannot extrapolate beyond the observed covariate space.
+ It reweights the individual patient-level data (IPD) to match the
+ aggregate characteristics of the comparator trial, thereby aligning
+ the populations.
+
+- *Simulated Treatment Comparison (STC)* relies on outcome regression
+ models fitted to IPD, conditioning on covariates to estimate the
+ effect of treatment. These estimates are then applied to the
+ aggregate-level comparator population. Like MAIC, STC is limited by
+ its conditional nature and can produce biased marginal estimates if
+ not properly marginalized.
+
+- *Parametric G-computation with maximum likelihood*: This method fits
+ an outcome model to the IPD using maximum likelihood estimation,
+ then uses that model to predict outcomes in the comparator
+ population. It allows extrapolation beyond the observed covariate
+ space but requires correct specification of the outcome model to
+ avoid bias.
+
+- *Parametric G-computation with Bayesian inference*: Similar to the
+ maximum likelihood version, this approach fits an outcome model but
+ within a Bayesian framework. It allows coherent propagation of
+ uncertainty through prior distributions and posterior inference,
+ enabling probabilistic sensitivity analysis and full uncertainty
+ quantification.
+
+- *Marginalization method based on parametric G-computation*: Current
+ outcome regression-based alternatives can extrapolate but target a
+ conditional treatment effect that is incompatible in the indirect
+ comparison. When adjusting for covariates, one must integrate or
+ average the conditional estimate over the relevant population to
+ recover a compatible marginal treatment effect. This can be easily
+ applied where the outcome regression is a generalized linear model
+ or a Cox model. The approach views the covariate adjustment
+ regression as a nuisance model and separates its estimation from the
+ evaluation of the marginal treatment effect of interest. The method
+ can accommodate a Bayesian statistical framework, which naturally
+ integrates the analysis into a probabilistic framework.
+
+We will now demonstrate the use of the `{outstandR}` package to fit all
+of these types of models to simulated binary data. Other vignettes will
+provide equivalent analyses for continuous and count data.
## General problem
-Consider one trial, for which the company has IPD, comparing treatments *A* and *C*, from herein call the *AC* trial.
-Also, consider a second trial comparing treatments *B* and *C*, similarly called the *BC* trial. For this trial only published aggregate data are available.
-We wish to estimate a comparison of the effects of treatments *A* and *B* on an
+Consider one trial, for which the company has IPD, comparing treatments
+*A* and *C*, from herein call the *AC* trial. Also, consider a second
+trial comparing treatments *B* and *C*, similarly called the *BC* trial.
+For this trial only published aggregate data are available. We wish to
+estimate a comparison of the effects of treatments *A* and *B* on an
appropriate scale in some target population *P*, denoted by the
parameter $d_{AB(P)}$. We make use of bracketed subscripts to denote a
specific population. Within the *BC* population there are parameters
-$\mu_{B(BC)}$ and $\mu_{C(BC)}$ representing the
-expected outcome on each treatment (including parameters for treatments
-not studied in the *BC* trial, e.g. treatment *A*). The *BC* trial
-provides estimators $\bar{Y}_{B(BC)}$ and $\bar{Y}_{C(BC)}$ of
-$\mu_{B(BC)}$, $\mu_{C(BC)}$, respectively, which are the summary
-outcomes. It is the same situation for the *AC* trial.
-
-For a suitable scale, for example a log-odds ratio, or risk difference, we form
-estimators $\Delta_{BC(BC)}$ and $\Delta_{AC(AC)}$ of the trial level
-(or marginal) relative treatment effects. We shall assume that this is always represented as a difference so, for example,
-for the risk ratio this is on the log scale.
+$\mu_{B(BC)}$ and $\mu_{C(BC)}$ representing the expected outcome on
+each treatment (including parameters for treatments not studied in the
+*BC* trial, e.g. treatment *A*). The *BC* trial provides estimators
+$\bar{Y}_{B(BC)}$ and $\bar{Y}_{C(BC)}$ of $\mu_{B(BC)}$, $\mu_{C(BC)}$,
+respectively, which are the summary outcomes. It is the same situation
+for the *AC* trial.
+
+For a suitable scale, for example a log-odds ratio, or risk difference,
+we form estimators $\Delta_{BC(BC)}$ and $\Delta_{AC(AC)}$ of the trial
+level (or marginal) relative treatment effects. We shall assume that
+this is always represented as a difference so, for example, for the risk
+ratio this is on the log scale.
$$
\Delta_{AB(BC)} = g(\bar{Y}_{B{(BC)}}) - g(\bar{Y}_{A{(BC)}})
$$
-
## Example analysis
First, let us load necessary packages.
@@ -81,17 +107,18 @@ First, let us load necessary packages.
library(boot) # non-parametric bootstrap in MAIC and ML G-computation
library(copula) # simulating BC covariates from Gaussian copula
library(rstanarm) # fit outcome regression, draw outcomes in Bayesian G-computation
+library(tidyr)
library(outstandR)
library(simcovariates)
```
### Data
-We consider binary outcomes using the log-odds ratio as the measure of effect.
-For example, the binary outcome may be response to treatment or the occurrence of an
-adverse event. For trials *AC* and *BC*, outcome $y_i$ for subject $i$
-is simulated from a Bernoulli distribution with probabilities of success
-generated from logistic regression.
+We consider binary outcomes using the log-odds ratio as the measure of
+effect. For example, the binary outcome may be response to treatment or
+the occurrence of an adverse event. For trials *AC* and *BC*, outcome
+$y_i$ for subject $i$ is simulated from a Bernoulli distribution with
+probabilities of success generated from logistic regression.
For the *BC* trial, the individual-level covariates and outcomes are
aggregated to obtain summaries. The continuous covariates are summarized
@@ -101,30 +128,31 @@ the RCT publication. The binary outcomes are summarized in an overall
event table. Typically, the published study only provides aggregate
information to the analyst.
-The IPD simulation input parameters are given below.
-
-| Parameter | Description | Value |
-|------------------|--------------------------------------------------------------------|------------------|
-| `N` | Sample size | 200 |
-| `allocation` | Active treatment vs. placebo allocation ratio (2:1) | 2/3 |
-| `b_trt` | Conditional effect of active treatment vs. comparator (log(0.17)) | -1.77196 |
-| `b_X` | Conditional effect of each prognostic variable (-log(0.5)) | 0.69315 |
-| `b_EM` | Conditional interaction effect of each effect modifier (-log(0.67))| 0.40048 |
-| `meanX_AC[1]` | Mean of prognostic factor X3 in AC trial | 0.45 |
-| `meanX_AC[2]` | Mean of prognostic factor X4 in AC trial | 0.45 |
-| `meanX_EM_AC[1]` | Mean of effect modifier X1 in AC trial | 0.45 |
-| `meanX_EM_AC[2]` | Mean of effect modifier X2 in AC trial | 0.45 |
-| `meanX_BC[1]` | Mean of prognostic factor X3 in BC trial | 0.6 |
-| `meanX_BC[2]` | Mean of prognostic factor X4 in BC trial | 0.6 |
-| `meanX_EM_BC[1]` | Mean of effect modifier X1 in BC trial | 0.6 |
-| `meanX_EM_BC[2]` | Mean of effect modifier X2 in BC trial | 0.6 |
-| `sdX` | Standard deviation of prognostic factors (AC and BC) | 0.4 |
-| `sdX_EM` | Standard deviation of effect modifiers | 0.4 |
-| `corX` | Covariate correlation coefficient | 0.2 |
-| `b_0` | Baseline intercept | -0.6 |
-
-
-We shall use the `gen_data()` function available with the [simcovariates](https://github.com/n8thangreen/simcovariates) package.
+The simulation input parameters are given below.
+
+| Parameter | Description | Value |
+|------------------|------------------------------------|------------------|
+| `N` | Sample size | 200 |
+| `allocation` | Active treatment vs. placebo allocation ratio (2:1) | 2/3 |
+| `b_trt` | Conditional effect of active treatment vs. comparator (log(0.17)) | -1.77196 |
+| `b_X` | Conditional effect of each prognostic variable (-log(0.5)) | 0.69315 |
+| `b_EM` | Conditional interaction effect of each effect modifier (-log(0.67)) | 0.40048 |
+| `meanX_AC[1]` | Mean of prognostic factor X3 in AC trial | 0.45 |
+| `meanX_AC[2]` | Mean of prognostic factor X4 in AC trial | 0.45 |
+| `meanX_EM_AC[1]` | Mean of effect modifier X1 in AC trial | 0.45 |
+| `meanX_EM_AC[2]` | Mean of effect modifier X2 in AC trial | 0.45 |
+| `meanX_BC[1]` | Mean of prognostic factor X3 in BC trial | 0.6 |
+| `meanX_BC[2]` | Mean of prognostic factor X4 in BC trial | 0.6 |
+| `meanX_EM_BC[1]` | Mean of effect modifier X1 in BC trial | 0.6 |
+| `meanX_EM_BC[2]` | Mean of effect modifier X2 in BC trial | 0.6 |
+| `sdX` | Standard deviation of prognostic factors (AC and BC) | 0.4 |
+| `sdX_EM` | Standard deviation of effect modifiers | 0.4 |
+| `corX` | Covariate correlation coefficient | 0.2 |
+| `b_0` | Baseline intercept | -0.6 |
+
+We shall use the `gen_data()` function available with the
+[simcovariates](https://github.com/n8thangreen/simcovariates) package on
+GitHub.
```{r, warning=FALSE, message=FALSE}
library(dplyr)
@@ -151,15 +179,19 @@ ipd_trial <- gen_data(N, b_trt, b_X, b_EM, b_0,
family = binomial("logit"))
```
-The treatment column in the return data is binary and takes values 0 and 1. We will include some extra information about treatment names.
-To do this we will define the lable of the two level factor as `A` for 1 and `C` for 0 as follows.
+The treatment column in the return data is binary and takes values 0
+and 1. We will include some extra information about treatment names. To
+do this we will define the lable of the two level factor as `A` for 1
+and `C` for 0 as follows.
```{r}
ipd_trial$trt <- factor(ipd_trial$trt, labels = c("C", "A"))
```
-Similarly, to obtain the aggregate data we will simulate IPD but with the additional summarise step.
-We set different mean values `meanX_BC` and `meanX_EM_BC` but otherwise use the same parameter values as for the $AC$ case.
+Similarly, to obtain the aggregate data we will simulate IPD but with
+the additional summarise step. We set different mean values `meanX_BC`
+and `meanX_EM_BC` but otherwise use the same parameter values as for the
+$AC$ trial.
```{r generate-ald-data}
BC.IPD <- gen_data(N, b_trt, b_X, b_EM, b_0,
@@ -168,44 +200,70 @@ BC.IPD <- gen_data(N, b_trt, b_X, b_EM, b_0,
corX, allocation,
family = binomial("logit"))
-cov.X <- BC.IPD %>%
- summarise(across(starts_with("X"),
- list(mean = mean, sd = sd),
- .names = "{fn}.{col}"))
-
-out.B <- dplyr::filter(BC.IPD, trt == 1) %>%
- summarise(y.B.sum = sum(y),
- y.B.bar = mean(y),
- y.B.sd = sd(y),
- N.B = n())
+BC.IPD$trt <- factor(BC.IPD$trt, labels = c("C", "B"))
-out.C <- dplyr::filter(BC.IPD, trt == 0) %>%
- summarise(y.C.sum = sum(y),
- y.C.bar = mean(y),
- y.C.sd = sd(y),
- N.C = n())
-
-ald_trial <- cbind.data.frame(cov.X, out.C, out.B)
+# covariate summary statistics
+# assume same between treatments
+cov.X <-
+ BC.IPD %>%
+ as.data.frame() |>
+ dplyr::select(X1, X2, X3, X4, trt) %>%
+ pivot_longer(cols = starts_with("X"), names_to = "variable", values_to = "value") %>%
+ group_by(variable) %>%
+ summarise(
+ mean = mean(value),
+ sd = sd(value)
+ ) %>%
+ pivot_longer(cols = c("mean", "sd"), names_to = "statistic", values_to = "value") %>%
+ ungroup() |>
+ mutate(trt = NA)
+
+# outcome
+summary.y <-
+ BC.IPD |>
+ as.data.frame() |>
+ dplyr::select(y, trt) %>%
+ pivot_longer(cols = "y", names_to = "variable", values_to = "value") %>%
+ group_by(variable, trt) %>%
+ summarise(
+ mean = mean(value),
+ sd = sd(value),
+ sum = sum(value)
+ ) %>%
+ pivot_longer(cols = c("mean", "sd", "sum"),
+ names_to = "statistic", values_to = "value") %>%
+ ungroup()
+
+# sample sizes
+summary.N <-
+ BC.IPD |>
+ group_by(trt) |>
+ count(name = "N") |>
+ pivot_longer(cols = "N", names_to = "statistic", values_to = "value") |>
+ mutate(variable = NA_character_) |>
+ dplyr::select(variable, statistic, value, trt)
+
+ald_trial <- rbind.data.frame(cov.X, summary.y, summary.N)
```
-This general format of data sets consist of the following.
+This general format of the data sets are in a 'long' style consisting of
+the following.
#### `ipd_trial`: Individual patient data
-- `X*`: patient measurements
-- `trt`: treatment ID (integer)
-- `y`: (logical) indicator of whether event was observed
+- `X*`: Patient measurements
+- `trt`: Treatment label (factor)
+- `y`: Indicator of whether event was observed (two level factor)
#### `ald_trial`: Aggregate-level data
-- `mean.X*`: mean patient measurement
-- `sd.X*`: standard deviation of patient measurement
-- `y.*.sum`: total number of events
-- `y.*.bar`: proportion of events
-- `N.*`: total number of individuals
-
-Note that the wildcard `*` here is usually an integer from 1 or the
-trial identifier *B*, *C*.
+- `variable`: Covariate name. In the case of treatment arm sample size
+ this is `NA`
+- `statistic`: Summary statistic name from mean, standard deviation or
+ sum
+- `value`: Numerical value of summary statistic
+- `trt`: Treatment label. Because we assume a common covariate
+ distribution between treatment arms this is `NA`
Our data look like the following.
@@ -213,15 +271,18 @@ Our data look like the following.
head(ipd_trial)
```
-There are 4 correlated continuous covariates generated per subject, simulated from a multivariate normal distribution.
-Treatment `trt` 1 corresponds to new treatment *A*, and 0 is standard of care or status quo *C*. The ITC is 'anchored' via *C*, the common treatment.
+There are 4 correlated continuous covariates generated per subject,
+simulated from a multivariate normal distribution. Treatment `trt` takes
+either new treatment *A* or standard of care / status quo *C*. The ITC
+is 'anchored' via *C*, the common treatment.
```{r}
ald_trial
```
In this case, we have 4 covariate mean and standard deviation values;
-and the event total, average and sample size for each treatment *B*, and *C*.
+and the event total, average and sample size for each treatment *B* and
+*C*.
#### Regression model
@@ -231,14 +292,21 @@ $$
\text{logit}(p_{t}) = \beta_0 + \beta_X (X_3 + X_4) + [\beta_{t} + \beta_{EM} (X_1 + X_2)] \; \text{I}(t \neq C)
$$
-That is, for treatment $C$ the right hand side becomes $\beta_0 + \beta_X (X_3 + X_4)$ and for comparator treatments $A$ or $B$ there is an additional $\beta_t + \beta_{EM} (X_1 + X_2)$ component consisting of the effect modifier terms and the coefficient for the treatment parameter is the log odds-ratio (LOR), $\beta_t$ (or `b_trt` in the R code).
-$p_{t}$ is the probability of experiencing the event of interest for treatment $t$.
+$\text{I}()$ is an indicator function taking value 1 if true and 0
+otherwise. That is, for treatment $C$ the right hand side becomes
+$\beta_0 + \beta_X (X_3 + X_4)$ and for comparator treatments $A$ or $B$
+there is an additional $\beta_t + \beta_{EM} (X_1 + X_2)$ component
+consisting of the effect modifier terms and the coefficient for the
+treatment parameter, $\beta_t$ (or `b_trt` in the R code), i.e. the log
+odds-ratio (LOR) for the logit model. Finally, $p_{t}$ is the
+probability of experiencing the event of interest for treatment $t$.
### Output statistics
-We will implement for MAIC, STC, and G-computation methods to obtain the *marginal treatment effect* and *marginal variance*.
-The definition by which these are calculated depends on the type of data and outcome scale.
-For our current example of binary data and log-odds ratio the marginal treatment effect is
+We will obtain the *marginal treatment effect* and *marginal variance*.
+The definition by which of these are calculated depends on the type of
+data and outcome scale. For our current example of binary data and
+log-odds ratio the marginal treatment effect is
$$
\log\left( \frac{n_B/(N_B-n_B)}{n_C/(N_B-n_{B})} \right) = \log(n_B n_{\bar{C}}) - \log(n_C n_{\bar{B}})
@@ -249,16 +317,19 @@ and marginal variance is
$$
\frac{1}{n_C} + \frac{1}{n_{\bar{C}}} + \frac{1}{n_B} + \frac{1}{n_{\bar{B}}}
$$
-where $n_B, n_C$ are the number of events in each arm and $\bar{C}$ is the compliment of $C$, so e.g. $n_{\bar{C}} = N_C - n_c$.
-Other scales will be discussed below.
+
+where $n_B, n_C$ are the number of events in each arm and $\bar{C}$ is
+the compliment of $C$, so e.g. $n_{\bar{C}} = N_C - n_c$. Other outcome
+scales will be discussed below.
## Model fitting in R
The `{outstandR}` package has been written to be easy to use and
-essential consists of a single function, `outstandR()`. This can be used
-to run all of the different types of model, which we will call
-*strategies*. The first two arguments of `outstandR()` are the
-individual and aggregate-level data, respectively.
+essentially consists of a single function, `outstandR()`. This can be
+used to run all of the different types of model, which when combined
+with their specific parameters we will call *strategies*. The first two
+arguments of `outstandR()` are the individual patient and
+aggregate-level data, respectively.
A `strategy` argument of `outstandR` takes functions called
`strategy_*()`, where the wildcard `*` is replaced by the name of the
@@ -267,14 +338,17 @@ specific example is provided below.
### Model formula
-Defining $X_1, X_2$ as effect modifiers, $X_3, X_4$ as prognostic variables and $Z$ the treatment indicator then the formula used in this model is
+We will take advantage of the in-built R formula object to define the
+models. This will allow us easily pull out components of the object and
+consistently use it. Defining $X_1, X_2$ as effect modifiers, $X_3, X_4$
+as prognostic variables and $Z$ the treatment indicator then the formula
+used in this model is
$$
y = X_3 + X_4 + Z + Z X_1 + Z X_2
-$$
-
-which corresponds to the following `R` `formula` object passed as an
-argument to the strategy function.
+$$ Notice that this does not include the link function of interest so
+appears as a linear regression. This corresponds to the following `R`
+`formula` object passed as an argument to the strategy function.
```{r}
lin_form <- as.formula("y ~ X3 + X4 + trt + trt:X1 + trt:X2")
@@ -286,20 +360,31 @@ Note that the more succinct formula
y ~ X3 + X4 + trt*(X1 + X2)
```
-Would also include $X_1, X_2$ as prognostic factors so in not equivalent, but could be modified as follows.
+Would additionally include $X_1, X_2$ as prognostic factors so in not
+equivalent, but could be modified as follows.
```{r, eval=FALSE}
y ~ X3 + X4 + trt*(X1 + X2) - X1 - X2
```
-### MAIC
+We note that the MAIC approach does not strictly use a regression in the
+same way as the other methods so should not be considered directly
+comparable in this sense but we have decided to use a consistent syntax
+across models using 'formula'.
+
+### Matching-Adjusted Indirect Comparison (MAIC)
-Using the individual level data for *AC* firstly we perform
-non-parametric bootstrap of the `maic.boot` function with `R = 1000`
-replicates. This function fits treatment coefficient for the marginal
-effect for *A* vs *C*. The returned value is an object of class `boot`
-from the `{boot}` package. We then calculate the bootstrap mean and
-variance in the wrapper function `maic_boot_stats`.
+A single call to `outstandR()` is sufficient to run the model. We pass
+to the `strategy` argument the `strategy_maic()` function with arguments
+`formula = lin_form` as defined above and
+`family = binomial(link = "logit")` for binary data and logistic link.
+
+Internally, using the individual patient level data for *AC* firstly we
+perform non-parametric bootstrap of the `maic.boot` function with
+`R = 1000` replicates. This function fits treatment coefficient for the
+marginal effect for *A* vs *C*. The returned value is an object of class
+`boot` from the `{boot}` package. We then calculate the bootstrap mean
+and variance in the wrapper function `maic_boot_stats`.
```{r outstandR_maic}
outstandR_maic <-
@@ -315,62 +400,73 @@ The returned object is of class `outstandR`.
str(outstandR_maic)
```
-We see that this is a list object with 3 parts, each containing
+We see that this is a list object with 2 parts. The first contains
statistics between each pair of treatments. These are the mean
contrasts, variances and confidence intervals (CI), respectively. The
-default CI is for 95% but can be altered in `outstandR` with the `CI` argument.
+default CI is for 95% but can be altered in `outstandR` with the `CI`
+argument. The second element of the list contains the absolute effect
+estimates.
-A `print` method is available for `outstandR` objects for more human-readable output
+A `print` method is available for `outstandR` objects for more
+human-readable output
```{r}
outstandR_maic
```
-
#### Outcome scale
-If we do not explicitly specify the outcome scale, the default is that used to fit to the data in the regression model. As we saw, in this case, the default is
-log-odds ratio corresponding to the `"logit"` link function for binary data.
-However, we can change this to some other scale which may be more appropriate for a particular analysis.
-So far implemented in the package, the links and their corresponding relative treatment effect scales are as follows:
+If we do not explicitly specify the outcome scale, the default is that
+used to fit to the data in the regression model. As we saw, in this
+case, the default is log-odds ratio corresponding to the `"logit"` link
+function for binary data. However, we can change this to some other
+scale which may be more appropriate for a particular analysis. So far
+implemented in the package, the links and their corresponding relative
+treatment effect scales are as follows:
-| Data Type | Model | Scale | Argument |
-|:-----------|---------|:------------------|:-------------------|
-| Binary | `logit` | Log-odds ratio | `log_odds` |
-| Count | `log` | Log-risk ratio | `log_relative_risk`|
-| Continuous | `mean` | Mean difference | `risk_difference` |
+| Data Type | Model | Scale | Argument |
+|:-----------|---------|:----------------|:--------------------|
+| Binary | `logit` | Log-odds ratio | `log_odds` |
+| Count | `log` | Log-risk ratio | `log_relative_risk` |
+| Continuous | `mean` | Mean difference | `risk_difference` |
-The full list of possible transformed treatment effect scales are:
-log-odds ratio, log-risk ratio, mean difference, risk difference, hazard ratio, hazard difference.
+The full list of possible transformed treatment effect scales will be:
+log-odds ratio, log-risk ratio, mean difference, risk difference, hazard
+ratio, hazard difference.
For binary data the marginal treatment effect and variance are
-* __Log-risk ratio__
+- **Log-risk ratio**
Treatment effect is
+
$$
\log(n_B/N_B) - \log(n_A/N_A)
$$
+
and variance
+
$$
\frac{1}{n_B} - \frac{1}{N_B} + \frac{1}{n_A} - \frac{1}{N_A}
$$
-
-* __Risk difference__
+- **Risk difference**
Treatment effect is
+
$$
\frac{n_B}{N_B} - \frac{n_A}{N_A}
$$
+
and variance
+
$$
\frac{n_B}{N_B} \left( 1 - \frac{n_B}{N_B} \right) + \frac{n_A}{N_A} \left( 1 - \frac{n_A}{N_A} \right)
$$
-
-To change the outcome scale, we can pass the `scale` argument in the `outstandR()` function.
-For example, to change the scale to risk difference, we can use the following code.
+To change the outcome scale, we can pass the `scale` argument in the
+`outstandR()` function. For example, to change the scale to risk
+difference, we can use the following code.
```{r outstandR_maic_lrr}
outstandR_maic_lrr <-
@@ -386,29 +482,12 @@ outstandR_maic_lrr
### Simulated treatment comparison
-Simulated treatment comparison (STC) is the conventional outcome regression method. It involves fitting a
-regression model of outcome on treatment and covariates to the IPD. IPD
-effect modifiers are centred at the mean *BC* values.
+Simulated treatment comparison (STC) is the conventional outcome
+regression method. It involves fitting a regression model of outcome on
+treatment and covariates to the IPD.
-$$
-g(\mu_n) = \beta_0 + \beta_X (\boldsymbol{x}_n - \boldsymbol{\theta}) + \boldsymbol{\beta_{EM}} (\beta_t + (\boldsymbol{x_n^{EM}} - \boldsymbol{\theta^{EM}}) ) \; \mbox{I}(t \neq C)
-$$
-
-where $\beta_0$ is the intercept, $\beta_1$ are the covariate
-coefficients, $\beta_z$ and $\beta_2$ are the effect modifier
-coefficients, $z_n$ are the indicator variables of effect alternative
-treatment. $g(\cdot)$ is the link function e.g. $\log$.
-
-As already mentioned, running the STC analysis is almost identical to
-the previous analysis but we now use the `strategy_stc()` strategy
-function instead and a formula with centered covariates.
-
-$$
-y = X_3 + X_4 + Z + Z (X_1 - \bar{X_1}) + Z (X_2 - \bar{X_2})
-$$
-
-However, `outstandR()` knows how to handle this so we can simply pass
-the same (uncentred) formula as before.
+We can simply pass the same formula as before to the modified call with
+`strategy_stc()`.
```{r outstandR_stc}
outstandR_stc <-
@@ -416,10 +495,13 @@ outstandR_stc <-
strategy = strategy_stc(
formula = lin_form,
family = binomial(link = "logit")))
+```
+
+```{r}
outstandR_stc
```
-Change the outcome scale
+Changing the outcome scale to LRR gives
```{r outstandR_stc_lrr}
outstandR_stc_lrr <-
@@ -465,15 +547,8 @@ $$
\hat{\mu}_0 = \int_{x^*} g^{-1} (\hat{\beta}_0 + x^* \hat{\beta}_1 ) p(x^*) \; \text{d}x^*
$$
-To estimate the marginal or population-average treatment effect for *A*
-versus *C* in the linear predictor scale, one back-transforms to this
-scale the average predictions, taken over all subjects on the natural
-outcome scale, and calculates the difference between the average linear
-predictions:
-
-$$
-\hat{\Delta}^{(2)}_{10} = g(\hat{\mu}_1) - g(\hat{\mu}_0)
-$$
+As performed for the previous approaches, call `outstandR()` but change
+the strategy to `strategy_gcomp_ml()`,
```{r outstandR_gcomp_ml}
outstandR_gcomp_ml <-
@@ -481,11 +556,13 @@ outstandR_gcomp_ml <-
strategy = strategy_gcomp_ml(
formula = lin_form,
family = binomial(link = "logit")))
+```
+```{r}
outstandR_gcomp_ml
```
-Change the outcome scale
+Once again, let us change the outcome scale to LRR
```{r outstandR_gcomp_ml_lrr}
outstandR_gcomp_ml_lrr <-
@@ -531,9 +608,8 @@ full Bayesian estimation via Markov chain Monte Carlo (MCMC) sampling.
The average, variance and interval estimates of the marginal treatment
effect can be derived empirically from draws of the posterior density.
-We can draw a vector of size $N^*$ of predicted outcomes $y^*_z$ under
-each set intervention $z^*$ from its posterior predictive distribution
-under the specific treatment.
+The strategy function to plug-in to the `outstandR()` call for this
+approach is `strategy_gcomp_stan()`,
```{r outstandR_gcomp_stan, message=FALSE, eval=FALSE}
outstandR_gcomp_stan <-
@@ -556,7 +632,7 @@ xx <- capture.output(
outstandR_gcomp_stan
```
-As before, we can change the outcome scale.
+As before, we can change the outcome scale to LRR.
```{r outstandR_gcomp_stan_lrr, eval=FALSE}
outstandR_gcomp_stan_lrr <-
@@ -583,7 +659,9 @@ outstandR_gcomp_stan_lrr
### Multiple imputation marginalisation
-Marginalized treatment effect for aggregate level data study is obtained by integrating over the covariate distribution from the $BC$ study
+The final method is to obtain the marginalized treatment effect for
+aggregate level data study, obtained by integrating over the covariate
+distribution from the aggregate level data $BC$ study
$$
\Delta^{\text{marg}} = \mathbb{E}_{X \sim f_{\text{BC}}(X)} \left[ \mu_{T=1}(X) - \mu_{T=0}(X) \right]
@@ -596,14 +674,15 @@ $$
\hat{\Delta}_{BC} \sim \mathcal{N}(\Delta^{\text{marg}}, SE^2)
$$
+The multiple imputation marginalisation strategy function is
+`strategy_mim()`,
+
```{r outstandR_mim, eval=FALSE}
outstandR_mim <-
outstandR(ipd_trial, ald_trial,
strategy = strategy_mim(
formula = lin_form,
family = binomial(link = "logit")))
-
-outstandR_mim
```
```{r outstandR_mim_eval, echo=FALSE}
@@ -619,7 +698,7 @@ xx <- capture.output(
outstandR_mim
```
-Change the outcome scale again.
+Change the outcome scale again for LRR,
```{r outstandR_mim_lrr, eval=FALSE}
outstandR_mim_lrr <-
@@ -644,15 +723,20 @@ xx <- capture.output(
outstandR_mim_lrr
```
-### Model comparison
+## Model comparison
#### $AC$ effect in $BC$ population
-The true $AC$ effect on the log OR scale in the $BC$ (aggregate trial data) population is
-$\beta_t^{AC} + \beta_{EM} (\bar{X}^{AC}_1 + \bar{X}_2^{AC})$. Calculated by
+The true $AC$ effect on the log OR scale in the $BC$ (aggregate trial
+data) population is
+$\beta_t^{AC} + \beta_{EM} (\bar{X}^{AC}_1 + \bar{X}_2^{AC})$.
+Calculated by
```{r}
-d_AC_true <- b_trt + b_EM * (ald_trial$mean.X1 + ald_trial$mean.X2)
+mean_X1 <- ald_trial$value[ald_trial$statistic == "mean" & ald_trial$variable == "X1"]
+mean_X2 <- ald_trial$value[ald_trial$statistic == "mean" & ald_trial$variable == "X2"]
+
+d_AC_true <- b_trt + b_EM * (mean_X1 + mean_X2)
```
```{r echo=FALSE}
@@ -664,7 +748,8 @@ d_AC_true <- b_trt + b_EM * (ald_trial$mean.X1 + ald_trial$mean.X2)
# d_C_true - d_A_true
```
-The naive approach is to just convert directly from one population to another, ignoring the imbalance in effect modifiers.
+The naive approach is to just convert directly from one population to
+another, ignoring the imbalance in effect modifiers.
```{r}
d_AC_naive <-
@@ -681,33 +766,43 @@ d_AC_naive <-
#### $AB$ effect in $BC$ population
-This is the indirect effect. The true $AB$ effect in the $BC$ population is $\beta_t^{AC} - \beta_t^{BC}$.
+This is the indirect effect. The true $AB$ effect in the $BC$ population
+is $\beta_t^{AC} - \beta_t^{BC}$.
```{r echo=FALSE}
d_AB_true <- 0
```
-Following the simulation study in Remiro et al (2020) these cancel out and the true effect is zero.
+Following the simulation study in Remiro et al (2020) these cancel out
+and the true effect is zero.
The naive comparison calculating $AB$ effect in the $BC$ population is
```{r}
+# reshape to make extraction easier
+ald_out <- ald_trial %>%
+ filter(variable == "y" | is.na(variable)) %>%
+ mutate(stat_trt = paste0(statistic, "_", trt)) %>%
+ dplyr::select(stat_trt, value) %>%
+ pivot_wider(names_from = stat_trt, values_from = value)
+
d_BC <-
- with(ald_trial, log(y.B.bar/(1-y.B.bar)) - log(y.C.bar/(1-y.C.bar)))
+ with(ald_out, log(mean_B/(1-mean_B)) - log(mean_C/(1-mean_C)))
d_AB_naive <- d_AC_naive$d_AC - d_BC
-var.d.BC <- with(ald_trial, 1/y.B.sum + 1/(N.B - y.B.sum) + 1/y.C.sum + 1/(N.C - y.C.sum))
+var.d.BC <- with(ald_out, 1/sum_B + 1/(N_B - sum_B) + 1/sum_C + 1/(N_C - sum_C))
var.d.AB.naive <- d_AC_naive$var_AC + var.d.BC
```
Of course, the $BC$ contrast is calculated directly.
-
## Results
-We now combine all outputs and present in plots and tables. For a log-odds ratio a table of all contrasts and methods in the $BC$ population is given below.
+We now combine all outputs and present in plots and tables. For a
+log-odds ratio a table of all contrasts and methods in the $BC$
+population is given below.
```{r echo=FALSE}
res_tab <-
@@ -735,9 +830,11 @@ res_tab_var <-
knitr::kable(res_tab)
```
-We can see that the different corresponds reasonably well with one another.
+We can see that the different corresponds reasonably well with one
+another.
-Next, let us look at the results on the log relative risk scale. First, the true values are calculated as
+Next, let us look at the results on the log relative risk scale. First,
+the true values are calculated as
```{r eval=FALSE}
d_AB_true_lrr <- 0
@@ -775,7 +872,10 @@ knitr::kable(res_tab_lrr)
#### Plots
-The same output can be presented in forest plots is as follows. Each horizontal bar represent a different method and the facets group these by treatment comparison for the $BC$ population. The log-odds ratio and log risk ratio plot are given below.
+The same output can be presented in forest plots is as follows. Each
+horizontal bar represent a different method and the facets group these
+by treatment comparison for the $BC$ population. The log-odds ratio and
+log risk ratio plot are given below.
```{r forest-lor, fig.width=8, fig.height=6, warning=FALSE, message=FALSE, echo=FALSE}
library(ggplot2)
@@ -810,8 +910,6 @@ ggplot(aes(x = Estimate, y = id, col = type), data = plotdat) +
```
```{r forest-rr, fig.width=8, fig.height=6, warning=FALSE, message=FALSE, echo=FALSE}
-library(ggplot2)
-
var_dat <-
t(res_tab_var_lrr) |>
as.data.frame() |>
@@ -840,4 +938,3 @@ ggplot(aes(x = Estimate, y = id, col = type), data = plotdat) +
scale_y_reverse(name = "Comparison in BC population",
breaks = NULL, expand = c(0, 0.6))
```
-
diff --git a/vignettes/Continuous_data_example.Rmd b/vignettes/Continuous_data_example.Rmd
index 75664f1..99e3d17 100644
--- a/vignettes/Continuous_data_example.Rmd
+++ b/vignettes/Continuous_data_example.Rmd
@@ -18,8 +18,12 @@ knitr::opts_chunk$set(
)
```
-##TODO
+## Introduction
+This is the vignette for performing population adjustment methods with continuous data, in order to compare marginal
+treatment effects when there are cross-trial differences in effect modifiers and limited patient-level data.
+We will demonstrate how to apply MAIC, STC, G-computation with ML, G-computation with Bayesian inference and multiple imputation marginalisation.
+The document structure follow the binary data example vignette which should be referred to for more details.
## Example analysis
@@ -30,30 +34,17 @@ library(boot) # non-parametric bootstrap in MAIC and ML G-computation
library(copula) # simulating BC covariates from Gaussian copula
library(rstanarm) # fit outcome regression, draw outcomes in Bayesian G-computation
library(outstandR)
+library(tidyr)
library(simcovariates)
```
### Data
-Next, we load the data to use in the analysis. The data comes from a
-simulation study in Remiro‐Azócar A, Heath A, Baio G (2020). We consider
-binary outcomes using the log-odds ratio as the measure of effect. The
-binary outcome may be response to treatment or the occurrence of an
-adverse event. For trials *AC* and *BC*, outcome $y_n$ for subject $n$
-is simulated from a Bernoulli distribution with probabilities of success
-generated from logistic regression.
+We first simulate both the IPD and ALD data. See the binary data example vignette for more details on how this is implemented.
+The difference with that example is that we change the `family` argument in `gen_data()` to `gaussian(link = "identity")`, corresponding to the continuous data case.
+The `gen_data()` function is available in the [simcovariates](https://github.com/n8thangreen/simcovariates) package on GitHub.
-For the *BC* trial, the individual-level covariates and outcomes are
-aggregated to obtain summaries. The continuous covariates are summarized
-as means and standard deviations, which would be available to the
-analyst in the published study in a table of baseline characteristics in
-the RCT publication. The binary outcomes are summarized in an overall
-event table. Typically, the published study only provides aggregate
-information to the analyst.
-
-Use the `gen_data()` function available in the [simcovariates](https://github.com/n8thangreen/simcovariates) package.
-
-```{r}
+```{r, warning=FALSE, message=FALSE}
library(dplyr)
library(MASS)
@@ -80,52 +71,78 @@ ipd_trial <- gen_data(N, b_trt, b_X, b_EM, b_0,
ipd_trial$trt <- factor(ipd_trial$trt, labels = c("C", "A"))
```
-Similarly, for the aggregate data but with the additional summarise step.
+Similarly, for the aggregate data but with the additional summarise step (see binary data example vignette for code).
-```{r generate-ald-data}
+```{r generate-ald-data, echo=FALSE, warning=FALSE, message=FALSE}
BC.IPD <- gen_data(N, b_trt, b_X, b_EM, b_0,
meanX_BC, sdX,
meanX_EM_BC, sdX_EM,
corX, allocation,
family = gaussian(link = "identity"))
-cov.X <- BC.IPD %>%
- summarise(across(starts_with("X"),
- list(mean = mean, sd = sd),
- .names = "{fn}.{col}"))
-
-out.B <- dplyr::filter(BC.IPD, trt == 1) %>%
- summarise(y.B.sum = sum(y),
- y.B.bar = mean(y),
- y.B.sd = sd(y),
- N.B = n())
-
-out.C <- dplyr::filter(BC.IPD, trt == 0) %>%
- summarise(y.C.sum = sum(y),
- y.C.bar = mean(y),
- y.C.sd = sd(y),
- N.C = n())
-
-ald_trial <- cbind.data.frame(cov.X, out.B, out.C)
+BC.IPD$trt <- factor(BC.IPD$trt, labels = c("C", "B"))
+
+# covariate summary statistics
+# assume same between treatments
+cov.X <-
+ BC.IPD %>%
+ as.data.frame() |>
+ dplyr::select(X1, X2, X3, X4, trt) %>%
+ pivot_longer(cols = starts_with("X"), names_to = "variable", values_to = "value") %>%
+ group_by(variable) %>%
+ summarise(
+ mean = mean(value),
+ sd = sd(value)
+ ) %>%
+ pivot_longer(cols = c("mean", "sd"), names_to = "statistic", values_to = "value") %>%
+ ungroup() |>
+ mutate(trt = NA)
+
+# outcome
+summary.y <-
+ BC.IPD |>
+ as.data.frame() |>
+ dplyr::select(y, trt) %>%
+ pivot_longer(cols = "y", names_to = "variable", values_to = "value") %>%
+ group_by(variable, trt) %>%
+ summarise(
+ mean = mean(value),
+ sd = sd(value),
+ sum = sum(value),
+ ) %>%
+ pivot_longer(cols = c("mean", "sd", "sum"),
+ names_to = "statistic", values_to = "value") %>%
+ ungroup()
+
+# sample sizes
+summary.N <-
+ BC.IPD |>
+ group_by(trt) |>
+ count(name = "N") |>
+ pivot_longer(cols = "N", names_to = "statistic", values_to = "value") |>
+ mutate(variable = NA_character_) |>
+ dplyr::select(variable, statistic, value, trt)
+
+ald_trial <- rbind.data.frame(cov.X, summary.y, summary.N)
```
-This general format of data sets consist of the following.
+This general format of the data sets are in a 'long' style consisting of the following.
#### `ipd_trial`: Individual patient data
-- `X*`: patient measurements
-- `trt`: treatment ID (integer)
-- `y`: continuous outcome measure
+- `X*`: Patient measurements
+- `trt`: Treatment label (factor)
+- `y`: Continuous numeric
#### `ald_trial`: Aggregate-level data
-- `mean.X*`: mean patient measurement
-- `sd.X*`: standard deviation of patient measurement
-- `mean.y`:
-- `sd.y`:
-
-Note that the wildcard `*` here is usually an integer from 1 or the
-trial identifier *B*, *C*.
+- `variable`: Covariate name. In the case of treatment arm sample size
+ this is `NA`
+- `statistic`: Summary statistic name from mean, standard deviation or
+ sum
+- `value`: Numerical value of summary statistic
+- `trt`: Treatment label. Because we assume a common covariate
+ distribution between treatment arms this is `NA`
Our data look like the following.
@@ -134,31 +151,20 @@ head(ipd_trial)
```
There are 4 correlated continuous covariates generated per subject,
-simulated from a multivariate normal distribution.
+simulated from a multivariate normal distribution. Treatment `trt` takes
+either new treatment *A* or standard of care / status quo *C*. The ITC
+is 'anchored' via *C*, the common treatment.
```{r}
ald_trial
```
In this case, we have 4 covariate mean and standard deviation values;
-and the event total, average and sample size for each treatment *B*, and
+and the total, average and sample size for each treatment *B* and
*C*.
-### Output statistics
-
-We will implement for MAIC, STC, and G-computation methods to obtain the
-*marginal variance*, defined as
-
-$$
-$$
-
-and the *marginal treatment effect*, defined as the log-odds ratio,
-
-$$
-$$
-
-where $\bar{C}$ is the compliment of $C$ so e.g.
-$n_{\bar{C}} = N_C - n_c$.
+In the following we will implement for MAIC, STC, and G-computation methods to obtain the
+*marginal variance* and the *marginal treatment effect*.
## Model fitting in R
@@ -173,28 +179,17 @@ A `strategy` argument of `outstandR` takes functions called
particular method required, e.g. `strategy_maic()` for MAIC. Each
specific example is provided below.
-### MAIC
-
-Using the individual level data for *AC* firstly we perform
-non-parametric bootstrap of the `maic.boot` function with `R = 1000`
-replicates. This function fits treatment coefficient for the marginal
-effect for *A* vs *C*. The returned value is an object of class `boot`
-from the `{boot}` package. We then calculate the bootstrap mean and
-variance in the wrapper function `maic_boot_stats`.
-
-The formula used in this model is
-
-$$
-y = X_3 + X_4 + \beta_t X_1 + \beta_t X_2
-$$
-
-which corresponds to the following `R` `formula` object passed as an
-argument to the strategy function.
+The formula used in this model, passed as an
+argument to the strategy function is
```{r}
lin_form <- as.formula("y ~ X3 + X4 + trt*X1 + trt*X2")
```
+### MAIC
+
+As mentioned above, pass the model specific strategy function to the main `outstandR()` function, in this case use `strategy_maic()`.
+
```{r outstandR_maic}
outstandR_maic <-
outstandR(ipd_trial, ald_trial,
@@ -209,37 +204,12 @@ The returned object is of class `outstandR`.
outstandR_maic
```
-We see that this is a list object with 3 parts, each containing
-statistics between each pair of treatments. These are the mean
-contrasts, variances and confidence intervals (CI), respectively. The
-default CI is for 95% but can be altered in `outstandR` with the `CI`
-argument.
-### STC
+### Simulated Treatment Comparison (STC)
STC is the conventional outcome regression method. It involves fitting a
-regression model of outcome on treatment and covariates to the IPD. IPD
-effect modifiers are centred at the mean *BC* values.
-
-$$
-g(\mu_n) = \beta_0 + (\boldsymbol{x}_n - \boldsymbol{\theta}) \beta_1 + (\beta_z + (\boldsymbol{x_n^{EM}} - \boldsymbol{\theta^{EM}}) \boldsymbol{\beta_2}) \; \mbox{I}(z_n=1)
-$$
-
-where $\beta_0$ is the intercept, $\beta_1$ are the covariate
-coefficients, $\beta_z$ and $\beta_2$ are the effect modifier
-coefficients, $z_n$ are the indicator variables of effect alternative
-treatment. $g(\cdot)$ is the link function e.g. $\log$.
-
-As already mentioned, running the STC analysis is almost identical to
-the previous analysis but we now use the `strategy_stc()` strategy
-function instead and a formula with centered covariates.
-
-$$
-y = X_3 + X_4 + \beta_t(X_1 - \bar{X_1}) + \beta_t(X_2 - \bar{X_2})
-$$
-
-However, `outstandR()` knows how to handle this so we can simply pass
-the same (uncentred) formula as before.
+regression model of outcome on treatment and covariates to the IPD. Simply pass
+the same as formula as before with the `strategy_stc()` strategy function.
```{r outstandR_stc}
outstandR_stc <-
@@ -250,52 +220,22 @@ outstandR_stc <-
outstandR_stc
```
-For the last two approaches, we perform G-computation firstly with a
-frequentist MLE approach and then a Bayesian approach.
### Parametric G-computation with maximum-likelihood estimation
G-computation marginalizes the conditional estimates by separating the
regression modelling from the estimation of the marginal treatment
-effect for *A* versus *C*. First, a regression model of the observed
-outcome $y$ on the covariates $x$ and treatment $z$ is fitted to the
-*AC* IPD:
-
-$$
-g(\mu_n) = \beta_0 + \boldsymbol{x}_n \boldsymbol{\beta_1} + (\beta_z + \boldsymbol{x_n^{EM}} \boldsymbol{\beta_2}) \; \mbox{I}(z_n = 1)
-$$
-
-In the context of G-computation, this regression model is often called
-the “Q-model.” Having fitted the Q-model, the regression coefficients
-are treated as nuisance parameters. The parameters are applied to the
-simulated covariates $x*$ to predict hypothetical outcomes for each
-subject under both possible treatments. Namely, a pair of predicted
-outcomes, also called potential outcomes, under *A* and under *C*, is
-generated for each subject.
-
-By plugging treatment *C* into the regression fit for every simulated
-observation, we predict the marginal outcome mean in the hypothetical
-scenario in which all units are under treatment *C*:
-
-$$
-\hat{\mu}_0 = \int_{x^*} g^{-1} (\hat{\beta}_0 + x^* \hat{\beta}_1 ) p(x^*) \; \text{d}x^*
-$$
-
-To estimate the marginal or population-average treatment effect for *A*
-versus *C* in the linear predictor scale, one back-transforms to this
-scale the average predictions, taken over all subjects on the natural
-outcome scale, and calculates the difference between the average linear
-predictions:
-
-$$
-\hat{\Delta}^{(2)}_{10} = g(\hat{\mu}_1) - g(\hat{\mu}_0)
-$$
+effect for *A* versus *C*.
+Pass the `strategy_gcomp_ml()` strategy function.
+
```{r outstandR_gcomp_ml}
outstandR_gcomp_ml <-
outstandR(ipd_trial, ald_trial,
- strategy = strategy_gcomp_ml(formula = lin_form,
- family = gaussian(link = "identity")))
+ strategy = strategy_gcomp_ml(
+ formula = lin_form,
+ family = gaussian(link = "identity")))
+
outstandR_gcomp_ml
```
@@ -307,32 +247,7 @@ The Bayesian approach also marginalizes, integrates or standardizes over
the joint posterior distribution of the conditional nuisance parameters
of the outcome regression, as well as the joint covariate distribution.
-Draw a vector of size $N^*$ of predicted outcomes $y^*_z$ under each set
-intervention $z^* \in \{0, 1\}$ from its posterior predictive
-distribution under the specific treatment. This is defined as
-$p(y^*_{z^*} \mid \mathcal{D}_{AC}) = \int_{\beta} p(y^*_{z^*} \mid \beta) p(\beta \mid \mathcal{D}_{AC}) d\beta$
-where $p(\beta \mid \mathcal{D}_{AC})$ is the posterior distribution of
-the outcome regression coefficients $\beta$, which encode the
-predictor-outcome relationships observed in the *AC* trial IPD. This is
-given by:
-
-$$
-p(y^*_{^z*} \mid \mathcal{D}_{AC}) = \int_{x^*} p(y^* \mid z^*, x^*, \mathcal{D}_{AC}) p(x^* \mid \mathcal{D}_{AC})\; \text{d}x^*
-$$
-
-$$
-= \int_{x^*} \int_{\beta} p(y^* \mid z^*, x^*, \beta) p(x^* \mid \beta) p(\beta \mid \mathcal{D}_{AC})\; d\beta \; \text{d}x^*
-$$
-
-In practice, the integrals above can be approximated numerically, using
-full Bayesian estimation via Markov chain Monte Carlo (MCMC) sampling.
-
-The average, variance and interval estimates of the marginal treatment
-effect can be derived empirically from draws of the posterior density.
-
-We can draw a vector of size $N^*$ of predicted outcomes $y^*_z$ under
-each set intervention $z^*$ from its posterior predictive distribution
-under the specific treatment.
+Pass the `strategy_gcomp_stan()` strategy function.
```{r outstandR_gcomp_stan, eval=FALSE}
outstandR_gcomp_stan <-
@@ -357,11 +272,7 @@ outstandR_gcomp_stan
### Multiple imputation marginalisation
-ref
-
-$$
-equation here
-$$
+Finally, the strategy function to pass to `outstandR()` for multiple imputation marginalisation is `strategy_mim()`,
```{r outstandR_mim, eval=FALSE}
outstandR_mim <-
@@ -388,14 +299,60 @@ outstandR_mim
Combine all outputs for relative effects table of all contrasts and methods.
-```{r}
-knitr::kable(
+```{r table-res, echo=FALSE}
+res_tab <-
data.frame(
+ # d_true = c(d_AB_true, d_AC_true, d_BC),
+ # d_naive = c(d_AB_naive, d_AC_naive$d_AC, d_BC),
`MAIC` = unlist(outstandR_maic$contrasts$means),
`STC` = unlist(outstandR_stc$contrasts$means),
`Gcomp ML` = unlist(outstandR_gcomp_ml$contrasts$means),
`Gcomp Bayes` = unlist(outstandR_gcomp_stan$contrasts$means),
- `MIM` = unlist(outstandR_mim$contrasts$means))
- |>
- round(2))
+ `MIM` = unlist(outstandR_mim$contrasts$means)) |>
+ round(2)
+
+res_tab_var <-
+ data.frame(
+ # d_true = c(NA, NA, NA),
+ # d_naive = c(var.d.AB.naive, d_AC_naive$var_AC, var.d.BC),
+ `MAIC` = unlist(outstandR_maic$contrasts$variances),
+ `STC` = unlist(outstandR_stc$contrasts$variances),
+ `Gcomp ML` = unlist(outstandR_gcomp_ml$contrasts$variances),
+ `Gcomp Bayes` = unlist(outstandR_gcomp_stan$contrasts$variances),
+ `MIM` = unlist(outstandR_mim$contrasts$variances)) |>
+ round(2)
+
+knitr::kable(res_tab)
+```
+
+```{r forest-res, fig.width=8, fig.height=6, warning=FALSE, message=FALSE, echo=FALSE}
+library(ggplot2)
+
+var_dat <-
+ t(res_tab_var) |>
+ as.data.frame() |>
+ tibble::rownames_to_column("type") |>
+ reshape2::melt(variable.name = "Comparison",
+ value.name = "var")
+
+plotdat <-
+ t(res_tab) |>
+ as.data.frame() |>
+ tibble::rownames_to_column("type") |>
+ reshape2::melt(variable.name = "Comparison",
+ value.name = "Estimate") |>
+ mutate(id = 1:n(),
+ type = as.factor(type)) |>
+ merge(var_dat) |>
+ mutate(lo = Estimate + qnorm(0.025) * sqrt(var),
+ hi = Estimate + qnorm(0.975) * sqrt(var))
+
+ggplot(aes(x = Estimate, y = id, col = type), data = plotdat) +
+ geom_vline(xintercept = 0, lty = 2) +
+ geom_point(size = 2) +
+ geom_segment(aes(y = id, yend = id, x = lo, xend = hi), na.rm = TRUE) +
+ xlab("Estimate (Log RR)") +
+ facet_grid(Comparison~., switch = "y", scales = "free_y", space = "free_y") +
+ scale_y_reverse(name = "Comparison in BC population",
+ breaks = NULL, expand = c(0, 0.6))
```
diff --git a/vignettes/Count_data_example.Rmd b/vignettes/Count_data_example.Rmd
index 922d0e6..ee5f5e7 100644
--- a/vignettes/Count_data_example.Rmd
+++ b/vignettes/Count_data_example.Rmd
@@ -14,13 +14,18 @@ vignette: >
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
- eval = FALSE, ##TODO:
+ eval = TRUE,
comment = "#>"
)
```
## Introduction
+This is the vignette for performing population adjustment methods with count data, in order to compare marginal
+treatment effects when there are cross-trial differences in effect modifiers and limited patient-level data.
+We will demonstrate how to apply MAIC, STC, G-computation with ML, G-computation with Bayesian inference and multiple imputation marginalisation.
+The document structure follow the binary data example vignette which should be referred to for more details.
+
## Example analysis
First, let us load necessary packages.
@@ -30,19 +35,20 @@ library(boot) # non-parametric bootstrap in MAIC and ML G-computation
library(copula) # simulating BC covariates from Gaussian copula
library(rstanarm) # fit outcome regression, draw outcomes in Bayesian G-computation
library(outstandR)
+library(tidyr)
library(simcovariates)
```
### Data
-First, use the `gen_data()` function available in the [simcovariates](https://github.com/n8thangreen/simcovariates) package.
+We first simulate both the IPD and ALD count data. See the binary data example vignette for more details on how this is implemented.
+The difference with that example is that we change the `family` argument in `gen_data()` to `poisson(link = "log")`, corresponding to the count data case.
+The `gen_data()` function is available in the [simcovariates](https://github.com/n8thangreen/simcovariates) package on GitHub.
-```{r warning=FALSE, message=FALSE}
+```{r, warning=FALSE, message=FALSE}
library(dplyr)
library(MASS)
-```
-```{r}
N <- 200
allocation <- 2/3 # active treatment vs. placebo allocation ratio (2:1)
b_trt <- log(0.17) # conditional effect of active treatment vs. common comparator
@@ -66,55 +72,100 @@ ipd_trial <- gen_data(N, b_trt, b_X, b_EM, b_0,
ipd_trial$trt <- factor(ipd_trial$trt, labels = c("C", "A"))
```
-Similarly, for the aggregate data but with the additional summarise step.
+Similarly, for the aggregate data but with the additional summarise step (see binary data example vignette for code).
-```{r generate-ald-data}
+```{r generate-ald-data, echo=FALSE, warning=FALSE, message=FALSE}
BC.IPD <- gen_data(N, b_trt, b_X, b_EM, b_0,
meanX_BC, sdX,
meanX_EM_BC, sdX_EM,
corX, allocation,
family = poisson(link = "log"))
-cov.X <- BC.IPD %>%
- summarise(across(starts_with("X"),
- list(mean = mean, sd = sd),
- .names = "{fn}.{col}"))
-
-out.C <- dplyr::filter(BC.IPD, trt == 1) %>%
- summarise(y.B.sum = sum(y),
- y.B.bar = mean(y),
- N.B = n())
-
-out.B <- dplyr::filter(BC.IPD, trt == 0) %>%
- summarise(y.C.sum = sum(y),
- y.C.bar = mean(y),
- N.C = n())
+BC.IPD$trt <- factor(BC.IPD$trt, labels = c("C", "B"))
+
+# covariate summary statistics
+# assume same between treatments
+cov.X <-
+ BC.IPD %>%
+ as.data.frame() |>
+ dplyr::select(X1, X2, X3, X4, trt) %>%
+ pivot_longer(cols = starts_with("X"), names_to = "variable", values_to = "value") %>%
+ group_by(variable) %>%
+ summarise(
+ mean = mean(value),
+ sd = sd(value)
+ ) %>%
+ pivot_longer(cols = c("mean", "sd"), names_to = "statistic", values_to = "value") %>%
+ ungroup() |>
+ mutate(trt = NA)
+
+# outcome
+summary.y <-
+ BC.IPD |>
+ as.data.frame() |>
+ dplyr::select(y, trt) %>%
+ pivot_longer(cols = "y", names_to = "variable", values_to = "value") %>%
+ group_by(variable, trt) %>%
+ summarise(
+ mean = mean(value),
+ sd = sd(value),
+ sum = sum(value),
+ ) %>%
+ pivot_longer(cols = c("mean", "sd", "sum"),
+ names_to = "statistic", values_to = "value") %>%
+ ungroup()
+
+# sample sizes
+summary.N <-
+ BC.IPD |>
+ group_by(trt) |>
+ count(name = "N") |>
+ pivot_longer(cols = "N", names_to = "statistic", values_to = "value") |>
+ mutate(variable = NA_character_) |>
+ dplyr::select(variable, statistic, value, trt)
+
+ald_trial <- rbind.data.frame(cov.X, summary.y, summary.N)
+```
-ald_trial <- cbind.data.frame(cov.X, out.C, out.B)
+This general format of the data sets are in a 'long' style consisting of the following.
-# By definition, the true log-OR relative effect in the AC population between AC is `r b_trt`. Between BC is `r b_trt` and between AB is `r b_trt - b_trt`.
-# A naive indirect comparison, ignoring the presence of effect modifiers, would calculate the C vs B effect in the AC population as
-# `b_trt - b_trt`.
-```
+#### `ipd_trial`: Individual patient data
+- `X*`: Patient measurements
+- `trt`: Treatment label (factor)
+- `y`: Counts
+#### `ald_trial`: Aggregate-level data
+- `variable`: Covariate name. In the case of treatment arm sample size
+ this is `NA`
+- `statistic`: Summary statistic name from mean, standard deviation or
+ sum
+- `value`: Numerical value of summary statistic
+- `trt`: Treatment label. Because we assume a common covariate
+ distribution between treatment arms this is `NA`
-### Output statistics
+Our data look like the following.
-We will implement for MAIC, STC, and G-computation methods to obtain the
-*marginal variance*, defined as
+```{r}
+head(ipd_trial)
+```
-$$
-$$
+There are 4 correlated continuous covariates generated per subject,
+simulated from a multivariate normal distribution. Treatment `trt` takes
+either new treatment *A* or standard of care / status quo *C*. The ITC
+is 'anchored' via *C*, the common treatment.
-and the *marginal treatment effect*, defined as the log-odds ratio,
+```{r}
+ald_trial
+```
-$$
-$$
+In this case, we have 4 covariate mean and standard deviation values;
+and the total, average and sample size for each treatment *B* and
+*C*.
-where $\bar{C}$ is the compliment of $C$ so e.g.
-$n_{\bar{C}} = N_C - n_c$.
+In the following we will implement for MAIC, STC, and G-computation methods to obtain the
+*marginal variance* and the *marginal treatment effect*.
## Model fitting in R
@@ -129,28 +180,17 @@ A `strategy` argument of `outstandR` takes functions called
particular method required, e.g. `strategy_maic()` for MAIC. Each
specific example is provided below.
-### MAIC
-
-Using the individual level data for *AC* firstly we perform
-non-parametric bootstrap of the `maic.boot` function with `R = 1000`
-replicates. This function fits treatment coefficient for the marginal
-effect for *A* vs *C*. The returned value is an object of class `boot`
-from the `{boot}` package. We then calculate the bootstrap mean and
-variance in the wrapper function `maic_boot_stats()`.
-
-The formula used in this model is
-
-$$
-y = X_3 + X_4 + \beta_t X_1 + \beta_t X_2
-$$
-
-which corresponds to the following `R` `formula` object passed as an
-argument to the strategy function.
+The formula used in this model, passed as an
+argument to the strategy function is
```{r}
lin_form <- as.formula("y ~ X3 + X4 + trt*X1 + trt*X2")
```
+### MAIC
+
+As mentioned above, pass the model specific strategy function to the main `outstandR()` function, in this case use `strategy_maic()`.
+
```{r outstandR_maic}
outstandR_maic <-
outstandR(ipd_trial, ald_trial,
@@ -165,37 +205,12 @@ The returned object is of class `outstandR`.
outstandR_maic
```
-We see that this is a list object with 3 parts, each containing
-statistics between each pair of treatments. These are the mean
-contrasts, variances and confidence intervals (CI), respectively. The
-default CI is for 95% but can be altered in `outstandR` with the `CI`
-argument.
-### STC
+### Simulated Treatment Comparison (STC)
STC is the conventional outcome regression method. It involves fitting a
-regression model of outcome on treatment and covariates to the IPD. IPD
-effect modifiers are centred at the mean *BC* values.
-
-$$
-g(\mu_n) = \beta_0 + (\boldsymbol{x}_n - \boldsymbol{\theta}) \beta_1 + (\beta_z + (\boldsymbol{x_n^{EM}} - \boldsymbol{\theta^{EM}}) \boldsymbol{\beta_2}) \; \mbox{I}(z_n=1)
-$$
-
-where $\beta_0$ is the intercept, $\beta_1$ are the covariate
-coefficients, $\beta_z$ and $\beta_2$ are the effect modifier
-coefficients, $z_n$ are the indicator variables of effect alternative
-treatment. $g(\cdot)$ is the link function e.g. $\log$.
-
-As already mentioned, running the STC analysis is almost identical to
-the previous analysis but we now use the `strategy_stc()` strategy
-function instead and a formula with centered covariates.
-
-$$
-y = X_3 + X_4 + \beta_t(X_1 - \bar{X_1}) + \beta_t(X_2 - \bar{X_2})
-$$
-
-However, `outstandR()` knows how to handle this so we can simply pass
-the same (uncentred) formula as before.
+regression model of outcome on treatment and covariates to the IPD. Simply pass
+the same as formula as before with the `strategy_stc()` strategy function.
```{r outstandR_stc}
outstandR_stc <-
@@ -203,58 +218,24 @@ outstandR_stc <-
strategy = strategy_stc(
formula = lin_form,
family = poisson(link = "log")))
-
outstandR_stc
```
-For the last two approaches, we perform G-computation firstly with a
-frequentist MLE approach and then a Bayesian approach.
### Parametric G-computation with maximum-likelihood estimation
G-computation marginalizes the conditional estimates by separating the
regression modelling from the estimation of the marginal treatment
-effect for *A* versus *C*. First, a regression model of the observed
-outcome $y$ on the covariates $x$ and treatment $z$ is fitted to the
-*AC* IPD:
-
-$$
-g(\mu_n) = \beta_0 + \boldsymbol{x}_n \boldsymbol{\beta_1} + (\beta_z + \boldsymbol{x_n^{EM}} \boldsymbol{\beta_2}) \; \mbox{I}(z_n = 1)
-$$
-
-In the context of G-computation, this regression model is often called
-the “Q-model.” Having fitted the Q-model, the regression coefficients
-are treated as nuisance parameters. The parameters are applied to the
-simulated covariates $x*$ to predict hypothetical outcomes for each
-subject under both possible treatments. Namely, a pair of predicted
-outcomes, also called potential outcomes, under *A* and under *C*, is
-generated for each subject.
-
-By plugging treatment *C* into the regression fit for every simulated
-observation, we predict the marginal outcome mean in the hypothetical
-scenario in which all units are under treatment *C*:
-
-$$
-\hat{\mu}_0 = \int_{x^*} g^{-1} (\hat{\beta}_0 + x^* \hat{\beta}_1 ) p(x^*) \; \text{d}x^*
-$$
-
-To estimate the marginal or population-average treatment effect for *A*
-versus *C* in the linear predictor scale, one back-transforms to this
-scale the average predictions, taken over all subjects on the natural
-outcome scale, and calculates the difference between the average linear
-predictions:
-
-$$
-\hat{\Delta}^{(2)}_{10} = g(\hat{\mu}_1) - g(\hat{\mu}_0)
-$$
-
-```{r outstandR_gcomp_ml}
+effect for *A* versus *C*.
+Pass the `strategy_gcomp_ml()` strategy function.
+
+
+```{r outstandR_gcomp_ml, message=FALSE, warning=FALSE}
outstandR_gcomp_ml <-
outstandR(ipd_trial, ald_trial,
strategy = strategy_gcomp_ml(
formula = lin_form,
family = poisson(link = "log")))
-
outstandR_gcomp_ml
```
@@ -266,34 +247,9 @@ The Bayesian approach also marginalizes, integrates or standardizes over
the joint posterior distribution of the conditional nuisance parameters
of the outcome regression, as well as the joint covariate distribution.
-Draw a vector of size $N^*$ of predicted outcomes $y^*_z$ under each set
-intervention $z^* \in \{0, 1\}$ from its posterior predictive
-distribution under the specific treatment. This is defined as
-$p(y^*_{z^*} \mid \mathcal{D}_{AC}) = \int_{\beta} p(y^*_{z^*} \mid \beta) p(\beta \mid \mathcal{D}_{AC}) d\beta$
-where $p(\beta \mid \mathcal{D}_{AC})$ is the posterior distribution of
-the outcome regression coefficients $\beta$, which encode the
-predictor-outcome relationships observed in the *AC* trial IPD. This is
-given by:
-
-$$
-p(y^*_{^z*} \mid \mathcal{D}_{AC}) = \int_{x^*} p(y^* \mid z^*, x^*, \mathcal{D}_{AC}) p(x^* \mid \mathcal{D}_{AC})\; \text{d}x^*
-$$
-
-$$
-= \int_{x^*} \int_{\beta} p(y^* \mid z^*, x^*, \beta) p(x^* \mid \beta) p(\beta \mid \mathcal{D}_{AC})\; d\beta \; \text{d}x^*
-$$
-
-In practice, the integrals above can be approximated numerically, using
-full Bayesian estimation via Markov chain Monte Carlo (MCMC) sampling.
-
-The average, variance and interval estimates of the marginal treatment
-effect can be derived empirically from draws of the posterior density.
+Pass the `strategy_gcomp_stan()` strategy function.
-We can draw a vector of size $N^*$ of predicted outcomes $y^*_z$ under
-each set intervention $z^*$ from its posterior predictive distribution
-under the specific treatment.
-
-```{r outstandR_gcomp_stan, eval=FALSE}
+```{r outstandR_gcomp_stan, eval=FALSE, message=FALSE, warning=FALSE}
outstandR_gcomp_stan <-
outstandR(ipd_trial, ald_trial,
strategy = strategy_gcomp_stan(
@@ -301,7 +257,7 @@ outstandR_gcomp_stan <-
family = poisson(link = "log")))
```
-```{r outstandR_gcomp_stan_eval, echo=FALSE}
+```{r outstandR_gcomp_stan_eval, echo=FALSE, message=FALSE, warning=FALSE}
xx <- capture.output(
outstandR_gcomp_stan <-
outstandR(ipd_trial, ald_trial,
@@ -316,33 +272,87 @@ outstandR_gcomp_stan
### Multiple imputation marginalisation
-ref
-
-$$
-equation here
-$$
+Finally, the strategy function to pass to `outstandR()` for multiple imputation marginalisation is `strategy_mim()`,
-```{r outstandR_mim}
+```{r outstandR_mim, eval=FALSE}
outstandR_mim <-
outstandR(ipd_trial, ald_trial,
strategy = strategy_mim(
formula = lin_form,
family = poisson(link = "log")))
+```
+
+```{r outstandR_mim_eval, echo=FALSE}
+xx <- capture.output(
+ outstandR_mim <-
+ outstandR(ipd_trial, ald_trial,
+ strategy = strategy_mim(
+ formula = lin_form,
+ family = poisson(link = "log"))))
+```
+
+```{r}
outstandR_mim
```
### Model comparison
-Combine all outputs for log-odds ratio table of all contrasts and methods.
+Combine all outputs for relative effects table of all contrasts and methods.
-```{r}
-knitr::kable(
+```{r table-res, echo=FALSE}
+res_tab <-
data.frame(
+ # d_true = c(d_AB_true, d_AC_true, d_BC),
+ # d_naive = c(d_AB_naive, d_AC_naive$d_AC, d_BC),
`MAIC` = unlist(outstandR_maic$contrasts$means),
`STC` = unlist(outstandR_stc$contrasts$means),
`Gcomp ML` = unlist(outstandR_gcomp_ml$contrasts$means),
`Gcomp Bayes` = unlist(outstandR_gcomp_stan$contrasts$means),
- `MIM` = unlist(outstandR_mim$contrasts$means))
-) |>
+ `MIM` = unlist(outstandR_mim$contrasts$means)) |>
round(2)
+
+res_tab_var <-
+ data.frame(
+ # d_true = c(NA, NA, NA),
+ # d_naive = c(var.d.AB.naive, d_AC_naive$var_AC, var.d.BC),
+ `MAIC` = unlist(outstandR_maic$contrasts$variances),
+ `STC` = unlist(outstandR_stc$contrasts$variances),
+ `Gcomp ML` = unlist(outstandR_gcomp_ml$contrasts$variances),
+ `Gcomp Bayes` = unlist(outstandR_gcomp_stan$contrasts$variances),
+ `MIM` = unlist(outstandR_mim$contrasts$variances)) |>
+ round(2)
+
+knitr::kable(res_tab)
+```
+
+```{r forest-res, fig.width=8, fig.height=6, warning=FALSE, message=FALSE, echo=FALSE}
+library(ggplot2)
+
+var_dat <-
+ t(res_tab_var) |>
+ as.data.frame() |>
+ tibble::rownames_to_column("type") |>
+ reshape2::melt(variable.name = "Comparison",
+ value.name = "var")
+
+plotdat <-
+ t(res_tab) |>
+ as.data.frame() |>
+ tibble::rownames_to_column("type") |>
+ reshape2::melt(variable.name = "Comparison",
+ value.name = "Estimate") |>
+ mutate(id = 1:n(),
+ type = as.factor(type)) |>
+ merge(var_dat) |>
+ mutate(lo = Estimate + qnorm(0.025) * sqrt(var),
+ hi = Estimate + qnorm(0.975) * sqrt(var))
+
+ggplot(aes(x = Estimate, y = id, col = type), data = plotdat) +
+ geom_vline(xintercept = 0, lty = 2) +
+ geom_point(size = 2) +
+ geom_segment(aes(y = id, yend = id, x = lo, xend = hi), na.rm = TRUE) +
+ xlab("Estimate (Log RR)") +
+ facet_grid(Comparison~., switch = "y", scales = "free_y", space = "free_y") +
+ scale_y_reverse(name = "Comparison in BC population",
+ breaks = NULL, expand = c(0, 0.6))
```