-
Notifications
You must be signed in to change notification settings - Fork 3
Add cluster-robust standard errors and multi-level clustering support for clustered sampling designs #475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Copilot
wants to merge
30
commits into
main
Choose a base branch
from
copilot/adjust-seroincidence-likelihood
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+1,098
−84
Open
Add cluster-robust standard errors and multi-level clustering support for clustered sampling designs #475
Changes from all commits
Commits
Show all changes
30 commits
Select commit
Hold shift + click to select a range
1c361e1
Initial plan
Copilot b1f321f
Add cluster-robust standard error support to est_seroincidence()
Copilot 46173c1
Fix linting issues and finalize cluster-robust SE implementation
Copilot 28b7b67
Refactor call to compute_cluster_robust_var
d-morrison 44a6bbb
Merge branch 'main' into copilot/adjust-seroincidence-likelihood
d-morrison a19f661
Fix linting issues: replace base R messaging with cli functions
Copilot fa216ed
Add se_type column and compute_icc() function for clustering analysis
Copilot 8567a5e
Fix est_seroincidence_by to support clustering and add comprehensive …
Copilot 01a66de
Extend compute_icc to work with est_seroincidence_by and add antigen_…
Copilot b8a557c
Add min/max cluster sizes and support multi-level clustering
Copilot 1c821c7
Merge branch 'main' into copilot/adjust-seroincidence-likelihood
d-morrison 6034fcf
Address code review feedback: fix cli formatting, line lengths, use m…
Copilot 5e153a3
Refactor clustering code: decompose into separate files per code orga…
Copilot d0fc3d1
Merge branch 'main' into copilot/adjust-seroincidence-likelihood
kaiemjoy 11c3e00
Remove compute_icc() functionality per user request
Copilot 81b14c3
Address code review feedback: remove ICC mentions, extract validation…
Copilot f7b3426
Add clustering documentation to vignettes and mark man/ as linguist-g…
Copilot 33a6826
Increment version and perform quality control checks
Copilot 449f29a
Update vignettes per code review: use actual SEES data variables with…
Copilot 96c583d
Merge branch 'main' into copilot/adjust-seroincidence-likelihood
d-morrison c275206
Fix workflow failures: increment version and address linting issues
Copilot 7f824b1
fix lints
d-morrison c4cb76b
Add version management guidelines to copilot-instructions.md
Copilot ab6426a
Merge branch 'main' into copilot/adjust-seroincidence-likelihood
d-morrison 49b95e1
Fix vignette rendering error: filter noise params to Pakistan in clus…
Copilot 3700ee6
Refactor methodology vignette: move cluster-robust SE content to subfile
Copilot e36a5ef
Improve cluster-robust SE documentation: add symbol definitions and m…
Copilot 3fee00a
more
d-morrison e02d6e7
Enhance vignette: add cross-references and comparisons for clustering…
Copilot 47aa916
Remove multi-level clustering example from enteric fever vignette
Copilot File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1 +1,2 @@ | ||
| NEWS.md merge=union | ||
| man/* linguist-generated |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,119 @@ | ||
| #' Compute cluster-robust variance for seroincidence estimates | ||
| #' | ||
| #' @description | ||
| #' Computes cluster-robust (sandwich) variance estimates for seroincidence | ||
| #' parameter estimates when data come from a clustered sampling design. | ||
| #' This adjusts the standard errors to account for within-cluster correlation. | ||
| #' | ||
| #' @param fit a `seroincidence` object from [est_seroincidence()] | ||
| #' @param cluster_var name(s) of the cluster variable(s) in the data. | ||
| #' Can be a single variable or vector of variables for multi-level clustering. | ||
| #' @param stratum_var optional name of the stratum variable | ||
| #' | ||
| #' @return variance of log(lambda) accounting for clustering | ||
| #' @keywords internal | ||
| #' @noRd | ||
| .compute_cluster_robust_var <- function( | ||
| fit, | ||
| cluster_var, | ||
| stratum_var = NULL) { | ||
| # Extract stored data (already split by antigen_iso) | ||
| pop_data_list <- attr(fit, "pop_data") | ||
| sr_params_list <- attr(fit, "sr_params") | ||
| noise_params_list <- attr(fit, "noise_params") | ||
| antigen_isos <- attr(fit, "antigen_isos") | ||
|
|
||
| # Get MLE estimate | ||
| log_lambda_mle <- fit$estimate | ||
|
|
||
| # Combine pop_data list back into a single data frame | ||
| # to get cluster info | ||
| pop_data_combined <- do.call(rbind, pop_data_list) | ||
|
|
||
| # Compute score (gradient) using numerical differentiation | ||
| # The score is the derivative of log-likelihood w.r.t. log(lambda) | ||
| epsilon <- 1e-6 | ||
|
|
||
| # For each observation, compute the contribution to the score | ||
| # We need to identify which cluster each observation belongs to | ||
|
|
||
| # Handle multiple clustering levels by creating composite cluster ID | ||
| if (length(cluster_var) == 1) { | ||
| cluster_ids <- pop_data_combined[[cluster_var]] | ||
| } else { | ||
| # Create composite cluster ID from multiple variables | ||
| cluster_ids <- interaction( | ||
| pop_data_combined[, cluster_var, drop = FALSE], | ||
| drop = TRUE, | ||
| sep = "_" | ||
| ) | ||
| } | ||
|
|
||
| # Get unique clusters | ||
| unique_clusters <- unique(cluster_ids) | ||
| n_clusters <- length(unique_clusters) | ||
|
|
||
| # Compute cluster-level scores | ||
| cluster_scores <- numeric(n_clusters) | ||
|
|
||
| for (i in seq_along(unique_clusters)) { | ||
| cluster_id <- unique_clusters[i] | ||
|
|
||
| # Get observations in this cluster | ||
| cluster_mask <- cluster_ids == cluster_id | ||
|
|
||
| # Create temporary pop_data with only this cluster | ||
| pop_data_cluster <- pop_data_combined[cluster_mask, , drop = FALSE] | ||
|
|
||
| # Split by antigen | ||
| pop_data_cluster_list <- split( | ||
| pop_data_cluster, | ||
| pop_data_cluster$antigen_iso | ||
| ) | ||
|
|
||
| # Ensure all antigen_isos are represented | ||
| # (add empty data frames if missing) | ||
| for (ag in antigen_isos) { | ||
| if (!ag %in% names(pop_data_cluster_list)) { | ||
| # Create empty data frame with correct structure | ||
| pop_data_cluster_list[[ag]] <- pop_data_list[[ag]][0, , drop = FALSE] | ||
| } | ||
| } | ||
|
|
||
| # Compute log-likelihood for this cluster at MLE | ||
| ll_cluster_mle <- -(.nll( | ||
| log.lambda = log_lambda_mle, | ||
| pop_data = pop_data_cluster_list, | ||
| antigen_isos = antigen_isos, | ||
| curve_params = sr_params_list, | ||
| noise_params = noise_params_list, | ||
| verbose = FALSE | ||
| )) | ||
|
|
||
| # Compute log-likelihood at MLE + epsilon | ||
| ll_cluster_plus <- -(.nll( | ||
| log.lambda = log_lambda_mle + epsilon, | ||
| pop_data = pop_data_cluster_list, | ||
| antigen_isos = antigen_isos, | ||
| curve_params = sr_params_list, | ||
| noise_params = noise_params_list, | ||
| verbose = FALSE | ||
| )) | ||
|
|
||
| # Numerical derivative (score for this cluster) | ||
| cluster_scores[i] <- (ll_cluster_plus - ll_cluster_mle) / epsilon | ||
| } | ||
|
|
||
| # Compute B matrix (middle of sandwich) | ||
| # B = sum of outer products of cluster scores | ||
| b_matrix <- sum(cluster_scores^2) # nolint: object_name_linter | ||
|
|
||
| # Get Hessian (already computed by nlm) | ||
| h_matrix <- fit$hessian # nolint: object_name_linter | ||
|
|
||
| # Sandwich variance: V = H^(-1) * B * H^(-1) | ||
| # Since we have a scalar parameter, this simplifies to: | ||
| var_log_lambda_robust <- b_matrix / (h_matrix^2) | ||
|
|
||
| return(var_log_lambda_robust) | ||
| } |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.