Skip to content
Open
Show file tree
Hide file tree
Changes from 26 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
1c361e1
Initial plan
Copilot Jan 13, 2026
b1f321f
Add cluster-robust standard error support to est_seroincidence()
Copilot Jan 13, 2026
46173c1
Fix linting issues and finalize cluster-robust SE implementation
Copilot Jan 13, 2026
28b7b67
Refactor call to compute_cluster_robust_var
d-morrison Jan 14, 2026
44a6bbb
Merge branch 'main' into copilot/adjust-seroincidence-likelihood
d-morrison Jan 15, 2026
a19f661
Fix linting issues: replace base R messaging with cli functions
Copilot Jan 15, 2026
fa216ed
Add se_type column and compute_icc() function for clustering analysis
Copilot Jan 15, 2026
8567a5e
Fix est_seroincidence_by to support clustering and add comprehensive …
Copilot Jan 15, 2026
01a66de
Extend compute_icc to work with est_seroincidence_by and add antigen_…
Copilot Jan 16, 2026
b8a557c
Add min/max cluster sizes and support multi-level clustering
Copilot Jan 16, 2026
1c821c7
Merge branch 'main' into copilot/adjust-seroincidence-likelihood
d-morrison Jan 20, 2026
6034fcf
Address code review feedback: fix cli formatting, line lengths, use m…
Copilot Jan 20, 2026
5e153a3
Refactor clustering code: decompose into separate files per code orga…
Copilot Jan 20, 2026
d0fc3d1
Merge branch 'main' into copilot/adjust-seroincidence-likelihood
kaiemjoy Jan 21, 2026
11c3e00
Remove compute_icc() functionality per user request
Copilot Jan 27, 2026
81b14c3
Address code review feedback: remove ICC mentions, extract validation…
Copilot Jan 27, 2026
f7b3426
Add clustering documentation to vignettes and mark man/ as linguist-g…
Copilot Jan 27, 2026
33a6826
Increment version and perform quality control checks
Copilot Jan 27, 2026
449f29a
Update vignettes per code review: use actual SEES data variables with…
Copilot Jan 27, 2026
96c583d
Merge branch 'main' into copilot/adjust-seroincidence-likelihood
d-morrison Jan 27, 2026
c275206
Fix workflow failures: increment version and address linting issues
Copilot Jan 27, 2026
7f824b1
fix lints
d-morrison Jan 27, 2026
c4cb76b
Add version management guidelines to copilot-instructions.md
Copilot Jan 27, 2026
ab6426a
Merge branch 'main' into copilot/adjust-seroincidence-likelihood
d-morrison Jan 27, 2026
49b95e1
Fix vignette rendering error: filter noise params to Pakistan in clus…
Copilot Jan 27, 2026
3700ee6
Refactor methodology vignette: move cluster-robust SE content to subfile
Copilot Jan 27, 2026
e36a5ef
Improve cluster-robust SE documentation: add symbol definitions and m…
Copilot Jan 27, 2026
3fee00a
more
d-morrison Jan 28, 2026
e02d6e7
Enhance vignette: add cross-references and comparisons for clustering…
Copilot Jan 28, 2026
47aa916
Remove multi-level clustering example from enteric fever vignette
Copilot Jan 28, 2026
2cfeb4b
Update conclusions to include clustering findings and reflect actual …
Copilot Feb 6, 2026
10c99ae
Add cluster-robust CIs and p-value to conclusions with country compar…
Copilot Feb 6, 2026
31e636b
Expand cluster-robust comparison to include all 3 countries with pair…
Copilot Feb 6, 2026
581e276
Fix blank CI bounds in vignette conclusions by correcting column names
Copilot Feb 6, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
NEWS.md merge=union
man/* linguist-generated
68 changes: 68 additions & 0 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -244,6 +244,26 @@ To regenerate:
rmarkdown::render("README.Rmd")
```

### Version Management

**CRITICAL**: Always ensure the development version in your PR branch is one version number higher than the main branch.

```r
# Check current version
desc::desc_get_version()

# Increment development version (use this for PRs)
usethis::use_version('dev')
```

**Version Check Workflow**: The `version-check.yaml` workflow will fail if your PR branch version is not higher than the main branch version. Before requesting PR review, always:

1. Check the current version on the main branch (look at DESCRIPTION on main)
2. Ensure your PR branch version is at least one development version higher
3. If main is at 1.4.0.9003, your PR should be at minimum 1.4.0.9004

**Why this matters**: This ensures proper version tracking and prevents conflicts when multiple PRs are merged.

### Package Checking

Run R CMD check to validate the package:
Expand Down Expand Up @@ -469,6 +489,54 @@ expect_false(has_missing_values(complete_data))
4. **Run tests**: Validate all tests pass, updating snapshots only when changes are intentional
5. **Review snapshots**: When snapshots change, review the diff to ensure changes are expected

## Code Organization Policies

**CRITICAL**: Follow these strict code organization policies for all new code and refactoring work:

### File Organization

1. **One function per file**: Each exported function and its associated S3 methods should be in its own file
- File name should match the function name (e.g., `summary.seroincidence.R` for `summary.seroincidence()`)
- S3 methods for the same generic can be in the same file (e.g., `compare_seroincidence.seroincidence()`, `compare_seroincidence.seroincidence.by()`, and `compare_seroincidence.default()` all in `compare_seroincidence.R`)

2. **Internal helper functions**: Move to separate files
- Use descriptive file names (e.g., `compute_cluster_robust_var.R` for `.compute_cluster_robust_var()`)
- Keep related internal functions together when logical
- Internal functions should use `.function_name()` naming convention

3. **Print methods**: Each print method in its own file
- File name: `print.{class_name}.R` (e.g., `print.seroincidence.R`)

4. **Extract anonymous functions**: Convert complex anonymous functions to named helper functions in separate files
- If an anonymous function is longer than ~5 lines, extract it
- Name should describe its purpose (e.g., `.helper_function_name()`)

### Example Organization

1. **Long examples**: Move to `inst/examples/exm-{function_name}.R`
- Use `@example inst/examples/exm-{function_name}.R` in roxygen documentation
- Keep inline `@examples` short (1-3 lines) for simple demonstrations

2. **Example file naming**: `exm-{function_name}.R`
- Example: `exm-est_seroincidence.R` for `est_seroincidence()` examples

### Benefits

- **Easier navigation**: Find functions quickly by file name
- **Better git history**: Changes to one function don't pollute history of unrelated functions
- **Clearer code review**: Reviewers can focus on individual functions
- **Reduced merge conflicts**: Multiple people can work on different functions simultaneously
- **Better organization**: Logical structure makes codebase more maintainable

### Migration Strategy

When refactoring existing code:
1. Extract functions to separate files
2. Update any internal calls if needed
3. Run `devtools::document()` to regenerate documentation
4. Run `devtools::check()` to ensure no breakage
5. Run tests to verify functionality unchanged

## Code Style Guidelines

- **Follow tidyverse style guide**: https://style.tidyverse.org
Expand Down
4 changes: 2 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Type: Package
Package: serocalculator
Title: Estimating Infection Rates from Serological Data
Version: 1.4.0.9003
Version: 1.4.0.9004
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The version number has been incremented from 1.4.0.9003 to 1.4.0.9004, but according to the PR description, the version was bumped to 1.4.0.9002. This inconsistency suggests either the PR description is outdated or multiple version bumps have occurred. Please ensure version numbers are correctly documented and match the PR description.

Suggested change
Version: 1.4.0.9004
Version: 1.4.0.9002

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 6034fcf. Version number corrected to 1.4.0.9002 to match the PR description.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

always make sure the dev version is one past the main branch. make a note about this in copilot-instructions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in c4cb76b. Added "Version Management" section to copilot-instructions.md with clear guidelines:

  • Always ensure PR dev version is one higher than main branch
  • How to check and increment version using usethis::use_version('dev')
  • Explanation of why this matters for version-check.yaml workflow
  • Step-by-step process for version management before PR review

Authors@R: c(
person("Kristina", "Lai", , "kwlai@ucdavis.edu", role = c("aut", "cre")),
person("Chris", "Orwa", role = "aut"),
Expand Down Expand Up @@ -77,7 +77,7 @@ Encoding: UTF-8
Language: en-US
LazyData: true
NeedsCompilation: no
Roxygen: list(markdown = TRUE, roclets = c("collate", "rd", "namespace", "devtag::dev_roclet"))
Roxygen: list(markdown = TRUE, roclets = c("collate", "rd", "namespace"))
RoxygenNote: 7.3.3
Remotes:
d-morrison/snapr
33 changes: 33 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,42 @@
# serocalculator (development version)

## New features

* Added `cluster_var` and `stratum_var` parameters to `est_seroincidence()` and
`est_seroincidence_by()` to support cluster-robust standard error estimation.
When `cluster_var` is specified, `summary.seroincidence()` automatically computes
cluster-robust (sandwich) variance estimates to account for within-cluster
correlation in clustered sampling designs such as household or school-based surveys.
* `cluster_var` parameter now accepts multiple variables (e.g., `c("school", "classroom")`)
for multi-level clustered sampling designs. Cluster-robust standard errors will account
for all specified clustering levels.

## Bug fixes

* Fixed column naming issue in `summary.seroincidence()` where cluster-robust standard
errors caused `[]` notation in column names (`SE[,1]` instead of `SE`).
* Added `se_type` column to `summary.seroincidence()` output to clearly indicate whether
"standard" or "cluster-robust" standard errors are being used.
* Fixed `est_seroincidence_by()` to properly pass cluster and stratum variables through
to stratified analyses. Previously, these variables were dropped during data stratification,
causing errors when trying to use clustering with `est_seroincidence_by()`.

## Code organization

* Refactored clustering-related code following package organization policies:
- Moved `.compute_cluster_robust_var()` to `R/compute_cluster_robust_var.R`
- Each function now in its own file for better maintainability and git history
* Updated copilot-instructions.md with code organization policies

# serocalculator 1.4.0

## New features

* Added support for cluster-robust standard errors in `est_seroincidence()` through
new `cluster_var` and `stratum_var` parameters. When `cluster_var` is specified,
`summary.seroincidence()` automatically computes cluster-robust (sandwich) variance
estimates to account for within-cluster correlation in clustered sampling designs
such as household or school-based surveys.
* Added `compare_seroincidence()` function for statistical comparison of seroincidence rates
- Performs two-sample z-tests to compare seroincidence estimates
- Returns `htest` format when comparing two single estimates
Expand Down
119 changes: 119 additions & 0 deletions R/compute_cluster_robust_var.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
#' Compute cluster-robust variance for seroincidence estimates
#'
#' @description
#' Computes cluster-robust (sandwich) variance estimates for seroincidence
#' parameter estimates when data come from a clustered sampling design.
#' This adjusts the standard errors to account for within-cluster correlation.
#'
#' @param fit a `seroincidence` object from [est_seroincidence()]
#' @param cluster_var name(s) of the cluster variable(s) in the data.
#' Can be a single variable or vector of variables for multi-level clustering.
#' @param stratum_var optional name of the stratum variable
#'
#' @return variance of log(lambda) accounting for clustering
#' @keywords internal
#' @noRd
.compute_cluster_robust_var <- function(
fit,
cluster_var,
stratum_var = NULL) {
# Extract stored data (already split by antigen_iso)
pop_data_list <- attr(fit, "pop_data")
sr_params_list <- attr(fit, "sr_params")
noise_params_list <- attr(fit, "noise_params")
antigen_isos <- attr(fit, "antigen_isos")

# Get MLE estimate
log_lambda_mle <- fit$estimate

# Combine pop_data list back into a single data frame
# to get cluster info
pop_data_combined <- do.call(rbind, pop_data_list)

# Compute score (gradient) using numerical differentiation
# The score is the derivative of log-likelihood w.r.t. log(lambda)
epsilon <- 1e-6

# For each observation, compute the contribution to the score
# We need to identify which cluster each observation belongs to

# Handle multiple clustering levels by creating composite cluster ID
if (length(cluster_var) == 1) {
cluster_ids <- pop_data_combined[[cluster_var]]
} else {
# Create composite cluster ID from multiple variables
cluster_ids <- interaction(
pop_data_combined[, cluster_var, drop = FALSE],
drop = TRUE,
sep = "_"
)
}

# Get unique clusters
unique_clusters <- unique(cluster_ids)
n_clusters <- length(unique_clusters)

# Compute cluster-level scores
cluster_scores <- numeric(n_clusters)

for (i in seq_along(unique_clusters)) {
cluster_id <- unique_clusters[i]

# Get observations in this cluster
cluster_mask <- cluster_ids == cluster_id

# Create temporary pop_data with only this cluster
pop_data_cluster <- pop_data_combined[cluster_mask, , drop = FALSE]

# Split by antigen
pop_data_cluster_list <- split(
pop_data_cluster,
pop_data_cluster$antigen_iso
)

# Ensure all antigen_isos are represented
# (add empty data frames if missing)
for (ag in antigen_isos) {
if (!ag %in% names(pop_data_cluster_list)) {
# Create empty data frame with correct structure
pop_data_cluster_list[[ag]] <- pop_data_list[[ag]][0, , drop = FALSE]
}
}

# Compute log-likelihood for this cluster at MLE
ll_cluster_mle <- -(.nll(
log.lambda = log_lambda_mle,
pop_data = pop_data_cluster_list,
antigen_isos = antigen_isos,
curve_params = sr_params_list,
noise_params = noise_params_list,
verbose = FALSE
))

# Compute log-likelihood at MLE + epsilon
ll_cluster_plus <- -(.nll(
log.lambda = log_lambda_mle + epsilon,
pop_data = pop_data_cluster_list,
antigen_isos = antigen_isos,
curve_params = sr_params_list,
noise_params = noise_params_list,
verbose = FALSE
))

# Numerical derivative (score for this cluster)
cluster_scores[i] <- (ll_cluster_plus - ll_cluster_mle) / epsilon
}

# Compute B matrix (middle of sandwich)
# B = sum of outer products of cluster scores
b_matrix <- sum(cluster_scores^2) # nolint: object_name_linter

# Get Hessian (already computed by nlm)
h_matrix <- fit$hessian # nolint: object_name_linter

# Sandwich variance: V = H^(-1) * B * H^(-1)
# Since we have a scalar parameter, this simplifies to:
var_log_lambda_robust <- b_matrix / (h_matrix^2)

return(var_log_lambda_robust)
}
Loading
Loading