Skip to content

Add NHANES covariate sampling method#3

Merged
roninsightrx merged 13 commits intomainfrom
add-nhanes-sampling
Mar 11, 2026
Merged

Add NHANES covariate sampling method#3
roninsightrx merged 13 commits intomainfrom
add-nhanes-sampling

Conversation

@roninsightrx
Copy link
Copy Markdown
Contributor

Summary

  • Adds sample_covariates_nhanes() to sample clinical trial subjects from the NHANES database using the nhanesA package
  • Supports downloading and merging multiple NHANES tables (joined on SEQN), with conditional filtering and variable selection — the same interface as the existing bootstrap, mice, and mvtnorm methods
  • Optional probability-proportional sampling via NHANES survey weights (WTMEC2YR, use_weights = TRUE) for population-representative samples
  • Survey year "2017-2018" is the default; all cycles from 1999–2020 are supported
  • nhanesA added to Suggests (checked at runtime, not a hard dependency)
  • "nhanes" added to the method argument of the top-level sample_covariates() dispatcher

Test plan

  • test-sample_covariates_nhanes.R covers: correct row count, SEQN dropped from output, variables selection, missing-variable error, single/multi-variable conditional filtering, empty-filter error, use_weights sampling, missing-weight error, unsupported year, single-table case, and dispatcher routing
  • All tests mock nhanesA::nhanes() so no network access is required during R CMD check
  • Run devtools::test() locally to confirm all tests pass

🤖 Generated with Claude Code

Adds `sample_covariates_nhanes()` to sample clinical trial subjects from
the NHANES database via the nhanesA package. Supports multi-table merging,
conditional filtering, variable selection, and optional probability-proportional
sampling using NHANES survey weights (WTMEC2YR).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new NHANES-backed covariate sampling implementation to the existing sample_covariates() family, enabling sampling from downloaded/merged NHANES tables with optional survey-weighted sampling.

Changes:

  • Adds sample_covariates_nhanes() to download, join (by SEQN), filter, and sample from NHANES tables (optionally using WTMEC2YR weights).
  • Extends the sample_covariates() dispatcher to accept method = "nhanes".
  • Adds nhanesA to Suggests and introduces a dedicated test suite for NHANES sampling (with nhanesA::nhanes() mocked).

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File Description
R/sample_covariates_nhanes.R Implements NHANES download/join/filter/sample logic and year-to-suffix mapping.
R/sample_covariates.R Adds "nhanes" to the dispatcher’s method argument and updates roxygen docs.
DESCRIPTION Adds nhanesA to Suggests.
tests/testthat/test-sample_covariates_nhanes.R Adds tests covering NHANES sampling behavior and dispatcher routing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Adds download_nhanes_cache() to pre-download NHANES tables as local RDS
files. sample_covariates_nhanes() gains a cache_dir parameter that loads
from those files instead of downloading, removing the need for nhanesA or
internet access at sampling time.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@roninsightrx roninsightrx marked this pull request as draft March 6, 2026 07:43
roninsightrx and others added 8 commits March 9, 2026 15:17
On first load, .onLoad() downloads DEMO and BMX (2017-2018) into a cache
inside the package installation directory. sample_covariates_nhanes() now
defaults cache_dir to that location, so no internet access is needed for
subsequent calls. Cache misses fall back to nhanesA rather than erroring.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
download_nhanes_cache() now takes `groups` (DEMO/LAB/EXAM) instead of
individual table names, downloads all tables in each group via
nhanesTables(), and saves a single merged nhanes_<year>.rds per year.

sample_covariates_nhanes() replaces the `tables` argument with `covariates`
(column selection from the merged data) and loads the pre-merged RDS.
.onLoad() downloads DEMO+LAB+EXAM for 2017-2018 on first package load.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Lists key measurement variables from DEMO_J, BMX_J, BPX_J, GHB_J,
BIOPRO_J, CBC_J, TCHOL_J, HDL_J, TRIGLY_J, and ALB_CR_J with
descriptions sourced from the CDC NHANES 2017-2018 data documentation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Allows users to map their own covariate names to NHANES variable names,
e.g. dictionary = list("WT" = "BMXWT", "AGE" = "RIDAGEYR"). Translation
applies to both covariates selection and conditional filtering keys;
output columns are returned under the user-defined names.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…cking

- nhanes_temp_cache(): pass parent.frame() to local_tempdir() so the
  temp dir lives for the duration of the calling test, not just the helper
- Duplicate-SEQN test: expect the "No tables downloaded" warning that the
  function correctly emits when every table is skipped
- .onLoad tests: split local_mocked_bindings into two calls so that
  nhanes_default_cache_dir is mocked in irxforge (not nhanesA)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The test mapped "HT" -> "BMXHT" via dictionary but mock_nhanes did not
contain a BMXHT column, causing a "Covariates not found" error in CI.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Each function now accepts seed = NULL; when non-NULL, set.seed() is called
at the top of the function body so results are fully reproducible.
sample_covariates() also gains seed and passes it explicitly to the
dispatched child function.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@roninsightrx roninsightrx marked this pull request as ready for review March 10, 2026 04:04
roninsightrx and others added 3 commits March 9, 2026 21:15
Both sample_covariates_nhanes() and sample_covariates_bootstrap() now
accept rm.na = TRUE (default), which filters out rows containing NA in
any of the requested covariates before sampling.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@roninsightrx roninsightrx merged commit b2857e3 into main Mar 11, 2026
3 checks passed
@roninsightrx roninsightrx deleted the add-nhanes-sampling branch March 11, 2026 04:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants