Add NHANES covariate sampling method#3
Merged
roninsightrx merged 13 commits intomainfrom Mar 11, 2026
Merged
Conversation
Adds `sample_covariates_nhanes()` to sample clinical trial subjects from the NHANES database via the nhanesA package. Supports multi-table merging, conditional filtering, variable selection, and optional probability-proportional sampling using NHANES survey weights (WTMEC2YR). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a new NHANES-backed covariate sampling implementation to the existing sample_covariates() family, enabling sampling from downloaded/merged NHANES tables with optional survey-weighted sampling.
Changes:
- Adds
sample_covariates_nhanes()to download, join (bySEQN), filter, and sample from NHANES tables (optionally usingWTMEC2YRweights). - Extends the
sample_covariates()dispatcher to acceptmethod = "nhanes". - Adds
nhanesAtoSuggestsand introduces a dedicated test suite for NHANES sampling (withnhanesA::nhanes()mocked).
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
R/sample_covariates_nhanes.R |
Implements NHANES download/join/filter/sample logic and year-to-suffix mapping. |
R/sample_covariates.R |
Adds "nhanes" to the dispatcher’s method argument and updates roxygen docs. |
DESCRIPTION |
Adds nhanesA to Suggests. |
tests/testthat/test-sample_covariates_nhanes.R |
Adds tests covering NHANES sampling behavior and dispatcher routing. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Adds download_nhanes_cache() to pre-download NHANES tables as local RDS files. sample_covariates_nhanes() gains a cache_dir parameter that loads from those files instead of downloading, removing the need for nhanesA or internet access at sampling time. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
On first load, .onLoad() downloads DEMO and BMX (2017-2018) into a cache inside the package installation directory. sample_covariates_nhanes() now defaults cache_dir to that location, so no internet access is needed for subsequent calls. Cache misses fall back to nhanesA rather than erroring. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
download_nhanes_cache() now takes `groups` (DEMO/LAB/EXAM) instead of individual table names, downloads all tables in each group via nhanesTables(), and saves a single merged nhanes_<year>.rds per year. sample_covariates_nhanes() replaces the `tables` argument with `covariates` (column selection from the merged data) and loads the pre-merged RDS. .onLoad() downloads DEMO+LAB+EXAM for 2017-2018 on first package load. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Lists key measurement variables from DEMO_J, BMX_J, BPX_J, GHB_J, BIOPRO_J, CBC_J, TCHOL_J, HDL_J, TRIGLY_J, and ALB_CR_J with descriptions sourced from the CDC NHANES 2017-2018 data documentation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Allows users to map their own covariate names to NHANES variable names,
e.g. dictionary = list("WT" = "BMXWT", "AGE" = "RIDAGEYR"). Translation
applies to both covariates selection and conditional filtering keys;
output columns are returned under the user-defined names.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…cking - nhanes_temp_cache(): pass parent.frame() to local_tempdir() so the temp dir lives for the duration of the calling test, not just the helper - Duplicate-SEQN test: expect the "No tables downloaded" warning that the function correctly emits when every table is skipped - .onLoad tests: split local_mocked_bindings into two calls so that nhanes_default_cache_dir is mocked in irxforge (not nhanesA) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The test mapped "HT" -> "BMXHT" via dictionary but mock_nhanes did not contain a BMXHT column, causing a "Covariates not found" error in CI. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Each function now accepts seed = NULL; when non-NULL, set.seed() is called at the top of the function body so results are fully reproducible. sample_covariates() also gains seed and passes it explicitly to the dispatched child function. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Both sample_covariates_nhanes() and sample_covariates_bootstrap() now accept rm.na = TRUE (default), which filters out rows containing NA in any of the requested covariates before sampling. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
sample_covariates_nhanes()to sample clinical trial subjects from the NHANES database using thenhanesApackagebootstrap,mice, andmvtnormmethodsWTMEC2YR,use_weights = TRUE) for population-representative samples"2017-2018"is the default; all cycles from 1999–2020 are supportednhanesAadded toSuggests(checked at runtime, not a hard dependency)"nhanes"added to themethodargument of the top-levelsample_covariates()dispatcherTest plan
test-sample_covariates_nhanes.Rcovers: correct row count, SEQN dropped from output,variablesselection, missing-variable error, single/multi-variableconditionalfiltering, empty-filter error,use_weightssampling, missing-weight error, unsupported year, single-table case, and dispatcher routingnhanesA::nhanes()so no network access is required duringR CMD checkdevtools::test()locally to confirm all tests pass🤖 Generated with Claude Code