Fix R/clean.R for surveys in which only age range is reported for participants#250
Fix R/clean.R for surveys in which only age range is reported for participants#250sbfnk merged 11 commits intoepiforecasts:mainfrom
Conversation
📝 WalkthroughWalkthroughAdds two participant age estimate columns to the cleaning pipeline and a unit test; updates AUTHORS metadata. The new columns Changes
Sequence Diagram(s)(omitted — changes are limited to adding columns and a test; no new multi-component control flow introduced) Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
Possibly related PRs
Poem
Pre-merge checks✅ Passed checks (3 passed)
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
R/clean.R(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
R/**/*.{R,r}
📄 CodeRabbit inference engine (CLAUDE.md)
Run
devtools::document()to regenerate documentation from roxygen comments before committing
Files:
R/clean.R
**/*.{R,r}
📄 CodeRabbit inference engine (CLAUDE.md)
Run
lintr::lint_package()to check for style issues and fix any in files you modified before committing
Files:
R/clean.R
🧠 Learnings (3)
📓 Common learnings
Learnt from: sbfnk
Repo: epiforecasts/socialmixr PR: 248
File: R/contact-matrix-utils.R:206-216
Timestamp: 2025-12-12T19:08:42.596Z
Learning: In the socialmixr package, the add_age() helper function in R/contact-matrix-utils.R intentionally overwrites the age column (e.g., part_age, cnt_age) with values from the exact age column (e.g., part_age_exact, cnt_age_exact) when the exact column exists. This overwriting behavior is by design.
Learnt from: njtierney
Repo: epiforecasts/socialmixr PR: 196
File: R/contact-matrix-utils.R:7-13
Timestamp: 2025-09-02T05:04:15.939Z
Learning: In the socialmixr package, functions like est_part_age_mean create new columns (part_age) by imputing from other existing columns (part_age_exact, part_age_est_min, part_age_est_max). The condition checks the source columns, not the target column being created.
📚 Learning: 2025-09-02T05:04:15.939Z
Learnt from: njtierney
Repo: epiforecasts/socialmixr PR: 196
File: R/contact-matrix-utils.R:7-13
Timestamp: 2025-09-02T05:04:15.939Z
Learning: In the socialmixr package, functions like est_part_age_mean create new columns (part_age) by imputing from other existing columns (part_age_exact, part_age_est_min, part_age_est_max). The condition checks the source columns, not the target column being created.
Applied to files:
R/clean.R
📚 Learning: 2025-12-12T19:08:42.596Z
Learnt from: sbfnk
Repo: epiforecasts/socialmixr PR: 248
File: R/contact-matrix-utils.R:206-216
Timestamp: 2025-12-12T19:08:42.596Z
Learning: In the socialmixr package, the add_age() helper function in R/contact-matrix-utils.R intentionally overwrites the age column (e.g., part_age, cnt_age) with values from the exact age column (e.g., part_age_exact, cnt_age_exact) when the exact column exists. This overwriting behavior is by design.
Applied to files:
R/clean.R
🪛 GitHub Actions: format-suggest.yaml
R/clean.R
[error] 136-139: Reviewdog detected issues with severity >= error in diff context (lines 136-139).
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
|
Thanks @LloydChapman for catching this and suggesting a fix! A few things:
(e.g. in a new file test_that("clean() creates est_min/est_max for age ranges", {
survey <- polymod
# Replace exact ages with ranges
survey$participants$part_age <- "20-30"
survey$participants$part_age_exact <- NULL
cleaned <- clean(survey)
expect_true("part_age_est_min" %in% names(cleaned$participants))
expect_true("part_age_est_max" %in% names(cleaned$participants))
expect_equal(cleaned$participants$part_age_est_min[1], 20)
expect_equal(cleaned$participants$part_age_est_max[1], 30)
}) |
|
Thanks @sbfnk. Sure thing. I've made those changes. |
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: sbfnk-bot <242615673+sbfnk-bot@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tests/testthat/test-clean.R (1)
1-13: Regression test successfully implements the requested validation.The test correctly verifies that
clean()createspart_age_est_minandpart_age_est_maxcolumns from age range strings, which addresses the regression test requirement from the PR review.One minor suggestion: Consider using
expect_equal()instead ofexpect_identical()on lines 11-12. The reviewer's example code usedexpect_equal(), which is more idiomatic for numeric comparisons in testthat and avoids potential type-mismatch issues between integer and numeric values.Optional: Use expect_equal for numeric comparisons
- expect_identical(cleaned$participants$part_age_est_min[1], 20) - expect_identical(cleaned$participants$part_age_est_max[1], 30) + expect_equal(cleaned$participants$part_age_est_min[1], 20) + expect_equal(cleaned$participants$part_age_est_max[1], 30)Optional: Additional test cases for broader coverage
If desired, you could extend coverage with:
- Different age range formats (e.g., "0-5", "65-100")
- Multiple participants with different ranges
- Edge cases (NA values, invalid ranges)
However, this basic regression test is sufficient for the immediate fix.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/testthat/test-clean.R(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
tests/**/*.{R,r}
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.{R,r}: Rundevtools::test()to ensure all tests pass before committing
Add tests for bug fixes (regression tests) or new features where appropriate
Files:
tests/testthat/test-clean.R
**/*.{R,r}
📄 CodeRabbit inference engine (CLAUDE.md)
Run
lintr::lint_package()to check for style issues and fix any in files you modified before committing
Files:
tests/testthat/test-clean.R
🧠 Learnings (3)
📓 Common learnings
Learnt from: sbfnk
Repo: epiforecasts/socialmixr PR: 248
File: R/contact-matrix-utils.R:206-216
Timestamp: 2025-12-12T19:08:46.874Z
Learning: In the socialmixr package, the add_age() helper function in R/contact-matrix-utils.R intentionally overwrites the age column (e.g., part_age, cnt_age) with values from the exact age column (e.g., part_age_exact, cnt_age_exact) when the exact column exists. This overwriting behavior is by design.
Learnt from: njtierney
Repo: epiforecasts/socialmixr PR: 196
File: R/contact-matrix-utils.R:7-13
Timestamp: 2025-09-02T05:04:15.939Z
Learning: In the socialmixr package, functions like est_part_age_mean create new columns (part_age) by imputing from other existing columns (part_age_exact, part_age_est_min, part_age_est_max). The condition checks the source columns, not the target column being created.
📚 Learning: 2025-12-12T19:13:48.046Z
Learnt from: CR
Repo: epiforecasts/socialmixr PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-12T19:13:48.046Z
Learning: Applies to tests/**/*.{R,r} : Add tests for bug fixes (regression tests) or new features where appropriate
Applied to files:
tests/testthat/test-clean.R
📚 Learning: 2025-09-02T05:04:15.939Z
Learnt from: njtierney
Repo: epiforecasts/socialmixr PR: 196
File: R/contact-matrix-utils.R:7-13
Timestamp: 2025-09-02T05:04:15.939Z
Learning: In the socialmixr package, functions like est_part_age_mean create new columns (part_age) by imputing from other existing columns (part_age_exact, part_age_est_min, part_age_est_max). The condition checks the source columns, not the target column being created.
Applied to files:
tests/testthat/test-clean.R
Co-authored-by: sbfnk-bot <242615673+sbfnk-bot@users.noreply.github.com>
Suggested fix for #249. This restores including
part_age_est_minandpart_age_est_maxinx$participantsinclean.contact_survey()in clean.R. The code in #249 for the CoMix survey with only participant age ranges reported will then work as the default value ofestimated.participant.age("mean") ensures that the mean ofpart_age_est_minandpart_age_est_maxis taken forpart_ageinsurvey$participantsin contact_matrix.Summary by CodeRabbit
New Features
Tests
Chores
✏️ Tip: You can customize this high-level summary in your review settings.