Skip to content

Comments

Fix R/clean.R for surveys in which only age range is reported for participants#250

Merged
sbfnk merged 11 commits intoepiforecasts:mainfrom
LloydChapman:fix-249
Jan 7, 2026
Merged

Fix R/clean.R for surveys in which only age range is reported for participants#250
sbfnk merged 11 commits intoepiforecasts:mainfrom
LloydChapman:fix-249

Conversation

@LloydChapman
Copy link
Contributor

@LloydChapman LloydChapman commented Dec 13, 2025

Suggested fix for #249. This restores including part_age_est_min and part_age_est_max in x$participantsin clean.contact_survey() in clean.R. The code in #249 for the CoMix survey with only participant age ranges reported will then work as the default value of estimated.participant.age ("mean") ensures that the mean of part_age_est_min and part_age_est_max is taken for part_age in survey$participants in contact_matrix.

Summary by CodeRabbit

  • New Features

    • Added estimated participant age-range fields: estimated minimum and estimated maximum alongside the existing exact age to improve reporting, display, and filtering.
  • Tests

    • Added a unit test verifying age-range strings are parsed into the new estimated min/max fields.
  • Chores

    • Updated package metadata to add a new contributor.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 13, 2025

📝 Walkthrough

Walkthrough

Adds two participant age estimate columns to the cleaning pipeline and a unit test; updates AUTHORS metadata. The new columns <participant.age.column>_est_min and <participant.age.column>_est_max are set from ..low and ..high before creating the _exact age column.

Changes

Cohort / File(s) Summary
Age estimation column addition
R/clean.R
In clean.contact_survey, create <participant.age.column>_est_min (from ..low) and <participant.age.column>_est_max (from ..high) on x$participants; inserted after numeric/NA handling and before <participant.age.column>_exact creation.
Tests
tests/testthat/test-clean.R
Adds a unit test asserting that an age range string like "20-30" yields est_min = 20 and est_max = 30 for a participant after clean().
Metadata
DESCRIPTION
Adds author Lloyd Chapman (ctb, email l.chapman4@lancaster.ac.uk) and adjusts Pietro Coletti's author entry punctuation.

Sequence Diagram(s)

(omitted — changes are limited to adding columns and a test; no new multi-component control flow introduced)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

Possibly related PRs

Poem

🐰
I hopped through rows with careful zest,
Split ranges, set min and max to rest,
Columns tucked in tidy line,
Numbers snug and fields align,
A tiny hop — the table's blessed 🥕

Pre-merge checks

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: handling surveys where only participant age ranges (not exact ages) are reported, and the fix to R/clean.R to support this scenario.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fb9a952 and c8817ae.

📒 Files selected for processing (1)
  • tests/testthat/test-clean.R
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/testthat/test-clean.R

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a0b456c and 74b9d7b.

📒 Files selected for processing (1)
  • R/clean.R (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
R/**/*.{R,r}

📄 CodeRabbit inference engine (CLAUDE.md)

Run devtools::document() to regenerate documentation from roxygen comments before committing

Files:

  • R/clean.R
**/*.{R,r}

📄 CodeRabbit inference engine (CLAUDE.md)

Run lintr::lint_package() to check for style issues and fix any in files you modified before committing

Files:

  • R/clean.R
🧠 Learnings (3)
📓 Common learnings
Learnt from: sbfnk
Repo: epiforecasts/socialmixr PR: 248
File: R/contact-matrix-utils.R:206-216
Timestamp: 2025-12-12T19:08:42.596Z
Learning: In the socialmixr package, the add_age() helper function in R/contact-matrix-utils.R intentionally overwrites the age column (e.g., part_age, cnt_age) with values from the exact age column (e.g., part_age_exact, cnt_age_exact) when the exact column exists. This overwriting behavior is by design.
Learnt from: njtierney
Repo: epiforecasts/socialmixr PR: 196
File: R/contact-matrix-utils.R:7-13
Timestamp: 2025-09-02T05:04:15.939Z
Learning: In the socialmixr package, functions like est_part_age_mean create new columns (part_age) by imputing from other existing columns (part_age_exact, part_age_est_min, part_age_est_max). The condition checks the source columns, not the target column being created.
📚 Learning: 2025-09-02T05:04:15.939Z
Learnt from: njtierney
Repo: epiforecasts/socialmixr PR: 196
File: R/contact-matrix-utils.R:7-13
Timestamp: 2025-09-02T05:04:15.939Z
Learning: In the socialmixr package, functions like est_part_age_mean create new columns (part_age) by imputing from other existing columns (part_age_exact, part_age_est_min, part_age_est_max). The condition checks the source columns, not the target column being created.

Applied to files:

  • R/clean.R
📚 Learning: 2025-12-12T19:08:42.596Z
Learnt from: sbfnk
Repo: epiforecasts/socialmixr PR: 248
File: R/contact-matrix-utils.R:206-216
Timestamp: 2025-12-12T19:08:42.596Z
Learning: In the socialmixr package, the add_age() helper function in R/contact-matrix-utils.R intentionally overwrites the age column (e.g., part_age, cnt_age) with values from the exact age column (e.g., part_age_exact, cnt_age_exact) when the exact column exists. This overwriting behavior is by design.

Applied to files:

  • R/clean.R
🪛 GitHub Actions: format-suggest.yaml
R/clean.R

[error] 136-139: Reviewdog detected issues with severity >= error in diff context (lines 136-139).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@sbfnk
Copy link
Collaborator

sbfnk commented Dec 15, 2025

Thanks @LloydChapman for catching this and suggesting a fix! A few things:

  1. We get a lintr error about single vs. double quotes - could you fix this?
  2. Can you add yourself as a contributor (role = "ctb") to the DESCRIPTION file?
  3. Can we add a regression test? Here's a suggestion but happy for you to do something different.

(e.g. in a new file tests/testthat/test-clean.R)

  test_that("clean() creates est_min/est_max for age ranges", {
    survey <- polymod
    # Replace exact ages with ranges
    survey$participants$part_age <- "20-30"
    survey$participants$part_age_exact <- NULL

    cleaned <- clean(survey)

    expect_true("part_age_est_min" %in% names(cleaned$participants))
    expect_true("part_age_est_max" %in% names(cleaned$participants))
    expect_equal(cleaned$participants$part_age_est_min[1], 20)
    expect_equal(cleaned$participants$part_age_est_max[1], 30)
  })

@LloydChapman
Copy link
Contributor Author

Thanks @sbfnk. Sure thing. I've made those changes.

LloydChapman and others added 4 commits December 18, 2025 14:15
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: sbfnk-bot <242615673+sbfnk-bot@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
tests/testthat/test-clean.R (1)

1-13: Regression test successfully implements the requested validation.

The test correctly verifies that clean() creates part_age_est_min and part_age_est_max columns from age range strings, which addresses the regression test requirement from the PR review.

One minor suggestion: Consider using expect_equal() instead of expect_identical() on lines 11-12. The reviewer's example code used expect_equal(), which is more idiomatic for numeric comparisons in testthat and avoids potential type-mismatch issues between integer and numeric values.

Optional: Use expect_equal for numeric comparisons
-  expect_identical(cleaned$participants$part_age_est_min[1], 20)
-  expect_identical(cleaned$participants$part_age_est_max[1], 30)
+  expect_equal(cleaned$participants$part_age_est_min[1], 20)
+  expect_equal(cleaned$participants$part_age_est_max[1], 30)
Optional: Additional test cases for broader coverage

If desired, you could extend coverage with:

  • Different age range formats (e.g., "0-5", "65-100")
  • Multiple participants with different ranges
  • Edge cases (NA values, invalid ranges)

However, this basic regression test is sufficient for the immediate fix.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 78579ad and fb9a952.

📒 Files selected for processing (1)
  • tests/testthat/test-clean.R (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
tests/**/*.{R,r}

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.{R,r}: Run devtools::test() to ensure all tests pass before committing
Add tests for bug fixes (regression tests) or new features where appropriate

Files:

  • tests/testthat/test-clean.R
**/*.{R,r}

📄 CodeRabbit inference engine (CLAUDE.md)

Run lintr::lint_package() to check for style issues and fix any in files you modified before committing

Files:

  • tests/testthat/test-clean.R
🧠 Learnings (3)
📓 Common learnings
Learnt from: sbfnk
Repo: epiforecasts/socialmixr PR: 248
File: R/contact-matrix-utils.R:206-216
Timestamp: 2025-12-12T19:08:46.874Z
Learning: In the socialmixr package, the add_age() helper function in R/contact-matrix-utils.R intentionally overwrites the age column (e.g., part_age, cnt_age) with values from the exact age column (e.g., part_age_exact, cnt_age_exact) when the exact column exists. This overwriting behavior is by design.
Learnt from: njtierney
Repo: epiforecasts/socialmixr PR: 196
File: R/contact-matrix-utils.R:7-13
Timestamp: 2025-09-02T05:04:15.939Z
Learning: In the socialmixr package, functions like est_part_age_mean create new columns (part_age) by imputing from other existing columns (part_age_exact, part_age_est_min, part_age_est_max). The condition checks the source columns, not the target column being created.
📚 Learning: 2025-12-12T19:13:48.046Z
Learnt from: CR
Repo: epiforecasts/socialmixr PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-12T19:13:48.046Z
Learning: Applies to tests/**/*.{R,r} : Add tests for bug fixes (regression tests) or new features where appropriate

Applied to files:

  • tests/testthat/test-clean.R
📚 Learning: 2025-09-02T05:04:15.939Z
Learnt from: njtierney
Repo: epiforecasts/socialmixr PR: 196
File: R/contact-matrix-utils.R:7-13
Timestamp: 2025-09-02T05:04:15.939Z
Learning: In the socialmixr package, functions like est_part_age_mean create new columns (part_age) by imputing from other existing columns (part_age_exact, part_age_est_min, part_age_est_max). The condition checks the source columns, not the target column being created.

Applied to files:

  • tests/testthat/test-clean.R

Co-authored-by: sbfnk-bot <242615673+sbfnk-bot@users.noreply.github.com>
@sbfnk sbfnk enabled auto-merge January 6, 2026 12:32
@sbfnk sbfnk added this pull request to the merge queue Jan 6, 2026
Merged via the queue into epiforecasts:main with commit 4ffd580 Jan 7, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants