Refactor contact_matrix() to use pipeline infrastructure#290
Conversation
Co-authored-by: sbfnk-bot <242615673+sbfnk-bot@users.noreply.github.com>
Co-authored-by: sbfnk-bot <242615673+sbfnk-bot@users.noreply.github.com>
Co-authored-by: sbfnk-bot <242615673+sbfnk-bot@users.noreply.github.com>
Co-authored-by: sbfnk-bot <242615673+sbfnk-bot@users.noreply.github.com>
Co-authored-by: sbfnk-bot <242615673+sbfnk-bot@users.noreply.github.com>
Co-authored-by: sbfnk-bot <242615673+sbfnk-bot@users.noreply.github.com>
…rg, tighten tolerance Co-authored-by: sbfnk-bot <242615673+sbfnk-bot@users.noreply.github.com>
Co-authored-by: sbfnk-bot <242615673+sbfnk-bot@users.noreply.github.com>
Co-authored-by: sbfnk-bot <242615673+sbfnk-bot@users.noreply.github.com>
Co-authored-by: sbfnk-bot <242615673+sbfnk-bot@users.noreply.github.com>
Co-authored-by: sbfnk-bot <242615673+sbfnk-bot@users.noreply.github.com>
…ompute_matrix() Co-authored-by: sbfnk-bot <242615673+sbfnk-bot@users.noreply.github.com>
Co-authored-by: sbfnk-bot <242615673+sbfnk-bot@users.noreply.github.com>
…nctions Co-authored-by: sbfnk-bot <242615673+sbfnk-bot@users.noreply.github.com>
Co-authored-by: sbfnk-bot <242615673+sbfnk-bot@users.noreply.github.com>
Co-authored-by: sbfnk-bot <242615673+sbfnk-bot@users.noreply.github.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 📝 WalkthroughWalkthroughThis PR refactors the contact matrix computation pipeline by consolidating weight normalization logic into reusable helpers, replacing the S3 method Changes
Sequence Diagram(s)sequenceDiagram
participant Survey as Survey Data
participant AgeGroup as assign_age_groups()
participant Filter as apply_data_filter()
participant Limits as agegroups_to_limits()
participant Weigh as weigh()
participant Normalise as normalise_weights()
participant Compute as compute_matrix()
Survey->>AgeGroup: survey with age data
AgeGroup->>Filter: survey with age.group assigned
Filter->>Limits: filtered survey
Limits->>Weigh: age_limits derived
Weigh->>Weigh: apply day-of-week weights
Weigh->>Weigh: apply age weights
Weigh->>Weigh: apply user-defined weights
Weigh->>Normalise: weighted participants
Normalise->>Normalise: truncate extreme weights (threshold)
Normalise->>Normalise: re-normalise by age.group
Normalise->>Compute: normalised weights
Compute->>Compute: aggregate contacts by age groups
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
8453a30 to
8cab50c
Compare
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (11)
R/contact-matrix-utils.R (1)
732-755:normalise_weightsduplicates logic intruncate_renormalise_weightsandparticipant_weights.
normalise_weights()effectively subsumes the manual normalisation at line 784 and thetruncate_renormalise_weights()call at line 787 inparticipant_weights(). Consider consolidating the legacy path to usenormalise_weights(participants, by = "age.group", threshold = weight_threshold)and removingtruncate_renormalise_weights.♻️ Suggested consolidation in participant_weights()
- # post-stratification weight standardisation: by age.group - participants[, weight := weight / sum(weight) * .N, by = age.group] - - # option to truncate overall participant weights (if not NULL or NA) - participants <- truncate_renormalise_weights(participants, weight_threshold) + # post-stratification weight standardisation (with optional truncation) + participants <- normalise_weights(participants, by = "age.group", threshold = weight_threshold)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/contact-matrix-utils.R` around lines 732 - 755, normalise_weights duplicates the truncation and re-normalisation logic found in truncate_renormalise_weights and the manual renormalisation in participant_weights; replace the legacy path by calling normalise_weights(participants, by = "age.group", threshold = weight_threshold) from participant_weights (instead of the manual weight clipping and the truncate_renormalise_weights call) so the same by/threshold behavior is used, and then remove the now-redundant truncate_renormalise_weights function and the manual renormalisation block to avoid duplication.tests/testthat/test-subset-contact-survey.R (1)
43-47:expect_namedwith exact field list is fragile if thepolymoddataset gains new fields.
expect_named(result, c("participants", "contacts", "reference"))asserts an exact match. The test on lines 90–97 already verifies that extra fields are preserved after filtering, which implicitly tests the subsetting contract. A more forward-compatible assertion here would only check that the mandatory fields are present:- expect_named(result, c("participants", "contacts", "reference")) + expect_true(all(c("participants", "contacts", "reference") %in% names(result)))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/testthat/test-subset-contact-survey.R` around lines 43 - 47, The test currently asserts an exact set of names with expect_named(result, c("participants", "contacts", "reference")), which is fragile; change the assertion to only require the mandatory fields be present after subsetting (for example assert that "participants", "contacts" and "reference" are all in names(result) or use a membership-based expectation) while leaving other fields allowable so the test continues to pass if polymod gains extra columns.tests/testthat/test-weigh.R (1)
193-204:# nolint endplacement is inside thetest_thatblock, which looks like a formatting artefact.The
# nolint start: nonportable_path_linteron line 193 is there to suppress a false-positive on the/in"mismatched target/groups lengths". Ending the suppression on line 195 (inside the{block) is syntactically correct but visually inconsistent. Consider ending it on the same line as the start, or after the closing}:♻️ Proposed cleanup
-# nolint start: nonportable_path_linter -test_that("weigh() errors for mismatched target/groups lengths", { - # nolint end +# nolint start: nonportable_path_linter +test_that("weigh() errors for mismatched target/groups lengths", { # nolint end expect_error(🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/testthat/test-weigh.R` around lines 193 - 204, The inline nolint block around the test_that("weigh() errors for mismatched target/groups lengths", { ... }) is split oddly with "# nolint end" inside the test body; move the "# nolint end" so the suppression is visually consistent — either put it on the same line as the start comment (after "nonportable_path_linter") or place it immediately after the closing "}" of the test_that block; update the comment surrounding the test_that (not the test logic) so the nolint scope still covers the quoted string but no longer appears inside the test body.tests/testthat/test-postprocess-matrix.R (1)
106-111: Indirect assertion inper_capita()test — consider verifying the actual formula.
expect_true(all(pc$matrix < result_base$matrix))only tests a proxy property (per-capita rates should be smaller) rather than the actual transformation. This is fragile: it passes vacuously for any matrix scaled down by a positive factor, and it would give a false positive if the implementation divided by the wrong population column. A stronger test would check the per-capita formula directly, e.g., thatpc$matrixequalsresult_base$matrixdivided by the resolved population size element-wise.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/testthat/test-postprocess-matrix.R` around lines 106 - 111, Update the test for per_capita() to assert the actual transformation rather than an indirect comparison: compute the expected per-capita matrix by dividing result_base$matrix element-wise by the population values resolved for each cell (using the same logic per_capita uses to map survey_pop/pop into a matrix of the same dimensions), then replace expect_true(all(pc$matrix < result_base$matrix)) with an equality check like expect_equal(pc$matrix, expected_matrix, tolerance = <small_tol>) so the test verifies pc$matrix == result_base$matrix / resolved_population element-wise (use pc, per_capita, result_base, and pop to locate relevant variables).tests/testthat/test-compute-matrix.R (1)
1-91: Missing test coverage for theweight_thresholdparameter.
compute_matrix(weight_threshold = ...)is a new parameter with non-trivial normalization semantics (cap + renormalize). There is no test that:
- Passes a non-
NULLweight_thresholdtocompute_matrix, and- Verifies the resulting matrix differs from the un-thresholded result in the expected direction.
This gap also means the normalisation inconsistency with
contact_matrixnoted in R/compute-matrix.R (global vsby = "age.group") would not be caught by the test suite.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/testthat/test-compute-matrix.R` around lines 1 - 91, Add a unit test in tests/testthat/test-compute-matrix.R that calls compute_matrix(polymod_uk_grouped, weight_threshold = <small positive value>) and compares it to the baseline compute_matrix(polymod_uk_grouped) to ensure the thresholding behavior: assert that all entries in the thresholded matrix are less than or equal to the un-thresholded matrix and that at least one entry is strictly smaller (i.e. matrices are not identical). Also add an assertion that the thresholded result differs from contact_matrix(...) when using legacy weighting settings to catch the global vs by="age.group" normalization inconsistency; reference compute_matrix, weight_threshold and contact_matrix to locate the relevant logic.R/weigh.R (2)
106-140: NA / unmatched participants receivesum(target) / n_total— clarify intent with a comment.When
group_idxis NA (either from NA column values or values not matching any group), the weight factor issum(target) / n_total(line 134). This means NAs contribute additional total weight beyond the target budget (matched groups sum tosum(target), NAs add more). If the intent is that NAs receive a "neutral average" without inflating total weight, the divisor should arguably be the total number of participants rather than matched-only. Currently it isn_total(all participants), which is reasonable but subtly doesn't preserve the total weight budget.A brief inline comment explaining the design choice (per the bug fix: "NA dayofweek now receive an average weight across all days") would help future readers.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/weigh.R` around lines 106 - 140, The branch that assigns weight_factor for NA group_idx values (in the weight_factor calculation using is.na(group_idx) => sum(target) / n_total) should be annotated with a concise inline comment clarifying the design choice: explain that unmatched/NA participants receive the average weight computed as sum(target) divided by n_total (all participants) so they get a "neutral average" and this may not preserve the original target budget exactly; reference the variables group_idx, weight_factor, sum(target), n_total and the update participants[, weight := weight * weight_factor] so future readers understand why unmatched NAs are handled this way.
143-167:table()excludes NAs by default —matched_countswill be NA for NA column values.On line 145,
table(col_vals)omitsNAentries by default. So whencol_valscontains NAs,val_counts["NA"]orval_counts[NA]yieldsNA, andmatched_countsbecomesNAfor those rows. Combined withmatched_targetalso beingNA(sincetarget[NA]=NA), theifelseon line 159-163 correctly yieldsNA_real_. The end result isweight * NA = NA.This is consistent behavior (unmatched → NA weight), but it means participants with NA in the
bycolumn silently getNAweight. If that's intended, it's fine; if not, considertable(col_vals, useNA = "ifany")and an explicit handling path.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/weigh.R` around lines 143 - 167, In weigh_named, table(col_vals) drops NA counts which causes matched_counts to be NA for rows where by is NA; update val_counts to use table(col_vals, useNA = "ifany") and then add explicit handling for NA keys: compute matched_counts so NA entries get a numeric count (or treat them separately) and decide whether to assign a default weight or keep NA, then apply that logic before computing weight_factor and updating participants[, weight := ...] to avoid silently turning weights into NA for NA group values.R/subset-contact-survey.R (3)
96-106: Duplicate error message is fine but could reference a shared helper string.The error at lines 100-105 is identical to lines 85-89. Both are valid but cover different cross-table scenarios (same column in both tables vs. different columns from each table). This is fine functionally. If you want to DRY it up later, a shared message template would suffice.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/subset-contact-survey.R` around lines 96 - 106, The cli::cli_abort call that emits the cross-table error message is duplicated; extract the shared message into a single helper string and reuse it in both places to DRY up the code: create a descriptive variable (e.g. cross_table_error_msg) near where found_in_part and found_in_cont are checked, build the template once (including {.val {part_cols}} and {.val {cont_cols}} placeholders), and replace the duplicated cli::cli_abort(...) invocations to call cli::cli_abort(cross_table_error_msg) so both branches reference the same message text.
56-73: Numeric indexing (e.g.survey[1:5]) is silently caught, but symbols inicould still mislead.
all.vars(quote(1:5))returnscharacter(0), which correctly hits the abort. Good guard.However, if a user writes
survey[n]wherenis a variable in the calling scope (intending numeric subsetting),all.varsreturns"n", it won't be found in either table, and the function silently returns the unfiltered survey with only a warning about an unknown column. This could be confusing compared to the explicit abort for literal indexing. Consider whether the!found_in_part && !found_in_contpath (lines 108-116) should also abort instead of returning the unfiltered survey, or at least make the warning more prominent.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/subset-contact-survey.R` around lines 56 - 73, The subset method `[.contact_survey` currently treats symbol names in i (derived via ref_vars) that are not found in participants or contacts by returning the unfiltered survey with a warning; this can be confusing for cases like survey[n] where n is a caller variable. Update the logic after computing ref_vars and checking found_in_part / found_in_cont (the branch that currently issues a warning when !found_in_part && !found_in_cont) to instead abort with cli::cli_abort giving a clear error that the referenced symbol is not a column in participants or contacts, or at minimum escalate the warning to an abort; ensure the message references the offending symbol(s) and keeps the existing behavior for literal numeric/logical indexing handled earlier (assemble_survey, copy(x$participants), copy(x$contacts)).
28-32:copy.contact_surveydoes not deep-copy the input before mutating it.The function modifies
xdirectly (x$participants <- copy(…)) and returns it. In R, list modification triggers copy-on-modify so the caller's original is safe. However,data.table::copygeneric's documented contract is to return a new deep copy without side-effects on the input. For clarity and to match the documented "deep copy" semantics, consider copyingxitself first (analogous to how[.contact_surveyandweighdo it):♻️ Suggested improvement
copy.contact_survey <- function(x) { + result <- x + result$participants <- copy(x$participants) + result$contacts <- copy(x$contacts) + result - x$participants <- copy(x$participants) - x$contacts <- copy(x$contacts) - x }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/subset-contact-survey.R` around lines 28 - 32, The function copy.contact_survey currently mutates the input by assigning to x$participants and x$contacts; change it to first make a full deep copy of the input (e.g., new_x <- copy(x)) and then modify new_x$participants and new_x$contacts before returning new_x so the original x is never mutated and the function adheres to the deep-copy contract of copy().R/postprocess-matrix.R (1)
69-74: Consider extracting the repeated input validation into a shared helper.All three exported functions (
symmetrise,split_matrix,per_capita) have identical validation blocks checking forx$matrixandx$participants. A smallcheck_compute_matrix_result(x)helper would DRY this up and ensure consistent messaging.♻️ Example helper
# In an internal utilities file: check_compute_matrix_result <- function(x, call = rlang::caller_env()) { if (!is.list(x) || is.null(x$matrix) || is.null(x$participants)) { cli::cli_abort( "{.arg x} must be a list with elements {.val matrix} and \\ {.val participants}, as returned by {.fn compute_matrix}.", call = call ) } }Also applies to: 131-136, 192-198
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/postprocess-matrix.R` around lines 69 - 74, Extract the duplicated validation into a single internal helper (e.g. check_compute_matrix_result) and call it from symmetrise, split_matrix, and per_capita instead of repeating the block; implement check_compute_matrix_result(x, call = rlang::caller_env()) to perform the existing is.list/NULL checks and call cli::cli_abort with the same message (passing through call), put the helper in your internal utilities file (not exported) and replace the three validation blocks at the locations currently in symmetrise, split_matrix, and per_capita with a single call to check_compute_matrix_result(x).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@NAMESPACE`:
- Line 6: The package is missing the S3 generic for copy so copy.contact_survey
is never dispatched and R falls back to data.table::copy; add a proper S3
generic definition (e.g., a function named copy that calls UseMethod("copy"))
and export it so calls like compute_matrix's call to copy(survey) dispatch to
copy.contact_survey; ensure the generic's name matches the S3method entry and
that copy.contact_survey remains exported/registered so the deep-copy behavior
for contact_survey is invoked instead of data.table::copy.
In `@NEWS.md`:
- Around line 32-33: Change the American spelling "normalization" to the British
spelling "normalisation" in the NEWS entry that documents compute_matrix()
gaining a weight_threshold parameter; ensure the wording matches existing
British-spelled functions like normalise_weights() and symmetrise(), and update
the sentence mentioning contact_matrix() to use "normalisation" for consistency.
In `@R/compute-matrix.R`:
- Around line 27-31: The defaults for weight_threshold are inconsistent:
compute_matrix uses weight_threshold = NULL while contact_matrix uses
weight_threshold = NA; update contact_matrix to use weight_threshold = NULL so
both APIs match and rely on normalise_weights's guard (normalise_weights checks
if (!is.null(threshold) && !is.na(threshold))). Locate the contact_matrix
function and change its default to NULL, and run tests to ensure no behavior
changes in normalise_weights, compute_matrix, or any callers.
In `@R/contact_matrix.R`:
- Around line 308-327: When weigh_dayofweek is TRUE but survey$participants
lacks a dayofweek column, ensure you still create the is.weekday column so later
code (e.g., return_part_weights logic that uses is.weekday in the by= clause
when weigh_age/return_part_weights conditions are met) doesn't fail; update the
else branch that currently only cli_warns to add survey$participants[,
is.weekday := FALSE] (or another sensible default) so is.weekday always exists
regardless of whether dayofweek was present.
In `@R/postprocess-matrix.R`:
- Around line 192-212: per_capita lacks the NA guard used in symmetrise and
split_matrix which causes matrix_per_capita to propagate NAs; before calling
matrix_per_capita inside per_capita, check x$matrix for missing values (e.g.
anyNA(x$matrix)) and call cli::cli_abort with a clear message if NAs are present
so behavior matches symmetrise/split_matrix. Ensure the check occurs after
validating x and before assigning x$matrix <- matrix_per_capita(...).
---
Nitpick comments:
In `@R/contact-matrix-utils.R`:
- Around line 732-755: normalise_weights duplicates the truncation and
re-normalisation logic found in truncate_renormalise_weights and the manual
renormalisation in participant_weights; replace the legacy path by calling
normalise_weights(participants, by = "age.group", threshold = weight_threshold)
from participant_weights (instead of the manual weight clipping and the
truncate_renormalise_weights call) so the same by/threshold behavior is used,
and then remove the now-redundant truncate_renormalise_weights function and the
manual renormalisation block to avoid duplication.
In `@R/postprocess-matrix.R`:
- Around line 69-74: Extract the duplicated validation into a single internal
helper (e.g. check_compute_matrix_result) and call it from symmetrise,
split_matrix, and per_capita instead of repeating the block; implement
check_compute_matrix_result(x, call = rlang::caller_env()) to perform the
existing is.list/NULL checks and call cli::cli_abort with the same message
(passing through call), put the helper in your internal utilities file (not
exported) and replace the three validation blocks at the locations currently in
symmetrise, split_matrix, and per_capita with a single call to
check_compute_matrix_result(x).
In `@R/subset-contact-survey.R`:
- Around line 96-106: The cli::cli_abort call that emits the cross-table error
message is duplicated; extract the shared message into a single helper string
and reuse it in both places to DRY up the code: create a descriptive variable
(e.g. cross_table_error_msg) near where found_in_part and found_in_cont are
checked, build the template once (including {.val {part_cols}} and {.val
{cont_cols}} placeholders), and replace the duplicated cli::cli_abort(...)
invocations to call cli::cli_abort(cross_table_error_msg) so both branches
reference the same message text.
- Around line 56-73: The subset method `[.contact_survey` currently treats
symbol names in i (derived via ref_vars) that are not found in participants or
contacts by returning the unfiltered survey with a warning; this can be
confusing for cases like survey[n] where n is a caller variable. Update the
logic after computing ref_vars and checking found_in_part / found_in_cont (the
branch that currently issues a warning when !found_in_part && !found_in_cont) to
instead abort with cli::cli_abort giving a clear error that the referenced
symbol is not a column in participants or contacts, or at minimum escalate the
warning to an abort; ensure the message references the offending symbol(s) and
keeps the existing behavior for literal numeric/logical indexing handled earlier
(assemble_survey, copy(x$participants), copy(x$contacts)).
- Around line 28-32: The function copy.contact_survey currently mutates the
input by assigning to x$participants and x$contacts; change it to first make a
full deep copy of the input (e.g., new_x <- copy(x)) and then modify
new_x$participants and new_x$contacts before returning new_x so the original x
is never mutated and the function adheres to the deep-copy contract of copy().
In `@R/weigh.R`:
- Around line 106-140: The branch that assigns weight_factor for NA group_idx
values (in the weight_factor calculation using is.na(group_idx) => sum(target) /
n_total) should be annotated with a concise inline comment clarifying the design
choice: explain that unmatched/NA participants receive the average weight
computed as sum(target) divided by n_total (all participants) so they get a
"neutral average" and this may not preserve the original target budget exactly;
reference the variables group_idx, weight_factor, sum(target), n_total and the
update participants[, weight := weight * weight_factor] so future readers
understand why unmatched NAs are handled this way.
- Around line 143-167: In weigh_named, table(col_vals) drops NA counts which
causes matched_counts to be NA for rows where by is NA; update val_counts to use
table(col_vals, useNA = "ifany") and then add explicit handling for NA keys:
compute matched_counts so NA entries get a numeric count (or treat them
separately) and decide whether to assign a default weight or keep NA, then apply
that logic before computing weight_factor and updating participants[, weight :=
...] to avoid silently turning weights into NA for NA group values.
In `@tests/testthat/test-compute-matrix.R`:
- Around line 1-91: Add a unit test in tests/testthat/test-compute-matrix.R that
calls compute_matrix(polymod_uk_grouped, weight_threshold = <small positive
value>) and compares it to the baseline compute_matrix(polymod_uk_grouped) to
ensure the thresholding behavior: assert that all entries in the thresholded
matrix are less than or equal to the un-thresholded matrix and that at least one
entry is strictly smaller (i.e. matrices are not identical). Also add an
assertion that the thresholded result differs from contact_matrix(...) when
using legacy weighting settings to catch the global vs by="age.group"
normalization inconsistency; reference compute_matrix, weight_threshold and
contact_matrix to locate the relevant logic.
In `@tests/testthat/test-postprocess-matrix.R`:
- Around line 106-111: Update the test for per_capita() to assert the actual
transformation rather than an indirect comparison: compute the expected
per-capita matrix by dividing result_base$matrix element-wise by the population
values resolved for each cell (using the same logic per_capita uses to map
survey_pop/pop into a matrix of the same dimensions), then replace
expect_true(all(pc$matrix < result_base$matrix)) with an equality check like
expect_equal(pc$matrix, expected_matrix, tolerance = <small_tol>) so the test
verifies pc$matrix == result_base$matrix / resolved_population element-wise (use
pc, per_capita, result_base, and pop to locate relevant variables).
In `@tests/testthat/test-subset-contact-survey.R`:
- Around line 43-47: The test currently asserts an exact set of names with
expect_named(result, c("participants", "contacts", "reference")), which is
fragile; change the assertion to only require the mandatory fields be present
after subsetting (for example assert that "participants", "contacts" and
"reference" are all in names(result) or use a membership-based expectation)
while leaving other fields allowable so the test continues to pass if polymod
gains extra columns.
In `@tests/testthat/test-weigh.R`:
- Around line 193-204: The inline nolint block around the test_that("weigh()
errors for mismatched target/groups lengths", { ... }) is split oddly with "#
nolint end" inside the test body; move the "# nolint end" so the suppression is
visually consistent — either put it on the same line as the start comment (after
"nonportable_path_linter") or place it immediately after the closing "}" of the
test_that block; update the comment surrounding the test_that (not the test
logic) so the nolint scope still covers the quoted string but no longer appears
inside the test body.
Co-authored-by: sbfnk-bot <242615673+sbfnk-bot@users.noreply.github.com>
Co-authored-by: sbfnk-bot <242615673+sbfnk-bot@users.noreply.github.com>
Co-authored-by: sbfnk-bot <242615673+sbfnk-bot@users.noreply.github.com>
8cab50c to
d3345eb
Compare
Co-authored-by: sbfnk-bot <242615673+sbfnk-bot@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
R/compute-matrix.R (1)
62-62:normalise_weightsreturn value is silently discarded.
normalise_weightsmodifiessurvey$participantsin-place via:=, so discarding the return value is functionally correct. Consider adding an invisible-return conventioninvisible(NULL)or updating callers to usesurvey$participants <- normalise_weights(...)for consistency with howweight_by_ageand similar helpers are called elsewhere in the file.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/compute-matrix.R` at line 62, normalise_weights is mutating survey$participants in-place but its return is ignored; update normalise_weights to return the modified data frame invisibly (e.g., return(invisible(participants)) so callers can continue to ignore the value but get a usable result if needed) or change this call to follow the pattern used by weight_by_age and similar helpers by assigning the result back (survey$participants <- normalise_weights(survey$participants, threshold = weight_threshold)); pick one approach and apply consistently across callers and tests, referencing the normalise_weights function and the survey$participants usage in compute-matrix.R when making the change.R/contact-matrix-utils.R (1)
718-755:truncate_renormalise_weightsis dead code with no callers — remove it and the hardcodednormalise_weightsvariant.
truncate_renormalise_weights(lines 748–755) has no external callers and duplicates the threshold-capping-and-renormalize logic already present innormalise_weights(lines 732–745). Whilenormalise_weightsoffers a more flexiblebyparameter,truncate_renormalise_weightshardcodesby = age.groupand is unused. Remove it. Additionally,participant_weights(line 758) andreturn_participant_weights(line 1199) also have no external callers and should be reviewed for removal.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/contact-matrix-utils.R` around lines 718 - 755, Remove the dead, duplicated truncation helper and hardcoded variant: delete the truncate_renormalise_weights function and the hardcoded variant of normalise_weights that uses age.group only, and consolidate logic into the single flexible normalise_weights(participants, by = "age.group", threshold = NULL) which already handles capping and re-normalising; also search for and remove unused functions participant_weights and return_participant_weights (or confirm callers and refactor if still needed) so there are no duplicate/unused weight-normalisation implementations left in the file.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@NEWS.md`:
- Around line 20-27: The example uses the native pipe placeholder `_` (polymod
|> _[country == "United Kingdom"] |> ...) which requires R ≥ 4.2.0 and will
parse-fail on older R versions; update the NEWS example by either adding a short
note stating "requires R ≥ 4.2.0" or replace the placeholder pipe expression
with the equivalent non-placeholder form (e.g. use polymod[country == "United
Kingdom"] before piping into assign_age_groups, compute_matrix, and symmetrise
with pop) so users on older R versions can copy-paste without error.
In `@R/compute-matrix.R`:
- Line 33: The call copy(survey) uses data.table::copy (not an S3 generic) so
the registered S3method(copy, contact_survey) never dispatches; either remove
the ineffective S3 registration from NAMESPACE or define a local S3 generic copy
<- function(x, ...) UseMethod("copy") so copy.contact_survey (defined in
R/subset-contact-survey.R) is invoked—update the package NAMESPACE to match the
chosen approach and ensure any calls to copy(...) within the package use the
local generic (or stop registering the S3method if you keep using
data.table::copy).
---
Duplicate comments:
In `@NEWS.md`:
- Around line 32-33: The NEWS.md entry uses American spelling "normalization"
but the codebase uses British spelling; update the line describing
compute_matrix() (mentioning compute_matrix() and weight_threshold) to replace
"normalization" with "normalisation" so it matches other identifiers like
normalise_weights and normalise_weighted_matrix; ensure the rest of the sentence
remains unchanged and run a quick search to confirm no other occurrences in
NEWS.md use the American variant.
In `@R/compute-matrix.R`:
- Around line 27-31: The default for weight_threshold in compute_matrix is
inconsistent with contact_matrix (NULL vs NA); change compute_matrix's signature
to use weight_threshold = NA so both APIs match, and ensure any internal checks
relying on !is.null(threshold) && !is.na(threshold) still work with the new
default; update references to compute_matrix (and its Roxygen/docs/tests if
present) to reflect the NA default, and verify compatibility with
normalise_weights and truncate_renormalise_weights which already handle NULL/NA
guards.
In `@R/contact_matrix.R`:
- Around line 308-327: When weigh_dayofweek is TRUE but survey$participants
lacks a dayofweek column, add an is.weekday logical column to avoid later
data.table grouping errors in return_part_weights; update the else branch that
currently only issues a cli_warn to set survey$participants[, is.weekday :=
FALSE] (or NA if you prefer unknown) so the is.weekday symbol always exists for
subsequent grouping operations.
---
Nitpick comments:
In `@R/compute-matrix.R`:
- Line 62: normalise_weights is mutating survey$participants in-place but its
return is ignored; update normalise_weights to return the modified data frame
invisibly (e.g., return(invisible(participants)) so callers can continue to
ignore the value but get a usable result if needed) or change this call to
follow the pattern used by weight_by_age and similar helpers by assigning the
result back (survey$participants <- normalise_weights(survey$participants,
threshold = weight_threshold)); pick one approach and apply consistently across
callers and tests, referencing the normalise_weights function and the
survey$participants usage in compute-matrix.R when making the change.
In `@R/contact-matrix-utils.R`:
- Around line 718-755: Remove the dead, duplicated truncation helper and
hardcoded variant: delete the truncate_renormalise_weights function and the
hardcoded variant of normalise_weights that uses age.group only, and consolidate
logic into the single flexible normalise_weights(participants, by = "age.group",
threshold = NULL) which already handles capping and re-normalising; also search
for and remove unused functions participant_weights and
return_participant_weights (or confirm callers and refactor if still needed) so
there are no duplicate/unused weight-normalisation implementations left in the
file.
- Use British spelling 'normalisation' in NEWS.md - Set weigh_dayofweek to FALSE when dayofweek column is missing - Add NA guard to per_capita() for consistency with symmetrise/split_matrix Co-authored-by: sbfnk-bot <242615673+sbfnk-bot@users.noreply.github.com>
Co-authored-by: sbfnk-bot <242615673+sbfnk-bot@users.noreply.github.com>
- Define copy() S3 generic so copy.contact_survey dispatches correctly - Add copy.default method that falls back to data.table::copy - Remove unused participant_weights() and truncate_renormalise_weights() - Return invisible(participants) from normalise_weights() Co-authored-by: sbfnk-bot <242615673+sbfnk-bot@users.noreply.github.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
NAMESPACE (1)
42-42:⚠️ Potential issue | 🟠 MajorRemove all stale
@importFrom data.table copyannotations — not just the one in[.contact_survey.With
export(copy)added (line 15), the localcopygeneric masks thedata.table::copyimport, creating a NAMESPACE conflict. However, the problem is broader than the review comment indicates:@importFrom data.table copyappears in five files (not justsubset-contact-survey.R):
R/subset-contact-survey.R:64(on[.contact_survey)R/weigh.R:43(onweigh)R/contact_matrix.R:35(oncontact_matrix)R/compute-matrix.R:24(oncompute_matrix)R/deprecated.R:63(on.get_survey)All bare
copy()calls in these functions dispatch through the package's generic, which routes tocopy.defaultorcopy.contact_survey—both calldata.table::copy()explicitly via the::operator. None of these annotations are needed. Removing only the one fromsubset-contact-survey.Rwill not eliminate the NAMESPACE entry; all five must be removed to clear the conflict entirely.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@NAMESPACE` at line 42, Remove every stale `@importFrom` data.table copy annotation from the package sources — not just in subset-contact-survey.R — so the local export(copy) generic no longer conflicts with an imported symbol; specifically delete the `@importFrom` data.table copy lines found on the methods/functions `[.contact_survey`, weigh, contact_matrix, compute_matrix, and .get_survey (they appear in R/subset-contact-survey.R, R/weigh.R, R/contact_matrix.R, R/compute-matrix.R, and R/deprecated.R) because these functions call data.table::copy() explicitly or dispatch through the package generic and do not need the import.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@NAMESPACE`:
- Line 42: Remove every stale `@importFrom` data.table copy annotation from the
package sources — not just in subset-contact-survey.R — so the local
export(copy) generic no longer conflicts with an imported symbol; specifically
delete the `@importFrom` data.table copy lines found on the methods/functions
`[.contact_survey`, weigh, contact_matrix, compute_matrix, and .get_survey (they
appear in R/subset-contact-survey.R, R/weigh.R, R/contact_matrix.R,
R/compute-matrix.R, and R/deprecated.R) because these functions call
data.table::copy() explicitly or dispatch through the package generic and do not
need the import.
---
Duplicate comments:
In `@NAMESPACE`:
- Around line 7-15: Add the S3 generic export and default method registration
for copy so R dispatch picks up copy.contact_survey: ensure the NAMESPACE
includes export(copy) and S3method(copy,default) alongside the existing
S3method(copy,contact_survey) so calls to copy(...) from compute_matrix invoke
copy.contact_survey rather than falling back to data.table::copy; update
NAMESPACE to include both export(copy) and S3method(copy,default) if not already
present.
These conflicted with the new copy() S3 generic. Functions now dispatch through the package generic, which calls data.table::copy() explicitly. Co-authored-by: sbfnk-bot <242615673+sbfnk-bot@users.noreply.github.com>
|
Caution Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted. Error details |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
R/contact_matrix.R (1)
307-337: Add a regression test covering the NAdayofweekbug fix.The stated bug fix ("participants with NA dayofweek were previously treated as weekend; they now receive an average weight across all days") has no corresponding regression test. A test exercising
weigh_dayofweek = TRUEwith a participant whosedayofweekisNAwould confirm the fix holds and guard against regressions.Based on learnings, regression tests should be added for bug fixes. Would you like me to draft the test case for this scenario?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/contact_matrix.R` around lines 307 - 337, Add a unit/regression test that ensures participants with NA dayofweek are given the average weight across all days when weigh_dayofweek = TRUE: create a minimal survey input (participants data.frame) containing at least one participant with dayofweek = NA and other participants on weekday and weekend, call the contact_matrix (or the same entry point that runs the weigh(...) logic) with weigh_dayofweek = TRUE, then assert that the weight assigned to the NA participant equals the expected mean of the weekday and weekend group weights (or matches the computed average from the non-NA groups). Reference the weigh_dayofweek flag, the survey$participants dayofweek column, and the weigh(...) path that sets is.weekday so the test exercises the exact code path fixed in contact_matrix.R.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@R/contact_matrix.R`:
- Around line 307-327: The is.weekday assignment currently treats NA dayofweek
as FALSE causing NA participants to be grouped with weekends; change the
assignment in the weigh_dayofweek block so is.weekday is NA whenever dayofweek
is NA (e.g., using a conditional vectorized assignment or data.table::fifelse
checking is.na(dayofweek) before computing dayofweek %in% 1:5) on
survey$participants; keep the rest of the weigh_dayofweek logic (weigh(),
target/groups) intact so return_part_weights can later distinguish true weekends
from NA dayofweek entries.
---
Duplicate comments:
In `@NAMESPACE`:
- Line 7: Ensure the S3 dispatch chain for copy is complete by adding
S3method(copy,default) and export(copy) to the NAMESPACE, remove
importFrom(data.table,copy) (keeping import(data.table)), and confirm
copy.default calls data.table::copy() so copy.contact_survey is dispatched
correctly instead of data.table::copy being invoked directly.
In `@R/compute-matrix.R`:
- Around line 27-31: No action required: the compute_matrix function correctly
adds the weight_threshold parameter (default NULL) and the copy(survey) call
properly dispatches to copy.contact_survey preserving deep-copy behavior; no
code changes needed for compute_matrix, weight_threshold, or
warn_multiple_observations delegation.
---
Nitpick comments:
In `@R/contact_matrix.R`:
- Around line 307-337: Add a unit/regression test that ensures participants with
NA dayofweek are given the average weight across all days when weigh_dayofweek =
TRUE: create a minimal survey input (participants data.frame) containing at
least one participant with dayofweek = NA and other participants on weekday and
weekend, call the contact_matrix (or the same entry point that runs the
weigh(...) logic) with weigh_dayofweek = TRUE, then assert that the weight
assigned to the NA participant equals the expected mean of the weekday and
weekend group weights (or matches the computed average from the non-NA groups).
Reference the weigh_dayofweek flag, the survey$participants dayofweek column,
and the weigh(...) path that sets is.weekday so the test exercises the exact
code path fixed in contact_matrix.R.
|
Fixed: Removed all stale |
Previously NA dayofweek was incorrectly classified as is.weekday=FALSE (since NA %in% 1:5 returns FALSE). Now uses fifelse to preserve NA, allowing these participants to be distinguished from true weekend days. Co-authored-by: sbfnk-bot <242615673+sbfnk-bot@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@NAMESPACE`:
- Line 7: Update the roxygen comments for the exported S3 generic so the ?copy
help page documents that this package exports S3method(copy,default) which may
mask data.table::copy depending on load order; edit the roxygen block in
R/subset-contact-survey.R to add a Note section explaining masking behavior and
that copy.default delegates to data.table::copy for internal calls, and instruct
users to call data.table::copy explicitly if they need the data.table
implementation. Also add a short NEWS entry describing the new exported copy
generic and the load-order caveat so users are informed of the potential
masking.
Co-authored-by: sbfnk-bot <242615673+sbfnk-bot@users.noreply.github.com>
NA / population = NA, so NAs can propagate through per_capita() unlike symmetrise() and split_matrix() which need complete data. Co-authored-by: sbfnk-bot <242615673+sbfnk-bot@users.noreply.github.com>
Avoids masking data.table::copy() and simplifies the code. Internal function, not exported. Co-authored-by: sbfnk-bot <242615673+sbfnk-bot@users.noreply.github.com>
|
Refactored: Replaced the |
Summary
Fixes #131
This is a follow-up to #287 that refactors
contact_matrix()to use the pipeline infrastructure internally.Changes
contact_matrix()now usesweigh()for all weighting (day-of-week, age, user-defined weights)warn_multiple_observations(),normalise_weights()compute_matrix()gainsweight_thresholdparameter to matchcontact_matrix()Bug fix
Fixed bug where participants with NA
dayofweekwere incorrectly weighted as weekend days. They now receive an average weight across all days.Summary by CodeRabbit
New Features
weight_thresholdparameter tocompute_matrix()to cap extreme weights before normalization.Bug Fixes
API Changes
copy.contact_survey()tocopy_survey()for consistency.