Skip to content

Reduce cyclomatic complexity of try_merge_additional_files()#292

Merged
sbfnk merged 7 commits intomainfrom
reduce-complexity-try-merge
Feb 24, 2026
Merged

Reduce cyclomatic complexity of try_merge_additional_files()#292
sbfnk merged 7 commits intomainfrom
reduce-complexity-try-merge

Conversation

@sbfnk
Copy link
Collaborator

@sbfnk sbfnk commented Feb 23, 2026

Closes #289.

Extract three focused helpers from try_merge_additional_files() to bring cyclomatic complexity from 37 down to under 15:

  • get_mergeable_files() — identifies which survey files share columns with the main table (was duplicated inline)
  • resolve_longitudinal_key() — validates a user-provided participant_key or auto-detects via find_unique_key()
  • try_merge_one_file() — handles merging a single file including error handling, duplicate detection, and match quality warnings

The outer function becomes a simple orchestrator. All existing tests pass unchanged.

Summary by CodeRabbit

  • New Features
    • Improved multi-file survey merging with automatic detection of joinable files, iterative merging of additional data, and clearer messaging when a longitudinal/participant key is detected and propagated.
  • Bug Fixes
    • Fixed unmatched-merge warning counts so warnings report accurate counts when duplicate keys exist.
  • Refactor
    • Centralized and modularized the merging flow for more consistent results and consolidated user-facing warnings.

sbfnk-bot and others added 2 commits February 23, 2026 19:44
Co-authored-by: sbfnk <sebastian.funk@lshtm.ac.uk>
Co-authored-by: sbfnk <sebastian.funk@lshtm.ac.uk>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 23, 2026

Important

Review skipped

This PR was authored by the user configured for CodeRabbit reviews. By default, CodeRabbit skips reviewing PRs authored by this user. It's recommended to use a dedicated user account to post CodeRabbit review feedback.

To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR refactors the survey-file merging workflow by extracting multiple helper functions from try_merge_additional_files(), centralizing merge logic and warnings, and fixes incorrect unmatched-merge warning counts when duplicate keys are present. The main function now delegates to the new helpers and returns an observation_key with merged surveys.

Changes

Cohort / File(s) Summary
Release notes
NEWS.md
Documents the refactor reducing cyclomatic complexity and the fix for unmatched-merge warning counts.
Global variables registry
R/globals.R
Retagged the ..merge_id documentation entry from <try_merge_additional_files> to <try_merge_one_file> (no behavioral change).
Merge utilities refactoring
R/load-survey-utils.R
Added helpers: get_mergeable_files, resolve_longitudinal_key, try_merge_one_file, warn_merge_quality, merge_all_files, inform_longitudinal_key; refactored try_merge_additional_files to delegate to these helpers, centralize warnings, and return an observation_key alongside merged surveys. Also adjusted orchestration and duplicate-key handling in merges.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

🐰
I hopped through rows and columns bright,
Untangled keys by morning light.
Six helpers stitched each wandering file,
Warnings now true — I pause and smile. 🎉

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main objective: reducing cyclomatic complexity of try_merge_additional_files() through helper extraction.
Linked Issues check ✅ Passed The PR fully addresses #289 by extracting helper functions (get_mergeable_files, resolve_longitudinal_key, try_merge_one_file) and refactoring try_merge_additional_files to reduce complexity from 37 to under 15.
Out of Scope Changes check ✅ Passed All changes are directly related to reducing cyclomatic complexity: helper extraction, refactoring the merge workflow, and updating global variable documentation. No unrelated modifications detected.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@R/load-survey-utils.R`:
- Around line 253-279: The loop block around get_mergeable_files /
try_merge_one_file has lint/style violations; run lintr::lint_package(), fix any
flagged issues in this block (e.g., spacing, assignment style, missing commas,
line-wrapping, unused variables, and NULL vs logical checks) — specifically
review the usages of get_mergeable_files, try_merge_one_file,
main_surveys[[type]], merged_files and setdiff and adjust code to conform to
lintr rules (fix indentation, ensure commas and parentheses are correct, prefer
consistent assignment operators, and avoid constructing NULL vectors
incorrectly); after fixing, run devtools::document() to regenerate roxygen docs
and re-run lintr until no warnings remain before committing.
- Around line 215-230: The unmatched-merge warning is wrong when the main table
has duplicate keys because matched_main counts join pairs; change the logic to
count unique matched IDs so unmatched_merge uses the number of distinct matched
..merge_id values rather than row pairs. Specifically, compute
matched_main_unique as the number of unique non-NA values in
merged[["..merge_id"]] (using unique/na.omit) and use that value when computing
unmatched_merge (nrow(contact_data[[file]]) - matched_main_unique) and in the
related warnings (replace uses of matched_main with matched_main_unique where
indicating unique matched rows).

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 112c3e7 and 4600007.

📒 Files selected for processing (6)
  • NEWS.md
  • R/globals.R
  • R/load-survey-utils.R
  • man/get_mergeable_files.Rd
  • man/resolve_longitudinal_key.Rd
  • man/try_merge_one_file.Rd

Co-authored-by: sbfnk <sebastian.funk@lshtm.ac.uk>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
R/load-survey-utils.R (1)

254-288: Verify lint compliance after the formatting fix commit.

The prior CI failures from air/reviewdog around this block were reported in an earlier review and the latest commit claims to have fixed them. Please confirm lintr::lint_package() still passes cleanly for this file.

#!/bin/bash
# Verify lintr passes for the modified file
rg -n "get_mergeable_files|try_merge_one_file|merged_files" R/load-survey-utils.R | head -40

As per coding guidelines: "Run lintr::lint_package() to check for style issues and fix any in files you modified before committing."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@R/load-survey-utils.R` around lines 254 - 288, Run lintr::lint_package() and
address any lint errors in the updated merge loop area: inspect usage of
get_mergeable_files, try_merge_one_file, and the merged_files variable handling
in the merge loop and fix style issues (e.g., spacing, line length, assignment
consistency, NULL checks) so that lintr reports no issues for this region;
re-run lintr::lint_package() and commit the formatting/style fixes.
🧹 Nitpick comments (2)
R/load-survey-utils.R (2)

135-146: resolve_longitudinal_key silently discards an explicitly provided participant_key without warning.

If participant_key is non-NULL but either has columns missing from merged or doesn't uniquely identify rows, the function falls through to find_unique_key() with no diagnostic. The user gets a "Detected longitudinal data…" message citing an auto-detected key with no indication their explicit key was rejected.

Consider emitting a cli::cli_warn() before falling through:

♻️ Proposed fix
 resolve_longitudinal_key <- function(merged, participant_key = NULL) {
   if (!is.null(participant_key)) {
     missing_cols <- setdiff(participant_key, names(merged))
     if (
       length(missing_cols) == 0 &&
         anyDuplicated(merged, by = participant_key) == 0L
     ) {
       return(participant_key)
+    } else if (length(missing_cols) > 0) {
+      cli::cli_warn(
+        "Provided {.arg participant_key} column{?s} not found in merged data: \\
+         {.val {missing_cols}}. Attempting auto-detection."
+      )
+    } else {
+      cli::cli_warn(
+        "Provided {.arg participant_key} does not uniquely identify rows. \\
+         Attempting auto-detection."
+      )
     }
   }
   find_unique_key(merged, "part_id")
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@R/load-survey-utils.R` around lines 135 - 146, resolve_longitudinal_key
currently ignores a non-NULL participant_key if columns are missing or it is not
unique and falls through to find_unique_key without notifying the user; update
resolve_longitudinal_key to detect these two failure modes (missing_cols !=
NULL/length > 0 and anyDuplicated(merged, by = participant_key) != 0L) and emit
a cli::cli_warn() that includes the provided participant_key and the reason
(missing columns list or non-unique key) before calling find_unique_key, so
callers see why their explicit key was rejected.

162-235: ..merge_id is not cleaned from contact_data[[file]] on early return.

data.table's := at Line 163 mutates the table in-place through R's reference semantics. When try_merge_one_file returns early (Lines 191, 199, or 212), contact_data[[file]] in the caller retains the ..merge_id column. Line 235 only removes it from the freshly-created merged table.

In practice this is harmless — successfully-merged files are dropped from survey_files, and failed files have ..merge_id overwritten on the next retry — but it is a subtle side effect worth guarding against:

♻️ Proposed cleanup on early returns
   if (is.null(merged)) {
+    contact_data[[file]][, ("..merge_id") := NULL]
     return(list(merged = NULL, detected_key = NULL, file = NULL))
   }

   has_duplicates <- anyDuplicated(merged[, "..main_id", with = FALSE]) > 0
   accept_merge <- !has_duplicates
   detected_key <- NULL

   if (has_duplicates && type == "contact") {
+    contact_data[[file]][, ("..merge_id") := NULL]
     return(list(merged = NULL, detected_key = NULL, file = NULL))
   }

   ...

   if (!accept_merge) {
+    contact_data[[file]][, ("..merge_id") := NULL]
     return(list(merged = NULL, detected_key = NULL, file = NULL))
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@R/load-survey-utils.R` around lines 162 - 235, The code adds a temporary
..merge_id column to contact_data[[file]] in try_merge_one_file but returns
early in several places (when merged is NULL, when has_duplicates for type
"contact", and when !accept_merge) without removing that column, leaving a
mutated contact_data; fix by avoiding in-place mutation or ensuring cleanup:
either operate on a copy (use data.table::copy(contact_data[[file]]) before
assigning ..merge_id) or register a cleanup with on.exit({
contact_data[[file]][, "..merge_id" := NULL] }, add = TRUE) immediately after
creating ..merge_id so the temporary column is removed regardless of which early
return path (references: contact_data[[file]], ..merge_id, try_merge_one_file,
merged, accept_merge).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@R/load-survey-utils.R`:
- Around line 254-288: Run lintr::lint_package() and address any lint errors in
the updated merge loop area: inspect usage of get_mergeable_files,
try_merge_one_file, and the merged_files variable handling in the merge loop and
fix style issues (e.g., spacing, line length, assignment consistency, NULL
checks) so that lintr reports no issues for this region; re-run
lintr::lint_package() and commit the formatting/style fixes.

---

Nitpick comments:
In `@R/load-survey-utils.R`:
- Around line 135-146: resolve_longitudinal_key currently ignores a non-NULL
participant_key if columns are missing or it is not unique and falls through to
find_unique_key without notifying the user; update resolve_longitudinal_key to
detect these two failure modes (missing_cols != NULL/length > 0 and
anyDuplicated(merged, by = participant_key) != 0L) and emit a cli::cli_warn()
that includes the provided participant_key and the reason (missing columns list
or non-unique key) before calling find_unique_key, so callers see why their
explicit key was rejected.
- Around line 162-235: The code adds a temporary ..merge_id column to
contact_data[[file]] in try_merge_one_file but returns early in several places
(when merged is NULL, when has_duplicates for type "contact", and when
!accept_merge) without removing that column, leaving a mutated contact_data; fix
by avoiding in-place mutation or ensuring cleanup: either operate on a copy (use
data.table::copy(contact_data[[file]]) before assigning ..merge_id) or register
a cleanup with on.exit({ contact_data[[file]][, "..merge_id" := NULL] }, add =
TRUE) immediately after creating ..merge_id so the temporary column is removed
regardless of which early return path (references: contact_data[[file]],
..merge_id, try_merge_one_file, merged, accept_merge).

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4600007 and 08a2f5b.

📒 Files selected for processing (1)
  • R/load-survey-utils.R

Co-authored-by: sbfnk <sebastian.funk@lshtm.ac.uk>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
R/load-survey-utils.R (2)

118-124: vapply FUN.VALUE — prefer logical(1) over TRUE.

TRUE works as a template value but logical(1) is the idiomatic form that makes the expected return type explicit.

♻️ Proposed refactor
  can_merge <- vapply(
    survey_files,
    function(x) {
      any(colnames(contact_data[[x]]) %in% main_cols)
    },
-   TRUE
+   logical(1)
  )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@R/load-survey-utils.R` around lines 118 - 124, The vapply call that computes
can_merge uses TRUE as the FUN.VALUE template; change that to logical(1) so the
expected return type is explicit — update the vapply over survey_files (the
anonymous function inspecting colnames(contact_data[[x]]) %in% main_cols) to use
FUN.VALUE = logical(1) instead of TRUE.

160-165: ..merge_id added to contact_data[[file]] in place but never cleaned up.

contact_data[[file]][, ..merge_id := seq_len(.N)] uses data.table reference semantics — it modifies the table in place even without the outer <- assignment. After try_merge_one_file returns, ..merge_id persists on the original data.table. It's overwritten safely if the file is retried, and the side-effect is contained within join_possible_files, so this is not a correctness bug. Consider adding on.exit(contact_data[[file]][, ("..merge_id") := NULL]) or an explicit cleanup after warn_merge_quality for hygiene.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@R/load-survey-utils.R` around lines 160 - 165, The temporary column
"..merge_id" is added in place to contact_data[[file]] (via
contact_data[[file]][, ..merge_id := seq_len(.N)]) and is never removed; update
try_merge_one_file (or the caller join_possible_files) to always clean up that
column after use—either add on.exit(contact_data[[file]][, ("..merge_id") :=
NULL]) at the start of try_merge_one_file or explicitly remove ("..merge_id") :=
NULL immediately after warn_merge_quality/merge logic so the side-effect on
contact_data[[file]] is eliminated.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@R/load-survey-utils.R`:
- Around line 262-318: Initialize merged_files as character(0) (not NULL) in the
merge loop inside try_merge_one_file flow (symbol: merged_files) and change the
break check to test its length (e.g., if (length(merged_files) == 0) break) so
the grow-with-c() pattern doesn't trigger vector_initialize_linter and avoids an
infinite loop; in inform_longitudinal_key replace the escaped double-quote
construction that builds key_code (currently paste0("\"", detected_key, "\"",
...)) with single-quote usage (e.g., paste0("'", detected_key, "'", collapse =
", ")) to satisfy quotes_linter, then run lintr::lint_package() and the air
formatter to confirm no remaining style issues.
- Around line 133-143: resolve_longitudinal_key currently ignores a
user-supplied participant_key when it's invalid without notifying the caller;
modify resolve_longitudinal_key to validate participant_key (using missing_cols
<- setdiff(participant_key, names(merged)) and anyDuplicated(merged, by =
participant_key)) and if invalid emit a clear warning message stating which
columns are missing or that the key is not unique and that participant_key will
be ignored, then fall back to find_unique_key(merged, "part_id"); keep the
original return behavior but ensure the warning references participant_key,
missing_cols and the uniqueness check so users know why their key was discarded.

---

Nitpick comments:
In `@R/load-survey-utils.R`:
- Around line 118-124: The vapply call that computes can_merge uses TRUE as the
FUN.VALUE template; change that to logical(1) so the expected return type is
explicit — update the vapply over survey_files (the anonymous function
inspecting colnames(contact_data[[x]]) %in% main_cols) to use FUN.VALUE =
logical(1) instead of TRUE.
- Around line 160-165: The temporary column "..merge_id" is added in place to
contact_data[[file]] (via contact_data[[file]][, ..merge_id := seq_len(.N)]) and
is never removed; update try_merge_one_file (or the caller join_possible_files)
to always clean up that column after use—either add
on.exit(contact_data[[file]][, ("..merge_id") := NULL]) at the start of
try_merge_one_file or explicitly remove ("..merge_id") := NULL immediately after
warn_merge_quality/merge logic so the side-effect on contact_data[[file]] is
eliminated.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 08a2f5b and 5130230.

📒 Files selected for processing (2)
  • NEWS.md
  • R/load-survey-utils.R

Co-authored-by: sbfnk <sebastian.funk@lshtm.ac.uk>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
R/load-survey-utils.R (2)

167-167: Inconsistent := style for ..-prefixed column names.

Line 167 uses the bare-symbol form (..merge_id := ...) while lines 209 and 213 use the explicit parenthesized-string form (("..main_id") := ...). Both are functionally correct — in data.table's := context, a bare LHS symbol is taken as a literal column name, so ..merge_id creates the column "..merge_id". However, readers familiar with data.table's .. parent-frame lookup convention for column selection may find the bare form confusing. Line 361 has the same pattern.

♻️ Suggested fix — use the parenthesized form consistently
-  contact_data[[file]] <- contact_data[[file]][,
-    ..merge_id := seq_len(.N)
-  ]
+  contact_data[[file]] <- contact_data[[file]][,
+    ("..merge_id") := seq_len(.N)
+  ]

And at line 361:

-    main_surveys[[type]] <- result$merged[, ..main_id := NULL]
+    main_surveys[[type]] <- result$merged[, ("..main_id") := NULL]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@R/load-survey-utils.R` at line 167, The code mixes two styles for creating
columns named with a leading ".." (bare-symbol form like ..merge_id :=
seq_len(.N) vs the explicit parenthesized-string form like ("..main_id") :=
...); update the bare uses (e.g., the ..merge_id assignment and the similar
occurrence around line 361) to the consistent parenthesized-string form so they
become ("..merge_id") := seq_len(.N) to match ("..main_id") := ... elsewhere,
keeping the same RHS logic and ensuring consistent readability across the file.

324-328: Single-element detected_key emits c("x") instead of "x" in the suggested call.

When length(detected_key) == 1, key_code becomes c("sday") rather than the more natural "sday". This is valid R but slightly noisier than necessary in the user-facing message.

♻️ Optional fix
-  key_code <- paste0(
-    "c(",
-    paste0('"', detected_key, '"', collapse = ", "),
-    ")"
-  )
+  key_code <- if (length(detected_key) == 1L) {
+    paste0('"', detected_key, '"')
+  } else {
+    paste0(
+      "c(",
+      paste0('"', detected_key, '"', collapse = ", "),
+      ")"
+    )
+  }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@R/load-survey-utils.R` around lines 324 - 328, The generated key_code
currently always wraps detected_key in c(...), producing c("x") for
single-element vectors; change the construction of key_code to format as a plain
quoted string when length(detected_key) == 1 and only use the c("...","...")
form when length > 1 — i.e., inspect detected_key, and set key_code to '"' +
detected_key + '"' for a single value and to 'c("a","b",...)' (using the
existing paste0 collapse logic) otherwise so the user-facing suggestion shows
"sday" instead of c("sday").
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@R/load-survey-utils.R`:
- Around line 133-145: In resolve_longitudinal_key, distinguish the two failure
modes: check missing_cols (setdiff(participant_key, names(merged))) first and if
non-empty emit a warning that names in participant_key are missing and list
missing_cols; otherwise (when all columns exist but anyDuplicated(..., by =
participant_key) != 0L) emit the existing warning that the provided
participant_key does not uniquely identify rows and that you will auto-detect a
key; ensure messages reference the variable participant_key and include the
missing column names when applicable.

---

Nitpick comments:
In `@R/load-survey-utils.R`:
- Line 167: The code mixes two styles for creating columns named with a leading
".." (bare-symbol form like ..merge_id := seq_len(.N) vs the explicit
parenthesized-string form like ("..main_id") := ...); update the bare uses
(e.g., the ..merge_id assignment and the similar occurrence around line 361) to
the consistent parenthesized-string form so they become ("..merge_id") :=
seq_len(.N) to match ("..main_id") := ... elsewhere, keeping the same RHS logic
and ensuring consistent readability across the file.
- Around line 324-328: The generated key_code currently always wraps
detected_key in c(...), producing c("x") for single-element vectors; change the
construction of key_code to format as a plain quoted string when
length(detected_key) == 1 and only use the c("...","...") form when length > 1 —
i.e., inspect detected_key, and set key_code to '"' + detected_key + '"' for a
single value and to 'c("a","b",...)' (using the existing paste0 collapse logic)
otherwise so the user-facing suggestion shows "sday" instead of c("sday").

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5130230 and 0bfde9c.

📒 Files selected for processing (1)
  • R/load-survey-utils.R

…:= style

Co-authored-by: sbfnk <sebastian.funk@lshtm.ac.uk>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
R/load-survey-utils.R (1)

293-295: Conflicting detected_key values from different files are silently overwritten.

If two separate survey files processed within the same for-loop pass both produce a non-NULL detected_key, the second silently replaces the first without any diagnostic. In practice a single type has at most one longitudinal file, but a guard would catch unexpected cases early.

💡 Optional guard
     if (!is.null(result$detected_key)) {
+      if (!is.null(detected_key) && !setequal(detected_key, result$detected_key)) {
+        cli::cli_warn(
+          "Multiple files produced conflicting longitudinal keys; \\
+          using key from {.file {basename(file)}}.",
+          call = call
+        )
+      }
       detected_key <- result$detected_key
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@R/load-survey-utils.R` around lines 293 - 295, The code silently overwrites
an existing detected_key when result$detected_key is non-NULL; update the block
around detected_key assignment to check if detected_key is already set before
overwriting (i.e., test existing detected_key variable), and if it is set then
emit a diagnostic (warning or stop) that names the conflicting files/variables
(or append to a list) so conflicts are surfaced; reference the variables
result$detected_key and detected_key and perform the guard/diagnostic instead of
unconditional reassignment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@R/load-survey-utils.R`:
- Around line 381-386: The warning uses the full path for {.file {file}} but
other messages use basename(file), causing inconsistent UX; update the
cli::cli_warn call in the loop to use basename(file) (i.e., {.file
{basename(file)}}) so it matches warn_merge_quality and the abort in
try_merge_one_file and preserves the same shortened filename presentation.
- Around line 133-155: Add a call parameter to resolve_longitudinal_key (e.g.
call = rlang::caller_env()), pass that call into both cli::cli_warn calls so
warnings report the correct caller, and ensure the function still calls
find_unique_key as before; then update the call site in try_merge_one_file to
forward its call argument into resolve_longitudinal_key (so try_merge_one_file
-> resolve_longitudinal_key receives the same call), matching the pattern used
by other helpers like inform_longitudinal_key.

---

Nitpick comments:
In `@R/load-survey-utils.R`:
- Around line 293-295: The code silently overwrites an existing detected_key
when result$detected_key is non-NULL; update the block around detected_key
assignment to check if detected_key is already set before overwriting (i.e.,
test existing detected_key variable), and if it is set then emit a diagnostic
(warning or stop) that names the conflicting files/variables (or append to a
list) so conflicts are surfaced; reference the variables result$detected_key and
detected_key and perform the guard/diagnostic instead of unconditional
reassignment.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0bfde9c and 6702271.

📒 Files selected for processing (1)
  • R/load-survey-utils.R

…istently

Co-authored-by: sbfnk <sebastian.funk@lshtm.ac.uk>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
R/load-survey-utils.R (2)

179-181: data.table reference semantics: ..merge_id is added in-place to the caller's data.

:= mutates the original contact_data[[file]] data.table before the list-level copy-on-modify kicks in, so the caller's contact_data[[file]] permanently gains the ..merge_id column. After a successful merge, only merged[, ("..merge_id") := NULL] (line 226) cleans up the merged result; the source data.table is never cleaned. This is contained within the current flow (the file is dropped from survey_files on success), but it is an invisible side-effect on the caller's contact_data.

To avoid mutating the caller's data, copy before adding the tracking column:

♻️ Proposed fix
-  contact_data[[file]] <- contact_data[[file]][,
-    ("..merge_id") := seq_len(.N)
-  ]
+  file_data <- data.table::copy(contact_data[[file]])[,
+    ("..merge_id") := seq_len(.N)
+  ]

Then replace subsequent references to contact_data[[file]] within this function with file_data.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@R/load-survey-utils.R` around lines 179 - 181, The code mutates the caller's
data.table by adding ("..merge_id") with := on contact_data[[file]]; to fix,
make a copy first (e.g., assign file_data <- copy(contact_data[[file]])) then
perform file_data[, ("..merge_id") := seq_len(.N)] and use file_data for all
subsequent operations in this function (including merging) so the original
contact_data[[file]] is not modified; keep the existing cleanup merged[,
("..merge_id") := NULL] as-is after merge and ensure any later references that
previously used contact_data[[file]] in this function are switched to file_data.

402-408: Add call parameter to join_possible_files for consistent warning attribution.

Every helper function in this file accepts call = rlang::caller_env() and threads it through to cli_warn/cli_abort. join_possible_files breaks that chain: when it calls try_merge_additional_files without passing call, the default rlang::caller_env() captures join_possible_files's own environment, attributing all warnings to join_possible_files rather than the user-facing caller (load_survey()).

♻️ Proposed fix
 join_possible_files <- function(
   survey_files,
   contact_data,
   main_types,
   main_surveys,
-  participant_key = NULL
+  participant_key = NULL,
+  call = rlang::caller_env()
 ) {
   survey_contact_data <- join_compatible_files(survey_files, contact_data)
   contact_data <- survey_contact_data$contact_data
   survey_files <- survey_contact_data$survey_files

   result <- try_merge_additional_files(
     main_types,
     main_surveys,
     survey_files,
     contact_data,
-    participant_key = participant_key
+    participant_key = participant_key,
+    call = call
   )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@R/load-survey-utils.R` around lines 402 - 408, The join_possible_files
function should accept a call = rlang::caller_env() parameter and thread it into
any downstream warning/abort invocations so warnings are attributed to the
original caller; update join_possible_files signature to include call, pass call
into try_merge_additional_files where it is invoked, and ensure any
cli_warn/cli_abort calls inside join_possible_files (or calls it makes) use that
call parameter (consistent with other helpers and with load_survey()).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@R/load-survey-utils.R`:
- Around line 179-181: The code mutates the caller's data.table by adding
("..merge_id") with := on contact_data[[file]]; to fix, make a copy first (e.g.,
assign file_data <- copy(contact_data[[file]])) then perform file_data[,
("..merge_id") := seq_len(.N)] and use file_data for all subsequent operations
in this function (including merging) so the original contact_data[[file]] is not
modified; keep the existing cleanup merged[, ("..merge_id") := NULL] as-is after
merge and ensure any later references that previously used contact_data[[file]]
in this function are switched to file_data.
- Around line 402-408: The join_possible_files function should accept a call =
rlang::caller_env() parameter and thread it into any downstream warning/abort
invocations so warnings are attributed to the original caller; update
join_possible_files signature to include call, pass call into
try_merge_additional_files where it is invoked, and ensure any
cli_warn/cli_abort calls inside join_possible_files (or calls it makes) use that
call parameter (consistent with other helpers and with load_survey()).

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6702271 and f8d9de1.

📒 Files selected for processing (1)
  • R/load-survey-utils.R

@sbfnk sbfnk added this pull request to the merge queue Feb 24, 2026
Merged via the queue into main with commit 253c381 Feb 24, 2026
13 checks passed
@sbfnk sbfnk deleted the reduce-complexity-try-merge branch February 24, 2026 20:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reduce cyclomatic complexity of try_merge_additional_files

2 participants