Skip to content

Fix trial age filtering for sub-year units and stabilize CTGov cursor across page boundaries#199

Merged
imaurer merged 1 commit intomainfrom
add-trial-search-edge-case-tests
Mar 12, 2026
Merged

Fix trial age filtering for sub-year units and stabilize CTGov cursor across page boundaries#199
imaurer merged 1 commit intomainfrom
add-trial-search-edge-case-tests

Conversation

@imaurer
Copy link
Collaborator

@imaurer imaurer commented Mar 12, 2026

Summary

  • Sub-year age eligibility: parse_age_years now converts months, weeks, and days to fractional years so trials with minimum ages like "6 Months" are correctly included or excluded when filtering by patient age. Previously only integer-year values were handled, causing all sub-year age bounds to be silently ignored.

  • CTGov cursor stability: When an offset is fully satisfied within a fetched page, the search loop now stops at the page boundary and carries forward the upstream nextPageToken as-is. This prevents a compound cursor format from appearing and ensures callers can resume cleanly from the correct position.

  • PD-L1+ token matching: contains_keyword_tokens now correctly matches hyphenated biomarker terms with a trailing + (e.g. PD-L1+) against text containing that exact token, while rejecting similar but non-matching text like PD-L1 positive.

Test plan

  • parse_age_years returns correct fractional values for months, weeks, and days
  • verify_age_eligibility correctly filters trials with sub-year minimum and maximum ages
  • contains_keyword_tokens accepts PD-L1+ in matching text and rejects non-matching text
  • CTGov mock test confirms cursor is the upstream token after offset is consumed across a full page
  • All 579 unit tests pass (cargo test)
  • cargo clippy -- -D warnings clean
  • cargo fmt --check clean

- Parse sub-year age units (months, weeks, days) in eligibility age
  checks so trials with minimum ages like "6 Months" filter correctly
- Return CTGov's upstream next-page token after an offset is satisfied
  within a page, preventing compound cursor format and keeping pagination
  stable across page boundaries
- Add token-boundary matching for hyphenated biomarker terms with plus
  suffixes (e.g. PD-L1+) so keyword filters neither over- nor under-match

New unit tests cover all four fix paths: parse_age_years for each unit,
verify_age_eligibility for sub-year min/max, PD-L1+ token matching, and
the CTGov cursor regression case.
@imaurer imaurer merged commit b1cf71e into main Mar 12, 2026
2 checks passed
@imaurer imaurer deleted the add-trial-search-edge-case-tests branch March 13, 2026 20:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant