Skip to content

Conversation

@jazairi
Copy link
Contributor

@jazairi jazairi commented Nov 26, 2025

Why these changes are being introduced:

The zipper merge we implemented naively queries n/2 results from each API and interleaves them, where n is the per-page value. This works if both APIs return many results, but it can cause problems in smaller, unbalanced result sets.

For example, the query term doc edgerton returns 50 Primo results and 4 TIMDEX results. Page 1 only shows 14 results (4 TIMDEX and 10 Primo), and each subsequent page returns only 10 (all Primo).

Relevant ticket(s):

How this addresses that need:

This implements more sophisticated logic that first checks the number of hits returned by each API and passes that, along with the pagination information, to a Merged Search Paginator class. This service object develops a 'merge plan', calculates API offsets, and merges the results for each page.

Queries on the 'all' tab now fetch twice from each API: once to determine the total number of hits for the Merged Search Paginator then again to fetch results at the appropriate offset. While hardly ideal, this was the only option I could figure to avoid losing results. I limited these extra calls to queries beyond page 1, which is the only case where they are needed.

Side effects of this change:

  • We now clear cache before each search controller test. This was done to avoid odd test behavior, but I ran the suite 50 times without any issues, so it might be excessively cautious.
  • The search controller continues to grow with this new logic. I tried to split things into multiple helper methods, so if we want to move more things to service objects later, it might be easier to do so.
  • A failing cassette has been replaced with a mock.

Developer

Accessibility
  • ANDI or WAVE has been run in accordance to our guide.
  • This PR contains no changes to the view layer.
  • New issues flagged by ANDI or WAVE have been resolved.
  • New issues flagged by ANDI or WAVE have been ticketed (link in the Pull Request details above).
  • No new accessibility issues have been flagged.
New ENV
  • All new ENV is documented in README.
  • All new ENV has been added to Heroku Pipeline, Staging and Prod.
  • ENV has not changed.
Approval beyond code review
  • UXWS/stakeholder approval has been confirmed.
  • UXWS/stakeholder review will be completed retroactively.
  • UXWS/stakeholder review is not needed.
Additional context needed to review

This is a pretty unwieldy changeset, so please reach out if you have questions!

Code Reviewer

Code
  • I have confirmed that the code works as intended.
  • Any CodeClimate issues have been fixed or confirmed as
    added technical debt.
Documentation
  • The commit message is clear and follows our guidelines
    (not just this pull request message).
  • The documentation has been updated or is unnecessary.
  • New dependencies are appropriate or there were no changes.
Testing
  • There are appropriate tests covering any new functionality.
  • No additional test coverage is required.

Why these changes are being introduced:

The zipper merge we implemented naively queries n/2 results from each
API and interleaves them, where n is the per-page value. This works if
both APIs return many results, but it can cause problems in smaller,
unbalanced result sets.

For example, the query term `doc edgerton` returns 50 Primo results and
4 TIMDEX results. Page 1 only shows 14 results (4 TIMDEX and 10 Primo),
and each subsequent page returns only 10 (all Primo).

Relevant ticket(s):

- [USE-179](https://mitlibraries.atlassian.net/browse/USE-179)

How this addresses that need:

This implements more sophisticated logic that first checks the number
of hits returned by each API and passes that, along with the pagination
information, to a Merged Search Paginator class. This service object
develops a 'merge plan', calculates API offsets, and merges the results
for each page.

Queries on the 'all' tab now fetch twice from each API: once to
determine the total number of hits for the Merged Search Paginator
then again to fetch results at the appropriate offset. While hardly
ideal, this was the only option I could figure to avoid losing results.
I limited these extra calls to queries beyond page 1, which is the
only case where they are needed.

Side effects of this change:

* We now clear cache before each search controller test. This was done
to avoid odd test behavior, but I ran the suite 50 times without any
issues, so it might be excessively cautious.
* The search controller continues to grow with this new logic. I tried
to split things into multiple helper methods, so if we want to move
more things to service objects later, it might be easier to do so.
* A failing cassette has been replaced with a mock.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants