Skip to content

Commit a516eec

Browse files
committed
Add pagination support to 'all' tab
Why these changes are being introduced: We intentionally decided to implement pagination for the 'all' tab separately to avoid excessively large tickets. Relevant ticket(s): - [USE-178](https://mitlibraries.atlassian.net/browse/USE-178) - [USE-179](https://mitlibraries.atlassian.net/browse/USE-179) How this addresses that need: This updates the Search Controller and Analyzer Model to handle pagination from multiple sources ('all' tab) in addition to single sources. Side effects of this change: I found it very difficult to follow where and how the 'per-page' value is set. I tried to improve the situation by moving this value to env. I'm not sure it fully makes sense it, but I doubt it will without a full refactor of the data flow architecture.
1 parent 77970ca commit a516eec

File tree

6 files changed

+367
-126
lines changed

6 files changed

+367
-126
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -116,6 +116,7 @@ may have unexpected consequences if applied to other TIMDEX UI apps.
116116
- `REQUEST_PERIOD` - time in minutes used along with `REQUESTS_PER_PERIOD`
117117
- `REDIRECT_REQUESTS_PER_PERIOD`- number of requests that can be made that the query string starts with our legacy redirect parameter to throttle per `REQUEST_PERIOD`
118118
- `REDIRECT_REQUEST_PERIOD`- time in minutes used along with `REDIRECT_REQUEST_PERIOD`
119+
- `RESULTS_PER_PAGE`: The number of results to display per page. Use an odd number to avoid peculiarities. Defaults to 20 if unset.
119120
- `SENTRY_DSN`: Client key for Sentry exception logging.
120121
- `SENTRY_ENV`: Sentry environment for the application. Defaults to 'unknown' if unset.
121122
- `TACOS_SOURCE`: If set, this value is sent to TACOS (as the `sourceSystem` value) to distinguish which application

app/controllers/search_controller.rb

Lines changed: 61 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ def results
2323
render 'results_geo'
2424
return
2525
end
26-
26+
2727
# Route to appropriate search based on active tab
2828
case @active_tab
2929
when 'primo'
@@ -65,7 +65,8 @@ def load_geodata_results
6565
@errors = extract_errors(response)
6666
return unless @errors.nil?
6767

68-
@pagination = Analyzer.new(@enhanced_query, response, :timdex).pagination
68+
hits = response.dig(:data, 'search', 'hits') || 0
69+
@pagination = Analyzer.new(@enhanced_query, { hits: hits }, :timdex).pagination
6970
raw_results = extract_results(response)
7071
@results = NormalizeTimdexResults.new(raw_results, @enhanced_query[:q]).normalize
7172
@filters = extract_filters(response)
@@ -87,53 +88,66 @@ def load_timdex_results
8788
end
8889

8990
def load_all_results
90-
# Parallel fetching from both APIs
91-
primo_thread = Thread.new { fetch_primo_data }
92-
timdex_thread = Thread.new { fetch_timdex_data }
93-
94-
# Wait for both threads to complete
95-
primo_data = primo_thread.value
96-
timdex_data = timdex_thread.value
91+
# Fetch results from both APIs in parallel
92+
primo_data, timdex_data = fetch_all_data
9793

98-
# Collect any errors from either API
99-
all_errors = []
100-
all_errors.concat(primo_data[:errors]) if primo_data[:errors]
101-
all_errors.concat(timdex_data[:errors]) if timdex_data[:errors]
102-
@errors = all_errors.any? ? all_errors : nil
94+
# Combine errors from both APIs
95+
@errors = combine_errors(primo_data[:errors], timdex_data[:errors])
10396

10497
# Zipper merge results from both APIs
105-
primo_results = primo_data[:results] || []
106-
timdex_results = timdex_data[:results] || []
107-
@results = primo_results.zip(timdex_results).flatten.compact
98+
@results = merge_results(primo_data[:results], timdex_data[:results])
99+
100+
# Use Analyzer for combined pagination calculation
101+
@pagination = Analyzer.new(@enhanced_query, { hits: timdex_data[:hits] }, :all, { hits: primo_data[:hits] }).pagination
108102

109-
# For now, just use primo pagination as a placeholder
110-
@pagination = primo_data[:pagination] || {}
111-
112103
# Handle primo continuation for high page numbers
113104
@show_primo_continuation = primo_data[:show_continuation] || false
114105
end
115106

107+
def fetch_all_data
108+
# Parallel fetching from both APIs
109+
primo_thread = Thread.new { fetch_primo_data }
110+
timdex_thread = Thread.new { fetch_timdex_data }
111+
112+
[primo_thread.value, timdex_thread.value]
113+
end
114+
115+
def combine_errors(*error_arrays)
116+
all_errors = error_arrays.compact.flatten
117+
all_errors.any? ? all_errors : nil
118+
end
119+
120+
def merge_results(primo_results, timdex_results)
121+
(primo_results || []).zip(timdex_results || []).flatten.compact
122+
end
123+
116124
def fetch_primo_data
117125
current_page = @enhanced_query[:page] || 1
118-
per_page = @enhanced_query[:per_page] || 20
126+
per_page = if @active_tab == 'all'
127+
ENV.fetch('RESULTS_PER_PAGE',
128+
'20').to_i / 2
129+
else
130+
ENV.fetch('RESULTS_PER_PAGE', '20').to_i
131+
end
119132
offset = (current_page - 1) * per_page
120133

121134
# Check if we're beyond Primo API limits before making the request.
122135
if offset >= Analyzer::PRIMO_MAX_OFFSET
123-
return { results: [], pagination: {}, errors: nil, show_continuation: true }
136+
return { results: [], pagination: {}, errors: nil, show_continuation: true, hits: 0 }
124137
end
125138

126-
primo_response = query_primo
139+
primo_response = query_primo(per_page, offset)
140+
hits = primo_response.dig('info', 'total') || 0
127141
results = NormalizePrimoResults.new(primo_response, @enhanced_query[:q]).normalize
128-
pagination = Analyzer.new(@enhanced_query, primo_response, :primo).pagination
142+
pagination = Analyzer.new(@enhanced_query, { hits: hits }, :primo).pagination
129143

130144
# Handle empty results from Primo API. Sometimes Primo will return no results at a given offset,
131145
# despite claiming in the initial query that more are available. This happens randomly and
132146
# seemingly for no reason (well below the recommended offset of 2,000). While the bug also
133147
# exists in Primo UI, sending users there seems like the best we can do.
134148
show_continuation = false
135149
errors = nil
136-
150+
137151
if results.empty?
138152
docs = primo_response['docs'] if primo_response.is_a?(Hash)
139153
if docs.nil? || docs.empty?
@@ -144,23 +158,37 @@ def fetch_primo_data
144158
end
145159
end
146160

147-
{ results: results, pagination: pagination, errors: errors, show_continuation: show_continuation }
161+
{ results: results, pagination: pagination, errors: errors, show_continuation: show_continuation,
162+
hits: hits }
148163
rescue StandardError => e
149-
{ results: [], pagination: {}, errors: handle_primo_errors(e), show_continuation: false }
164+
{ results: [], pagination: {}, errors: handle_primo_errors(e), show_continuation: false, hits: 0 }
150165
end
151166

152167
def fetch_timdex_data
153-
query = QueryBuilder.new(@enhanced_query).query
168+
# For all tab, modify query to use half page size
169+
if @active_tab == 'all'
170+
per_page = ENV.fetch('RESULTS_PER_PAGE', '20').to_i / 2
171+
page = @enhanced_query[:page] || 1
172+
from_offset = ((page - 1) * per_page).to_s
173+
174+
query_builder = QueryBuilder.new(@enhanced_query)
175+
query = query_builder.query
176+
query['from'] = from_offset
177+
else
178+
query = QueryBuilder.new(@enhanced_query).query
179+
end
180+
154181
response = query_timdex(query)
155182
errors = extract_errors(response)
156-
183+
157184
if errors.nil?
158-
pagination = Analyzer.new(@enhanced_query, response, :timdex).pagination
185+
hits = response.dig(:data, 'search', 'hits') || 0
186+
pagination = Analyzer.new(@enhanced_query, { hits: hits }, :timdex).pagination
159187
raw_results = extract_results(response)
160188
results = NormalizeTimdexResults.new(raw_results, @enhanced_query[:q]).normalize
161-
{ results: results, pagination: pagination, errors: nil }
189+
{ results: results, pagination: pagination, errors: nil, hits: hits }
162190
else
163-
{ results: [], pagination: {}, errors: errors }
191+
{ results: [], pagination: {}, errors: errors, hits: 0 }
164192
end
165193
end
166194

@@ -189,16 +217,12 @@ def query_timdex(query)
189217
end
190218
end
191219

192-
def query_primo
220+
def query_primo(per_page, offset)
193221
# We generate unique cache keys to avoid naming collisions.
194222
cache_key = generate_cache_key(@enhanced_query)
195223

196224
Rails.cache.fetch("#{cache_key}/primo", expires_in: 12.hours) do
197225
primo_search = PrimoSearch.new
198-
per_page = @enhanced_query[:per_page] || 20
199-
current_page = @enhanced_query[:page] || 1
200-
offset = (current_page - 1) * per_page
201-
202226
primo_search.search(@enhanced_query[:q], per_page, offset)
203227
end
204228
end

app/models/analyzer.rb

Lines changed: 24 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1,54 +1,49 @@
11
class Analyzer
22
attr_accessor :pagination
33

4-
RESULTS_PER_PAGE = 20
5-
64
# Primo API theoretical maximum recommended offset is 2000 records (per Ex Libris documentation)
75
# but in practice, the API often can't deliver results beyond ~960 records for large result sets,
86
# likely due to performance constraints.
97
PRIMO_MAX_OFFSET = 960
108

11-
def initialize(enhanced_query, response, source)
9+
def initialize(enhanced_query, response, source, secondary_response = nil)
1210
@source = source
11+
@enhanced_query = enhanced_query
1312
@pagination = {}
14-
@pagination[:hits] = hits(response)
15-
@pagination[:start] = ((enhanced_query[:page] - 1) * RESULTS_PER_PAGE) + 1
16-
@pagination[:end] = [enhanced_query[:page] * RESULTS_PER_PAGE, @pagination[:hits]].min
17-
@pagination[:prev] = enhanced_query[:page] - 1 if enhanced_query[:page] > 1
18-
@pagination[:next] = next_page(enhanced_query[:page], @pagination[:hits]) if next_page(
19-
enhanced_query[:page], @pagination[:hits]
20-
)
21-
@pagination[:per_page] = RESULTS_PER_PAGE
13+
calculate_pagination(response, secondary_response)
2214
end
2315

2416
private
2517

26-
def hits(response)
27-
return 0 if response.nil?
28-
29-
if @source == :primo
30-
primo_hits(response)
31-
elsif @source == :timdex
32-
timdex_hits(response)
18+
def calculate_pagination(response, secondary_response = nil)
19+
if @source == :all
20+
@pagination[:hits] = (secondary_response&.dig(:hits) || 0) + (response&.dig(:hits) || 0)
21+
@pagination[:per_page] = ENV.fetch('RESULTS_PER_PAGE', '20').to_i
22+
calculate_all_pagination
3323
else
34-
0
24+
@pagination[:hits] = response&.dig(:hits) || 0
25+
@pagination[:per_page] = ENV.fetch('RESULTS_PER_PAGE', '20').to_i
26+
calculate_standard_pagination
3527
end
3628
end
3729

38-
def primo_hits(response)
39-
return 0 unless response.is_a?(Hash)
40-
41-
response.dig('info', 'total') || 0
30+
def calculate_standard_pagination
31+
page = @enhanced_query[:page] || 1
32+
@pagination[:start] = ((page - 1) * @pagination[:per_page]) + 1
33+
@pagination[:end] = [page * @pagination[:per_page], @pagination[:hits]].min
34+
@pagination[:prev] = page - 1 if page > 1
35+
@pagination[:next] = next_page(page, @pagination[:hits]) if next_page(page, @pagination[:hits])
4236
end
4337

44-
def timdex_hits(response)
45-
return 0 unless response.is_a?(Hash) && response.key?(:data) && response[:data].key?('search')
46-
return 0 unless response[:data]['search'].is_a?(Hash) && response[:data]['search'].key?('hits')
47-
48-
response[:data]['search']['hits']
38+
def calculate_all_pagination
39+
page = @enhanced_query[:page] || 1
40+
@pagination[:start] = ((page - 1) * @pagination[:per_page]) + 1
41+
@pagination[:end] = [page * @pagination[:per_page], @pagination[:hits]].min
42+
@pagination[:prev] = page - 1 if page > 1
43+
@pagination[:next] = page + 1 if page * @pagination[:per_page] < @pagination[:hits]
4944
end
5045

5146
def next_page(page, hits)
52-
page + 1 if page * RESULTS_PER_PAGE < hits
47+
page + 1 if page * @pagination[:per_page] < hits
5348
end
5449
end

app/models/query_builder.rb

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
class QueryBuilder
22
attr_reader :query
33

4-
RESULTS_PER_PAGE = 20
54
QUERY_PARAMS = %w[q citation contributors fundingInformation identifiers locations subjects title booleanType].freeze
65
FILTER_PARAMS = %i[accessToFilesFilter contentTypeFilter contributorsFilter formatFilter languagesFilter
76
literaryFormFilter placesFilter sourceFilter subjectsFilter].freeze
@@ -10,7 +9,8 @@ class QueryBuilder
109

1110
def initialize(enhanced_query)
1211
@query = {}
13-
@query['from'] = calculate_from(enhanced_query[:page])
12+
@per_page = ENV.fetch('RESULTS_PER_PAGE', '20').to_i
13+
@query['from'] = calculate_from(enhanced_query[:page], @per_page)
1414

1515
if Feature.enabled?(:geodata)
1616
@query['geobox'] = 'true' if enhanced_query[:geobox] == 'true'
@@ -27,10 +27,10 @@ def initialize(enhanced_query)
2727

2828
private
2929

30-
def calculate_from(page = 1)
30+
def calculate_from(page = 1, per_page = ENV.fetch('RESULTS_PER_PAGE', '20').to_i)
3131
# This needs to return a string because Timdex needs $from to be a String
3232
page = 1 if page.to_i.zero?
33-
((page - 1) * RESULTS_PER_PAGE).to_s
33+
((page - 1) * per_page).to_s
3434
end
3535

3636
def extract_query(enhanced_query)

0 commit comments

Comments
 (0)