Skip to content

Conversation

@jstone-dev
Copy link
Collaborator

@jstone-dev jstone-dev commented Sep 30, 2025

Changes

  • Add a limit option to score set search queries. Require a limit of at most 100 for searches of all score sets, while searches for the user's own score sets can have no limit.
  • Extract the score set search query filter clause logic into a new function.
  • Use limit + 1 in the search query, and if limit is exceeded, run a second query to count available rows. In either case, return the number of available rows with the limited search result.
  • Instead of searching all score sets and then replacing un-superseded ones with their successors, revise the database query to search only un-superseded score sets.
  • In the main search endpoint (but not in the "my score sets" endpoint), mandate that the search be only for published score sets.
  • Add an endpoint to obtain search filter options based on a given score set search.
  • Add an endpoint to obtain search filter options based on a given score set search.

Notes

  • Searches for large result sets return much more quickly than before.
  • Some searches for small results sets appear to be a bit slower than before, almost certainly because of the new clauses that limit search results to un-superseded score sets. This can probably be improved by changing the implementation of supersession, so that the superseded score set has a superseding_score_set_id property.
  • Unpublished superseding score sets now prevent their precursors from appearing in search results. If we adopt the current set of changes, a new issue should be opened to address this soon.

@jstone-dev jstone-dev changed the base branch from release-2025.4.1 to release-2025.4.2 October 1, 2025 15:32
@jstone-dev jstone-dev marked this pull request as ready for review October 1, 2025 15:32
Copy link
Collaborator

@bencap bencap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing this Jeremy, it'll be quite a nice improvement to the UX of landing on the search page.

The only other thing I was wondering is if we might add a page parameter to the search as well. Given the implementation is just slicing the score set list, it seems to me like it would be really easy to add (score_sets[limit * page : limit * page + 1]) and would complete the feature. I'm not sure how many people would practically click on a next button on the UI, but it would be annoying to me if I had a search with say 150 results and I was categorically denied from viewing the final 50. It would need a few tests though too.

I know this is meant to be more of a stopgap feature though, so if you don't think it's worth the additional effort to add the tests for it we can leave it off.

Comment on lines 290 to 291
save_to_logging_context({"matching_resources": len(score_sets)})
logger.debug(msg=f"Score set search yielded {len(score_sets)} matching resources.", extra=logging_context())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These would become num_score_sets since we've already sliced the list. Since the limit is part of the search criteria, I don't think we need to log the number we were limited to.

return {"score_sets": score_sets, "num_score_sets": num_score_sets}


def fetch_score_set_search_filter_options(db: Session, owner_or_contributor: Optional[User], search: ScoreSetsSearch):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm struggling a little with the duplication in this function. I'm not sure there's an easy way to abstract the counters in a way that's readable, but we could add a helper that reduces the duplication in counter to value dictionary generation. That would at least get rid of a bunch of the duplicated list comprehensions, which are a little verbose.

def _counterHelper(counter: Counter):
    return [{"value": value, "count": count} for value, count in counter.items()]

...

return {
   "target_gene_categories": _counterHelper(target_category_counter),
    ...
    "publication_journals": _counterHelper(publication_journals),
}

Comment on lines 155 to 158
# Require a limit of at most 100 when the search query does not include publication identifiers. We allow unlimited
# searches with publication identifiers, presuming that such a search will not have excessive results.
if search.publication_identifiers is None and (search.limit is None or search.limit > 100):
search.limit = 100
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This distinction probably doesn't matter, but might we return a 422 when the limit is set above 100? I'm a little wary of altering the clients request from something they explicitly requested. It seems fine to enforce the limit when it isn't explicitly set though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could also use a test.

Comment on lines +148 to +155
if search.published is False:
raise HTTPException(
status_code=status.HTTP_422_UNPROCESSABLE_ENTITY,
detail="Cannot search for private score sets except in the context of the current user's data.",
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice for this block to have an associated test.

Comment on lines 160 to 166
# Also limit the search to at most 40 publication identifiers, to prevent artificially constructed searches that
# return very large result sets.
if search.publication_identifiers is not None and len(search.publication_identifiers) > 40:
raise HTTPException(
status_code=status.HTTP_422_UNPROCESSABLE_ENTITY,
detail="Cannot search for score sets belonging to more than 40 publication identifiers at once.",
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice for this block to have an associated test.

- Add a limit option to score set search queries. Require a limit of at most 100 for searches of all score sets, while searches for the user's own score sets can have no limit.
- Extract the score set search query filter clause logic into a new function.
- Use limit + 1 in the search query, and if limit is exceeded, run a second query to count available rows. In either case, return the number of available rows with the limited search result.
- Instead of searching all score sets and then replacing un-superseded ones with their successors, revise the database query to search only un-superseded score sets.
- In the main search endpoint (but not in the "my score sets" endpoint), mandate that the search be only for published score sets.
…pecified, but limit the number of publication IDs.
@jstone-dev jstone-dev force-pushed the jstone-dev/score-set-search-result-optimization branch from 7f0da04 to 3be7945 Compare October 24, 2025 16:21
Copy link
Collaborator

@bencap bencap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks for the tests!

query = db.query(ScoreSet) # \
# .filter(ScoreSet.private.is_(False))
# Limit to unsuperseded score sets.
# TODO#??? Prevent unpublished superseding score sets from hiding their published precursors in search results.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we end up opening an issue for this we can add the number for?

@jstone-dev jstone-dev changed the base branch from release-2025.4.2 to release-2025.5.0 October 29, 2025 21:20
@jstone-dev jstone-dev merged commit c6a2014 into release-2025.5.0 Nov 5, 2025
6 checks passed
@bencap bencap mentioned this pull request Nov 13, 2025
@bencap bencap deleted the jstone-dev/score-set-search-result-optimization branch November 14, 2025 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Long score set search result optimization

3 participants