Score set search result optimization #525

jstone-dev · 2025-09-30T16:32:38Z

Changes

Add a limit option to score set search queries. Require a limit of at most 100 for searches of all score sets, while searches for the user's own score sets can have no limit.
Extract the score set search query filter clause logic into a new function.
Use limit + 1 in the search query, and if limit is exceeded, run a second query to count available rows. In either case, return the number of available rows with the limited search result.
Instead of searching all score sets and then replacing un-superseded ones with their successors, revise the database query to search only un-superseded score sets.
In the main search endpoint (but not in the "my score sets" endpoint), mandate that the search be only for published score sets.
Add an endpoint to obtain search filter options based on a given score set search.
Add an endpoint to obtain search filter options based on a given score set search.

Notes

Searches for large result sets return much more quickly than before.
Some searches for small results sets appear to be a bit slower than before, almost certainly because of the new clauses that limit search results to un-superseded score sets. This can probably be improved by changing the implementation of supersession, so that the superseded score set has a superseding_score_set_id property.
Unpublished superseding score sets now prevent their precursors from appearing in search results. If we adopt the current set of changes, a new issue should be opened to address this soon.

bencap

Thanks for writing this Jeremy, it'll be quite a nice improvement to the UX of landing on the search page.

The only other thing I was wondering is if we might add a page parameter to the search as well. Given the implementation is just slicing the score set list, it seems to me like it would be really easy to add (score_sets[limit * page : limit * page + 1]) and would complete the feature. I'm not sure how many people would practically click on a next button on the UI, but it would be annoying to me if I had a search with say 150 results and I was categorically denied from viewing the final 50. It would need a few tests though too.

I know this is meant to be more of a stopgap feature though, so if you don't think it's worth the additional effort to add the tests for it we can leave it off.

bencap · 2025-10-01T22:08:42Z

src/mavedb/lib/score_sets.py

    save_to_logging_context({"matching_resources": len(score_sets)})
    logger.debug(msg=f"Score set search yielded {len(score_sets)} matching resources.", extra=logging_context())


These would become num_score_sets since we've already sliced the list. Since the limit is part of the search criteria, I don't think we need to log the number we were limited to.

bencap · 2025-10-01T22:34:27Z

src/mavedb/lib/score_sets.py

+    return {"score_sets": score_sets, "num_score_sets": num_score_sets}
+
+
+def fetch_score_set_search_filter_options(db: Session, owner_or_contributor: Optional[User], search: ScoreSetsSearch):


I'm struggling a little with the duplication in this function. I'm not sure there's an easy way to abstract the counters in a way that's readable, but we could add a helper that reduces the duplication in counter to value dictionary generation. That would at least get rid of a bunch of the duplicated list comprehensions, which are a little verbose.

def _counterHelper(counter: Counter): return [{"value": value, "count": count} for value, count in counter.items()] ... return { "target_gene_categories": _counterHelper(target_category_counter), ... "publication_journals": _counterHelper(publication_journals), }

bencap · 2025-10-01T22:46:42Z

src/mavedb/routers/score_sets.py

+    # Require a limit of at most 100 when the search query does not include publication identifiers. We allow unlimited
+    # searches with publication identifiers, presuming that such a search will not have excessive results.
+    if search.publication_identifiers is None and (search.limit is None or search.limit > 100):
+        search.limit = 100


This distinction probably doesn't matter, but might we return a 422 when the limit is set above 100? I'm a little wary of altering the clients request from something they explicitly requested. It seems fine to enforce the limit when it isn't explicitly set though.

This could also use a test.

bencap · 2025-10-01T22:54:07Z

src/mavedb/routers/score_sets.py

+    if search.published is False:
+        raise HTTPException(
+            status_code=status.HTTP_422_UNPROCESSABLE_ENTITY,
+            detail="Cannot search for private score sets except in the context of the current user's data.",
+        )


It'd be nice for this block to have an associated test.

bencap · 2025-10-01T22:54:32Z

src/mavedb/routers/score_sets.py

+    # Also limit the search to at most 40 publication identifiers, to prevent artificially constructed searches that
+    # return very large result sets.
+    if search.publication_identifiers is not None and len(search.publication_identifiers) > 40:
+        raise HTTPException(
+            status_code=status.HTTP_422_UNPROCESSABLE_ENTITY,
+            detail="Cannot search for score sets belonging to more than 40 publication identifiers at once.",
+        )


It'd be nice for this block to have an associated test.

- Add a limit option to score set search queries. Require a limit of at most 100 for searches of all score sets, while searches for the user's own score sets can have no limit. - Extract the score set search query filter clause logic into a new function. - Use limit + 1 in the search query, and if limit is exceeded, run a second query to count available rows. In either case, return the number of available rows with the limited search result. - Instead of searching all score sets and then replacing un-superseded ones with their successors, revise the database query to search only un-superseded score sets. - In the main search endpoint (but not in the "my score sets" endpoint), mandate that the search be only for published score sets.

…e set search.

…pecified, but limit the number of publication IDs.

…turned.

…h results.

bencap

Looks great, thanks for the tests!

bencap · 2025-10-25T01:17:20Z

src/mavedb/lib/score_sets.py

-    query = db.query(ScoreSet)  # \
-    # .filter(ScoreSet.private.is_(False))
+    # Limit to unsuperseded score sets.
+    # TODO#??? Prevent unpublished superseding score sets from hiding their published precursors in search results.


Did we end up opening an issue for this we can add the number for?

…lt-optimization

…nd_count is false.

jstone-dev changed the base branch from release-2025.4.1 to release-2025.4.2 October 1, 2025 15:32

jstone-dev marked this pull request as ready for review October 1, 2025 15:32

bencap reviewed Oct 1, 2025

View reviewed changes

bencap mentioned this pull request Oct 1, 2025

Score set search result optimization VariantEffect/mavedb-ui#493

Merged

bencap linked an issue Oct 4, 2025 that may be closed by this pull request

Long score set search result optimization #524

Closed

jstone-dev added 17 commits October 24, 2025 09:21

Add an endpoint to obtain search filter options based on a given scor…

0c1544f

…e set search.

Allow score set search without a row limit when publication IDs are s…

041bfb9

…pecified, but limit the number of publication IDs.

MyPy: typing for counters

1e5e6ed

Update unit tests to reflect score set search endpoint change.

a3a1372

Code formatting

b6b3720

Unit test fixes

6331d55

Format & test fixes

fd6e701

Test bug fixes

9bcb0cf

Unit test fixes

ea62984

Test bug fix

765f02f

Refactor counter usage for score set search filters.

0ef53bf

Log the total number of matching score sets rather than the number re…

6a67c84

…turned.

Add an offset parameter to support full pagination of score set searc…

52cb863

…h results.

Move score set search limits into constants.

b9ff02b

Supply a default search limit.

db53a0e

Unit tests for new score set search errors.

3be7945

jstone-dev force-pushed the jstone-dev/score-set-search-result-optimization branch from 7f0da04 to 3be7945 Compare October 24, 2025 16:21

jstone-dev added 3 commits October 24, 2025 09:36

Don't import from router in test_score_set.py.

0508a4f

Linting fix

be3d522

Return correct result count in paginated results with offset.

ebde590

bencap approved these changes Oct 25, 2025

View reviewed changes

jstone-dev changed the base branch from release-2025.4.2 to release-2025.5.0 October 29, 2025 21:20

jstone-dev and others added 2 commits October 29, 2025 14:24

Merge branch 'release-2025.5.0' into jstone-dev/score-set-search-resu…

38d974e

…lt-optimization

Fix after merge

e0abfe0

Return unenriched score sets when include_experiment_score_set_urns_a…

7f0688b

…nd_count is false.

jstone-dev merged commit c6a2014 into release-2025.5.0 Nov 5, 2025
6 checks passed

bencap mentioned this pull request Nov 13, 2025

Release 2025.5.0 #575

Merged

bencap deleted the jstone-dev/score-set-search-result-optimization branch November 14, 2025 20:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Score set search result optimization #525

Score set search result optimization #525

Uh oh!

jstone-dev commented Sep 30, 2025 •

edited

Loading

Uh oh!

bencap left a comment •

edited

Loading

Uh oh!

bencap Oct 1, 2025

Uh oh!

bencap Oct 1, 2025

Uh oh!

bencap Oct 1, 2025

Uh oh!

bencap Oct 1, 2025

Uh oh!

bencap Oct 1, 2025

Uh oh!

bencap Oct 1, 2025

Uh oh!

bencap left a comment

Uh oh!

bencap Oct 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		save_to_logging_context({"matching_resources": len(score_sets)})
		logger.debug(msg=f"Score set search yielded {len(score_sets)} matching resources.", extra=logging_context())

		return {"score_sets": score_sets, "num_score_sets": num_score_sets}


		def fetch_score_set_search_filter_options(db: Session, owner_or_contributor: Optional[User], search: ScoreSetsSearch):

Score set search result optimization #525

Score set search result optimization #525

Uh oh!

Conversation

jstone-dev commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bencap left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bencap Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

bencap Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

bencap Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

bencap Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

bencap Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

bencap Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

bencap left a comment

Choose a reason for hiding this comment

Uh oh!

bencap Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jstone-dev commented Sep 30, 2025 •

edited

Loading

bencap left a comment •

edited

Loading