-
Notifications
You must be signed in to change notification settings - Fork 2
Score set search result optimization #525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Score set search result optimization #525
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for writing this Jeremy, it'll be quite a nice improvement to the UX of landing on the search page.
The only other thing I was wondering is if we might add a page parameter to the search as well. Given the implementation is just slicing the score set list, it seems to me like it would be really easy to add (score_sets[limit * page : limit * page + 1]) and would complete the feature. I'm not sure how many people would practically click on a next button on the UI, but it would be annoying to me if I had a search with say 150 results and I was categorically denied from viewing the final 50. It would need a few tests though too.
I know this is meant to be more of a stopgap feature though, so if you don't think it's worth the additional effort to add the tests for it we can leave it off.
src/mavedb/lib/score_sets.py
Outdated
| save_to_logging_context({"matching_resources": len(score_sets)}) | ||
| logger.debug(msg=f"Score set search yielded {len(score_sets)} matching resources.", extra=logging_context()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These would become num_score_sets since we've already sliced the list. Since the limit is part of the search criteria, I don't think we need to log the number we were limited to.
| return {"score_sets": score_sets, "num_score_sets": num_score_sets} | ||
|
|
||
|
|
||
| def fetch_score_set_search_filter_options(db: Session, owner_or_contributor: Optional[User], search: ScoreSetsSearch): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm struggling a little with the duplication in this function. I'm not sure there's an easy way to abstract the counters in a way that's readable, but we could add a helper that reduces the duplication in counter to value dictionary generation. That would at least get rid of a bunch of the duplicated list comprehensions, which are a little verbose.
def _counterHelper(counter: Counter):
return [{"value": value, "count": count} for value, count in counter.items()]
...
return {
"target_gene_categories": _counterHelper(target_category_counter),
...
"publication_journals": _counterHelper(publication_journals),
}
src/mavedb/routers/score_sets.py
Outdated
| # Require a limit of at most 100 when the search query does not include publication identifiers. We allow unlimited | ||
| # searches with publication identifiers, presuming that such a search will not have excessive results. | ||
| if search.publication_identifiers is None and (search.limit is None or search.limit > 100): | ||
| search.limit = 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This distinction probably doesn't matter, but might we return a 422 when the limit is set above 100? I'm a little wary of altering the clients request from something they explicitly requested. It seems fine to enforce the limit when it isn't explicitly set though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could also use a test.
| if search.published is False: | ||
| raise HTTPException( | ||
| status_code=status.HTTP_422_UNPROCESSABLE_ENTITY, | ||
| detail="Cannot search for private score sets except in the context of the current user's data.", | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be nice for this block to have an associated test.
src/mavedb/routers/score_sets.py
Outdated
| # Also limit the search to at most 40 publication identifiers, to prevent artificially constructed searches that | ||
| # return very large result sets. | ||
| if search.publication_identifiers is not None and len(search.publication_identifiers) > 40: | ||
| raise HTTPException( | ||
| status_code=status.HTTP_422_UNPROCESSABLE_ENTITY, | ||
| detail="Cannot search for score sets belonging to more than 40 publication identifiers at once.", | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be nice for this block to have an associated test.
- Add a limit option to score set search queries. Require a limit of at most 100 for searches of all score sets, while searches for the user's own score sets can have no limit. - Extract the score set search query filter clause logic into a new function. - Use limit + 1 in the search query, and if limit is exceeded, run a second query to count available rows. In either case, return the number of available rows with the limited search result. - Instead of searching all score sets and then replacing un-superseded ones with their successors, revise the database query to search only un-superseded score sets. - In the main search endpoint (but not in the "my score sets" endpoint), mandate that the search be only for published score sets.
…pecified, but limit the number of publication IDs.
7f0da04 to
3be7945
Compare
bencap
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks for the tests!
| query = db.query(ScoreSet) # \ | ||
| # .filter(ScoreSet.private.is_(False)) | ||
| # Limit to unsuperseded score sets. | ||
| # TODO#??? Prevent unpublished superseding score sets from hiding their published precursors in search results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we end up opening an issue for this we can add the number for?
…nd_count is false.
Changes
Notes