fix title_author search to find 'title author' style searches#3773
fix title_author search to find 'title author' style searches#3773ilkka-ollakka wants to merge 1 commit intobookwyrm-social:mainfrom
Conversation
mouse-reeve
left a comment
There was a problem hiding this comment.
I think the impact on title-only searches is too high as-as, would it be possible to tweak how the query is being constructed to avoid this?
If you're interested in some context on the logic behind sometimes retaining common words: #1196
bookwyrm/book_search.py
Outdated
| else: | ||
| search_query |= SearchQuery(search_term, config="simple") | SearchQuery( | ||
| search_term, config="english" | ||
| ) |
There was a problem hiding this comment.
I think the combination of the or here with splitting the string by word is causing over-matching. Words like "a" and "the" are being explicitly included in the query because they are used in the simple config, instead of being removed by the english config.
For a complete title, the english config changes a tree grows in brooklyn to _ tree grows in brooklyn (removing the a but keeping the length and order of the query string), but with this change, a is evaluated separately and since it produces a blank result, the or keeps it in place, producing ['a', 'tree', 'grows', 'in', 'brooklyn'] and then searching on a.
There was a problem hiding this comment.
ah good point. I'll check on that example and check if it can be easily fixed with anding configs and orring between config.
But anyway I'll check the examples you provided and figure out how to address that issue.
There was a problem hiding this comment.
I reworked the PR and seems that only splitting the simple-part resolves the author finding and doesn't mess up the finding of 'a tree grows in brooklyn' style of books.
e1c1853 to
17102df
Compare
* Give book authors and titles same weight in searches * Manually split search terms to separate SearchQuery for simple config Manually splitting simple terms helps to find 'title author' style of searches.
17102df to
60ab403
Compare


Description
Tune title-author search to allow finding books with 'title author' style searches
PR splits simple terms separately, keeping the english configuration as is.
What type of Pull Request is this?
Does this PR change settings or dependencies, or break something?
Details of breaking or configuration changes (if any of above checked)
Documentation
Tests