Skip to content

fix title_author search to find 'title author' style searches#3773

Draft
ilkka-ollakka wants to merge 1 commit intobookwyrm-social:mainfrom
ilkka-ollakka:tweak/book_author_title_search
Draft

fix title_author search to find 'title author' style searches#3773
ilkka-ollakka wants to merge 1 commit intobookwyrm-social:mainfrom
ilkka-ollakka:tweak/book_author_title_search

Conversation

@ilkka-ollakka
Copy link
Copy Markdown
Contributor

@ilkka-ollakka ilkka-ollakka commented Dec 28, 2025

Description

Tune title-author search to allow finding books with 'title author' style searches

  • Give book authors and titles same weight in searches
  • Manually split search terms to separate SearchQuery

PR splits simple terms separately, keeping the english configuration as is.

What type of Pull Request is this?

  • Bug Fix
  • Enhancement
  • Plumbing / Internals / Dependencies
  • Refactor

Does this PR change settings or dependencies, or break something?

  • This PR changes or adds default settings, configuration, or .env values
  • This PR changes or adds dependencies
  • This PR introduces other breaking changes

Details of breaking or configuration changes (if any of above checked)

Documentation

  • New or amended documentation will be required if this PR is merged
  • I have created a matching pull request in the Documentation repository
  • I intend to create a matching pull request in the Documentation repository after this PR is merged

Tests

  • My changes do not need new tests
  • All tests I have added are passing
  • I have written tests but need help to make them pass
  • I have not written tests and need help to write them

@mouse-reeve
Copy link
Copy Markdown
Member

I noticed that this leads to wildly over-matching book titles that have words like "a" and "the" in the title. Here's an example of searching for "a tree grows in brooklyn" on main:

Screenshot 2026-01-01 at 9 40 42 AM

vs this branch:
Screenshot 2026-01-01 at 9 40 09 AM

On this branch, the desired result is on the second page.

Copy link
Copy Markdown
Member

@mouse-reeve mouse-reeve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the impact on title-only searches is too high as-as, would it be possible to tweak how the query is being constructed to avoid this?

If you're interested in some context on the logic behind sometimes retaining common words: #1196

else:
search_query |= SearchQuery(search_term, config="simple") | SearchQuery(
search_term, config="english"
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the combination of the or here with splitting the string by word is causing over-matching. Words like "a" and "the" are being explicitly included in the query because they are used in the simple config, instead of being removed by the english config.

For a complete title, the english config changes a tree grows in brooklyn to _ tree grows in brooklyn (removing the a but keeping the length and order of the query string), but with this change, a is evaluated separately and since it produces a blank result, the or keeps it in place, producing ['a', 'tree', 'grows', 'in', 'brooklyn'] and then searching on a.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah good point. I'll check on that example and check if it can be easily fixed with anding configs and orring between config.

But anyway I'll check the examples you provided and figure out how to address that issue.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reworked the PR and seems that only splitting the simple-part resolves the author finding and doesn't mess up the finding of 'a tree grows in brooklyn' style of books.

@ilkka-ollakka ilkka-ollakka force-pushed the tweak/book_author_title_search branch 2 times, most recently from e1c1853 to 17102df Compare January 6, 2026 15:54
* Give book authors and titles same weight in searches
* Manually split search terms to separate SearchQuery for simple config

Manually splitting simple terms helps to find 'title author' style of searches.
@ilkka-ollakka ilkka-ollakka force-pushed the tweak/book_author_title_search branch from 17102df to 60ab403 Compare January 16, 2026 21:38
@ilkka-ollakka ilkka-ollakka marked this pull request as draft January 26, 2026 20:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Search for author and title not working

3 participants