Skip to content

Conversation

@eepMoody
Copy link
Collaborator

@eepMoody eepMoody commented Jun 8, 2025

Replaces the existing hand-tuned search with the Whoosh framework.

This has major performance benefits, but incurs a much higher indexing time. To deal with this, I've adopted a strategy that caches indexes in the repo, unpacks them for quick development purposes, then runs a background indexing process inside the server after deployment. In practice, this means search results may start slightly out-of-date (if a major content release has not been cachced) but will be consistent within approximately 5 minutes of deploy.

Major changes:

  • Replace hand-tuned and ES options with Whoosh
  • replace vector search with spaCy embeddings for true semantic matching
  • add management commands, including quickindex which will index and create zipped files for checking in
  • add tests covering all search methods, ensuring they return realistic results
  • add background process in server to index live content in production

NOTE being kept in draft until #844 is merged, as this is based on that branch

@augustjohnson
Copy link
Collaborator

@eepMoody what's the status on this PR? If I understand correctly, a fair amount of the search is already in place in staging... Where does this fit in?

@eepMoody
Copy link
Collaborator Author

eepMoody commented Sep 23, 2025

Ah sorry, I hit a bit of a wall on getting the actual switch to Elasticsearch (which is a major performance boost) sorted out. What's in production adds the new search methods, but the performance isn't really improved. Might be a little bit worse, but hard to tell until it hits real resources.

The main bit that's outstanding is the deployment setup, since ES needs to run in its own container/server. I haven't quite gotten that into a state where it's functional in production.

If you want to take a peek, maybe there's something obvious that I've overlooked or another approach that might make more sense?

@eepMoody eepMoody changed the title Switch to elasticsearch implementation Switch to Whoosh implementation to improve speed Dec 31, 2025
Use Whoosh for text/fuzzy search and spaCy word embeddings for
semantic similarity. Removes the need for a separate ES service.
@eepMoody eepMoody force-pushed the moody/elasticsearch-implementation branch from f3358bd to 598c9f8 Compare December 31, 2025 19:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants