-
Notifications
You must be signed in to change notification settings - Fork 108
Switch to Whoosh implementation to improve speed #762
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: staging
Are you sure you want to change the base?
Conversation
|
@eepMoody what's the status on this PR? If I understand correctly, a fair amount of the search is already in place in staging... Where does this fit in? |
|
Ah sorry, I hit a bit of a wall on getting the actual switch to Elasticsearch (which is a major performance boost) sorted out. What's in production adds the new search methods, but the performance isn't really improved. Might be a little bit worse, but hard to tell until it hits real resources. The main bit that's outstanding is the deployment setup, since ES needs to run in its own container/server. I haven't quite gotten that into a state where it's functional in production. If you want to take a peek, maybe there's something obvious that I've overlooked or another approach that might make more sense? |
Use Whoosh for text/fuzzy search and spaCy word embeddings for semantic similarity. Removes the need for a separate ES service.
f3358bd to
598c9f8
Compare
Replaces the existing hand-tuned search with the Whoosh framework.
This has major performance benefits, but incurs a much higher indexing time. To deal with this, I've adopted a strategy that caches indexes in the repo, unpacks them for quick development purposes, then runs a background indexing process inside the server after deployment. In practice, this means search results may start slightly out-of-date (if a major content release has not been cachced) but will be consistent within approximately 5 minutes of deploy.
Major changes:
quickindexwhich will index and create zipped files for checking inNOTE being kept in draft until #844 is merged, as this is based on that branch