feat(tantivy): improve batching, concurrency, and index versioning #2807
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
feat(tantivy): improve batching, concurrency, and index versioning
Summary
This PR improves the tantivy search plugin with three main changes:
Concurrency improvements: Replaced
MutexwithRwLockso search operations can run concurrently without blocking writes. Searches now use read locks while write operations use write locks.Batching/commit strategy: Implemented the previously unused
auto_commitconfig with configurablecommit_interval_ms. When enabled, writes are batched and only committed when the interval has elapsed, reducing commit overhead for frequent transcript updates.Index versioning: Added
SCHEMA_VERSIONconstant and version tracking. On startup, if the stored version doesn't match the current schema version, the index is automatically re-created.Auto-flush on exit: Pending writes are automatically flushed when the app exits via
RunEvent::ExitRequested.Updates since last revision
useKeywords.tsandchat-shortcuts/index.tsto pass CI lint checksReview & Testing Checklist for Human
try_write()in the exit handler (lib.rs:189) - if it fails to acquire the lock, pending writes will be silently lost. Consider if this is acceptable or if blocking should be used.commit_interval_ms: 1000, writes won't be immediately searchable. Verify this is acceptable for the transcript update use case.flush()method in ext.rs (lines 585-612) appears to be dead code since the flush command was removed. Consider removing it.SCHEMA_VERSIONand verifying the index is recreated on app restart.Recommended test plan:
SCHEMA_VERSIONto 2, restart, and verify re-indexing occursNotes
{index_path}/schema_versionLink to Devin run: https://app.devin.ai/sessions/0a50f5a4f3814dd29fabf7224cee142b
Requested by: yujonglee (@yujonglee)