MB-65473: Refactor and Optimize Pre-Filtered Vector Search#2169
MB-65473: Refactor and Optimize Pre-Filtered Vector Search#2169abhinavdangeti merged 8 commits intomasterfrom
Conversation
- Refactor pre-filtered vector search to enhance performance and reduce memory footprint.
|
adding a |
abhinavdangeti
left a comment
There was a problem hiding this comment.
Looks good, let's get some perf results for this and the order of merge should be faiss > go-faiss > index_api/segment_api > zapx > bleve
There was a problem hiding this comment.
Pull Request Overview
This PR refactors the pre-filtered vector search functionality to improve performance and reduce memory footprint by replacing legacy filtering mechanisms with a new eligible document selector. Key changes include updating knn query construction to use an eligibleSelector, refactoring the EligibleCollector to remove unused ID slices and adopt the new filtering model, and removing outdated bitmap filtering logic from the optimize phase.
Reviewed Changes
Copilot reviewed 11 out of 12 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| search_knn.go | Updates to createKNNQuery and runKnnCollector to support eligibleSelector instead of map-based filtering. |
| search/collector/eligible.go | Refactors EligibleCollector to use eligibleSelector and updates the document match handling accordingly. |
| index/scorch/snapshot_vector_index.go | Adapts VectorReader to accept eligibleSelector and removes legacy filtering with bitmaps. |
| search.go | Adds an isMatchAllQuery implementation for query filtering consistency. |
| search/query/knn.go | Removes legacy filterResults and incorporates eligibleSelector; note the spelling inconsistency in the field name. |
| index/scorch/optimize_knn.go | Removes obsolete bitmap logic in favor of the new eligibleSelector filtering support. |
| index/scorch/snapshot_index.go | Simplifies segment index and local document number resolution and removes deprecated globalDocNums. |
| search/searcher/search_knn.go | Updates KNNSearcher to use the eligibleSelector from VectorReader. |
| search_knn_test.go | Modifies tests to match the updated API for KNN query creation. |
| index/scorch/snapshot_index_vr.go | Removes legacy handling of eligibleDocIDs, replacing it with eligibleSelector usage. |
Files not reviewed (1)
- go.mod: Language not supported
Thejas-bhat
left a comment
There was a problem hiding this comment.
yeah the collector code looks much better now
…lector (#66) - Refactor interface as per blevesearch/bleve#2169 (comment)
…-gitea#7468) This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | [github.com/blevesearch/bleve/v2](https://github.com/blevesearch/bleve) | require | minor | `v2.4.4` -> `v2.5.0` | --- ### Release Notes <details> <summary>blevesearch/bleve (github.com/blevesearch/bleve/v2)</summary> ### [`v2.5.0`](https://github.com/blevesearch/bleve/releases/tag/v2.5.0) [Compare Source](blevesearch/bleve@v2.4.4...v2.5.0) ##### Bug Fixes - Exact hits to score higher than fuzzy hits, with blevesearch/bleve#2056 - Fix boosting during hybrid search that involves text + nearest neighbor, with blevesearch/bleve#2127 - Addressed bug in IP field handling while highlighting, with blevesearch/bleve#2142 - Graceful error handling within registry, with blevesearch/bleve#2151 - `http/` package (meant for demo purposes) removed from repository to remove vulnerability - [CVE-2022-31022](GHSA-9w9f-6mg8-jp7w), relocated to within https://github.com/blevesearch/bleve-explorer - Geo radius queries will now advertise distances (within sort values) in readable format, with blevesearch/bleve#2137 ##### Improvements - Vector search requires `faiss` dynamic library to be built from [blevesearch/faiss@352484e](https://github.com/blevesearch/faiss/tree/352484e0fc9d1f8f46737841efe5f26e0f383f71) which is a modified version of [v1.10.0](https://github.com/facebookresearch/faiss/releases/tag/v1.10.0) - Support for **BM25 scoring**, see: [scoring.md](https://github.com/blevesearch/bleve/blob/v2.5.0/docs/scoring.md#bm25) - Support for **synonyms' search**, see: [synonyms.md](https://github.com/blevesearch/bleve/blob/v2.5.0/docs/synonyms.md) - **Significant performance improvements in pre-filtered vector search**, with blevesearch/bleve#2169 + dependent changes - `auto` fuzziness detection with blevesearch/bleve#2060 - Ability to affect ingestion/drain rate by tuning persister workers with blevesearch/bleve#2100 - Additional config in merge policy for improved merger behavior, with blevesearch/bleve#2134 - Geo improvements: footprint reduction for polygons, better validation and graceful error handling, with blevesearch/bleve#2162 + blevesearch/bleve#2158 + blevesearch/bleve#2165 - Upgrade to RoaringBitmap/roaring@v2.4.5, etcd.io/bbolt@v1.4.0 - More metrics ##### Milestone - [v2.5.0](https://github.com/blevesearch/bleve/milestone/24) </details> --- ### Configuration 📅 **Schedule**: Branch creation - "* 0-3 * * *" (UTC), Automerge - "* 0-3 * * *" (UTC). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOS4yMjIuMSIsInVwZGF0ZWRJblZlciI6IjM5LjIyMi4xIiwidGFyZ2V0QnJhbmNoIjoiZm9yZ2VqbyIsImxhYmVscyI6WyJkZXBlbmRlbmN5LXVwZ3JhZGUiLCJ0ZXN0L25vdC1uZWVkZWQiXX0=--> Co-authored-by: Gusted <postmaster@gusted.xyz> Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/7468 Reviewed-by: Gusted <gusted@noreply.codeberg.org> Reviewed-by: Shiny Nematoda <snematoda@noreply.codeberg.org> Co-authored-by: Renovate Bot <forgejo-renovate-action@forgejo.org> Co-committed-by: Renovate Bot <forgejo-renovate-action@forgejo.org>
…lector (#66) - Refactor interface as per blevesearch/bleve#2169 (comment)
- Refactor pre-filtered vector search to enhance performance and reduce
memory footprint.
- Replace the current bitmap-based approach for calculating segment
local document numbers with a more direct method, where the local
document numbers are mapped directly to the segment ID during the
execution of the eligible collector.
- Requires:
- blevesearch/bleve_index_api#63
- blevesearch/bleve_index_api#66
- blevesearch/zapx#317
- blevesearch/go-faiss#41
- blevesearch/faiss#49
---------
Co-authored-by: Abhinav Dangeti <abhinav@couchbase.com>
… (#2175) - Refactor pre-filtered vector search to enhance performance and reduce memory footprint. - Replace the current bitmap-based approach for calculating segment local document numbers with a more direct method, where the local document numbers are mapped directly to the segment ID during the execution of the eligible collector. - Requires: - blevesearch/bleve_index_api#67 - blevesearch/zapx#320 - blevesearch/go-faiss#41 - blevesearch/faiss#49 --------- --------- Co-authored-by: Abhinav Dangeti <abhinav@couchbase.com>
Uh oh!
There was an error while loading. Please reload this page.