Skip to content

MB-65473: Refactor and Optimize Pre-Filtered Vector Search#2169

Merged
abhinavdangeti merged 8 commits intomasterfrom
prefilterOpt
Apr 2, 2025
Merged

MB-65473: Refactor and Optimize Pre-Filtered Vector Search#2169
abhinavdangeti merged 8 commits intomasterfrom
prefilterOpt

Conversation

@CascadingRadium
Copy link
Member

@CascadingRadium CascadingRadium commented Mar 21, 2025

- Refactor pre-filtered vector search to enhance performance and reduce memory footprint.
@CascadingRadium
Copy link
Member Author

adding a do not merge label for now, as the zapx side is yet to be investigated

Copy link
Member

@abhinavdangeti abhinavdangeti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, let's get some perf results for this and the order of merge should be faiss > go-faiss > index_api/segment_api > zapx > bleve

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the pre-filtered vector search functionality to improve performance and reduce memory footprint by replacing legacy filtering mechanisms with a new eligible document selector. Key changes include updating knn query construction to use an eligibleSelector, refactoring the EligibleCollector to remove unused ID slices and adopt the new filtering model, and removing outdated bitmap filtering logic from the optimize phase.

Reviewed Changes

Copilot reviewed 11 out of 12 changed files in this pull request and generated no comments.

Show a summary per file
File Description
search_knn.go Updates to createKNNQuery and runKnnCollector to support eligibleSelector instead of map-based filtering.
search/collector/eligible.go Refactors EligibleCollector to use eligibleSelector and updates the document match handling accordingly.
index/scorch/snapshot_vector_index.go Adapts VectorReader to accept eligibleSelector and removes legacy filtering with bitmaps.
search.go Adds an isMatchAllQuery implementation for query filtering consistency.
search/query/knn.go Removes legacy filterResults and incorporates eligibleSelector; note the spelling inconsistency in the field name.
index/scorch/optimize_knn.go Removes obsolete bitmap logic in favor of the new eligibleSelector filtering support.
index/scorch/snapshot_index.go Simplifies segment index and local document number resolution and removes deprecated globalDocNums.
search/searcher/search_knn.go Updates KNNSearcher to use the eligibleSelector from VectorReader.
search_knn_test.go Modifies tests to match the updated API for KNN query creation.
index/scorch/snapshot_index_vr.go Removes legacy handling of eligibleDocIDs, replacing it with eligibleSelector usage.
Files not reviewed (1)
  • go.mod: Language not supported

Thejas-bhat
Thejas-bhat previously approved these changes Mar 27, 2025
Copy link
Member

@Thejas-bhat Thejas-bhat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah the collector code looks much better now

Thejas-bhat
Thejas-bhat previously approved these changes Apr 1, 2025
abhinavdangeti pushed a commit to blevesearch/bleve_index_api that referenced this pull request Apr 1, 2025
@abhinavdangeti abhinavdangeti merged commit 89e041b into master Apr 2, 2025
9 checks passed
@abhinavdangeti abhinavdangeti deleted the prefilterOpt branch April 2, 2025 15:36
project-mirrors-bot-tu bot pushed a commit to project-mirrors/forgejo-as-gitea-fork that referenced this pull request Apr 6, 2025
…-gitea#7468)

This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [github.com/blevesearch/bleve/v2](https://github.com/blevesearch/bleve) | require | minor | `v2.4.4` -> `v2.5.0` |

---

### Release Notes

<details>
<summary>blevesearch/bleve (github.com/blevesearch/bleve/v2)</summary>

### [`v2.5.0`](https://github.com/blevesearch/bleve/releases/tag/v2.5.0)

[Compare Source](blevesearch/bleve@v2.4.4...v2.5.0)

##### Bug Fixes

-   Exact hits to score higher than fuzzy hits, with blevesearch/bleve#2056
-   Fix boosting during hybrid search that involves text + nearest neighbor, with blevesearch/bleve#2127
-   Addressed bug in IP field handling while highlighting, with blevesearch/bleve#2142
-   Graceful error handling within registry, with blevesearch/bleve#2151
-   `http/` package (meant for demo purposes) removed from repository to remove vulnerability - [CVE-2022-31022](GHSA-9w9f-6mg8-jp7w), relocated to within https://github.com/blevesearch/bleve-explorer
-   Geo radius queries will now advertise distances (within sort values) in readable format, with blevesearch/bleve#2137

##### Improvements

-   Vector search requires `faiss` dynamic library to be built from [blevesearch/faiss@352484e](https://github.com/blevesearch/faiss/tree/352484e0fc9d1f8f46737841efe5f26e0f383f71) which is a modified version of [v1.10.0](https://github.com/facebookresearch/faiss/releases/tag/v1.10.0)
-   Support for **BM25 scoring**, see: [scoring.md](https://github.com/blevesearch/bleve/blob/v2.5.0/docs/scoring.md#bm25)
-   Support for **synonyms' search**, see: [synonyms.md](https://github.com/blevesearch/bleve/blob/v2.5.0/docs/synonyms.md)
-   **Significant performance improvements in pre-filtered vector search**, with blevesearch/bleve#2169 + dependent changes
-   `auto` fuzziness detection with blevesearch/bleve#2060
-   Ability to affect ingestion/drain rate by tuning persister workers with blevesearch/bleve#2100
-   Additional config in merge policy for improved merger behavior, with blevesearch/bleve#2134
-   Geo improvements: footprint reduction for polygons, better validation and graceful error handling, with blevesearch/bleve#2162 + blevesearch/bleve#2158 + blevesearch/bleve#2165
-   Upgrade to RoaringBitmap/roaring@v2.4.5, etcd.io/bbolt@v1.4.0
-   More metrics

##### Milestone

-   [v2.5.0](https://github.com/blevesearch/bleve/milestone/24)

</details>

---

### Configuration

📅 **Schedule**: Branch creation - "* 0-3 * * *" (UTC), Automerge - "* 0-3 * * *" (UTC).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOS4yMjIuMSIsInVwZGF0ZWRJblZlciI6IjM5LjIyMi4xIiwidGFyZ2V0QnJhbmNoIjoiZm9yZ2VqbyIsImxhYmVscyI6WyJkZXBlbmRlbmN5LXVwZ3JhZGUiLCJ0ZXN0L25vdC1uZWVkZWQiXX0=-->

Co-authored-by: Gusted <postmaster@gusted.xyz>
Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/7468
Reviewed-by: Gusted <gusted@noreply.codeberg.org>
Reviewed-by: Shiny Nematoda <snematoda@noreply.codeberg.org>
Co-authored-by: Renovate Bot <forgejo-renovate-action@forgejo.org>
Co-committed-by: Renovate Bot <forgejo-renovate-action@forgejo.org>
CascadingRadium added a commit to blevesearch/bleve_index_api that referenced this pull request Apr 7, 2025
CascadingRadium added a commit that referenced this pull request Apr 7, 2025
- Refactor pre-filtered vector search to enhance performance and reduce
memory footprint.
- Replace the current bitmap-based approach for calculating segment
local document numbers with a more direct method, where the local
document numbers are mapped directly to the segment ID during the
execution of the eligible collector.
- Requires:
    - blevesearch/bleve_index_api#63
    - blevesearch/bleve_index_api#66
    - blevesearch/zapx#317
    - blevesearch/go-faiss#41
    - blevesearch/faiss#49

---------

Co-authored-by: Abhinav Dangeti <abhinav@couchbase.com>
abhinavdangeti added a commit that referenced this pull request Apr 8, 2025
… (#2175)

- Refactor pre-filtered vector search to enhance performance and reduce
memory footprint.
- Replace the current bitmap-based approach for calculating segment
local document numbers with a more direct method, where the local
document numbers are mapped directly to the segment ID during the
execution of the eligible collector.
- Requires:
    - blevesearch/bleve_index_api#67
    - blevesearch/zapx#320
    - blevesearch/go-faiss#41
    - blevesearch/faiss#49

---------

---------

Co-authored-by: Abhinav Dangeti <abhinav@couchbase.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants