Skip to content

MB-60971: Improve drain rate of the in-memory segments#2100

Merged
Thejas-bhat merged 12 commits intomasterfrom
batchFlush
Mar 27, 2025
Merged

MB-60971: Improve drain rate of the in-memory segments#2100
Thejas-bhat merged 12 commits intomasterfrom
batchFlush

Conversation

@Thejas-bhat
Copy link
Member

@Thejas-bhat Thejas-bhat commented Nov 14, 2024

  • Currently when there is a certain amount of data in memory, persister performs an in-memory merge and introduces the merged file as part of the new snapshot. Furthermore, there isn't a bound on how much data the persister can operate upon during the in-memory merge operation.
  • This can have effects when the underlying hardware supports very fast ingestion and if there is a huge amount of in-memory segments accumulated the costly merge operation can be time consuming and cause memory pressure to spike up, in which case the application layer would have to stop pumping data into the bleve system.
  • The PR introduces the concept of persister workers and the max amount of data (which translates to a bounded number of segments) each worker can parallely perform an in-memory merge. The merged files are introduced to the system in a single step to avoid having too many files in the system since that causes too much pressure on the cleanup procedure of out of ref'd files.
  • The original behaviour of unbounded in-memory merge + flush in one shot is still supported, and to keep things compliant with earlier bleve releases, its the default behaviour. To enable the feature proposed in the PR application layer can update the config while opening or creating a new index by setting the values for NumPersisterWorkers and MaxSizeInMemoryMergePerWorker.

@abhinavdangeti abhinavdangeti added this to the v2.5.0 milestone Nov 21, 2024
@Thejas-bhat Thejas-bhat force-pushed the batchFlush branch 2 times, most recently from c4b9f67 to f45d2eb Compare December 3, 2024 10:43
@Thejas-bhat Thejas-bhat marked this pull request as ready for review December 3, 2024 10:48
@abhinavdangeti
Copy link
Member

@Thejas-bhat is this something we want to consider for v2.5.0 still and so morpheus ?

@abhinavdangeti
Copy link
Member

@Thejas-bhat have we gotten back results to back this change?

CascadingRadium
CascadingRadium previously approved these changes Mar 5, 2025
Copy link
Member

@abhinavdangeti abhinavdangeti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hope we've done our due diligence well to ensure that there're no performance regressions with this change.

Copy link
Member

@abhinavdangeti abhinavdangeti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops wrong response :)

@Thejas-bhat Thejas-bhat merged commit 065afdf into master Mar 27, 2025
9 checks passed
@CascadingRadium CascadingRadium deleted the batchFlush branch March 27, 2025 08:49
ns-codereview pushed a commit to couchbase/cbft that referenced this pull request Mar 27, 2025
			friendly

- blevesearch/bleve#2100
- blevesearch/bleve#2134

Change-Id: I8e3e6e8f60d16094d3d4ddee44115c07a872fcfb
Reviewed-on: https://review.couchbase.org/c/cbft/+/225100
Reviewed-by: Abhi Dangeti <abhinav@couchbase.com>
Tested-by: <thejas.orkombu@couchbase.com>
Well-Formed: Build Bot <build@couchbase.com>
project-mirrors-bot-tu bot pushed a commit to project-mirrors/forgejo-as-gitea-fork that referenced this pull request Apr 6, 2025
…-gitea#7468)

This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [github.com/blevesearch/bleve/v2](https://github.com/blevesearch/bleve) | require | minor | `v2.4.4` -> `v2.5.0` |

---

### Release Notes

<details>
<summary>blevesearch/bleve (github.com/blevesearch/bleve/v2)</summary>

### [`v2.5.0`](https://github.com/blevesearch/bleve/releases/tag/v2.5.0)

[Compare Source](blevesearch/bleve@v2.4.4...v2.5.0)

##### Bug Fixes

-   Exact hits to score higher than fuzzy hits, with blevesearch/bleve#2056
-   Fix boosting during hybrid search that involves text + nearest neighbor, with blevesearch/bleve#2127
-   Addressed bug in IP field handling while highlighting, with blevesearch/bleve#2142
-   Graceful error handling within registry, with blevesearch/bleve#2151
-   `http/` package (meant for demo purposes) removed from repository to remove vulnerability - [CVE-2022-31022](GHSA-9w9f-6mg8-jp7w), relocated to within https://github.com/blevesearch/bleve-explorer
-   Geo radius queries will now advertise distances (within sort values) in readable format, with blevesearch/bleve#2137

##### Improvements

-   Vector search requires `faiss` dynamic library to be built from [blevesearch/faiss@352484e](https://github.com/blevesearch/faiss/tree/352484e0fc9d1f8f46737841efe5f26e0f383f71) which is a modified version of [v1.10.0](https://github.com/facebookresearch/faiss/releases/tag/v1.10.0)
-   Support for **BM25 scoring**, see: [scoring.md](https://github.com/blevesearch/bleve/blob/v2.5.0/docs/scoring.md#bm25)
-   Support for **synonyms' search**, see: [synonyms.md](https://github.com/blevesearch/bleve/blob/v2.5.0/docs/synonyms.md)
-   **Significant performance improvements in pre-filtered vector search**, with blevesearch/bleve#2169 + dependent changes
-   `auto` fuzziness detection with blevesearch/bleve#2060
-   Ability to affect ingestion/drain rate by tuning persister workers with blevesearch/bleve#2100
-   Additional config in merge policy for improved merger behavior, with blevesearch/bleve#2134
-   Geo improvements: footprint reduction for polygons, better validation and graceful error handling, with blevesearch/bleve#2162 + blevesearch/bleve#2158 + blevesearch/bleve#2165
-   Upgrade to RoaringBitmap/roaring@v2.4.5, etcd.io/bbolt@v1.4.0
-   More metrics

##### Milestone

-   [v2.5.0](https://github.com/blevesearch/bleve/milestone/24)

</details>

---

### Configuration

📅 **Schedule**: Branch creation - "* 0-3 * * *" (UTC), Automerge - "* 0-3 * * *" (UTC).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOS4yMjIuMSIsInVwZGF0ZWRJblZlciI6IjM5LjIyMi4xIiwidGFyZ2V0QnJhbmNoIjoiZm9yZ2VqbyIsImxhYmVscyI6WyJkZXBlbmRlbmN5LXVwZ3JhZGUiLCJ0ZXN0L25vdC1uZWVkZWQiXX0=-->

Co-authored-by: Gusted <postmaster@gusted.xyz>
Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/7468
Reviewed-by: Gusted <gusted@noreply.codeberg.org>
Reviewed-by: Shiny Nematoda <snematoda@noreply.codeberg.org>
Co-authored-by: Renovate Bot <forgejo-renovate-action@forgejo.org>
Co-committed-by: Renovate Bot <forgejo-renovate-action@forgejo.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants