feat(indexer): 优化 Milvus 索引器以支持批量处理和并发控制 #581

lonelymeko · 2025-12-01T09:55:07Z

引入批次处理机制，提升大规模文档插入效率
添加并发控制功能，通过信号量限制最大并发数
修改默认配置，新增 BatchSize 和 MaxConcurrency 参数
调整 embedding 与数据转换逻辑，使其在批处理协程中执行
保留最终统一 Flush 操作，确保数据一致性
更新依赖项版本，引入 eino-ext 组件及相关库
升级多项间接依赖包，增强系统稳定性和兼容性
添加测试markdown 文件，在 milvus/examples/main.go引入md分割器，将分割后的docs交给新的 stroe 方法

What type of PR is this?

Check the PR title.

[✅ ] This PR title match the format: <type>(optional scope): <description>
[✅ ] The description of this PR title is user-oriented and clear enough for others to understand.
Attach the PR updating the user documentation if the current PR requires user awareness at the usage level. User docs repo

(Optional) Translate the PR title into Chinese.

feat(milvus): 支持并发批量插入与性能优化

(Optional) More detailed description for this PR(en: English/zh: Chinese).

en: This PR introduces a high-performance batch processing mechanism for the Milvus Indexer, significantly improving the efficiency of large-scale document insertions.

Key Changes:

Batch Processing & Concurrency:
Introduced BatchSize and MaxConcurrency parameters in IndexerConfig.
Implemented internal batching logic in the Store method.
Added semaphore-based concurrency control to process Embedding, Conversion, and InsertRows in parallel goroutines.
Optimized the workflow to perform a unified Flush operation only after all batches are processed, avoiding performance bottlenecks caused by frequent flushing.
Refactoring:
Refactored the Store method to handle large document lists automatically, removing the need for callers to manually split data.
Dependency Upgrades:
Updated eino-ext and related indirect dependencies to enhance stability.
Examples & Testing:
Added examples/test.md (146 docs) and updated examples/main.go.
Introduced markdown splitter in the example to demonstrate the new capability.
Performance Comparison: Testing with examples/test.md (146 documents) on the same environment:

Before (Manual Serial Batching): ~2m 40s (160.3s)

After (Concurrent Batching): ~2.74s
Improvement: ~58x faster
Before (Before: Manual serial processing took 2m40s)

After (After: Concurrent processing took 2.74s)

zh(optional): 本 PR 为 Milvus Indexer 引入了高性能的批处理机制，极大提升了大规模文档插入的效率。

主要变更：

批处理与并发：
在 IndexerConfig 中新增了 BatchSize 和 MaxConcurrency 配置项。
在 Store 方法内部实现了自动分批逻辑。
添加了基于信号量的并发控制，支持并行执行 Embedding、数据转换和 InsertRows 操作。
优化了 Flush 逻辑，确保所有批次插入完成后统一执行一次 Flush，避免频繁 Flush 导致的性能阻塞。
重构：
重构了 Store 方法，调用方不再需要手动切分数据，可以直接传入大量文档。
依赖升级：
更新了 eino-ext 组件版本及相关间接依赖。
示例与测试：
新增 examples/test.md 测试文件（包含146个文档片段）并在 examples/main.go 中引入了 markdown 分割器进行验证。
性能对比：使用 examples/test.md (146个文档片段) 在相同环境下测试：

优化前 (手动串行分批): 耗时约 2分40秒 (160.3s)

优化后 (内部并发分批): 耗时约 2.74秒

提升: 性能提升约 58 倍

(Optional) Which issue(s) this PR fixes:

Fixes #579

(optional) The PR that updates user documentation:

- 引入批次处理机制，提升大规模文档插入效率 - 添加并发控制功能，通过信号量限制最大并发数 - 修改默认配置，新增 BatchSize 和 MaxConcurrency 参数 - 调整 embedding 与数据转换逻辑，使其在批处理协程中执行 - 保留最终统一 Flush 操作，确保数据一致性 - 更新依赖项版本，引入 eino-ext 组件及相关库 - 升级多项间接依赖包，增强系统稳定性和兼容性 - 添加测试markdown 文件，在 milvus/examples/main.go引入md分割器，将分割后的docs交给新的 stroe 方法

CLAassistant · 2025-12-01T09:55:15Z

All committers have signed the CLA.

Merge branch 'main' into main

29bb0e2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(indexer): 优化 Milvus 索引器以支持批量处理和并发控制 #581

feat(indexer): 优化 Milvus 索引器以支持批量处理和并发控制 #581

Uh oh!

lonelymeko commented Dec 1, 2025 •

edited

Loading

Uh oh!

CLAassistant commented Dec 1, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

feat(indexer): 优化 Milvus 索引器以支持批量处理和并发控制 #581

Are you sure you want to change the base?

feat(indexer): 优化 Milvus 索引器以支持批量处理和并发控制 #581

Uh oh!

Conversation

lonelymeko commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this?

Check the PR title.

(Optional) Translate the PR title into Chinese.

(Optional) More detailed description for this PR(en: English/zh: Chinese).

(Optional) Which issue(s) this PR fixes:

(optional) The PR that updates user documentation:

Uh oh!

CLAassistant commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

lonelymeko commented Dec 1, 2025 •

edited

Loading

CLAassistant commented Dec 1, 2025 •

edited

Loading