Skip to content

Conversation

eezz10001
Copy link

🐛 问题描述

在海量数据到知识库时(4000万条dataset_collections),MongoDB由于压力过大,切换主节点导致Change Streams断连且无法自动重连,造成:

  • 数据消费缓慢(正常能按照定时任务一分钟消费1条数据)
  • 需要手动重启服务才能恢复
  • 系统稳定性问题

✨ 解决方案

1️⃣ Change Streams 自动重连机制

  • 错误监听: 添加 errorclose 事件处理
  • 自动重连: 实现带指数退避的重连逻辑
  • 状态日志: 详细记录连接状态变化
  • 统一模式: 使用 createWatch 函数确保一致性

2️⃣ 应用优雅关闭

  • 资源清理: 保存并调用所有 Change Streams 的清理函数
  • 信号处理: 监听 SIGTERM/SIGINT/SIGUSR2 信号
  • 防止泄漏: 确保应用关闭时正确释放连接

📁 修改文件

  • packages/service/core/ai/config/utils.ts - 系统模型配置监听
  • projects/app/src/service/common/system/volumnMongoWatch.ts - 系统配置和模板监听
  • projects/app/src/service/core/dataset/training/utils.ts - 数据集训练监听
  • projects/app/src/instrumentation.ts - 应用启动和关闭处理

🧪 测试场景

  • ✅ MongoDB 主节点切换时自动重连
  • ✅ 网络中断后恢复连接
  • ✅ 应用重启时资源正确清理
  • ✅ 开发环境热重载兼容

eezz10001 and others added 3 commits October 8, 2025 23:08
- 问题描述:当大量推送数据到知识库时,MongoDB 主节点切换会导致 Change Streams 进程挂掉,数据消费变慢,需要重启服务
- 修复内容:
  1. 为所有 Change Stream 实现添加错误处理和自动重连机制
  2. 添加 error 和 close 事件监听器,实现延迟重连
  3. 增强日志记录,便于问题排查
  4. 提供清理函数,确保资源正确释放
- 影响文件:
  - packages/service/core/ai/config/utils.ts (watchSystemModelUpdate)
  - projects/app/src/service/common/system/volumnMongoWatch.ts (reloadConfigWatch, refetchAppTemplates, startMongoWatch)
  - projects/app/src/service/core/dataset/training/utils.ts (createDatasetTrainingMongoWatch)
- 解决场景:MongoDB 主节点切换时 Change Streams 自动重连,避免服务中断
- 保存 startMongoWatch 返回的 cleanup 函数
- 添加进程信号监听 (SIGTERM, SIGINT, SIGUSR2)
- 在应用关闭时正确清理所有 Change Streams 连接
- 防止资源泄漏和连接积累问题

修复文件: projects/app/src/instrumentation.ts
Copy link
Contributor

gru-agent bot commented Oct 8, 2025

TestGru Assignment

Summary

Link CommitId Status Reason
Detail 82105b4 🚫 Skipped No files need to be tested {"packages/service/common/mongo/init.ts":"File path does not match include patterns.","packages/service/core/ai/config/utils.ts":"File path does not match include patterns.","projects/app/src/instrumentation.ts":"Can not find valuable test target.\nregister: Out of scope - This function orchestrates the initialization of the entire system, involving multiple modules and side effects such as database connections, worker preloading, and signal handling. Its correctness depends on interactions between many services and the overall process, making…","projects/app/src/service/common/system/volumnMongoWatch.ts":"Can not find valuable test target.\nstartMongoWatch: Out of scope - This function orchestrates the starting and cleanup of multiple change stream watchers, each of which involves integration with MongoDB and external services. It does not contain core business logic or calculations in itself, and its correctness relies on the behavior…","projects/app/src/se…

History Assignment

Tip

You can @gru-agent and leave your feedback. TestGru will make adjustments based on your input

Copy link

cla-assistant bot commented Oct 8, 2025

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

1 similar comment
Copy link

cla-assistant bot commented Oct 8, 2025

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link

github-actions bot commented Oct 8, 2025

Preview sandbox Image:

registry.cn-hangzhou.aliyuncs.com/fastgpt/fastgpt-pr:fatsgpt_sandbox_82105b45d6e37969e901566001d1eb6bf37f7e2c

Copy link

github-actions bot commented Oct 8, 2025

Preview mcp_server Image:

registry.cn-hangzhou.aliyuncs.com/fastgpt/fastgpt-pr:fatsgpt_mcp_server_82105b45d6e37969e901566001d1eb6bf37f7e2c

@c121914yu
Copy link
Collaborator

c121914yu commented Oct 9, 2025

代码似乎不通用,其他 change streams 没有生效。
这个问题是什么?重连后,change streams 不再生效了?

@eezz10001
Copy link
Author

是的, 重新链接以后,change streams 这个进程就挂掉了,就只剩下定时任务,一次性只消费一条数据,老慢了。

@eezz10001
Copy link
Author

隆哥,不通用没关系, 我可以close掉, 但是这个问题肯定是客观存在的, 如果是mongodb单节点没问题,但是如果是副本集肯定是有问题的。 但是我觉得这个影响工程的健壮性,建议修复一下。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants