Skip to content

Conversation

@yurekami
Copy link
Contributor

Summary

Changes

  1. ServiceGroup.cc: Add TCP connection cleanup alongside RDMA in checkConnectionsRegularly()
  2. TransportPool.cc: Increment connectedCount when creating new connections to match decrement in remove()

Test plan

  • Verify idle TCP connections are properly released when reducing max_connections
  • Verify common.net.connected_count metric doesn't overflow

🤖 Generated with Claude Code

yurekami and others added 2 commits December 25, 2025 23:59
Add context manager support and atexit handlers to iovec/ioring classes
to address symlink leakage issues reported in deepseek-ai#334.

Changes:
- Add global symlink registry with thread-safe lock for tracking
- Add atexit handler _cleanup_symlinks() for cleanup on normal exit
- Add context manager protocol (__enter__/__exit__) to iovec class
- Add context manager protocol to ioring class
- Add explicit close() methods with idempotent behavior
- Replace simple __del__ with close() call for safe cleanup

This ensures symlinks in /3fs-virt/iovs/ are cleaned up even when:
- Exceptions occur during I/O operations
- The process exits normally without explicit cleanup
- Users forget to manually unlink symlinks

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
…ek-ai#366)

1. Fix TCP connection leak (deepseek-ai#365):
   - Add TCP connection cleanup in checkConnectionsRegularly()
   - Previously only RDMA connections were checked for expiration

2. Fix metric overflow (deepseek-ai#366):
   - Increment connectedCount when creating new outgoing connections
   - This matches the decrement in remove() for symmetry

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
XLOGF(DBG9, "server@{} check connections", fmt::ptr(this));
// Check both RDMA and TCP connections for expiration
ioWorker().checkConnections(Address{0, 0, Address::RDMA}, config_.connection_expiration_time());
ioWorker().checkConnections(Address{0, 0, Address::TCP}, config_.connection_expiration_time());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

当集群规模较大,或者连接数较大时,这两个checkConnections是会block 协程调度的,感觉可以定期yield一下,让出cpu。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants