Skip to content

use unix socket to share fd#634

Merged
Binyang2014 merged 13 commits intomainfrom
binyli/nccl
Sep 25, 2025
Merged

use unix socket to share fd#634
Binyang2014 merged 13 commits intomainfrom
binyli/nccl

Conversation

@Binyang2014
Copy link
Contributor

@Binyang2014 Binyang2014 commented Sep 18, 2025

Use unix socket to share fd to other processes. Used for nvls handle sharing

@Binyang2014 Binyang2014 requested a review from Copilot September 18, 2025 05:12
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR replaces the pidfd mechanism for file descriptor sharing with a Unix socket-based approach. The change eliminates dependencies on Linux-specific system calls (SYS_pidfd_open and SYS_pidfd_getfd) in favor of a more portable Unix domain socket implementation.

  • Introduces UnixSocketServer and UnixSocketClient classes for file descriptor transmission
  • Updates RegisteredMemory and NvlsConnection to use Unix sockets instead of pidfd
  • Integrates Unix socket server lifecycle with TcpBootstrap initialization

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/unix_socket.cc Implements Unix socket server/client for file descriptor transfer
src/include/unix_socket.hpp Defines UnixSocketServer and UnixSocketClient class interfaces
src/registered_memory.cc Replaces pidfd usage with Unix socket client calls
src/nvls.cc Updates NVLS connection to use Unix socket for file descriptor sharing
src/include/registered_memory.hpp Changes TransportInfo structure from pid/fd to fdId/localRankId
src/bootstrap/bootstrap.cc Integrates Unix socket server startup and shutdown with bootstrap
include/mscclpp/env.hpp Adds localRank field to Env class

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Binyang2014 and others added 7 commits September 17, 2025 22:13
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@Binyang2014 Binyang2014 changed the title use unix socket to trans fd use unix socket to share fd Sep 22, 2025
@Binyang2014 Binyang2014 marked this pull request as ready for review September 22, 2025 18:04
@Binyang2014
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@Binyang2014
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@Binyang2014
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 3 pipeline(s).

Copy link
Contributor

@caiomcbr caiomcbr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Binyang2014 Binyang2014 merged commit bf8d424 into main Sep 25, 2025
17 of 29 checks passed
@Binyang2014 Binyang2014 deleted the binyli/nccl branch September 25, 2025 18:40
@Binyang2014 Binyang2014 restored the binyli/nccl branch September 25, 2025 18:41
@Binyang2014 Binyang2014 deleted the binyli/nccl branch October 2, 2025 20:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants