-
Notifications
You must be signed in to change notification settings - Fork 12.7k
ggml-rpc: chunk send()/recv() to avoid EINVAL for very large tensors over RPC (macOS & others) #15188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…over RPC (macOS & others). Fixes ggml-org#15055
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the bug report and the patch
ggml/src/ggml-rpc/ggml-rpc.cpp
Outdated
@@ -32,6 +32,8 @@ | |||
|
|||
namespace fs = std::filesystem; | |||
|
|||
static constexpr size_t RPC_IO_CHUNK = 1024ull * 1024ull * 1024ull; // 1 GiB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to MAX_CHUNK_SIZE
ggml/src/ggml-rpc/ggml-rpc.cpp
Outdated
@@ -323,27 +325,43 @@ static std::shared_ptr<socket_t> create_server_socket(const char * host, int por | |||
static bool send_data(sockfd_t sockfd, const void * data, size_t size) { | |||
size_t bytes_sent = 0; | |||
while (bytes_sent < size) { | |||
ssize_t n = send(sockfd, (const char *)data + bytes_sent, size - bytes_sent, 0); | |||
size_t size_to_send = size - bytes_sent; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size_t size_to_send = std::max(size - bytes_sent, MAX_CHUNK_SIZE);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean std::min
, not std::max
, sorry
ggml/src/ggml-rpc/ggml-rpc.cpp
Outdated
if (n < 0) { | ||
#ifndef _WIN32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace with GGML_LOG_ERROR
Please follow up on how to reproduce this, thanks |
…, switch to GGML_LOG_ERROR, handle 0-length send/recv
514a5ff
to
829d6b6
Compare
Thank you for the review and suggestions! Applied the requested changes:
Please let me know if you’d like further changes. |
ggml/src/ggml-rpc/ggml-rpc.cpp
Outdated
bytes_sent, size_to_send); | ||
return false; | ||
} | ||
if (n == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this special case for n == 0
? if zero bytes are sent, then we should retry again until we send everything or an error occurs (n < 0
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it — I’ll remove the special case for n == 0
in send_data()
and just retry in the loop as suggested.
Thanks — removed the n == 0 special case in send_data(). recv() is unchanged as n == 0 correctly indicates a closed connection there. |
ggml/src/ggml-rpc/ggml-rpc.cpp
Outdated
return false; | ||
} | ||
bytes_sent += n; | ||
bytes_sent += (size_t)n; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the trailing whitespace
Fixes #15055
This PR prevents send()/recv() from being called with extremely large buffers during RPC tensor transfer by chunking I/O into 1 GiB pieces. On macOS this avoids intermittent EINVAL errors that previously caused the client to abort when offloading very large models via RPC.
What’s the symptom?
Loading very large GGUFs via RPC would fail with:
client: send: Invalid argument
server: recv: Invalid argument
Repro seen with DeepSeek-R1-0528-* and Qwen3-480B-* at large quants.
Single-node load (no RPC) worked fine.
Root cause (observed)
OS-level limits on single send()/recv() buffer sizes; very large tensors were transmitted in one shot. Splitting into smaller chunks resolves the issue.
Changes
Add RPC_IO_CHUNK = 1 GiB.
Update send_data() and recv_data() to loop with chunked I/O.
Keep existing error logging; behavior is otherwise unchanged.
Why 1 GiB?
Empirically under the limits that triggered EINVAL on macOS.
Large enough to keep throughput good; easy to tune later if needed.
Testing
macOS (Metal): Previously failing large-model RPC offload now completes. Inference runs.
macOS (non-Metal): Build + basic RPC transfer OK.
Linux/Ubuntu: Not tested yet. Relying on CI and maintainer validation. (Happy to test on request; I can also try Docker later.)
Known quirk (non-blocking)
I still see an occasional non-fatal recv: Invalid argument before the big tensor transfer starts, but the run proceeds and finishes. I suspect a minor size-field mismatch during early handshake. If useful, I can follow up with a tiny patch that always serializes message lengths as uint64_t on the wire.
Performance / compatibility
No API changes.
Chunking is per-call looped send/recv; negligible overhead in my tests vs. “one big send”.
Should be safe across platforms.
Thanks!
Update: Verified cross-OS direction as well.
Additional testing
./build/bin/llama-server
-m /path/to/DeepSeek-R1-0528-Q4_K_M.gguf
--rpc :50052 -c 3000
./build/bin/rpc-server -p 50052 --host
Result
recv: Invalid argument
before the first large transfer; run proceeds normally.Notes
uint64_t
to silence the early handshake warning.