ggml-rpc: chunk send()/recv() to avoid EINVAL for very large tensors over RPC (macOS & others) #15188

Tak-RS · 2025-08-09T01:19:39Z

This PR prevents send()/recv() from being called with extremely large buffers during RPC tensor transfer by chunking I/O into 1 GiB pieces. On macOS this avoids intermittent EINVAL errors that previously caused the client to abort when offloading very large models via RPC.

What’s the symptom?
Loading very large GGUFs via RPC would fail with:

client: send: Invalid argument

server: recv: Invalid argument

Repro seen with DeepSeek-R1-0528-* and Qwen3-480B-* at large quants.

Single-node load (no RPC) worked fine.

Root cause (observed)
OS-level limits on single send()/recv() buffer sizes; very large tensors were transmitted in one shot. Splitting into smaller chunks resolves the issue.

Changes
Add RPC_IO_CHUNK = 1 GiB.

Update send_data() and recv_data() to loop with chunked I/O.

Keep existing error logging; behavior is otherwise unchanged.

Why 1 GiB?
Empirically under the limits that triggered EINVAL on macOS.

Large enough to keep throughput good; easy to tune later if needed.

Testing
macOS (Metal): Previously failing large-model RPC offload now completes. Inference runs.

macOS (non-Metal): Build + basic RPC transfer OK.

Linux/Ubuntu: Not tested yet. Relying on CI and maintainer validation. (Happy to test on request; I can also try Docker later.)

Known quirk (non-blocking)
I still see an occasional non-fatal recv: Invalid argument before the big tensor transfer starts, but the run proceeds and finishes. I suspect a minor size-field mismatch during early handshake. If useful, I can follow up with a tiny patch that always serializes message lengths as uint64_t on the wire.

Performance / compatibility
No API changes.

Chunking is per-call looped send/recv; negligible overhead in my tests vs. “one big send”.

Should be safe across platforms.

Thanks!

Update: Verified cross-OS direction as well.

Additional testing

Client: Ubuntu 22.04 (glibc), clang/gcc build, commit 0e7aa4e
RPC server: macOS (Apple M3 Ultra, 512 GB RAM, Metal enabled)
llama.cpp build: Release
Model: DeepSeek-R1-0528-Q4_K_M (GGUF format, very large tensor size)
Command (client):
./build/bin/llama-server
-m /path/to/DeepSeek-R1-0528-Q4_K_M.gguf
--rpc :50052 -c 3000
Command (server):
./build/bin/rpc-server -p 50052 --host

Result

Large tensor offload succeeds end-to-end. Inference runs normally.
No client/server aborts observed.
Occasionally still see a non-fatal recv: Invalid argument before the first large transfer; run proceeds normally.

Notes

The chunked I/O change fixes the main crash.
If you prefer, I can follow up with a tiny patch to always serialize message length fields as uint64_t to silence the early handshake warning.

…over RPC (macOS & others). Fixes ggml-org#15055

rgerganov

Thank you for the bug report and the patch

rgerganov · 2025-08-10T10:44:54Z

ggml/src/ggml-rpc/ggml-rpc.cpp

@@ -32,6 +32,8 @@

 namespace fs = std::filesystem;

+static constexpr size_t RPC_IO_CHUNK = 1024ull * 1024ull * 1024ull; // 1 GiB


rename to MAX_CHUNK_SIZE

rgerganov · 2025-08-10T10:45:39Z

ggml/src/ggml-rpc/ggml-rpc.cpp

@@ -323,27 +325,43 @@ static std::shared_ptr<socket_t> create_server_socket(const char * host, int por
 static bool send_data(sockfd_t sockfd, const void * data, size_t size) {
    size_t bytes_sent = 0;
    while (bytes_sent < size) {
-        ssize_t n = send(sockfd, (const char *)data + bytes_sent, size - bytes_sent, 0);
+        size_t size_to_send = size - bytes_sent;


size_t size_to_send = std::max(size - bytes_sent, MAX_CHUNK_SIZE);

I mean std::min, not std::max, sorry

rgerganov · 2025-08-10T10:47:55Z

ggml/src/ggml-rpc/ggml-rpc.cpp

        if (n < 0) {
+#ifndef _WIN32


replace with GGML_LOG_ERROR

rgerganov · 2025-08-10T10:56:01Z

I still see an occasional non-fatal recv: Invalid argument before the big tensor transfer starts, but the run proceeds and finishes. I suspect a minor size-field mismatch during early handshake. If useful, I can follow up with a tiny patch that always serializes message lengths as uint64_t on the wire.

Please follow up on how to reproduce this, thanks

…, switch to GGML_LOG_ERROR, handle 0-length send/recv

Tak-RS · 2025-08-11T15:11:02Z

Thank you for the review and suggestions!

Applied the requested changes:

Renamed RPC_IO_CHUNK → MAX_CHUNK_SIZE
Switched error logging from perror/fprintf to GGML_LOG_ERROR
Corrected chunk size calculation to use std::min(size - bytes_sent, MAX_CHUNK_SIZE) (cap instead of max)
Same fix applied in recv_data()
Added a check to treat 0-length send/recv as an error

Please let me know if you’d like further changes.

rgerganov · 2025-08-11T15:46:13Z

ggml/src/ggml-rpc/ggml-rpc.cpp

+                           bytes_sent, size_to_send);
+            return false;
+        }
+        if (n == 0) {


why do we need this special case for n == 0? if zero bytes are sent, then we should retry again until we send everything or an error occurs (n < 0)

Got it — I’ll remove the special case for n == 0 in send_data()
and just retry in the loop as suggested.

Tak-RS · 2025-08-12T11:35:03Z

Thanks — removed the n == 0 special case in send_data(). recv() is unchanged as n == 0 correctly indicates a closed connection there.

rgerganov · 2025-08-12T15:16:39Z

ggml/src/ggml-rpc/ggml-rpc.cpp

            return false;
        }
-        bytes_sent += n;
+        bytes_sent += (size_t)n;  


remove the trailing whitespace

ggml-rpc: chunk send()/recv() to avoid EINVAL for very large tensors …

0e7aa4e

…over RPC (macOS & others). Fixes ggml-org#15055

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Aug 9, 2025

rgerganov requested changes Aug 10, 2025

View reviewed changes

rgerganov mentioned this pull request Aug 10, 2025

Running on iGPUs with Vulkan can be fun - llama.cpp OOM error debugging geerlingguy/beowulf-ai-cluster#2

Open

ggml-rpc: rename RPC_IO_CHUNK->MAX_CHUNK_SIZE, use std::min() for cap…

829d6b6

…, switch to GGML_LOG_ERROR, handle 0-length send/recv

Tak-RS force-pushed the fix/rpc-chunked-io branch from 514a5ff to 829d6b6 Compare August 11, 2025 14:49

rgerganov reviewed Aug 11, 2025

View reviewed changes

rpc: drop n==0 special case in send_data(); retry in loop per review

b44560e

rgerganov approved these changes Aug 12, 2025

View reviewed changes

rgerganov requested a review from slaren August 12, 2025 15:09

rgerganov reviewed Aug 12, 2025

View reviewed changes

rpc: remove trailing whitespace in send_data()

7c7c3d9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ggml-rpc: chunk send()/recv() to avoid EINVAL for very large tensors over RPC (macOS & others) #15188

ggml-rpc: chunk send()/recv() to avoid EINVAL for very large tensors over RPC (macOS & others) #15188

Tak-RS commented Aug 9, 2025 •

edited

Loading

Uh oh!

rgerganov left a comment

Uh oh!

rgerganov Aug 10, 2025

Uh oh!

rgerganov Aug 10, 2025

Uh oh!

rgerganov Aug 10, 2025

Uh oh!

rgerganov Aug 10, 2025

Uh oh!

rgerganov commented Aug 10, 2025

Uh oh!

Tak-RS commented Aug 11, 2025

Uh oh!

rgerganov Aug 11, 2025

Uh oh!

Tak-RS Aug 11, 2025

Uh oh!

Tak-RS commented Aug 12, 2025

Uh oh!

rgerganov Aug 12, 2025

Uh oh!

Uh oh!

		@@ -32,6 +32,8 @@

		namespace fs = std::filesystem;

		static constexpr size_t RPC_IO_CHUNK = 1024ull * 1024ull * 1024ull; // 1 GiB

		if (n < 0) {
		#ifndef _WIN32

ggml-rpc: chunk send()/recv() to avoid EINVAL for very large tensors over RPC (macOS & others) #15188

Are you sure you want to change the base?

ggml-rpc: chunk send()/recv() to avoid EINVAL for very large tensors over RPC (macOS & others) #15188

Conversation

Tak-RS commented Aug 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rgerganov left a comment

Choose a reason for hiding this comment

Uh oh!

rgerganov Aug 10, 2025

Choose a reason for hiding this comment

Uh oh!

rgerganov Aug 10, 2025

Choose a reason for hiding this comment

Uh oh!

rgerganov Aug 10, 2025

Choose a reason for hiding this comment

Uh oh!

rgerganov Aug 10, 2025

Choose a reason for hiding this comment

Uh oh!

rgerganov commented Aug 10, 2025

Uh oh!

Tak-RS commented Aug 11, 2025

Uh oh!

rgerganov Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

Tak-RS Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

Tak-RS commented Aug 12, 2025

Uh oh!

rgerganov Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Tak-RS commented Aug 9, 2025 •

edited

Loading