RPC sync #193

saood06 · 2025-02-08T11:12:07Z

I grabbed all of the changes needed for llama.cpp/pull/11047 , which was ggml-org/llama.cpp#9912 and ggml-org/llama.cpp#9040

This compiles, but has not been tested yet.

Add more checks which prevent RPC server from crashing if invalid input is received from client

Co-authored-by: Diego Devesa <[email protected]>

Use structs for RPC request/response messages

ikawrakow · 2025-02-08T13:23:08Z

I never use RPC, have never looked into the RPC code, so I'll have to rely on you for self-review and testing.

saood06 · 2025-02-10T16:40:34Z

@jukofyork

I strongly suspect something funky is going on

There is, see this comment: #180 (comment)

This fork has much faster PP speeds, has Deepseek MLA support with a flag (-mla), this PR should allow RPC to work, and I'm working on porting the add option to override model tensor buffers.

This is something I've done for a while on my Windows builds due to the fact that on Windows long is not 8 bytes. On linux this changes nothing as both are 8 bytes there.

saood06 · 2025-02-27T23:11:54Z

This has been tested, and does not currently work. I'm not sure why as the errors I'm getting seem to have never been encountered by people on llama.cpp.

saood06 · 2025-02-27T23:14:23Z

ggml/src/ggml-rpc.cpp

+
+        rpc_msg_get_alloc_size_rsp response;
+        bool status = send_rpc_cmd(sock, RPC_CMD_GET_ALLOC_SIZE, &request, sizeof(request), &response, sizeof(response));
+        GGML_ASSERT(status);


The RPC client crashes here, which happens as the RPC server hits an issue.

saood06 · 2025-02-27T23:17:32Z

ggml/src/ggml-rpc.cpp

+    ggml_tensor * tensor = deserialize_tensor(ctx, &request.tensor);
+
+    if (tensor == nullptr) {
+        GGML_PRINT_DEBUG("Null tensor pointer passed to server get_alloc_size function.\n");


I'm fairly certain this is where the RPC server is crashing, although it doesn't print the message as I never ran with GGML_DEBUG on.

ubergarm · 2025-04-11T18:32:04Z

@saood06

I just came across another llama.cpp fork called prima.cpp which claims to have improved support for multi-device distributed inferencing.

I haven't tried it, just saw it on reddit today. Might be worth a shot given your GPU is in a different system than your big RAM box.

saood06 · 2025-04-12T04:39:37Z

@saood06

I just came across another llama.cpp fork called prima.cpp which claims to have improved support for multi-device distributed inferencing.

I haven't tried it, just saw it on reddit today. Might be worth a shot given your GPU is in a different system than your big RAM box.

Thanks for the link, it is interesting. I think it would work for dense models but not as well for MoE because as far as I can tell it doesn't handle -ot (this commit looks relevant) . I'd also need windows support which is on the roadmap (but I might see what the issue is by trying to build it on my machine, and see if I can fix it), and the GPU machine has to run windows (my big RAM box runs clear linux, and I have other servers that run FreeBSD and Proxmox).

saood06 · 2025-06-15T11:26:50Z

Closed as superseded by #480 / #506

rgerganov and others added 12 commits February 8, 2025 03:10

rpc : prevent crashes on invalid input (#9040)

a7a532d

Add more checks which prevent RPC server from crashing if invalid input is received from client

rpc : prevent crashes on invalid input (#9040)

96ec90c

Add more checks which prevent RPC server from crashing if invalid input is received from client

Added init tensor calling code

823cf1e

Added get_alloc_size forwarding

105b40c

Cleaned up and improved type/error handling.

14f2d8e

fix: remove trailing whitespaces.

a6e460d

Cleanup and use GGML error logging functions.

07c628f

Handle potentially dangerous edge cases.

d2dd8b4

Apply suggestions from code review

a0631d8

Co-authored-by: Diego Devesa <[email protected]>

Fix merge mistake

bd7bde0

rpc : refactor backend

0f9bdbb

Use structs for RPC request/response messages

rpc : refactor server

2a77cb9

Fixes

0219e22

saood06 added 3 commits February 25, 2025 00:25

Merge remote-tracking branch 'origin/main' into s6/rpc

f16ef77

Merge remote-tracking branch 'origin/main' into s6/rpc

c47ef20

Change long to long long

b01012e

This is something I've done for a while on my Windows builds due to the fact that on Windows long is not 8 bytes. On linux this changes nothing as both are 8 bytes there.

saood06 commented Feb 27, 2025

View reviewed changes

saood06 mentioned this pull request Jun 1, 2025

Rpc improvement #480

Merged

4 tasks

saood06 closed this Jun 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RPC sync #193

RPC sync #193

Uh oh!

saood06 commented Feb 8, 2025

Uh oh!

ikawrakow commented Feb 8, 2025

Uh oh!

saood06 commented Feb 10, 2025

Uh oh!

saood06 commented Feb 27, 2025

Uh oh!

saood06 Feb 27, 2025

Uh oh!

saood06 Feb 27, 2025

Uh oh!

ubergarm commented Apr 11, 2025

Uh oh!

saood06 commented Apr 12, 2025

Uh oh!

saood06 commented Jun 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

RPC sync #193

RPC sync #193

Uh oh!

Conversation

saood06 commented Feb 8, 2025

Uh oh!

ikawrakow commented Feb 8, 2025

Uh oh!

saood06 commented Feb 10, 2025

Uh oh!

saood06 commented Feb 27, 2025

Uh oh!

saood06 Feb 27, 2025

Choose a reason for hiding this comment

Uh oh!

saood06 Feb 27, 2025

Choose a reason for hiding this comment

Uh oh!

ubergarm commented Apr 11, 2025

Uh oh!

saood06 commented Apr 12, 2025

Uh oh!

saood06 commented Jun 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants