UPSTREAM PR #17116: rpc : fix alloc size logic #145

DajanaV · 2025-11-09T11:33:23Z

fix #16657
ref ggml-org/llama.cpp#16276 (review)

This fixes the RPC inference when Metal backend is involved.

Testing:

# server
make -j && ./bin/rpc-server

# cli
make -j && ./bin/llama-cli -m ../models/gemma-3-4b-it/ggml-model-f16.gguf --rpc localhost:50052 -ngl 99 --no-mmap -no-cnv -p "Hello" --top-k 1 -n 32 -fa on

TODO:

Check performance imapct
Cache the responses to avoid extra RPC calls?

rpc : fix alloc size logic

e118dc9

DajanaV had a problem deploying to PROD__AL_DEMO November 9, 2025 11:33 — with GitHub Actions Failure

DajanaV force-pushed the main branch 28 times, most recently from 930eefd to db9060f Compare November 12, 2025 23:09

DajanaV force-pushed the main branch 7 times, most recently from 24733fb to 4b4bb7c Compare November 13, 2025 12:15

DajanaV closed this Nov 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

UPSTREAM PR #17116: rpc : fix alloc size logic #145

UPSTREAM PR #17116: rpc : fix alloc size logic #145

Uh oh!

DajanaV commented Nov 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

UPSTREAM PR #17116: rpc : fix alloc size logic #145

UPSTREAM PR #17116: rpc : fix alloc size logic #145

Uh oh!

Conversation

DajanaV commented Nov 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants