-
Notifications
You must be signed in to change notification settings - Fork 13.4k
Closed
Labels
Apple Metalhttps://en.wikipedia.org/wiki/Metal_(API)https://en.wikipedia.org/wiki/Metal_(API)bug-unconfirmed
Description
Name and Version
% build/bin/llama-cli --version
version: 6473 (9dcd200)
built with Apple clang version 17.0.0 (clang-1700.0.13.5) for arm64-apple-darwin24.6.0
Operating systems
Mac
Which llama.cpp modules do you know to be affected?
llama-server
Command line
./llama-server -m ~/qwen3/Qwen_Qwen3-235B-A22B-Instruct-2507-IQ3_XS-00001-of-00003.gguf --temp 0.6 --no-webui -t 6 --host 0.0.0.0 --port 8081
Problem description & steps to reproduce
System configuration: Mac Studio M4 Max 128GB unified memory. OS: 15.7.1
I was able to run Qwen3-235B-A22B-Instruct-2507 IQ3 quantised model on the machine without issue prior to PR #15966.
After the removal of memory pool, I am hitting out of memory error
ggml_backend_metal_synchronize: error: command buffer 0 failed with status 5 error: Insufficient Memory (00000008:kIOGPUCommandBufferCallbackErrorOutOfMemory)
Not sure this could be considered as a bug. Perhaps some workaround will be nice.
First Bad Commit
commit 9dcd200 (HEAD, tag: b6473)
Relevant log output
Metadata
Metadata
Assignees
Labels
Apple Metalhttps://en.wikipedia.org/wiki/Metal_(API)https://en.wikipedia.org/wiki/Metal_(API)bug-unconfirmed