Skip to content

Misc. bug: Metal running out of memory after PR #15966 #16646

@kunle12

Description

@kunle12

Name and Version

% build/bin/llama-cli --version
version: 6473 (9dcd200)
built with Apple clang version 17.0.0 (clang-1700.0.13.5) for arm64-apple-darwin24.6.0

Operating systems

Mac

Which llama.cpp modules do you know to be affected?

llama-server

Command line

./llama-server -m ~/qwen3/Qwen_Qwen3-235B-A22B-Instruct-2507-IQ3_XS-00001-of-00003.gguf --temp 0.6 --no-webui -t 6 --host 0.0.0.0 --port 8081

Problem description & steps to reproduce

System configuration: Mac Studio M4 Max 128GB unified memory. OS: 15.7.1

I was able to run Qwen3-235B-A22B-Instruct-2507 IQ3 quantised model on the machine without issue prior to PR #15966.

After the removal of memory pool, I am hitting out of memory error

ggml_backend_metal_synchronize: error: command buffer 0 failed with status 5 error: Insufficient Memory (00000008:kIOGPUCommandBufferCallbackErrorOutOfMemory)

Not sure this could be considered as a bug. Perhaps some workaround will be nice.

First Bad Commit

commit 9dcd200 (HEAD, tag: b6473)

Relevant log output

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions