Misc. bug: Metal running out of memory after PR #15966

### Name and Version

% build/bin/llama-cli --version
version: 6473 (9dcd200d5)
built with Apple clang version 17.0.0 (clang-1700.0.13.5) for arm64-apple-darwin24.6.0




### Operating systems

Mac

### Which llama.cpp modules do you know to be affected?

llama-server

### Command line

```shell
./llama-server -m ~/qwen3/Qwen_Qwen3-235B-A22B-Instruct-2507-IQ3_XS-00001-of-00003.gguf --temp 0.6 --no-webui -t 6 --host 0.0.0.0 --port 8081
```

### Problem description & steps to reproduce

System configuration: Mac Studio M4 Max 128GB unified memory. OS: 15.7.1

I was able to run Qwen3-235B-A22B-Instruct-2507 IQ3 quantised model on the machine without issue prior to PR [#15966](https://github.com/ggml-org/llama.cpp/pull/15966).

After the removal of memory pool, I am hitting out of memory error

`ggml_backend_metal_synchronize: error: command buffer 0 failed with status 5
error: Insufficient Memory (00000008:kIOGPUCommandBufferCallbackErrorOutOfMemory)
` 

Not sure this could be considered as a bug. Perhaps some workaround will be nice.

### First Bad Commit

commit 9dcd200d57bc6f05a59a6a8df361d5d183af4124 (HEAD, tag: b6473)

### Relevant log output

```shell

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Misc. bug: Metal running out of memory after PR #15966 #16646

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Misc. bug: Metal running out of memory after PR #15966 #16646

Description

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions