Eval bug: cpu: try reducing --n-gpu-layers if you're running out of VRAM

### Name and Version

app@4970e27c8b47:~$ ./llama-server --version
load_backend: loaded CPU backend from /app/libggml-cpu-icelake.so
version: 6922 (7db35a795)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04.2) 11.4.0 for x86_64-linux-gnu


### Operating systems

Linux

### GGML backends

CPU

### Hardware

amd ryzen 7 7700

### Models

Qwen3-30B-A3B-Q4_K_M.gguf

### Problem description & steps to reproduce

it doesn't start try -ngl 0 , -ngl 99, in the past it worked on CPU, also other models have ths issue

```
llama-swap  | main: loading model
llama-swap  | srv    load_model: loading model '/models/Qwen3-30B-A3B-Q4_K_M.gguf'
llama-swap  | gguf_init_from_file: failed to open GGUF file '/models/Qwen3-30B-A3B-Q4_K_M.gguf'
llama-swap  | llama_model_load: error loading model: llama_model_loader: failed to load model from /models/Qwen3-30B-A3B-Q4_K_M.gguf
llama-swap  | llama_model_load_from_file_impl: failed to load model
llama-swap  | common_init_from_params: failed to load model '/models/Qwen3-30B-A3B-Q4_K_M.gguf', try reducing --n-gpu-layers if you're running out of VRAM
llama-swap  | srv    load_model: failed to load model, '/models/Qwen3-30B-A3B-Q4_K_M.gguf'
llama-swap  | srv    operator(): operator(): cleaning up before exit...
llama-swap  | main: exiting due to model loading error
llama-swap  | [WARN] <Qwen3-30B-A3B-Q4_K_M> ExitError >> exit status 1, exit code: 1
llama-swap  | [INFO] <Qwen3-30B-A3B-Q4_K_M> process exited but not StateStopping, current state: starting

```

### First Bad Commit

_No response_

### Relevant log output

```shell
llama-swap  | main: loading model
llama-swap  | srv    load_model: loading model '/models/Qwen3-30B-A3B-Q4_K_M.gguf'
llama-swap  | gguf_init_from_file: failed to open GGUF file '/models/Qwen3-30B-A3B-Q4_K_M.gguf'
llama-swap  | llama_model_load: error loading model: llama_model_loader: failed to load model from /models/Qwen3-30B-A3B-Q4_K_M.gguf
llama-swap  | llama_model_load_from_file_impl: failed to load model
llama-swap  | common_init_from_params: failed to load model '/models/Qwen3-30B-A3B-Q4_K_M.gguf', try reducing --n-gpu-layers if you're running out of VRAM
llama-swap  | srv    load_model: failed to load model, '/models/Qwen3-30B-A3B-Q4_K_M.gguf'
llama-swap  | srv    operator(): operator(): cleaning up before exit...
llama-swap  | main: exiting due to model loading error
llama-swap  | [WARN] <Qwen3-30B-A3B-Q4_K_M> ExitError >> exit status 1, exit code: 1
llama-swap  | [INFO] <Qwen3-30B-A3B-Q4_K_M> process exited but not StateStopping, current state: starting
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eval bug: cpu: try reducing --n-gpu-layers if you're running out of VRAM #16955

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eval bug: cpu: try reducing --n-gpu-layers if you're running out of VRAM #16955

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions