Skip to content

Eval bug: cpu: try reducing --n-gpu-layers if you're running out of VRAM #16955

@grigio

Description

@grigio

Name and Version

app@4970e27c8b47:$ ./llama-server --version
load_backend: loaded CPU backend from /app/libggml-cpu-icelake.so
version: 6922 (7db35a7)
built with cc (Ubuntu 11.4.0-1ubuntu1
22.04.2) 11.4.0 for x86_64-linux-gnu

Operating systems

Linux

GGML backends

CPU

Hardware

amd ryzen 7 7700

Models

Qwen3-30B-A3B-Q4_K_M.gguf

Problem description & steps to reproduce

it doesn't start try -ngl 0 , -ngl 99, in the past it worked on CPU, also other models have ths issue

llama-swap  | main: loading model
llama-swap  | srv    load_model: loading model '/models/Qwen3-30B-A3B-Q4_K_M.gguf'
llama-swap  | gguf_init_from_file: failed to open GGUF file '/models/Qwen3-30B-A3B-Q4_K_M.gguf'
llama-swap  | llama_model_load: error loading model: llama_model_loader: failed to load model from /models/Qwen3-30B-A3B-Q4_K_M.gguf
llama-swap  | llama_model_load_from_file_impl: failed to load model
llama-swap  | common_init_from_params: failed to load model '/models/Qwen3-30B-A3B-Q4_K_M.gguf', try reducing --n-gpu-layers if you're running out of VRAM
llama-swap  | srv    load_model: failed to load model, '/models/Qwen3-30B-A3B-Q4_K_M.gguf'
llama-swap  | srv    operator(): operator(): cleaning up before exit...
llama-swap  | main: exiting due to model loading error
llama-swap  | [WARN] <Qwen3-30B-A3B-Q4_K_M> ExitError >> exit status 1, exit code: 1
llama-swap  | [INFO] <Qwen3-30B-A3B-Q4_K_M> process exited but not StateStopping, current state: starting

First Bad Commit

No response

Relevant log output

llama-swap  | main: loading model
llama-swap  | srv    load_model: loading model '/models/Qwen3-30B-A3B-Q4_K_M.gguf'
llama-swap  | gguf_init_from_file: failed to open GGUF file '/models/Qwen3-30B-A3B-Q4_K_M.gguf'
llama-swap  | llama_model_load: error loading model: llama_model_loader: failed to load model from /models/Qwen3-30B-A3B-Q4_K_M.gguf
llama-swap  | llama_model_load_from_file_impl: failed to load model
llama-swap  | common_init_from_params: failed to load model '/models/Qwen3-30B-A3B-Q4_K_M.gguf', try reducing --n-gpu-layers if you're running out of VRAM
llama-swap  | srv    load_model: failed to load model, '/models/Qwen3-30B-A3B-Q4_K_M.gguf'
llama-swap  | srv    operator(): operator(): cleaning up before exit...
llama-swap  | main: exiting due to model loading error
llama-swap  | [WARN] <Qwen3-30B-A3B-Q4_K_M> ExitError >> exit status 1, exit code: 1
llama-swap  | [INFO] <Qwen3-30B-A3B-Q4_K_M> process exited but not StateStopping, current state: starting

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions