Skip to content

Saving KV cache (using /slots/3?action=save API endpoint) does not work for vision-enabled models #19466

@Lissanro

Description

@Lissanro

Name and Version

ggml_cuda_init: found 4 CUDA devices:
Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
Device 1: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
Device 2: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
Device 3: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
version: 7971 (5fa1c19)
built with GNU 14.2.0 for Linux x86_64

Operating systems

Linux

GGML backends

CUDA

Hardware

EPYC 7763 + 1 TB RAM + 4x3090 GPUs

Models

Kimi K2.5

Problem description & steps to reproduce

First, run the model like this (likely will be reproducible with very small vision-enabled models too, but for larger ones it is much more noticeable issue since when the model is big, prefilling 100K-200K tokens from scratch to return to a dialog or Roo Code project takes long time if cannot restore saved cache):

numactl --cpunodebind=0 --interleave=all /home/lissanro/pkgs/llama.cpp/build/bin/llama-server \
--model /mnt/neuro/models/Kimi-K2.5-Q4_X-VL.gguf \
--mmproj /mnt/neuro/models/Kimi-K2.5/mmproj-Kimi-K2.5-F16.gguf \
--fit on --fit-ctx 262144 -b 4096 -ub 4096 -fa on \
--threads 64 --host 0.0.0.0 --port 5000 \
--jinja \
--slot-save-path /var/cache/llama.cpp/k2.5 --cache-ram 131072 --fit-target 512 \
--min-p 0.01 --top-p 0.95 --temp 1.0 --top-k 100

Then try to save the cache:

curl --header "Content-Type: application/json" --request POST --data '{"filename":"cache.bin"}' "http://localhost:5000/slots/3?action=save"

It fails to save the cache. Please refer to the "Relevant log output" section for the exact error message. For K2.5 support I used #19127 but I think this issue is reproducible with any vision model, based on the error message.

First Bad Commit

No response

Relevant log output

{"error":{"code":501,"message":"This feature is not supported by multimodal","type":"not_supported_error"}}

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions