Skip to content

Eval bug: llama.cpp/ggml/src/ggml-backend.cpp:750: pre-allocated tensor (cache_k_l32 (view) (copy of cache_k_l32 (view))) in a buffer (Vulkan0) that cannot run the operation (CPY) #13684

@wizardeur

Description

@wizardeur

Name and Version

$ sources/llama.cpp/build/bin/llama-server --version
version: 5435 (a4090d1)
built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu

Operating systems

Linux

GGML backends

Vulkan

Hardware

Intel(R) Core(TM) Ultra 5 245KF + Radeon RX 7900 XTX, gfx1100 (0x1100)

Models

gemma-3-27b-it-qat-UD-Q4_K_XL.gguf + gemma-3-27b-it-qat-GGUF/mmproj-F16.gguf

Problem description & steps to reproduce

Built with (corrected command):
cmake -S . -B build -DGGML_VULKAN=1 -DCMAKE_BUILD_TYPE=Release && cmake --build build -- -j 16

Command used to start:
sources/llama.cpp/build/bin/llama-server --port 9001 -c 65536 -ctv q8_0 -ctk q8_0 --no-warmup -ngl 99 -fa -m models/unsloth/gemma-3-27b-it-qat-GGUF/gemma-3-27b-it-qat-UD-Q4_K_XL.gguf --mmproj models/unsloth/gemma-3-27b-it-qat-GGUF/mmproj-F16.gguf --jinja

This is the first time I used this model. I accessed the API via openwebui hosted in a docker container. Normal text only chat, nothing special. I can share the chat contents privately - it crashes every time when I try to regenerate the last message. I recompiled with -DCMAKE_BUILD_TYPE=Debug and it was reproducible, so the full debug log is attached.

The memory utilization is 22052M our of 24560M, nothing else is using the GPU.

Error looks similar to #12045

llama.cpp-debug.log

First Bad Commit

No response

Relevant log output

slot update_slots: id  0 | task 0 | kv cache rm [2048, end)
slot update_slots: id  0 | task 0 | prompt processing progress, n_past = 4076, n_tokens = 2028, progress = 1.000000
slot update_slots: id  0 | task 0 | prompt done, n_past = 4076, n_tokens = 2028
/home/wizz/sources/llama.cpp/ggml/src/ggml-backend.cpp:750: pre-allocated tensor (cache_k_l32 (view) (copy of cache_k_l32 (view))) in a buffer (Vulkan0) that cannot run the operation (CPY)
[New LWP 1580837]
[New LWP 1580836]
[New LWP 1580835]
[New LWP 1580834]
[New LWP 1580833]
[New LWP 1580832]
[New LWP 1580831]
[New LWP 1580830]
[New LWP 1580829]
[New LWP 1580828]
[New LWP 1580827]
[New LWP 1580826]
[New LWP 1580825]
[New LWP 1580824]
[New LWP 1580823]
[New LWP 1580822]
[New LWP 1580821]
[New LWP 1580819]

This GDB supports auto-downloading debuginfo from the following URLs:
  <https://debuginfod.ubuntu.com>
Enable debuginfod for this session? (y or [n]) [answered N; input not from terminal]
Debuginfod has been disabled.
To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit.
warning: could not find '.gnu_debugaltlink' file for /lib/x86_64-linux-gnu/liblber.so.2
warning: could not find '.gnu_debugaltlink' file for /lib/x86_64-linux-gnu/libbrotlidec.so.1
warning: could not find '.gnu_debugaltlink' file for /lib/x86_64-linux-gnu/libbrotlicommon.so.1
warning: could not find '.gnu_debugaltlink' file for /usr/lib/x86_64-linux-gnu/libvulkan_radeon.so
warning: could not find '.gnu_debugaltlink' file for /lib/x86_64-linux-gnu/libtinfo.so.6
warning: could not find '.gnu_debugaltlink' file for /usr/lib/x86_64-linux-gnu/libvulkan_intel.so
warning: could not find '.gnu_debugaltlink' file for /usr/lib/x86_64-linux-gnu/libvulkan_intel_hasvk.so
warning: could not find '.gnu_debugaltlink' file for /usr/lib/x86_64-linux-gnu/libvulkan_virtio.so
warning: could not find '.gnu_debugaltlink' file for /usr/lib/x86_64-linux-gnu/libvulkan_lvp.so
warning: could not find '.gnu_debugaltlink' file for /usr/lib/x86_64-linux-gnu/libvulkan_nouveau.so
warning: could not find '.gnu_debugaltlink' file for /lib/x86_64-linux-gnu/libVkLayer_MESA_device_select.so
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007587595107e3 in __GI___wait4 (pid=1580930, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
warning: 30     ../sysdeps/unix/sysv/linux/wait4.c: No such file or directory
#0  0x00007587595107e3 in __GI___wait4 (pid=1580930, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
30      in ../sysdeps/unix/sysv/linux/wait4.c
#1  0x000075875b3c1682 in ggml_print_backtrace () at /home/wizz/sources/llama.cpp/ggml/src/ggml.c:194
194             waitpid(child_pid, NULL, 0);
#2  0x000075875b3c17b5 in ggml_abort (file=0x75875b4410a0 "/home/wizz/sources/llama.cpp/ggml/src/ggml-backend.cpp", line=750, fmt=0x75875b4414f0 "pre-allocated tensor (%s) in a buffer (%s) that cannot run the operation (%s)") at /home/wizz/sources/llama.cpp/ggml/src/ggml.c:215
215         ggml_print_backtrace();
#3  0x000075875b3da02f in ggml_backend_sched_backend_id_from_cur (sched=0x5fb9db9024d0, tensor=0x5fb9dbcebd00) at /home/wizz/sources/llama.cpp/ggml/src/ggml-backend.cpp:750
750             GGML_ABORT("pre-allocated tensor (%s) in a buffer (%s) that cannot run the operation (%s)", tensor->name, ggml_backend_buffer_name(buffer), ggml_op_name(tensor->op));
#4  0x000075875b3da9c4 in ggml_backend_sched_split_graph (sched=0x5fb9db9024d0, graph=0x5fb9dbad8d00) at /home/wizz/sources/llama.cpp/ggml/src/ggml-backend.cpp:899
899                 *node_backend_id = ggml_backend_sched_backend_id_from_cur(sched, node);
#5  0x000075875b3dde02 in ggml_backend_sched_alloc_graph (sched=0x5fb9db9024d0, graph=0x5fb9dbad8d00) at /home/wizz/sources/llama.cpp/ggml/src/ggml-backend.cpp:1565
1565        ggml_backend_sched_split_graph(sched, graph);
#6  0x000075875b9e6308 in llama_kv_cache_unified::update (this=0x5fb9db909230, lctx=...) at /home/wizz/sources/llama.cpp/src/llama-kv-cache.cpp:427
427                 ggml_backend_sched_alloc_graph(sched, gf);
#7  0x000075875b9eb5ee in llama_kv_cache_unified_iswa::update (this=0x5fb9db9071e0, lctx=...) at /home/wizz/sources/llama.cpp/src/llama-kv-cache.cpp:1761
1761        res = res & kv_swa ->update(lctx);
#8  0x000075875b976c4d in llama_context::kv_self_update (this=0x5fb9d4dfc2b0) at /home/wizz/sources/llama.cpp/src/llama-context.cpp:457
457         need_reserve = kv_self->update(*this);
#9  0x000075875b97931e in llama_context::decode (this=0x5fb9d4dfc2b0, inp_batch=...) at /home/wizz/sources/llama.cpp/src/llama-context.cpp:926
926         kv_self_update();
#10 0x000075875b97fb11 in llama_decode (ctx=0x5fb9d4dfc2b0, batch=...) at /home/wizz/sources/llama.cpp/src/llama-context.cpp:2545
2545        int ret = ctx->decode(batch);
#11 0x00005fb99b780b7b in server_context::update_slots (this=0x7ffdc33254d0) at /home/wizz/sources/llama.cpp/tools/server/server.cpp:3358
3358                    ret = llama_decode(ctx, batch_view);
#12 0x00005fb99b725035 in operator() (__closure=0x7ffdc3326aa8) at /home/wizz/sources/llama.cpp/tools/server/server.cpp:4849
4849            ctx_server.update_slots();
#13 0x00005fb99b7335c4 in std::__invoke_impl<void, main(int, char**)::<lambda()>&>(std::__invoke_other, struct {...} &) (__f=...) at /usr/include/c++/13/bits/invoke.h:61
61          { return std::forward<_Fn>(__f)(std::forward<_Args>(__args)...); }
#14 0x00005fb99b731494 in std::__invoke_r<void, main(int, char**)::<lambda()>&>(struct {...} &) (__fn=...) at /usr/include/c++/13/bits/invoke.h:111
111             std::__invoke_impl<__type>(__tag{}, std::forward<_Callable>(__fn),
#15 0x00005fb99b72d6bb in std::_Function_handler<void(), main(int, char**)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/13/bits/std_function.h:290
290             return std::__invoke_r<_Res>(*_Base::_M_get_pointer(__functor),
#16 0x00005fb99b786d36 in std::function<void ()>::operator()() const (this=0x7ffdc3326aa8) at /usr/include/c++/13/bits/std_function.h:591
591             return _M_invoker(_M_functor, std::forward<_ArgTypes>(__args)...);
#17 0x00005fb99b772b83 in server_queue::start_loop (this=0x7ffdc3326988) at /home/wizz/sources/llama.cpp/tools/server/server.cpp:1681
1681                callback_update_slots();
#18 0x00005fb99b7278a0 in main (argc=22, argv=0x7ffdc3326d48) at /home/wizz/sources/llama.cpp/tools/server/server.cpp:4874
4874        ctx_server.queue_tasks.start_loop();
[Inferior 1 (process 1580786) detached]

Metadata

Metadata

Assignees

No one assigned

    Labels

    VulkanIssues specific to the Vulkan backendbugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions