Skip to content

Eval bug: Granite 4.0 Invalid diff: '<|tool_call|>["1025202362"]' not found at start of '<|tool_call|>["1350490027"]' #15713

@ExtReMLapin

Description

@ExtReMLapin

Name and Version

Device 0: NVIDIA A100-PCIE-40GB, compute capability 8.0, VMM: yes
register_backend: registered backend CUDA (1 devices)
register_device: registered device CUDA0 (NVIDIA A100-PCIE-40GB)
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (AMD EPYC 7282 16-Core Processor)
load_backend: failed to find ggml_backend_init in /home/pierre/idextend/llama.cpp/build/bin/libggml-cuda.so
load_backend: failed to find ggml_backend_init in /home/pierre/idextend/llama.cpp/build/bin/libggml-cpu.so
version: 6366 (7a8e3c4)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04.2) 11.4.0 for x86_64-linux-gnu

built on https://github.com/ExtReMLapin/llama.cpp/tree/master2

Which is pretty much head + streaming pp progress + fixes for hermex2

Operating systems

Linux

GGML backends

CUDA

Hardware

3x A100 40GB

Models

bm-granite_granite-4.0-tiny-preview-Q6_K.gguf

from https://huggingface.co/bartowski/ibm-granite_granite-4.0-tiny-preview-GGUF

Problem description & steps to reproduce

When I run a tool call with tool_choice at "required" it crashes. (also crashes in AUTO mode)

First Bad Commit

No response

Relevant log output

slot update_slots: id  0 | task 10 | prompt processing progress, n_past = 38912, n_tokens = 2048, progress = 0.903522
slot update_slots: id  0 | task 10 | kv cache rm [38912, end)
slot update_slots: id  0 | task 10 | prompt processing progress, n_past = 40960, n_tokens = 2048, progress = 0.951076
slot update_slots: id  0 | task 10 | kv cache rm [40960, end)
slot update_slots: id  0 | task 10 | prompt processing progress, n_past = 43008, n_tokens = 2048, progress = 0.998630
slot update_slots: id  0 | task 10 | kv cache rm [43008, end)
slot update_slots: id  0 | task 10 | prompt processing progress, n_past = 43067, n_tokens = 59, progress = 1.000000
slot update_slots: id  0 | task 10 | prompt done, n_past = 43067, n_tokens = 59
[New LWP 2094729]
[New LWP 2094733]
[New LWP 2094734]
[New LWP 2094735]
[New LWP 2094736]
[New LWP 2094737]
[New LWP 2094738]
[New LWP 2094739]
[New LWP 2094740]
[New LWP 2094741]
[New LWP 2094742]
[New LWP 2094743]
[New LWP 2094744]
[New LWP 2094745]
[New LWP 2094746]
[New LWP 2094747]
[New LWP 2094748]
[New LWP 2094749]
[New LWP 2094750]
[New LWP 2094751]
[New LWP 2094752]
[New LWP 2094753]
[New LWP 2094754]
[New LWP 2094755]
[New LWP 2094756]
[New LWP 2094757]
[New LWP 2094758]
[New LWP 2094759]
[New LWP 2094760]
[New LWP 2094761]
[New LWP 2094762]
[New LWP 2094763]
[New LWP 2094764]
[New LWP 2094765]
[New LWP 2094766]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007ff1a165742f in __GI___wait4 (pid=2095296, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
30      ../sysdeps/unix/sysv/linux/wait4.c: No such file or directory.
#0  0x00007ff1a165742f in __GI___wait4 (pid=2095296, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
30      in ../sysdeps/unix/sysv/linux/wait4.c
#1  0x00007ff1a1b43371 in ggml_print_backtrace () at /home/xxxxxx/idextend/llama.cpp/ggml/src/ggml.c:196
196             waitpid(child_pid, NULL, 0);
#2  0x00007ff1a1b58443 in ggml_uncaught_exception () at /home/xxxxxx/idextend/llama.cpp/ggml/src/ggml.cpp:9
9           ggml_print_backtrace();
#3  0x00007ff1a194b20c in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007ff1a194b277 in std::terminate() () from /lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007ff1a194b4d8 in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00005591def611cf in string_diff (last="<|tool_call|>[\"1025202362\"]", current="<|tool_call|>[\"1350490027\"]") at /home/xxxxxx/idextend/llama.cpp/common/chat.cpp:41
41              throw std::runtime_error("Invalid diff: '" + last + "' not found at start of '" + current + "'");
#7  0x00005591def61e01 in common_chat_msg_diff::compute_diffs (previous_msg=..., new_msg=...) at /home/xxxxxx/idextend/llama.cpp/common/chat.cpp:93
93              diff.content_delta = string_diff(previous_msg.content, new_msg.content);
#8  0x00005591dee22981 in server_slot::update_chat_msg (this=0x5591e5d33b60, diffs=std::vector of length 0, capacity 0) at /home/xxxxxx/idextend/llama.cpp/tools/server/server.cpp:1603
1603                diffs = common_chat_msg_diff::compute_diffs(previous_msg, new_msg.empty() ? previous_msg : new_msg);
#9  0x00005591dee2df4e in server_context::send_partial_response (this=0x7ffd40faafb0, slot=..., tkn=...) at /home/xxxxxx/idextend/llama.cpp/tools/server/server.cpp:2690
2690            slot.update_chat_msg(res->oaicompat_msg_diffs);
#10 0x00005591dee2cc67 in server_context::process_token (this=0x7ffd40faafb0, result=..., slot=...) at /home/xxxxxx/idextend/llama.cpp/tools/server/server.cpp:2487
2487                    send_partial_response(slot, result);
#11 0x00005591dee34f0a in server_context::update_slots (this=0x7ffd40faafb0) at /home/xxxxxx/idextend/llama.cpp/tools/server/server.cpp:3818
3818                    if (!process_token(result, slot)) {
#12 0x00005591deddcab5 in operator() (__closure=0x7ffd40fac6a0) at /home/xxxxxx/idextend/llama.cpp/tools/server/server.cpp:5231
5231            ctx_server.update_slots();
#13 0x00005591dedead8a in std::__invoke_impl<void, main(int, char**)::<lambda()>&>(std::__invoke_other, struct {...} &) (__f=...) at /usr/include/c++/11/bits/invoke.h:61
61          { return std::forward<_Fn>(__f)(std::forward<_Args>(__args)...); }
#14 0x00005591dede8cba in std::__invoke_r<void, main(int, char**)::<lambda()>&>(struct {...} &) (__fn=...) at /usr/include/c++/11/bits/invoke.h:111
111             std::__invoke_impl<__type>(__tag{}, std::forward<_Callable>(__fn),
#15 0x00005591dede4e8b in std::_Function_handler<void(), main(int, char**)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/11/bits/std_function.h:290
290             return std::__invoke_r<_Res>(*_Base::_M_get_pointer(__functor),
#16 0x00005591dee3a27e in std::function<void ()>::operator()() const (this=0x7ffd40fac6a0) at /usr/include/c++/11/bits/std_function.h:590
590             return _M_invoker(_M_functor, std::forward<_ArgTypes>(__args)...);
#17 0x00005591dee25541 in server_queue::start_loop (this=0x7ffd40fac580) at /home/xxxxxx/idextend/llama.cpp/tools/server/server.cpp:1897
1897                callback_update_slots();
#18 0x00005591deddf225 in main (argc=25, argv=0x7ffd40fac978) at /home/xxxxxx/idextend/llama.cpp/tools/server/server.cpp:5258
5258        ctx_server.queue_tasks.start_loop();
[Inferior 1 (process 2094728) detached]
terminate called after throwing an instance of 'std::runtime_error'
  what():  Invalid diff: '<|tool_call|>["1025202362"]' not found at start of '<|tool_call|>["1350490027"]'
Aborted (core dumped)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions