Skip to content

Conversation

kallewoof
Copy link
Contributor

@kallewoof kallewoof commented Jul 29, 2025

When the model hits the token limit when generating multibyte UTF-8 content, the server crashes due to an assert failure nlohmann::json_abi_v3_12_0::detail::type_error:

terminate called after throwing an instance of 'nlohmann::json_abi_v3_12_0::detail::type_error'
  what():  [json.exception.type_error.316] incomplete UTF-8 string; last byte: 0x99

Thread 1 "llama-server" received signal SIGABRT, Aborted.

The trace:

#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140736686145536) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140736686145536) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140736686145536, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007fffef242476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007fffef2287f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007fffef6a2b9e in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007fffef6ae20c in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x00007fffef6ae277 in std::terminate() () from /lib/x86_64-linux-gnu/libstdc++.so.6
#8  0x00007fffef6ae4d8 in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6
#9  0x0000555555641997 in nlohmann::json_abi_v3_12_0::detail::serializer<nlohmann::json_abi_v3_12_0::basic_json<nlohmann::json_abi_v3_12_0::ordered_map, std::vector, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, unsigned long, double, std::allocator, nlohmann::json_abi_v3_12_0::adl_serializer, std::vector<unsigned char, std::allocator<unsigned char> >, void> >::dump_escaped (this=0x7fffffff8dd0, 
    s="ぶらぶらぶら", <incomplete sequence \351\231>, ensure_ascii=false)
    at /home/me/workspace/llama.cpp/common/../vendor/nlohmann/json.hpp:19326
#10 0x000055555561f0f9 in nlohmann::json_abi_v3_12_0::detail::serializer<nlohmann::json_abi_v3_12_0::basic_json<nlohmann::json_abi_v3_12_0::ordered_map, std::vector, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, unsigned long, double, std::allocator, nlohmann::json_abi_v3_12_0::adl_serializer, std::vector<unsigned char, std::allocator<unsigned char> >, void> >::dump (this=0x7fffffff8dd0, val=..., pretty_print=false, ensure_ascii=false, indent_step=0, current_indent=0)
    at /home/me/workspace/llama.cpp/common/../vendor/nlohmann/json.hpp:18971
#11 0x000055555561ec3e in nlohmann::json_abi_v3_12_0::detail::serializer<nlohmann::json_abi_v3_12_0::basic_json<nlohmann::json_abi_v3_12_0::ordered_map, std::vector, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, unsigned long, double, std::allocator, nlohmann::json_abi_v3_12_0::adl_serializer, std::vector<unsigned char, std::allocator<unsigned char> >, void> >::dump (this=0x7fffffff8dd0, val=..., pretty_print=false, ensure_ascii=false, indent_step=0, current_indent=0)
    at /home/me/workspace/llama.cpp/common/../vendor/nlohmann/json.hpp:18901
#12 0x00005555555fc8af in nlohmann::json_abi_v3_12_0::basic_json<nlohmann::json_abi_v3_12_0::ordered_map, std::vector, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, unsigned long, double, std::allocator, nlohmann::json_abi_v3_12_0::adl_serializer, std::vector<unsigned char, std::allocator<unsigned char> >, void>::dump (this=0x55556ea26240, indent=-1, indent_char=32 ' ', ensure_ascii=false, error_handler=nlohmann::json_abi_v3_12_0::detail::error_handler_t::strict)
    at /home/me/workspace/llama.cpp/common/../vendor/nlohmann/json.hpp:21327
#13 0x0000555555759656 in common_chat_parse (input="ぶらぶらぶら", <incomplete sequence \351\231>, is_partial=true, 

Specifically the crash happens on this line:

LOG_DBG("Parsed message: %s\n", common_chat_msgs_to_json_oaicompat<json>({msg}).at(0).dump().c_str());

(i.e. it is possible that this is a debug-only crash).

This code adds a utf8 truncator helper and truncates all unfinished sequences.

Note: this breaks non-utf8 encoded strings. If llama.cpp allows e.g. utf-16 encoded strings as well, and there's no way to distinguish between these, then this approach needs to be tweaked to e.g. check if the string is utf-8 before performing truncation.

Code now simply checks if is_partial is set, and skips the debug log if it is. This avoids the hassle of trying to determine if the string (which may or may not be a UTF-8 encoded string) is or isn't truncated mid-sequence.

@github-actions github-actions bot added the python python script changes label Jul 29, 2025
@kallewoof kallewoof force-pushed the 202507-broken-utf8 branch from 4c678da to 4a293ba Compare July 29, 2025 07:46
@kallewoof
Copy link
Contributor Author

Sorry, accidentally included some unrelated python code hence the github-actions python tag there.

@CISC
Copy link
Collaborator

CISC commented Jul 29, 2025

Can you check if this is a problem if you disable that logging and letting msg propagate? If not, maybe only log when the msg is not partial.

@kallewoof
Copy link
Contributor Author

Can you check if this is a problem if you disable that logging and letting msg propagate? If not, maybe only log when the msg is not partial.

There is no crash. Instead the partial/invalid UTF-8 sequence is forwarded to the caller. E.g. "ぶらぶら�".
I guess we could skip the log and return the invalid UTF-8 as is when we think it's broken. For non-UTF8 the worst thing is a lack of debug log messages.

@kallewoof
Copy link
Contributor Author

Pushed a commit which switches to simply dropping the log message and always leaving msg alone. This removes the risk of non-UTF-8 corruption.

@CISC
Copy link
Collaborator

CISC commented Jul 29, 2025

I think you can remove truncate_incomplete_utf8 and just check is_partial.

@kallewoof
Copy link
Contributor Author

kallewoof commented Jul 29, 2025

GGML_ASSERT(is_partial == (msg.content != truncate_incomplete_utf8(msg.content)));

asserts for me, so the two are not entirely equivalent it seems?

Looking at uses of is_partial also do not seem related to unfinished UTF-8 sequences:

        auto new_msg = common_chat_parse(
            generated_text,
            /* is_partial= */ stop != STOP_TYPE_EOS,
            params.oaicompat_chat_syntax);

@CISC
Copy link
Collaborator

CISC commented Jul 29, 2025

the two are not entirely equivalent it seems?

Correct, but I think (I may be wrong) that a truncated sequence will always result in is_partial being true.

@CISC
Copy link
Collaborator

CISC commented Jul 29, 2025

Anyway, what I'm saying is that the worst-case scenario is disabling the log, it's not that important, and I'd rather not have it than doing special handling for it.

@kallewoof
Copy link
Contributor Author

Good point. Adding that entire function just to see if we want to debug-log is way overkill.

@CISC CISC merged commit 1a67fcc into ggml-org:master Jul 29, 2025
47 checks passed
@kallewoof kallewoof deleted the 202507-broken-utf8 branch July 30, 2025 11:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

python python script changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants