-
Notifications
You must be signed in to change notification settings - Fork 13.4k
bug-fix: handle broken UTF-8 sequences in common_chat_parse() #14937
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
4c678da
to
4a293ba
Compare
Sorry, accidentally included some unrelated python code hence the github-actions python tag there. |
Can you check if this is a problem if you disable that logging and letting msg propagate? If not, maybe only log when the msg is not partial. |
There is no crash. Instead the partial/invalid UTF-8 sequence is forwarded to the caller. E.g. |
…8 sequence at the end
Pushed a commit which switches to simply dropping the log message and always leaving |
I think you can remove |
GGML_ASSERT(is_partial == (msg.content != truncate_incomplete_utf8(msg.content))); asserts for me, so the two are not entirely equivalent it seems? Looking at uses of auto new_msg = common_chat_parse(
generated_text,
/* is_partial= */ stop != STOP_TYPE_EOS,
params.oaicompat_chat_syntax); |
Correct, but I think (I may be wrong) that a truncated sequence will always result in |
Anyway, what I'm saying is that the worst-case scenario is disabling the log, it's not that important, and I'd rather not have it than doing special handling for it. |
…due to unfinished UTF-8 sequences
Good point. Adding that entire function just to see if we want to debug-log is way overkill. |
When the model hits the token limit when generating multibyte UTF-8 content, the server crashes due to an assert failure
nlohmann::json_abi_v3_12_0::detail::type_error
:The trace:
Specifically the crash happens on this line:
(i.e. it is possible that this is a debug-only crash).
This code adds a utf8 truncator helper and truncates all unfinished sequences.Note: this breaks non-utf8 encoded strings. If llama.cpp allows e.g. utf-16 encoded strings as well, and there's no way to distinguish between these, then this approach needs to be tweaked to e.g. check if the string is utf-8 before performing truncation.Code now simply checks if
is_partial
is set, and skips the debug log if it is. This avoids the hassle of trying to determine if the string (which may or may not be a UTF-8 encoded string) is or isn't truncated mid-sequence.