Skip to content

Conversation

@ochafik
Copy link
Collaborator

@ochafik ochafik commented May 30, 2025

Fixes #13867

(updated / simplified webui accordingly edit: reverted UI changes, will let @ngxson update it as follow up)

@github-actions github-actions bot added testing Everything test related examples server labels May 30, 2025
@ochafik ochafik marked this pull request as ready for review May 31, 2025 00:02
@ochafik ochafik requested a review from ngxson as a code owner May 31, 2025 00:02
@github-actions github-actions bot added the python python script changes label May 31, 2025
Copy link
Collaborator

@ngxson ngxson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I will redo the frontend changes in a more structured way, can you maybe revert these changes?

For now, I think it's worth adding a per-request reasoning format control

Comment on lines 58 to 68
// for reasoning model, we split the message into content and thought
// TODO: implement this as remark/rehype plugin in the future
const { content, thought, isThinking }: SplitMessage = useMemo(() => {
if (msg.content === null || msg.role !== 'assistant') {
return { content: msg.content };
}
let actualContent = '';
let thought = '';
let isThinking = false;
let thinkSplit = msg.content.split('<think>', 2);
actualContent += thinkSplit[0];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If user doesn't explicitly enable deepseek reasoning format (which is the default behavior), this code is there to make sure the thinking is always parsed.

Unless we can control thinking format per-request, I don't think we can remove this code block

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted UI changes


export type PendingMessage = Omit<Message, 'content'> & {
content: string | null;
reasoningContent: string | null;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tbh I'm not very confident adding this, because it is not future-proof.

If you look at how chatgpt structure their message, the content is an array instead of a string, and that is what I wanted to do in near future. It will allow different message parts, like for example a message can contains both reasoning, text response and a tool call

"(default: deepseek)",
[](common_params & params, const std::string & value) {
/**/ if (value == "deepseek") { params.reasoning_format = COMMON_REASONING_FORMAT_DEEPSEEK; }
else if (value == "deepseek-legacy") { params.reasoning_format = COMMON_REASONING_FORMAT_DEEPSEEK_LEGACY; }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add a help message to explain this mode?

@ochafik ochafik merged commit c9bbc77 into ggml-org:master Jun 2, 2025
48 checks passed
furyhawk pushed a commit to furyhawk/llama.cpp that referenced this pull request Jun 6, 2025
… diffs) (ggml-org#13933)

* server: update deepseek reasoning format (now in reasoning_content diffs), add legacy option for compat
* update unit/test_tool_call.py::test_thoughts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

examples python python script changes server testing Everything test related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Misc. bug: Reasoning content is not separated when streaming

3 participants