-
Notifications
You must be signed in to change notification settings - Fork 13.7k
webui: Add a "Continue" Action for Assistant Message #16971
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
allozaur
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ngxson @ggerganov lemme know if you think that this logic works for handling the "Continue" action for assistant messages.
@artyfacialintelagent fell free to test this out and give feedback!
|
Is this supposed to work correctly when pressing Continue after stopping a response while it is generating? I am testing with |
I've tested it for the edited assistant responses so far. I will take a close look at the stopped generation -> continue flow as well |
When using gpt-oss in Lm Studio the model generates a new response instead of continuing the previous text, this is because of the Harmony parser, uninstalling it resolves this and the model continues the generation successfully. |
f4c3aeb to
b8e4bb4
Compare
|
@ggerganov please check the demos i've attached to the PR description and also test this feature on your end. looking forward to your feedback! |
b8e4bb4 to
e0d03e2
Compare
4741f81 to
c7e23c7
Compare
Hm, I wonder why do it like this. We already have support on the server to continue the assistant message if it is the last one in the request #13174: llama.cpp/tools/server/utils.hpp Lines 729 to 751 in c7e23c7
The current approach often does not continue properly, as can be seen in the sample videos:
Using the assistant prefill functionality above would make this work correctly in all cases. |
|
Agree with @ggerganov , it's better to use the prefill assistant message from #13174 Just one thing to note though, I think most templates does not support formatting the reasoning content back to original, so probably that's the only case where it will break |
|
Thanks guys, I missed that! Will patch it and come back to you. |
|
I've updated the logic with 859e496 and i have tested with few models and only 1 (
|
|
For me, both Qwen3 and Gemma3 are able to complete successfully. For example, here is Gemma3 12B IT: webui-continue-0.mp4It's strange that it didn't work for you. Regarding gpt-oss - I think that "Continue" has to also send the reasoning in this case. Currently, it is discarded and I think it confuses the model. |
Should we then address the thinking models differently for now, at least from the WebUI perspective?
I will do some more testing with other instruct models and make sure all is working right. |
|
It's likely due to chat template, I suspect some chat templates (especially jinja) adds the generation prompt. Can you verify how the chat template looks like with |
If it's not too complicated, I'd say change the logic so that "Continue" includes the reasoning of the last assistant message for all reasoning models. |
The main issue is that some chat templates actively suppress the reasoning content from assistant messages, so I'm doubt if it will work cross all model. Actually I'm thinking about a more generic approach, we can implement a feature in the backend such that both the "raw" generated text (i.e. with I would say for now, we can put a warning in the webui to tell user that this feature is experimental and doesn't work cross all models. We can improve it later if it gets more usage. |
Gotcha, @ngxson, let's do that |
|
I don't want to interfere with you experts, I just want to share my insight, as I also struggled with gpt-oss on this issue, see the video. Of course, the implementation of Harmony might be different in llamacpp, but this is the only way to get the continue feature working for gpt-oss in LM Studio. If it is possible, rather than disabling the function for all thinking models, adding an option to disable the Harmony parser might be better. output.mp4 |
…g the conversation payload ending with assistant message
941ec7c to
d8f952d
Compare
|
I've added this setting and as for now we have "Continue" icon button rendered only for non-reasoning models.
Maybe we can tackle this with the next iteration of this feature..? Idk, @ngxson do you think it's worth still doing this as a part of this PR or we want to revisit this in the future? |
I'm a bit surprise that it doesn't work in LM Studio. IIRC LM Studio doesn't actually modify, but they parse it for displaying, while still keeping the original generated content under the hood. CC @mattjcly from LM Studio team (as this probably a bug) As I mentioned earlier in #16971 (comment) , we can preserve the raw content by introducing a new flag. But this is currently a low-prio task and we can do it later if more users need it |
@ngxson i guess this doesn't stop us from having that PR reviewed and eventually merged? |
Yes the current approach in this PR should be enough. Will give it a try a bit later. |
Sure, lemme know! |




Close #16097
Add Continue and Save features for chat messages
What's new
Continue button for assistant messages
Save button when editing user messages
Technical notes
Demos
ggml-org/gpt-oss-20b-GGUFdemo1.mp4
unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUFdemo2.mp4