Skip to content

Conversation

@JohannesGaessler
Copy link
Collaborator

The goal of this PR is to add the ability to set an assistant prefill via the llama.cpp server webui. I am not knowledgeable at all when it comes to webdev so any help would be appreciated. The current state is that the prefill seems to work correctly for basic usage.

Current issues:

  • If the server is started with --no-prefill-assistant the webui doesn't work properly. I think the correct behavior would not to use the prefill and to display this in the settings menu. However, the server does not expose this information.
  • If the server is running a reasoning model, the prefill does not seem to be used. I think the correct behavior would be to apply the prefill to the reasoning.
  • According to npm audit the packages should be updated due to vulnerabilities. Should this be done in this PR?

@Hoernchen
Copy link

Oh, this is just in time, I was about to open a feature request for this because there are quite a few models with imbued thinking like https://huggingface.co/TheDrummer/Gemma-3-R1-4B-v1-GGUF with usage notes like

Usage

You'll probably need to prefill <think> at the start of the assistant turn.
Since it's not a special token, you can get creative with the reasoning tags with modifications like <evil_think> or <creative_think>.

@ngxson
Copy link
Collaborator

ngxson commented Aug 29, 2025

FYI we are moving the frontend to a Svelte-based project, which should land very soon. This feature can be added in a future version (cc @allozaur if you already had a wishlist)

@JohannesGaessler
Copy link
Collaborator Author

Alright, then I'll hold off on investing more time into the current webui.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants