-
Notifications
You must be signed in to change notification settings - Fork 13.3k
extend server/public_simplechat with simple minded interactive browser-client side based toolcalling - base logic #16563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Enable streaming by default, to check the handshake before going on to change the code, given that havent looked into this for more than a year now and have been busy with totally different stuff. Also updated the user messages used for testing a bit
Define the meta that needs to be passed to the GenAi Engine. Define the logic that implements the tool call, if called. Implement the flow/structure such that a single tool calls implementation file can define multiple tool calls.
Make tooljs structure and flow more generic Add a simple_calculator tool/function call logic Add initial skeleton wrt the main tools.mjs file.
Changed latestResponse type to an object instead of a string. Inturn it contains entries for content, toolname and toolargs. Added a custom clear logic due to the same and used it to replace the previously simple assigning of empty string to latestResponse. For now in all places where latestReponse is used, I have replaced with latestReponse.content. Next need to handle identifying the field being streamed and inturn append to it. Also need to add logic to call tool, when tool_call triggered by genai.
Update response_extract_stream to check for which field is being currently streamed ie is it normal content or tool call func name or tool call func args and then return the field name and extracted value. Previously it was always assumed that only normal content will be returned. Currently it is assumed that the server will only stream one of the 3 supported fields at any time and not more than one of them at the same time. TODO: Have to also add logic to extract the reasoning field later, ie wrt gen ai models which give out their thinking. Have updated append_response to expect both the key and the value wrt the latestResponse object, which it will be manipualted. Previously it was always assumed that content is what will be got and inturn appended.
I was wrongly checking for finish_reason to be non null, before trying to extract the genai content/toolcalls, have fixed this oversight with the new flow in progress. I had added few debug logs to identify the above issue, need to remove them later. Note: given that debug logs are disabled by replacing the debug function during this program's initialisation, which I had forgotten about, I didnt get the debug messages and had to scratch my head a bit, before realising this and the other issue ;) Also either when I had originally implemented simplechat 1+ years back, or later due to changes on the server end, the streaming flow sends a initial null wrt the content, where it only sets the role. This was not handled in my flow on the client side, so a null was getting prepended to the chat messages/responses from the server. This has been fixed now in the new generic flow.
Make latestResponse into a new class based type instance wrt ai assistant response, which is what it represents. Move clearing, appending fields' values and getting assistant's response info (irrespective of a content or toolcall response) into this new class and inturn use the same.
Switch oneshot handler to use AssistantResponse, inturn currenlty only handle the normal content in the response. TODO: If any tool_calls in the oneshot response, it is currently not handled. Inturn switch the generic/toplevel handle response logic to use AssistantResponse class, given that both oneshot and the multipart/streaming flows use/return it. Inturn add trimmedContent member to AssistantResponse class and make the generic handle response logic to save the trimmed content into this. Update users of trimmed to work with this structure.
As there could be failure wrt getting the response from the ai server some where in between a long response spread over multiple parts, the logic uses the latestResponse to cache the response as it is being received. However once the full response is got, one needs to transfer it to a new instance of AssistantResponse class, so that latestResponse can be cleared, while the new instance can be used in other locations in the flow as needed. Achieve the same now.
Previously if content was empty, it would have always sent the toolcall info related version even if there was no toolcall info in it. Fixed now to return empty string, if both content and toolname are empty.
The implementations of javascript and simple_calculator now use provided helpers to trap console.log messages when they execute the code / expression provided by GenAi and inturn store the captured log messages in the newly added result key in tc_switch This should help trap the output generated by the provided code or expression as the case maybe and inturn return the same to the GenAi, for its further processing.
Checks for toolname to be defined or not in the GenAi's response If toolname is set, then check if a corresponding tool/func exists, and if so call the same by passing it the GenAi provided toolargs as a object. Inturn the text generated by the tool/func is captured and put into the user input entry text box, with tool_response tag around it.
As output generated by any tool/function call is currently placed into the TextArea provided for End user (for their queries), bcas the GenAi (engine/LLM) may be expecting the tool response to be sent as a user role data with tool_response tag surrounding the results from the tool call. So also now at the end of submit btn click handling, the end user input text area is not cleared, if there was a tool call handled, for above reasons. Also given that running a simple arithmatic expression in itself doesnt generate any output, so wrap them in a console.log, to help capture the result using the console.log trapping flow that is already setup.
and inform the GenAi/LLM about the same
Should hopeful ensure that the GenAi/LLM will generate appropriate code/expression as the argument to pass to these tool calls, to some extent.
ie in vs code with ts-check
Move tool calling logic into tools module. Try trap async promise failures by awaiting results of tool calling and putting full thing in an outer try catch. Have forgotten the nitty gritties of JS flow, this might help, need to check.
So that when tool handler writes the result to the tc_switch, it can make use of the same, to write to the right location. NOTE: This also fixes the issue with I forgetting to rename the key in js_run wrt writing of result.
to better describe how it will be run, so that genai/llm while creating the code to run, will hopefully take care of any naunces required.
Also as part of same, wrap the request details in the assistant block using a similar tagging format as the tool_response in user block.
Instead of automatically calling the requested tool with supplied arguments, rather allow user to verify things before triggering the tool. NOTE: User already provided control over tool_response before submitting it to the ai assistant.
Instead of automatically calling any requested tool by the GenAi / llm, that is from the tail end of the handle user submit btn click, Now if the GenAi/LLM has requested any tool to be called, then enable the Tool Run related UI elements and fill them with the tool name and tool args. In turn the user can verify if they are ok with the tool being called and the arguments being passed to it. Rather they can even fix any errors in the tool usage like the arithmatic expr to calculate that is being passed to simple_calculator or the javascript code being passed to run_javascript_function_code If user is ok with the tool call being requested, then trigger the same. The results if any will be automatically placed into the user query text area. User can cross verify if they are ok with the result and or modify it suitabley if required and inturn submit the same to the GenAi/LLM.
Also avoid showing Tool calling UI elements, when not needed to be shown.
tools manager/module * setup the web worker that will help execute the tool call related codes in a js environment that is isolated from the browsers main js environment * pass the web worker to the tool call providers, for them to use * dont wait for the result from the tool call, as it will be got later asynchronously through a message * allow users of the tools manager to register a call back, which will be called when ever a message is got from the web worker containing response wrt previously requested tool call execution. simplechat * decouple toolcall response handling and toolcall requesting logic * setup a timeout to take back control if tool call takes up too much time. Inturn help alert the ai model, that the tool call took up too much time and so was aborted, by placing a approriate tagged tool response into user query area. * register a call back that will be called when response is got asynchronously wrt anye requested tool calls. In turn take care of updating the user query area with response got wrt the tool call, along with tool response tag around it.
Had forgotten to specify type as module wrt web worker, in order to allow it to import the toolsconsole module. Had forgotten to maintain the id of the timeout handler, which is needed to clear/stop the timeout handler from triggering, if tool call response is got well in time. As I am currently reverting the console redirection at end of handling a tool call code in the web worker message handler, I need to setup the redirection each time. Also I had forgotten to clear the console.log capture data space, before a new tool call code is executed, this is also fixed by this change. TODO: Need to abort the tool call code execution in the web worker if possible in future, if the client / browser side times out waiting for tool call response, ie if the tool call code is taking up too much time.
As the tool calling, if enabled, will need access to last few user query and ai assistant responses (which will also include in them the tool call requests and the corresponding results), so that the model can build answers based on its tool call reqs and got responses, and also given that most of the models these days have sufficiently large context windows, so the sliding window context implemented by SimpleChat logic has been increased by default to include last 4 query and their responses roughlty.
ie wrt the tool calls provided.
Attached is a sample session with Gemma3N ai model with tool calling support, where the tool calling support is used to inform ai model that the current year is no longer the year it assumes based on its training data. So that it can make use of the same for future interactions in that session (provided it remains with in the sliding window context) |
Attached is another sample session with Gemma3N ai model with tool calling support, where the tool calling support is used to generate the factorials of few numbers as well as to have a chat has to why infinity doesnt seem right and inturn as to why it cant be the real right answer in those cases. |
Have updated the logic to now run the tool call related runtime created code within a web worker context of the browser and not the browser's global context/scope. |
This patch set also avoids inserting/showing of the unneeded null at the beginning of assistant responses. |
Hi @ggerganov @ngxson @ericcurtin This is a simple minded continuation/update to my previous tools/server/public_simplechat, which allows one to use basic tool calls support of ai models from within the browser-client side environment in a interactive user controlled way without needing any additional mcp host/tool etal for simple things like calculations or basic code based (js in this case) data augmenting / cross check etal. The team working on the default webui, could implement a similar thing to the default webui also, to enable end users to make use of tool callings support of the latest ai models in useful ways that to in a simple 0 additional setup way. |
Honestly I think simplechat should be a project outside of llama.cpp as it's only maintained mostly by you. More importantly, I don't think the number of users can justify the maintaining cost. The tool calling / code execution capability are trivial features to add to the current (more functional) Sveltekit webui. We just need to plan it a bit carefully to make sure MCP possible in the future. It will eventually be added into the webui, see #13501 (comment) That to say, I won't spend my time on reviewing PRs related to simplechat. I need to focus my time on areas that are more important in the project. Please only ping me (and maybe other maintainers) when absolutely necessary. |
Modify the constructor, newFrom and clear towards this goal.
Rename ChatMessage to ChatMessageEx. Add typedefs for NSToolCall and NSChatMessage, they represent the way the corresponding data is structured in network hs. Add logic to build the ChatMessageEx from data got over network in streaming mode.
Update HasToolCalls and ContentEquiv to work with new structure
Use the equivalent update_stream directly added to ChatMessageEx. update_stream is also more generic to some extent and also directly implemented by the ChatMessageEx class.
response_extract logic moved directly into ChatMessageEx as update oneshot, with suitable adjustments. Inturn use the same directly.
these have been updated to work with ChatMessageEx to an extent
GetSystemLatest and its users updated wrt ChatMessageEx. RecentChat updated wrt ChatMessageEx. Also now irrespective of whether full history is being retrieved or only a subset, both cases refer to the ChatMessageEx instances in SimpleChat.xchat without creating new instances of anything.
Simplify Add semantic by expecting any validation of stuff before adding to be done by the callers of Add and not by add itself. Also update it to expect ChatMessageEx object Update all users of add to follow the new syntax and semantic. Remove the old and ununsed AddSysPromptOnlyAtBegin helper
Users of recent_chat updated to work with ChatMessageEx As part of same recent_chat_ns also added, for the case where the array of chat messages can be passed as is ie in the chat mode, provided it has only the network handshake representation of the messages.
wrt ChatMessageEx related required flow as well as avoid warnings
Use HTMLElement's dataset to maintain tool call id along with the element which maintains the toolname. Pass it along to the tools manager and inturn the actual tool calls and through them to the web worker handling the tool call related code and inturn returning it back as part of the obj which is used to return the tool call result. Embed the tool call id, function name and function result into the content field of chat message in terms of a xml structure Also make use of tool role to send back the tool call result. Do note that currently the id, name and content are all embedded into the content field of the tool role message sent to the ai engine on the server. NOTE: Use the user query entry area for showing tool call result in the above mentioned xml form, as well as for user to enter their own queries. Based on presence of the xml format data at beginning the logic will treat it has a tool result and if not then as a normal user query. The css has been updated to help show tool results/msgs in a lightyellow background
Expand the xml format id, name and content in content field of tool result into apropriate fields in the tool result message sent to the genai/llm engine on the server.
these common helpers avoid needing ignore tagging to ts-check, in places where valid constructs have been used which go beyond strict structured js handling that is tried to be achieved using it, but are still valid and legal.
Also update the sliding window context size to last 9 chat messages so that there is a sufficiently large context for multi turn tool calls based adjusting by ai and user, without needing to go full hog, which has the issue of overflowing the currently set context window wrt the loaded ai model.
Updated flow to use a ChatMessageEx class which handles many of the stuff (code and git log has details), which was previously spread out in other places, thus cleaning up things, as well as making it even more easy to add support for viewing thinking output, as well as embedding video or audio for multimodal models in future Now the tool calling handshake part follows the openai / standard handshake like having tool_calls in assistant messages and having tool role messages wrt results. Also the size of the client side sliding window implemented in public_simplechat has been increased to better match the needs of tool calling experiments and basic usage. |
No worries Son, I will only ping when I am ready for things to be merged, I eitherway cross check things to some extent before I request the merge, given that I do these changes, as part of trying to use llama.cpp every once in a while and inturn finding something I was expecting to try out at that moment, not being there or working odd and inturn cross checking things, same as how this simplechat code started out also originally, while looking into chat templates, direct use of llama.cpp by end users as well as chat and completion handshakes etal at that time. Yesterday I notified given that you all are mentioned in the CodeOwners file wrt server, Had a look at the PR you had mentioned, I feel any MCP based tool handshaking on the server side should be eitherway relatively orthogonal to any tool call handling especially on browser based client situations, as the browser runtime provides a implicit environment for having many simple yet useful tools without needing anything beyond the basic llamacpp engine based server. Even if one is implementing transparent MCP based tool routing / handling on the backend, using additional server or so, the configuration for such a setup can help specify any hybrid paths (ie some tools directly through backend and some on client side) if and when needed. So also the decision to abandon that PR seems bit unneeded, beyond the streaming based handshake stuff. Eitherway irrespective of that or any such other decisions wrt the new webui frontend now, the simplechat provides a alternate mechanism for experimenting and or basic usage. Best wishes to you all wrt ggml org etal and thank you to all the open source contributors as always. |
Extends my earlier simple minded tools/server/public_simplechat web/browser ui for llama.cpp to include support for a simple minded interactive tool calling which uses the javascript environment of the browser to provide some basic tool / function calls.
Currently it provides simple_calculator and run_javascript_function_code tool calls.
If ToolCalling is enabled in ui settings, meta data about these tools is handshaked with the GenAi/LLM model. Inturn if the ai model used is aware of tool calling and makes a tool_calls request, the user is shown the tool name and the argument being passed to it. User can verify the same and trigger the tool call as is or make changes as needed before triggering the tool call.
The result of the tool call is automatically placed into the user query chat area, with tool_response tag surrounding it. The user can submit the response as is or make suitable changes to the tool response contents before submitting the same to the ai model.
NOTE: This is for a simple minded exploration of tool calling support in newer ai models and some fun along the way as well as occasional practical use like verifying mathematical or logical statements/reasoning made by the ai model during chat sessions by getting it to also create and execute code to verify such stuff and so.
[[OLD NOTE: The ai model created code is currently run in the browser's global scope, so always cross check the tool call before allowing/running it. In a later version will be updating the logic so that the generated tool call is run within a web worker scope, to limit its powers a little bit, but always be careful when using this. OLD]]
The ai model created code is run from within a web worker context in the browser, to try and isolate it from the main browser context. However any shared web worker context, if any, is not isolated. Always cross check the tool call before allowing/running it.
Bit more details about this feature is in the updated readme.md within public_simplechat.
NOTE: The tool calling has been implemented for the chat streaming mode for now. Will add support for oneshot mode later. Tool calling with this logics current simple minded ideosynchronusy (noted in readme.md) has been tested with Gemma3N model for now.