extend server/public_simplechat with simple minded interactive browser-client side based toolcalling - base logic #16563

hanishkvc · 2025-10-13T14:10:21Z

[Updated on 20251024IST2028]
Extends my earlier simple minded tools/server/public_simplechat web/browser ui in llama.cpp to include support for a simple minded interactive tool calling which uses the javascript environment of the browser to provide some basic tool / function calls.

Currently it provides the following tool calls

directly in browser
- simple_calculator (runs in a web worker)
- run_javascript_function_code tool calls (runs in a web worker)
- NOTE: The code created by ai model is run from within a web worker context in the browser, to isolate it from the main browser context. However any shared web worker context, if any, is not isolated.
using bundled simpleproxy.py (helps bypass browser cors restriction)
- fetch_web_url_raw
- fetch_web_url_text (strip out head, style, script, header, footer, nav)
- NOTE: these tool calls exposed to ai engine on the server, only if simpleproxy.py is run
- NOTE: allows a white list of domains to be specified

If ToolCalling is enabled in ui settings, meta data about these tools is handshaked with the GenAi/LLM model. Inturn if the ai model used is aware of tool calling and makes a tool_calls request, the user is shown the tool name and the argument being passed to it. User can verify the same and trigger the tool call as is or make changes as needed before triggering the tool call.

The result of the tool call is automatically placed into the user query chat area, with tool_response tag surrounding it. The user can submit the response as is or make suitable changes to the tool response contents before submitting the same to the ai model.

NOTE: User can optionally enable automatic triggering of tool calls and its response submission.

NOTE: This allows end users to use some basic yet useful tool calls to enhance their ai chat sessions to some extent. It also provides for a simple minded exploration of tool calling support in newer ai models and some fun along the way as well as occasional practical use like

verifying mathematical or logical statements/reasoning made by the ai model during chat sessions by getting it to also create and execute code to verify such stuff and so.
access content from internet and augment the ai model's context with additional data as needed to help generate better responses. this can also be used for
- get latest news summary by fetching from news aggregator sites and collating organising and summarising the same
- search for specific topics and summarising the results or so

Bit more details about this feature is in the updated readme.md within public_simplechat.

The tool calling has been implemented for the chat endpoint's streaming and oneshot modes. The same has been checked with Gemma3N, Granite4 and GPT-OSS (reasoing not handled currently and can mess with flow in streaming mode) models for now.

Enable streaming by default, to check the handshake before going on to change the code, given that havent looked into this for more than a year now and have been busy with totally different stuff. Also updated the user messages used for testing a bit

Define the meta that needs to be passed to the GenAi Engine. Define the logic that implements the tool call, if called. Implement the flow/structure such that a single tool calls implementation file can define multiple tool calls.

Make tooljs structure and flow more generic Add a simple_calculator tool/function call logic Add initial skeleton wrt the main tools.mjs file.

Changed latestResponse type to an object instead of a string. Inturn it contains entries for content, toolname and toolargs. Added a custom clear logic due to the same and used it to replace the previously simple assigning of empty string to latestResponse. For now in all places where latestReponse is used, I have replaced with latestReponse.content. Next need to handle identifying the field being streamed and inturn append to it. Also need to add logic to call tool, when tool_call triggered by genai.

Update response_extract_stream to check for which field is being currently streamed ie is it normal content or tool call func name or tool call func args and then return the field name and extracted value. Previously it was always assumed that only normal content will be returned. Currently it is assumed that the server will only stream one of the 3 supported fields at any time and not more than one of them at the same time. TODO: Have to also add logic to extract the reasoning field later, ie wrt gen ai models which give out their thinking. Have updated append_response to expect both the key and the value wrt the latestResponse object, which it will be manipualted. Previously it was always assumed that content is what will be got and inturn appended.

I was wrongly checking for finish_reason to be non null, before trying to extract the genai content/toolcalls, have fixed this oversight with the new flow in progress. I had added few debug logs to identify the above issue, need to remove them later. Note: given that debug logs are disabled by replacing the debug function during this program's initialisation, which I had forgotten about, I didnt get the debug messages and had to scratch my head a bit, before realising this and the other issue ;) Also either when I had originally implemented simplechat 1+ years back, or later due to changes on the server end, the streaming flow sends a initial null wrt the content, where it only sets the role. This was not handled in my flow on the client side, so a null was getting prepended to the chat messages/responses from the server. This has been fixed now in the new generic flow.

Make latestResponse into a new class based type instance wrt ai assistant response, which is what it represents. Move clearing, appending fields' values and getting assistant's response info (irrespective of a content or toolcall response) into this new class and inturn use the same.

Switch oneshot handler to use AssistantResponse, inturn currenlty only handle the normal content in the response. TODO: If any tool_calls in the oneshot response, it is currently not handled. Inturn switch the generic/toplevel handle response logic to use AssistantResponse class, given that both oneshot and the multipart/streaming flows use/return it. Inturn add trimmedContent member to AssistantResponse class and make the generic handle response logic to save the trimmed content into this. Update users of trimmed to work with this structure.

As there could be failure wrt getting the response from the ai server some where in between a long response spread over multiple parts, the logic uses the latestResponse to cache the response as it is being received. However once the full response is got, one needs to transfer it to a new instance of AssistantResponse class, so that latestResponse can be cleared, while the new instance can be used in other locations in the flow as needed. Achieve the same now.

Previously if content was empty, it would have always sent the toolcall info related version even if there was no toolcall info in it. Fixed now to return empty string, if both content and toolname are empty.

The implementations of javascript and simple_calculator now use provided helpers to trap console.log messages when they execute the code / expression provided by GenAi and inturn store the captured log messages in the newly added result key in tc_switch This should help trap the output generated by the provided code or expression as the case maybe and inturn return the same to the GenAi, for its further processing.

Checks for toolname to be defined or not in the GenAi's response If toolname is set, then check if a corresponding tool/func exists, and if so call the same by passing it the GenAi provided toolargs as a object. Inturn the text generated by the tool/func is captured and put into the user input entry text box, with tool_response tag around it.

As output generated by any tool/function call is currently placed into the TextArea provided for End user (for their queries), bcas the GenAi (engine/LLM) may be expecting the tool response to be sent as a user role data with tool_response tag surrounding the results from the tool call. So also now at the end of submit btn click handling, the end user input text area is not cleared, if there was a tool call handled, for above reasons. Also given that running a simple arithmatic expression in itself doesnt generate any output, so wrap them in a console.log, to help capture the result using the console.log trapping flow that is already setup.

and inform the GenAi/LLM about the same

Should hopeful ensure that the GenAi/LLM will generate appropriate code/expression as the argument to pass to these tool calls, to some extent.

ie in vs code with ts-check

Move tool calling logic into tools module. Try trap async promise failures by awaiting results of tool calling and putting full thing in an outer try catch. Have forgotten the nitty gritties of JS flow, this might help, need to check.

So that when tool handler writes the result to the tc_switch, it can make use of the same, to write to the right location. NOTE: This also fixes the issue with I forgetting to rename the key in js_run wrt writing of result.

to better describe how it will be run, so that genai/llm while creating the code to run, will hopefully take care of any naunces required.

Also as part of same, wrap the request details in the assistant block using a similar tagging format as the tool_response in user block.

Instead of automatically calling the requested tool with supplied arguments, rather allow user to verify things before triggering the tool. NOTE: User already provided control over tool_response before submitting it to the ai assistant.

Instead of automatically calling any requested tool by the GenAi / llm, that is from the tail end of the handle user submit btn click, Now if the GenAi/LLM has requested any tool to be called, then enable the Tool Run related UI elements and fill them with the tool name and tool args. In turn the user can verify if they are ok with the tool being called and the arguments being passed to it. Rather they can even fix any errors in the tool usage like the arithmatic expr to calculate that is being passed to simple_calculator or the javascript code being passed to run_javascript_function_code If user is ok with the tool call being requested, then trigger the same. The results if any will be automatically placed into the user query text area. User can cross verify if they are ok with the result and or modify it suitabley if required and inturn submit the same to the GenAi/LLM.

Also avoid showing Tool calling UI elements, when not needed to be shown.

The config entries should be named same as their equivalent cmdline argument entries but without the -- prefix

Allow fetching from only specified allowed.domains

Had confused between js and python wrt accessing dictionary contents and its consequence on non existent key. Fixed it. Use different error ids to distinguish between failure in common urlreq and the specific urltext and urlraw helpers.

with allowed domains set to few sites in general to show its use this includes some sites which allow search to be carried out through them as well as provide news aggregation

ie include User-Agent, Accept-Language and Accept in the generated request using equivalent values got in the request being proxied.

The tagging of messages wrt ValidateUrl and UrlReq Also dump req Move check for --allowed.domains to ValidateUrl NOTE: Also with mimicing of user agent etal from got request to the generated request, yahoo search/news is returning results now, instead of the bland error before.

mimicing got req in generated req helps with duckduckgo also and not just yahoo. also update allowed.domains to allow a url generated by ai when trying to access the bing's news aggregation url

Use DOMParser parseFromString in text/html mode rather than text/xml as it makes it more relaxed without worrying about special chars of xml like & etal

ie during proxying

Instead of simple concatenating of tool call id, name and result now use browser's dom logic to create the xml structure used for now to store these within content field. This should take care of transforming / escaping any xml special chars in the result, so that extracting them later for putting into different fields in the server handshake doesnt have any problem.

bing raised a challenge for chrome triggered search requests after few requests, which were spread few minutes apart, while still seemingly allowing wget based search to continue (again spread few minutes apart). Added a simple helper to trace this, use --debug True to enable same.

avoid logically duplicate debug log

Instead of enforcing always explicit user triggered tool calling, now user is given the option whether to use explicit user triggered tool calling or to use auto triggering after showing tool details for a user specified amount of seconds. NOTE: The current logic doesnt account for user clicking the buttons before the autoclick triggers; need to cancel the auto clicks, if user triggers before autoclick, ie in future.

also cleanup the existing toolResponseTimeout timer to be in the same structure and have similar flow convention.

hanishkvc · 2025-10-24T14:32:03Z

Now simpleproxy.py

expects a list of allowed domains (regex based) to be specified through a config file. Inturn when ever a request for fetch web url is got, it will proceed only if the domain being fetched is in the allowed domains list
tries to mimic certain characteristics (user-agent, accept, accept-language) of the got proxy request, in the corresponding request it generates to the actual web server, this should potentially allow certain web sites to allow the request, instead of blocking it.

Allow user to trigger tool call usage in a automated way without requiring the user to explicitly trigger the tool call and its response submission

Other cleanups and updates in general

identified by llama.cpp editorconfig check * convert tab to spaces in json config file * remove extra space at end of line

Add missing newline to ending bracket line of json config file

hanishkvc · 2025-10-24T21:17:17Z

Hi @ggerganov

Given that browsers provide a implicit env for not only showing ui, but also running logic,

I have updated tools/server/public_simplechat client ui to use/expose tool calling support provided by the newer ai models to end users of llama.cpp's server in a simple way without needing to worry about seperate mcp host/router, tools etal, for basic useful tools/functions like a calculator or code execution (javascript in this case). The ai generated math expressions or code is run within a web worker, so that the main browser context cant be manipulated.

Additionally if users want to fetch web content as part of their ai chat session, I have also implemented two functions related to web url fetching which work with a included python based simple proxy server, which allows the ai to either get raw web content, or a stripped web content (which is devoid of html tags as well as head, script, style, header, footer, nav blocks). This simple proxy server also supports a whitelist of allowed domains which the user can update as needed, to ensure that ai doesnt access sites which the end user doesnt want to.

As part of working on this PR, I have also cleaned up the simplechat flow a bit, so that in future it is easy to add support for showing reasoning and or handshaking multimodal data like images or so.

I have tested this with Gemma3N, Granite4 (and GPT-OSS, to an extent) for requesting calculations or for getting ai to verify its logical assertions or calculations etal, as well as for getting ai to create news summaries after fetching info from news sites or searching and summarising results and so locally.

As before, I have not used any external libraries or so, when implementing this logic, and instead rely on standard features/modules supported by the respective languages.

This PR is ready for commiting to the main branch now.

NOTE: the python pyright checker is flagging the python based simpleproxy logic wrt certain features of python that it uses like use of match statements or constraining of certain variables/arguments to only a subset of possible types rather than default pythons variant flexibility, as requiring python 3.10 or so. Given that python 3.10 came out few years back, I assume we can ignore those, but do let me know, if you want me to try be compatible with older versions of python.

NOTE: I havent added myself to CODEOWNERS for tools/server/public_simplechat for now, as by the time this PR gets merged if others also modify CODEOWNERS for other stuff, I didnt want to trigger any merge conflict etc, because otherwise this PR doesnt touch any file outside tools/server/public_simplechat.

NOTE: I wanted to explore the happening in current gen ai landscape and inturn tool calling capabilities a bit. But most general purpose or otherwise tool calling discussions/effort seemed to be going around mcp and its kitchen sink of parts or so, and or require online account creation etal for end users. So also I did this PR, so that end users can explore and use advantage of tool calling with local gen ai at a basic level without any/too much hassle to an extent.

include info about the auto option within tools. use nonwrapped text wrt certain sections, so that the markdown readme can be viewed properly wrt the structure of the content in it.

ggerganov · 2025-10-29T08:19:53Z

Honestly I think simplechat should be a project outside of llama.cpp as it's only maintained mostly by you. More importantly, I don't think the number of users can justify the maintaining cost.

@hanishkvc I have to agree with ngxson's assessment here. I don't think this implementation see any usage at the moment and it's not worth for us to maintain it further. It's better to focus our efforts on the modern Svelte-based WebUI.

If you decide to host this project somewhere else, will be happy to link from our README as an alternative client.

hanishkvc · 2025-11-16T22:01:41Z

Honestly I think simplechat should be a project outside of llama.cpp as it's only maintained mostly by you. More importantly, I don't think the number of users can justify the maintaining cost.

@hanishkvc I have to agree with ngxson's assessment here. I don't think this implementation see any usage at the moment and it's not worth for us to maintain it further. It's better to focus our efforts on the modern Svelte-based WebUI.

If you decide to host this project somewhere else, will be happy to link from our README as an alternative client.

Hi @ggerganov,

Sorry didnt notice your response earlier, as I was in the process of adding the missing features wrt this ie peek at reasoning, vision and tool calling and some associated cleanup, so that I could explore some stuff which I wanted to.

If you look at the newer PR #17142 , you will notice that this alternate pure html + css + js based flow (avoids dependence on external / 3rd party libraries in general) now supports reasoning, vision and also tool calling (with a bunch of built in client side based tool calls with zero setup, ++) all within a uncompressed source code size of 300KB (including the python simpleproxy.py for web access and related tool calls). Also the logical ui elements have their own unique id/class, which can help theme the ui, if one wants to.

While the default web ui is around 1.2 MB or so compressed, needs one to understand svelte framework and also track the bundled modules. Also it doesnt support tool calling currently, and the plan is more towards server side / back end MCP based tool call support, if I understand correctly.

Given the above significant differences, it may make sense to continue this as a lightweight alternate ui option within llama.cpp itself, parallel to the default webui. My embedded background also biases me toward simple, flexible and functional options.

NOTE: When I revisited ai after almost a year++ wanting to explore some of the recent ai developments, I couldnt find any sensible zero or minimal setup tool calling supported open source ai clients, so I started on this series of patches/PRs.

Eitherway the final decision is up to you and team of open source developers who work on this proactively, rather than once in a bluemoon me, as to whether you would prefer to apply these into llama.cpp itself or not. Do let me know your thoughts.

hanishkvc · 2025-11-20T23:03:56Z

Look at #17415 for the latest PR in this series

hanishkvc · 2025-11-26T19:45:50Z

Look at the newer PRs in this series with more features.

hanishkvc added 30 commits October 13, 2025 18:51

SimpleChatToolCalling: Test/Explore srvr initial hs using cmdline

fa23e9d

SimpleChatTools: Add boolean to allow user control of tools use

75ce9e4

SimpleChatTC: More generic tooljs, SimpCalc, some main skeleton

bbaae70

Make tooljs structure and flow more generic Add a simple_calculator tool/function call logic Add initial skeleton wrt the main tools.mjs file.

SimpleChatTC: Bring in the tools meta into the main flow

f091568

SimpleChatTC: use tcpdump to dbg hs; check if ai aware of tools

9d8be85

SimpleChatTC: Show toolcall being generated by ai - Temp

4cbe1d2

SimpleChatTC: Saner/Robust AssistantResponse content_equiv

d7f612f

Previously if content was empty, it would have always sent the toolcall info related version even if there was no toolcall info in it. Fixed now to return empty string, if both content and toolname are empty.

SimpleChatTC: Trap any exception raised during tool call

7a2bcfb

and inform the GenAi/LLM about the same

SimpleChatTC: More clearer description of toolcalls execution env

f10ab96

Should hopeful ensure that the GenAi/LLM will generate appropriate code/expression as the argument to pass to these tool calls, to some extent.

SimpleChatTC: Clarify some type definitions to avoid warnings

a1f1776

ie in vs code with ts-check

SimpleChatTC: Pass toolname to the tool handler

3796306

So that when tool handler writes the result to the tc_switch, it can make use of the same, to write to the right location. NOTE: This also fixes the issue with I forgetting to rename the key in js_run wrt writing of result.

SimpleChatTC: Cleanup the function description a bit

0ed8329

to better describe how it will be run, so that genai/llm while creating the code to run, will hopefully take care of any naunces required.

SimpleChatTC: Update the readme.md wrt tool calling a bit

aa81f51

SimpleChatTC: ToolCall hs info in normal assistant-user chat flow

5ed2bc3

Also as part of same, wrap the request details in the assistant block using a similar tagging format as the tool_response in user block.

SimpleChatTC: Update readme with bit more details, Cleaner UI

2aabca2

Also avoid showing Tool calling UI elements, when not needed to be shown.

SimpleChatTC: Tool Calling UI elements use up horizontal space

90b2491

hanishkvc added 17 commits October 23, 2025 17:45

SimpleChatTC:SimpleProxy:Allow for loading json based config file

832d613

The config entries should be named same as their equivalent cmdline argument entries but without the -- prefix

SimpleChatTC:SimpleProxy: Update doc following python convention

a415180

SimpleChatTC:SimpleProxy: AllowedDomains based filtering

289e9e0

Allow fetching from only specified allowed.domains

SimpleChatTC:SimpleProxy: Include a sample config file

05b52c3

with allowed domains set to few sites in general to show its use this includes some sites which allow search to be carried out through them as well as provide news aggregation

SimpleChatTC: Update readme a bit

da99c8b

SimpleChatTC:SimpleProxy: Some debug prints which give info

e42e72e

SimpleChatTC:SimpleProxy:Try mimic real client using got req info

0cb2217

ie include User-Agent, Accept-Language and Accept in the generated request using equivalent values got in the request being proxied.

SimpleChatTC:SimpleProxy: mimicing got req helps wrt duckduckgo

9bd3b35

mimicing got req in generated req helps with duckduckgo also and not just yahoo. also update allowed.domains to allow a url generated by ai when trying to access the bing's news aggregation url

SimpleChatTC:ToolCall response relaxed handling

9441798

Use DOMParser parseFromString in text/html mode rather than text/xml as it makes it more relaxed without worrying about special chars of xml like & etal

SimpleChatTC:SimpleProxy: Update readme wrt mimicing client req

200181f

ie during proxying

SimpleChatTC:SimpleProxy:Cleanup

8b18473

avoid logically duplicate debug log

SimpleChatTC:AutoToolCalls: Track and clear related timers

524aa01

also cleanup the existing toolResponseTimeout timer to be in the same structure and have similar flow convention.

hanishkvc added 2 commits October 24, 2025 20:51

SimpleChatTC: Cleanup whitespaces

a4152d1

identified by llama.cpp editorconfig check * convert tab to spaces in json config file * remove extra space at end of line

SimpleChatTC:Cleanup whitespace - github editorconfig checker

cff1de9

Add missing newline to ending bracket line of json config file

SimpleChatTC:Update and cleanup the readme a bit

8481ab4

include info about the auto option within tools. use nonwrapped text wrt certain sections, so that the markdown readme can be viewed properly wrt the structure of the content in it.

This was referenced Oct 26, 2025

server/public_simplechat web search tool call support and bearer token wrt simpleproxy handshake #16791

Closed

server/public_simplechat client ui show reasoning support added, toolcalling with direct builtin tools and beyond also in PR chain #16819

Closed

hanishkvc mentioned this pull request Oct 29, 2025

server/public_simplechat - basic builtin data store related tool calls added - use builtin browser/client side tool calling with minimal setup #16852

Closed

hanishkvc closed this Nov 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

extend server/public_simplechat with simple minded interactive browser-client side based toolcalling - base logic #16563

extend server/public_simplechat with simple minded interactive browser-client side based toolcalling - base logic #16563

Uh oh!

hanishkvc commented Oct 13, 2025 •

edited

Loading

Uh oh!

hanishkvc commented Oct 24, 2025

Uh oh!

hanishkvc commented Oct 24, 2025 •

edited

Loading

Uh oh!

ggerganov commented Oct 29, 2025

Uh oh!

hanishkvc commented Nov 16, 2025 •

edited

Loading

Uh oh!

hanishkvc commented Nov 20, 2025

Uh oh!

hanishkvc commented Nov 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

extend server/public_simplechat with simple minded interactive browser-client side based toolcalling - base logic #16563

extend server/public_simplechat with simple minded interactive browser-client side based toolcalling - base logic #16563

Uh oh!

Conversation

hanishkvc commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hanishkvc commented Oct 24, 2025

Uh oh!

hanishkvc commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov commented Oct 29, 2025

Uh oh!

hanishkvc commented Nov 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hanishkvc commented Nov 20, 2025

Uh oh!

hanishkvc commented Nov 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hanishkvc commented Oct 13, 2025 •

edited

Loading

hanishkvc commented Oct 24, 2025 •

edited

Loading

hanishkvc commented Nov 16, 2025 •

edited

Loading