UPSTREAM PR #17506: server/publc_simplechat tiny (50KB compressed) web ui++ updated with reasoning, vision, builtin clientside tool calls (and markdown wip) #327

loci-dev · 2025-11-25T22:37:15Z

updated server/public_simplechat additionally with a initial go at a simple minded minimal markdown to html logic, so that if the ai model is outputting markdown text instead of plain text, user gets a basic formatted view of the same. If things dont seem ok, user can disable markdown processing from settings in ui.

look into the previous PR #17451 in this series for details wrt other features added to tools/server/public_simplechat
like peeking into reasoning, working with vision models as well as built in support for a bunch of useful tool calls on the client side with minimal to no setup.

All features (except for pdf - pypdf dep) are implemented internally without depending on any external libraries, and inturn should fit within 50KB compressed. Created using pure html+css+js in general, with additionally python for simpleproxy to bypass the cors++ restrictions in browser environment for direct web access.

mimicing got req in generated req helps with duckduckgo also and not just yahoo. also update allowed.domains to allow a url generated by ai when trying to access the bing's news aggregation url

Use DOMParser parseFromString in text/html mode rather than text/xml as it makes it more relaxed without worrying about special chars of xml like & etal

ie during proxying

Instead of simple concatenating of tool call id, name and result now use browser's dom logic to create the xml structure used for now to store these within content field. This should take care of transforming / escaping any xml special chars in the result, so that extracting them later for putting into different fields in the server handshake doesnt have any problem.

bing raised a challenge for chrome triggered search requests after few requests, which were spread few minutes apart, while still seemingly allowing wget based search to continue (again spread few minutes apart). Added a simple helper to trace this, use --debug True to enable same.

avoid logically duplicate debug log

Instead of enforcing always explicit user triggered tool calling, now user is given the option whether to use explicit user triggered tool calling or to use auto triggering after showing tool details for a user specified amount of seconds. NOTE: The current logic doesnt account for user clicking the buttons before the autoclick triggers; need to cancel the auto clicks, if user triggers before autoclick, ie in future.

also cleanup the existing toolResponseTimeout timer to be in the same structure and have similar flow convention.

identified by llama.cpp editorconfig check * convert tab to spaces in json config file * remove extra space at end of line

Add missing newline to ending bracket line of json config file

include info about the auto option within tools. use nonwrapped text wrt certain sections, so that the markdown readme can be viewed properly wrt the structure of the content in it.

So as to split browser js webworker based tool calls from web related tool calls.

Remove the unneed (belonging to the other file) stuff from tooljs and toolweb files. Update tools manager to make use of the new toolweb module

Initial go at implementing a web search tool call, which uses the existing UrlText support of the bundled simpleproxy.py. It allows user to control the search engine to use, by allowing them to set the search engine url template. The logic comes with search engine url template strings for duckduckgo, brave, bing and google. With duckduckgo set by default.

Avoid code duplication, by creating helpers for setup and toolcall. Also send indication of the path that will be used, when checking for simpleproxy.py server to be running at runtime setup.

If using wikipedia or so, remember to have sufficient context window in general wrt the ai engine as well as wrt the handshake / chat end point.

Moved it into Me->tools, so that end user can modify the same as required from the settings ui. TODO: Currently, if tc response is got after a tool call timed out and user submitted default timed out error response, the delayed actual response when it is got may overwrite any new content in user query box, this needs to be tackled.

Now both follow a similar mechanism and do the following * exit on finding any issue, so that things are in a known state from usage perspective, without any confusion/overlook * check if the cmdlineArgCmd/configCmd being processed is a known one or not. * check value of the cmd is of the expected type * have a generic flow which can accomodate more cmds in future in a simple way

Ensure load_config gets called on encountering --config in cmdline, so that the user has control over whether cmdline or config file will decide the final value of any given parameter. Ensure that str type values in cmdline are picked up directly, without running them through ast.literal_eval, bcas otherwise one will have to ensure throught the cmdline arg mechanism that string quote is retained for literal_eval Have the """ function note/description below def line immidiately so that it is interpreted as a function description.

Add a config entry called bearer.insecure which will contain a token used for bearer auth of http requests Make bearer.insecure and allowed.domains as needed configs, and exit program if they arent got through cmdline or config file.

As noted in the comments in code, this is a very insecure flow for now.

Next will be adding a proxyAuth field also to tools.

User can configure the bearer token to send

instead of using the shared bearer token as is, hash it with current year and use the hash. keep /aum path out of auth check. in future bearer token could be transformed more often, as well as with additional nounce/dynamic token from server got during initial /aum handshake as also running counter and so ... NOTE: All these circus not good enough, given that currently the simpleproxy.py handshakes work over http. However these skeletons put in place, for future, if needed. TODO: There is a once in a bluemoon race when the year transitions between client generating the request and server handling the req. But other wise year transitions dont matter bcas client always creates fresh token, and server checks for year change to genrate fresh token if required.

Add a new role ToolTemp, which is used to maintain any tool call response on the client ui side, without submitting it to the server ie till user or auto submit triggers the submitting of that tool call response. When ever a tool call response is got, create a ToolTemp role based message in the corresponding chat session. And dont directly update the user query input area, rather leave it to the updated simplechat show and the new multichatui chat_show helper and inturn whether the current chat session active in ui is same as the one for which the tool call response has been recieved. TODO: Currently the response message is added to the current active chat session, but this needs to be changed by tracking chatId/session through the full tool call cycle and then adding the tool call response in the related chat session, and inturn updating or not the ui based on whether that chat session is still the active chat session in ui or not, given that tool call gets handled in a asynchronous way. Now when that tool call response is submitted, promote the equiv tool temp role based message that should be in the session's chat history as the last message into becoming a normal tool response message. SimpleChat.show has been updated to take care of showing any ToolTemp role message in the user query input area. A newer chat_show helper added to MultiChatUI, that takes care of calling SimpleChat.show, provided the chat_show is being requested for the currently active in ui, chat session. As well as to take care of passing both the ChatDiv and elInUser. Converts users of SimpleChat.show to use MultiChatUI.chat_show

Update the immidiate tool call triggering failure and tool call response timeout paths to use the new ToolTemp and MultiChatUI based chat show logics. Actual tool call itself generating errors, is already handled in the previous commit changes.

Pass chatId to tool call, and use chatId in got tool call resp, to decide as to to which chat session the async tool call resp belongs and inturn if auto submit timer should be started if auto is enabled.

This should ensure that tool call responses can be mapped back to the chat session for which it was triggered.

Avoid seperate duplicated logic for creating the div+label+el based element

So there is slightly better typecheck and less extra code.

Try identify headings, and blocks in markdown and convert them into equivalent stuff in html Show the same in the chat message blocks.

Remove markdown heading markers Fix pre equivalent blocks of markdown given that they can have the block type following ``` marker Remember to add line break at end of line wrt pre block.

Ensure '---' is treated as a horizontal line and doesnt mess with unordered list handling. Take care of unwinding the unordered list everywhere it is needed.

also make flow simple by using same logic for putting the list content.

Allow for other valid char based markers wrt horizontal lines and unordered lists ?Also allow for spaces after horizontal line marker, in same line?

Allow fenced code block / pre to be demarkated using either ``` or ~~~ Ensure the termination line wrt fenced block doesnt contain anything else. Same starting marker needs to be present wrt ending also

Try create a table head

Rather this wont work, need to refresh on regex, been too long. Rather using split should be simpler However the extraction of head and body parts with seperation inbetween for transition should work Rather the seperation is blindly assumed and corresponding line discarded for now

Switch to the simpler split based flow. Include tr wrt the table head block also. Add a css entry to try and have header cell contents text aling to left for now, given that there is no border or color shaded or so distinguishing characteristics wrt the table cells for now.

User can enable or disable the simple minded bruteforce markdown parsing from the per session settings. Add grey shading and align text to left wrt table headings of markdown to html converted tables.

Save copy of data being processed. Try and sanitize the data passed for markdown to html conversion, so that if there are any special characters wrt html in the passed markdown content, it gets translated into a harmless text. This also ensures that those text dont disappear, bcas of browser trying to interpret them as html tagged content. Trap any errors during sanitizing and or processing of the lines in general and push them into a errors array. Callers of this markdown class can decide whether to use the converted html or not based on errors being empty or not or ... Move the processing of unordered list into a function of its own. Rather the ordered list can also use the same flow in general except for some tiny changes including wrt the regex, potentially.

loci-agentic-ai · 2025-11-25T23:05:04Z

Explore the complete analysis inside the Version Insights

Performance Analysis Summary

Project: llama.cpp
Versions Compared: acf457e6-fec8-49e0-af2c-73481a3746f2 vs aab9b31c-ad35-48ba-b9fe-4c0fd3dc2df2

Summary

This version introduces extensive web UI enhancements to the SimpleChat frontend without modifying core llama.cpp inference engine code. Three utility binaries were removed from the build configuration. No function-level performance changes were detected in core libraries. The changes are confined to the tools/server/public_simplechat/ directory, implementing tool calling, vision support, reasoning display, and markdown rendering capabilities entirely client-side.

Power Consumption Changes:

build.bin.llama-cvector-generator: Reduced by 278999 nJ (removed from build)
build.bin.llama-run: Reduced by 245370 nJ (removed from build)
build.bin.llama-tts: Reduced by 285154 nJ (removed from build)
build.bin.libllama.so: Changed by -0.35 nJ (negligible)
All other binaries: No measurable change

Inference Performance Impact:
No functions in the tokenization or inference paths were modified. Functions llama_decode, llama_encode, llama_tokenize, llama_model_load_from_file, and other performance-critical components show zero change in response time and throughput. Tokens per second remains unaffected as no inference engine modifications occurred.

The removed binaries represent standalone utilities for control vector generation, inference running, and text-to-speech functionality, not core inference components.

loci-agentic-ai · 2025-11-25T23:05:04Z

Explore the complete analysis inside the Version Insights

Performance Analysis Summary

Project: llama.cpp
Versions Compared: acf457e6-fec8-49e0-af2c-73481a3746f2 vs aab9b31c-ad35-48ba-b9fe-4c0fd3dc2df2

Summary

This version introduces extensive web UI enhancements to the SimpleChat frontend without modifying core llama.cpp inference engine code. Three utility binaries were removed from the build configuration. No function-level performance changes were detected in core libraries. The changes are confined to the tools/server/public_simplechat/ directory, implementing tool calling, vision support, reasoning display, and markdown rendering capabilities entirely client-side.

Power Consumption Changes:

build.bin.llama-cvector-generator: Reduced by 278999 nJ (removed from build)
build.bin.llama-run: Reduced by 245370 nJ (removed from build)
build.bin.llama-tts: Reduced by 285154 nJ (removed from build)
build.bin.libllama.so: Changed by -0.35 nJ (negligible)
All other binaries: No measurable change

Inference Performance Impact:
No functions in the tokenization or inference paths were modified. Functions llama_decode, llama_encode, llama_tokenize, llama_model_load_from_file, and other performance-critical components show zero change in response time and throughput. Tokens per second remains unaffected as no inference engine modifications occurred.

The removed binaries represent standalone utilities for control vector generation, inference running, and text-to-speech functionality, not core inference components.

loci-agentic-ai · 2025-11-25T23:05:04Z

Explore the complete analysis inside the Version Insights

Performance Analysis Summary

Project: llama.cpp
Versions Compared: acf457e6-fec8-49e0-af2c-73481a3746f2 vs aab9b31c-ad35-48ba-b9fe-4c0fd3dc2df2

Summary

This version introduces extensive web UI enhancements to the SimpleChat frontend without modifying core llama.cpp inference engine code. Three utility binaries were removed from the build configuration. No function-level performance changes were detected in core libraries. The changes are confined to the tools/server/public_simplechat/ directory, implementing tool calling, vision support, reasoning display, and markdown rendering capabilities entirely client-side.

Power Consumption Changes:

build.bin.llama-cvector-generator: Reduced by 278999 nJ (removed from build)
build.bin.llama-run: Reduced by 245370 nJ (removed from build)
build.bin.llama-tts: Reduced by 285154 nJ (removed from build)
build.bin.libllama.so: Changed by -0.35 nJ (negligible)
All other binaries: No measurable change

Inference Performance Impact:
No functions in the tokenization or inference paths were modified. Functions llama_decode, llama_encode, llama_tokenize, llama_model_load_from_file, and other performance-critical components show zero change in response time and throughput. Tokens per second remains unaffected as no inference engine modifications occurred.

The removed binaries represent standalone utilities for control vector generation, inference running, and text-to-speech functionality, not core inference components.

loci-agentic-ai · 2025-11-26T00:16:28Z

Explore the complete analysis inside the Version Insights

Performance Analysis Summary: PR #327

Overview

This PR introduces comprehensive web UI enhancements to tools/server/public_simplechat without modifying core llama.cpp inference engine. Analysis confirms zero impact on performance-critical paths.

Performance Impact Assessment

Core Inference Functions: No changes detected

llama_decode: 44,752,344 ns response time (0% change, 0 ns delta)
llama_tokenize: 899,200 ns response time (0% change, 0 ns delta)
llama_encode: Not modified
llama_batch_init: 252 ns response time (0% change, 0 ns delta)

Tokens Per Second Impact: None. The reference model (smollm:135m on 12th Gen Intel i7-1255U) maintains baseline performance as no tokenization or inference functions were modified.

Power Consumption Analysis: Negligible changes across all binaries

build.bin.libllama.so: 228,844 nJ (delta: -0.45 nJ, -0.0% change)
build.bin.llama-cvector-generator: 278,999 nJ (delta: -0.30 nJ, -0.0% change)
build.bin.llama-run: 245,370 nJ (delta: +0.13 nJ, +0.0% change)
All other binaries (libggml-base.so, libggml-cpu.so, libggml.so, libmtmd.so, llama-bench, llama-quantize, etc.): 0 nJ change

Code Changes Summary

Scope: 380 commits, 6,395 additions, 799 deletions across 29 files in tools/server/public_simplechat/

Implementation: Client-side web interface with:

Multi-session chat management with per-session configuration
Tool calling framework using Web Workers for isolation
Vision support via base64-encoded image data URLs
Markdown rendering and reasoning display
IndexedDB persistence for chat history
Optional Python proxy server for web access tools

Architecture: Modular JavaScript implementation with classes for message handling (NSChatMessage, ChatMessageEx), session management (SimpleChat), UI orchestration (MultiChatUI), and tool coordination (ToolsManager).

The changes are entirely isolated to the web UI layer, utilizing existing /chat/completions and /completions HTTP endpoints without modifications to request handling or server binary.

loci-agentic-ai · 2025-11-26T09:52:53Z

Explore the complete analysis inside the Version Insights

Performance Analysis Summary: PR #327

Overview

This PR introduces 380 commits across 29 files, adding 6,395 lines and removing 799 lines. All changes are confined to the tools/server/public_simplechat/ directory, consisting entirely of JavaScript, HTML, CSS, and Python proxy server modifications. No C++ source files, build system configurations, or core llama.cpp inference components were modified.

Performance Impact

Zero measurable performance impact on core llama.cpp binaries.

All performance-critical functions show no change:

llama_decode: 44,752,504 ns (0% change)
llama_encode: 11,254,049 ns (0% change)
llama_tokenize: 899,206 ns (0% change)
ggml_graph_compute: 1,358,852 ns (0% change)

Power consumption analysis across all binaries shows variations within compiler optimization noise:

libllama.so: +0.355 nJ
llama-cvector-generator: -0.274 nJ
llama-run: +0.158 nJ
llama-tts: -0.0003 nJ

Tokens per second: No impact. Since llama_decode, llama_encode, and llama_tokenize response times remain unchanged, inference throughput is unaffected.

Code Changes

The PR transforms the SimpleChat web UI from a basic interface into a feature-rich client supporting tool calling, vision models, reasoning display, markdown rendering, and multi-session management. Changes include:

Class-based JavaScript architecture replacing functional approach
Tool calling system with Web Worker isolation
Python proxy server for CORS bypass
IndexedDB-based session persistence
Client-side markdown rendering
Vision support with base64 image handling

All functionality operates in the browser client layer with no modifications to server-side inference paths.

hanishkvc added 30 commits November 20, 2025 12:24

SimpleChatTC:SimpleProxy: mimicing got req helps wrt duckduckgo

af0b333

mimicing got req in generated req helps with duckduckgo also and not just yahoo. also update allowed.domains to allow a url generated by ai when trying to access the bing's news aggregation url

SimpleChatTC:ToolCall response relaxed handling

1fa3cb0

Use DOMParser parseFromString in text/html mode rather than text/xml as it makes it more relaxed without worrying about special chars of xml like & etal

SimpleChatTC:SimpleProxy: Update readme wrt mimicing client req

ddc2e9d

ie during proxying

SimpleChatTC:SimpleProxy:Cleanup

22ccae3

avoid logically duplicate debug log

SimpleChatTC:AutoToolCalls: Track and clear related timers

d865b04

also cleanup the existing toolResponseTimeout timer to be in the same structure and have similar flow convention.

SimpleChatTC: Cleanup whitespaces

bbf6e4c

identified by llama.cpp editorconfig check * convert tab to spaces in json config file * remove extra space at end of line

SimpleChatTC:Cleanup whitespace - github editorconfig checker

f9c6273

Add missing newline to ending bracket line of json config file

SimpleChatTC:Update and cleanup the readme a bit

95f75e3

include info about the auto option within tools. use nonwrapped text wrt certain sections, so that the markdown readme can be viewed properly wrt the structure of the content in it.

SimpleChatTC:Duplicate tooljs.mjs to toolweb.mjs

af38fa4

So as to split browser js webworker based tool calls from web related tool calls.

SimpleChatTC:ToolCalling:Seprat out JSWebWorker and ProxyBasedWeb

64ccfd0

Remove the unneed (belonging to the other file) stuff from tooljs and toolweb files. Update tools manager to make use of the new toolweb module

SimpleChatTC:ToolCallWeby: Cleanup the toolweb module flow

6a44daf

Avoid code duplication, by creating helpers for setup and toolcall. Also send indication of the path that will be used, when checking for simpleproxy.py server to be running at runtime setup.

SimpleChatTC:WebSearchPlus: Update readme, Wikipedia in allowed

d97f2c0

If using wikipedia or so, remember to have sufficient context window in general wrt the ai engine as well as wrt the handshake / chat end point.

SimpleChatTC:SimpleProxy: Check for bearer authorization

00c3dc2

As noted in the comments in code, this is a very insecure flow for now.

SimpleChatTC:tools.proxyUrl: rename to just proxyUrl

3cbc99b

Next will be adding a proxyAuth field also to tools.

SimpleChatTC:SimpleProxy:ClientUI: Send Authorization bearer

1452c96

User can configure the bearer token to send

SimpleChatTC:ToolTemp: Ensure add removes non promoted ToolTemp

c0d7d17

SimpleChatTC:ChatSessionID through the tool call cycle

f20ac86

Pass chatId to tool call, and use chatId in got tool call resp, to decide as to to which chat session the async tool call resp belongs and inturn if auto submit timer should be started if auto is enabled.

SimpleChatTC:ChatSessionID: Get all handlers to account for chatid

800d3ae

This should ensure that tool call responses can be mapped back to the chat session for which it was triggered.

SimpleChatTC:Reasoning: Initial Go

48c824f

hanishkvc added 15 commits November 25, 2025 15:53

SimpleChatTCRV:UI:Cleanup: Have common div+label+el logic

9ea0896

Avoid seperate duplicated logic for creating the div+label+el based element

SimpleChatTCRV:Ui:Cleanup: Extended Type annotations

a4b3f81

So there is slightly better typecheck and less extra code.

SimpleChatTCRV:Markdown:Initial skeleton

ba5ae80

Try identify headings, and blocks in markdown and convert them into equivalent stuff in html Show the same in the chat message blocks.

SimpleChatTCRV:MarkDown:Headings, Pre initial cleanup

4cabed4

Remove markdown heading markers Fix pre equivalent blocks of markdown given that they can have the block type following ``` marker Remember to add line break at end of line wrt pre block.

SimpleChatTCRV:Markdown:Unordered list initial go

e66e67e

SimpleChatTCRV:MarkDown:Cleanup Unordered list initial go

b42aeed

Ensure '---' is treated as a horizontal line and doesnt mess with unordered list handling. Take care of unwinding the unordered list everywhere it is needed.

SimpleChatTCRV:Markdown: Remove unordered list marker

1dcbc43

also make flow simple by using same logic for putting the list content.

SimpleChatTCRV:MarkDown:HorizLine and Unordered list

96c5b94

Allow for other valid char based markers wrt horizontal lines and unordered lists ?Also allow for spaces after horizontal line marker, in same line?

SimpleChatTCRV:MarkDown: Better Fenced Pre

726d76c

Allow fenced code block / pre to be demarkated using either ``` or ~~~ Ensure the termination line wrt fenced block doesnt contain anything else. Same starting marker needs to be present wrt ending also

SimpleChatTCRV:Markdown: Allow fixed spaces b4 fenced pre marker

8fc910e

SimpleChatTCRV:MarkDown:Tables initial skeleton

46688b7

Try create a table head

SimpleChatTCRV:Markdown:User configurable per session

23ece5c

User can enable or disable the simple minded bruteforce markdown parsing from the per session settings. Add grey shading and align text to left wrt table headings of markdown to html converted tables.

loci-dev temporarily deployed to PROD__AL_DEMO November 25, 2025 22:37 — with GitHub Actions Inactive

loci-dev had a problem deploying to PROD__AL_DEMO November 25, 2025 23:35 — with GitHub Actions Failure

loci-dev force-pushed the main branch 2 times, most recently from 53eeb3f to 2531f8a Compare November 26, 2025 08:11

loci-dev temporarily deployed to PROD__AL_DEMO November 26, 2025 09:09 — with GitHub Actions Inactive

loci-dev force-pushed the main branch from 2531f8a to 4600128 Compare November 26, 2025 09:10

loci-dev force-pushed the main branch 4 times, most recently from 92ef8cd to 7dd50b8 Compare November 26, 2025 16:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

UPSTREAM PR #17506: server/publc_simplechat tiny (50KB compressed) web ui++ updated with reasoning, vision, builtin clientside tool calls (and markdown wip) #327

UPSTREAM PR #17506: server/publc_simplechat tiny (50KB compressed) web ui++ updated with reasoning, vision, builtin clientside tool calls (and markdown wip) #327

loci-dev commented Nov 25, 2025

Uh oh!

loci-agentic-ai bot commented Nov 25, 2025

Uh oh!

loci-agentic-ai bot commented Nov 25, 2025

Uh oh!

loci-agentic-ai bot commented Nov 25, 2025

Uh oh!

loci-agentic-ai bot commented Nov 26, 2025

Uh oh!

loci-agentic-ai bot commented Nov 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

UPSTREAM PR #17506: server/publc_simplechat tiny (50KB compressed) web ui++ updated with reasoning, vision, builtin clientside tool calls (and markdown wip) #327

Are you sure you want to change the base?

UPSTREAM PR #17506: server/publc_simplechat tiny (50KB compressed) web ui++ updated with reasoning, vision, builtin clientside tool calls (and markdown wip) #327

Conversation

loci-dev commented Nov 25, 2025

Uh oh!

loci-agentic-ai bot commented Nov 25, 2025

Performance Analysis Summary

Summary

Uh oh!

loci-agentic-ai bot commented Nov 25, 2025

Performance Analysis Summary

Summary

Uh oh!

loci-agentic-ai bot commented Nov 25, 2025

Performance Analysis Summary

Summary

Uh oh!

loci-agentic-ai bot commented Nov 26, 2025

Performance Analysis Summary: PR #327

Overview

Performance Impact Assessment

Code Changes Summary

Uh oh!

loci-agentic-ai bot commented Nov 26, 2025

Performance Analysis Summary: PR #327

Overview

Performance Impact

Code Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants