-
Notifications
You must be signed in to change notification settings - Fork 0
UPSTREAM PR #17506: server/publc_simplechat tiny (50KB compressed) web ui++ updated with reasoning, vision, builtin clientside tool calls (and markdown wip) #327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
mimicing got req in generated req helps with duckduckgo also and not just yahoo. also update allowed.domains to allow a url generated by ai when trying to access the bing's news aggregation url
Use DOMParser parseFromString in text/html mode rather than text/xml as it makes it more relaxed without worrying about special chars of xml like & etal
ie during proxying
Instead of simple concatenating of tool call id, name and result now use browser's dom logic to create the xml structure used for now to store these within content field. This should take care of transforming / escaping any xml special chars in the result, so that extracting them later for putting into different fields in the server handshake doesnt have any problem.
bing raised a challenge for chrome triggered search requests after few requests, which were spread few minutes apart, while still seemingly allowing wget based search to continue (again spread few minutes apart). Added a simple helper to trace this, use --debug True to enable same.
avoid logically duplicate debug log
Instead of enforcing always explicit user triggered tool calling, now user is given the option whether to use explicit user triggered tool calling or to use auto triggering after showing tool details for a user specified amount of seconds. NOTE: The current logic doesnt account for user clicking the buttons before the autoclick triggers; need to cancel the auto clicks, if user triggers before autoclick, ie in future.
also cleanup the existing toolResponseTimeout timer to be in the same structure and have similar flow convention.
identified by llama.cpp editorconfig check * convert tab to spaces in json config file * remove extra space at end of line
Add missing newline to ending bracket line of json config file
include info about the auto option within tools. use nonwrapped text wrt certain sections, so that the markdown readme can be viewed properly wrt the structure of the content in it.
So as to split browser js webworker based tool calls from web related tool calls.
Remove the unneed (belonging to the other file) stuff from tooljs and toolweb files. Update tools manager to make use of the new toolweb module
Initial go at implementing a web search tool call, which uses the existing UrlText support of the bundled simpleproxy.py. It allows user to control the search engine to use, by allowing them to set the search engine url template. The logic comes with search engine url template strings for duckduckgo, brave, bing and google. With duckduckgo set by default.
Avoid code duplication, by creating helpers for setup and toolcall. Also send indication of the path that will be used, when checking for simpleproxy.py server to be running at runtime setup.
If using wikipedia or so, remember to have sufficient context window in general wrt the ai engine as well as wrt the handshake / chat end point.
Moved it into Me->tools, so that end user can modify the same as required from the settings ui. TODO: Currently, if tc response is got after a tool call timed out and user submitted default timed out error response, the delayed actual response when it is got may overwrite any new content in user query box, this needs to be tackled.
Now both follow a similar mechanism and do the following * exit on finding any issue, so that things are in a known state from usage perspective, without any confusion/overlook * check if the cmdlineArgCmd/configCmd being processed is a known one or not. * check value of the cmd is of the expected type * have a generic flow which can accomodate more cmds in future in a simple way
Ensure load_config gets called on encountering --config in cmdline, so that the user has control over whether cmdline or config file will decide the final value of any given parameter. Ensure that str type values in cmdline are picked up directly, without running them through ast.literal_eval, bcas otherwise one will have to ensure throught the cmdline arg mechanism that string quote is retained for literal_eval Have the """ function note/description below def line immidiately so that it is interpreted as a function description.
Add a config entry called bearer.insecure which will contain a token used for bearer auth of http requests Make bearer.insecure and allowed.domains as needed configs, and exit program if they arent got through cmdline or config file.
As noted in the comments in code, this is a very insecure flow for now.
Next will be adding a proxyAuth field also to tools.
User can configure the bearer token to send
instead of using the shared bearer token as is, hash it with current year and use the hash. keep /aum path out of auth check. in future bearer token could be transformed more often, as well as with additional nounce/dynamic token from server got during initial /aum handshake as also running counter and so ... NOTE: All these circus not good enough, given that currently the simpleproxy.py handshakes work over http. However these skeletons put in place, for future, if needed. TODO: There is a once in a bluemoon race when the year transitions between client generating the request and server handling the req. But other wise year transitions dont matter bcas client always creates fresh token, and server checks for year change to genrate fresh token if required.
Add a new role ToolTemp, which is used to maintain any tool call response on the client ui side, without submitting it to the server ie till user or auto submit triggers the submitting of that tool call response. When ever a tool call response is got, create a ToolTemp role based message in the corresponding chat session. And dont directly update the user query input area, rather leave it to the updated simplechat show and the new multichatui chat_show helper and inturn whether the current chat session active in ui is same as the one for which the tool call response has been recieved. TODO: Currently the response message is added to the current active chat session, but this needs to be changed by tracking chatId/session through the full tool call cycle and then adding the tool call response in the related chat session, and inturn updating or not the ui based on whether that chat session is still the active chat session in ui or not, given that tool call gets handled in a asynchronous way. Now when that tool call response is submitted, promote the equiv tool temp role based message that should be in the session's chat history as the last message into becoming a normal tool response message. SimpleChat.show has been updated to take care of showing any ToolTemp role message in the user query input area. A newer chat_show helper added to MultiChatUI, that takes care of calling SimpleChat.show, provided the chat_show is being requested for the currently active in ui, chat session. As well as to take care of passing both the ChatDiv and elInUser. Converts users of SimpleChat.show to use MultiChatUI.chat_show
Update the immidiate tool call triggering failure and tool call response timeout paths to use the new ToolTemp and MultiChatUI based chat show logics. Actual tool call itself generating errors, is already handled in the previous commit changes.
Pass chatId to tool call, and use chatId in got tool call resp, to decide as to to which chat session the async tool call resp belongs and inturn if auto submit timer should be started if auto is enabled.
This should ensure that tool call responses can be mapped back to the chat session for which it was triggered.
Avoid seperate duplicated logic for creating the div+label+el based element
So there is slightly better typecheck and less extra code.
Try identify headings, and blocks in markdown and convert them into equivalent stuff in html Show the same in the chat message blocks.
Remove markdown heading markers Fix pre equivalent blocks of markdown given that they can have the block type following ``` marker Remember to add line break at end of line wrt pre block.
Ensure '---' is treated as a horizontal line and doesnt mess with unordered list handling. Take care of unwinding the unordered list everywhere it is needed.
also make flow simple by using same logic for putting the list content.
Allow for other valid char based markers wrt horizontal lines and unordered lists ?Also allow for spaces after horizontal line marker, in same line?
Allow fenced code block / pre to be demarkated using either ``` or ~~~ Ensure the termination line wrt fenced block doesnt contain anything else. Same starting marker needs to be present wrt ending also
Try create a table head
Rather this wont work, need to refresh on regex, been too long. Rather using split should be simpler However the extraction of head and body parts with seperation inbetween for transition should work Rather the seperation is blindly assumed and corresponding line discarded for now
Switch to the simpler split based flow. Include tr wrt the table head block also. Add a css entry to try and have header cell contents text aling to left for now, given that there is no border or color shaded or so distinguishing characteristics wrt the table cells for now.
User can enable or disable the simple minded bruteforce markdown parsing from the per session settings. Add grey shading and align text to left wrt table headings of markdown to html converted tables.
Save copy of data being processed. Try and sanitize the data passed for markdown to html conversion, so that if there are any special characters wrt html in the passed markdown content, it gets translated into a harmless text. This also ensures that those text dont disappear, bcas of browser trying to interpret them as html tagged content. Trap any errors during sanitizing and or processing of the lines in general and push them into a errors array. Callers of this markdown class can decide whether to use the converted html or not based on errors being empty or not or ... Move the processing of unordered list into a function of its own. Rather the ordered list can also use the same flow in general except for some tiny changes including wrt the regex, potentially.
|
Explore the complete analysis inside the Version Insights Performance Analysis SummaryProject: llama.cpp SummaryThis version introduces extensive web UI enhancements to the SimpleChat frontend without modifying core llama.cpp inference engine code. Three utility binaries were removed from the build configuration. No function-level performance changes were detected in core libraries. The changes are confined to the Power Consumption Changes:
Inference Performance Impact: The removed binaries represent standalone utilities for control vector generation, inference running, and text-to-speech functionality, not core inference components. |
2 similar comments
|
Explore the complete analysis inside the Version Insights Performance Analysis SummaryProject: llama.cpp SummaryThis version introduces extensive web UI enhancements to the SimpleChat frontend without modifying core llama.cpp inference engine code. Three utility binaries were removed from the build configuration. No function-level performance changes were detected in core libraries. The changes are confined to the Power Consumption Changes:
Inference Performance Impact: The removed binaries represent standalone utilities for control vector generation, inference running, and text-to-speech functionality, not core inference components. |
|
Explore the complete analysis inside the Version Insights Performance Analysis SummaryProject: llama.cpp SummaryThis version introduces extensive web UI enhancements to the SimpleChat frontend without modifying core llama.cpp inference engine code. Three utility binaries were removed from the build configuration. No function-level performance changes were detected in core libraries. The changes are confined to the Power Consumption Changes:
Inference Performance Impact: The removed binaries represent standalone utilities for control vector generation, inference running, and text-to-speech functionality, not core inference components. |
|
Explore the complete analysis inside the Version Insights Performance Analysis Summary: PR #327OverviewThis PR introduces comprehensive web UI enhancements to Performance Impact AssessmentCore Inference Functions: No changes detected
Tokens Per Second Impact: None. The reference model (smollm:135m on 12th Gen Intel i7-1255U) maintains baseline performance as no tokenization or inference functions were modified. Power Consumption Analysis: Negligible changes across all binaries
Code Changes SummaryScope: 380 commits, 6,395 additions, 799 deletions across 29 files in Implementation: Client-side web interface with:
Architecture: Modular JavaScript implementation with classes for message handling ( The changes are entirely isolated to the web UI layer, utilizing existing |
53eeb3f to
2531f8a
Compare
|
Explore the complete analysis inside the Version Insights Performance Analysis Summary: PR #327OverviewThis PR introduces 380 commits across 29 files, adding 6,395 lines and removing 799 lines. All changes are confined to the Performance ImpactZero measurable performance impact on core llama.cpp binaries. All performance-critical functions show no change:
Power consumption analysis across all binaries shows variations within compiler optimization noise:
Tokens per second: No impact. Since llama_decode, llama_encode, and llama_tokenize response times remain unchanged, inference throughput is unaffected. Code ChangesThe PR transforms the SimpleChat web UI from a basic interface into a feature-rich client supporting tool calling, vision models, reasoning display, markdown rendering, and multi-session management. Changes include:
All functionality operates in the browser client layer with no modifications to server-side inference paths. |
92ef8cd to
7dd50b8
Compare
Mirrored from ggml-org/llama.cpp#17506
updated server/public_simplechat additionally with a initial go at a simple minded minimal markdown to html logic, so that if the ai model is outputting markdown text instead of plain text, user gets a basic formatted view of the same. If things dont seem ok, user can disable markdown processing from settings in ui.
look into the previous PR #17451 in this series for details wrt other features added to tools/server/public_simplechat
like peeking into reasoning, working with vision models as well as built in support for a bunch of useful tool calls on the client side with minimal to no setup.
All features (except for pdf - pypdf dep) are implemented internally without depending on any external libraries, and inturn should fit within 50KB compressed. Created using pure html+css+js in general, with additionally python for simpleproxy to bypass the cors++ restrictions in browser environment for direct web access.