Conversation
PyTorch 2.9+ includes native XPU support, making intel-extension-for-pytorch and the separate oneAPI conda install unnecessary. Closes #7308
- Use config.eos_token_id_list for all EOS tokens as stop conditions (fixes models like Llama-3 that define multiple EOS token IDs) - Load vision/draft models before main model so autosplit accounts for their VRAM usage - Fix loss computation in ExLlamav3_HF: use cache across chunks so sequences longer than 2048 tokens get correct perplexity values
- Forward logit_bias and logprobs natively to llama.cpp - Support n>1 completions with seed increment for diversity - Fix logprobs returning empty dict when not requested
Now models have 131k+ context length. The parameters can still be passed to llama.cpp through --extra-flags.
This sets the minimum acceptable context length, which by default is 4096.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR merges the dev branch and introduces first-class tool-calling support across the UI and OpenAI-compatible API, alongside updates to web fetching/search, reasoning/thinking extraction, and multiple installer/dependency adjustments.
Changes:
- Add selectable tool scripts (
user_data/tools/*.py) with prompt parsing/execution loops in chat and OpenAI API. - Replace/extend web search & page fetching (trafilatura extraction + SSRF-style URL validation/redirect handling).
- Update multiple runtime/installer defaults (ctx-size auto behavior, ROCm versions, Gradio wheel, UI/JS/CSS refinements, logprobs/logit_bias support).
Reviewed changes
Copilot reviewed 71 out of 71 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| user_data/tools/web_search.py | Adds a tool wrapper for DuckDuckGo search results (title/url). |
| user_data/tools/roll_dice.py | Adds a dice-rolling tool. |
| user_data/tools/get_datetime.py | Adds a date/time tool. |
| user_data/tools/fetch_webpage.py | Adds a webpage fetch/read tool calling the new extractor. |
| user_data/tools/calculate.py | Adds a “safe” arithmetic evaluator tool. |
| user_data/presets/min_p.yaml | Removes preset file. |
| user_data/presets/Top-P.yaml | Adds a Top-P preset file. |
| user_data/presets/Instruct.yaml | Removes preset file. |
| start_windows.bat | Forces UTF-8 mode via PYTHONUTF8=1. |
| server.py | Refactors startup/env setup and theme autosave synchronization. |
| requirements/portable/requirements_vulkan.txt | Updates Gradio wheel refs, llama.cpp binaries, replaces html2text with trafilatura. |
| requirements/portable/requirements_nowheels.txt | Updates Gradio wheel refs, replaces html2text with trafilatura. |
| requirements/portable/requirements_cuda131.txt | Updates Gradio wheel refs, llama.cpp binaries, replaces html2text with trafilatura. |
| requirements/portable/requirements_cpu_only.txt | Updates Gradio wheel refs, llama.cpp binaries, replaces html2text with trafilatura. |
| requirements/portable/requirements_apple_silicon.txt | Updates Gradio wheel refs, llama.cpp binaries, replaces html2text with trafilatura. |
| requirements/portable/requirements_apple_intel.txt | Updates Gradio wheel refs, llama.cpp binaries, replaces html2text with trafilatura. |
| requirements/portable/requirements_amd.txt | Updates Gradio wheel refs, llama.cpp binaries/ROCm, replaces html2text with trafilatura. |
| requirements/portable/requirements.txt | Updates Gradio wheel refs, llama.cpp binaries, replaces html2text with trafilatura. |
| requirements/full/requirements_nowheels.txt | Bumps diffusers, updates Gradio wheel refs, replaces html2text with trafilatura. |
| requirements/full/requirements_cpu_only.txt | Bumps diffusers, updates Gradio wheel refs/binaries, replaces html2text with trafilatura. |
| requirements/full/requirements_apple_silicon.txt | Bumps diffusers, updates Gradio wheel refs/binaries, replaces html2text with trafilatura. |
| requirements/full/requirements_apple_intel.txt | Bumps diffusers, updates Gradio wheel refs/binaries, replaces html2text with trafilatura. |
| requirements/full/requirements_amd.txt | Bumps diffusers, updates Gradio wheel refs/binaries/ROCm, replaces html2text with trafilatura. |
| requirements/full/requirements.txt | Bumps diffusers, updates Gradio wheel refs/binaries, replaces html2text with trafilatura. |
| one_click.py | Updates ROCm install path, Intel install flow, wheel state tracking, Windows python path handling. |
| modules/web_search.py | Switches to requests + trafilatura, adds URL safety checks + redirect validation, optional content fetch. |
| modules/ui_parameters.py | Changes sampler priority UI to DragDrop and reorders cutoff controls. |
| modules/ui_model_menu.py | Updates ctx-size/fit-target UI text and ctx-size behavior for truncation. |
| modules/ui_image_generation.py | Adds extra token marker stripping for prompt variation. |
| modules/ui_chat.py | Adds incognito chat option + tools selection UI and refresh/sync behavior. |
| modules/ui.py | Delegates model element listing to loaders; persists selected_tools; tweaks autosave keys. |
| modules/transformers_loader.py | Adjusts logprob collection and removes RoPE scaling args wiring. |
| modules/training.py | Blocks LoRA training for llama.cpp loader with user-facing error. |
| modules/tool_use.py | New: loads tool scripts from user_data/tools and executes them. |
| modules/tool_parsing.py | New: robust multi-format tool call parsing + streaming buffering heuristics. |
| modules/text_generation.py | Preserves logits_processor across deep-copies; trims token ban parsing. |
| modules/shared.py | Adjusts CLI defaults (ctx-size=0, fit-target), adds selected_tools, template/tool formatting, warnings copy. |
| modules/reasoning.py | New: shared reasoning/thinking extraction (multiple formats). |
| modules/presets.py | Adds top_n_sigma default preset value ordering. |
| modules/models_settings.py | Removes RoPE metadata wiring; moves instruction template loading local; uses loaders for params. |
| modules/models.py | Normalizes ctx_size default per-loader; syncs truncation length from llama.cpp server auto ctx. |
| modules/loaders.py | Moves model element list here; adds gradio lazy imports; adds sampler list entries. |
| modules/llama_cpp_server.py | Adds n_ctx capture, logprobs/logit_bias forwarding, fit ctx defaults, improved ctx logging. |
| modules/html_generator.py | Centralizes reasoning extraction; adds tool-call accordion rendering; table wrapper styling hook. |
| modules/extensions.py | Lazily imports gradio in UI functions. |
| modules/exllamav3_hf.py | Adjusts chunked forward labels path to reuse cache. |
| modules/exllamav3.py | Adds logit_bias filter, logprobs capture, better iterate-loop error handling, vision-before-main load. |
| modules/chat.py | Major: tool execution loop for UI, incognito chat, tool_sequence metadata, safer history saving, template helpers. |
| js/main.js | Updates highlight theme sync, scroll logic, tools refresh link injection, incognito hover menu, ResizeObserver. |
| js/global_scope_js.js | Debounces morphdom updates via rAF; preserves open/scroll state; auto-scroll and padding updates. |
| extensions/openai/utils.py | Removes duplicated tool parsing helpers (now in modules/tool_parsing.py). |
| extensions/openai/typing.py | Adds stream_options, tool_choice, logprobs/top_logprobs, max_completion_tokens mapping. |
| extensions/openai/script.py | Adds OpenAIError handler, streaming n restriction, [DONE] emission, listen_host handling, /v1 URLs. |
| extensions/openai/models.py | Limits model list output to currently loaded model; uses loaders.list_model_elements for allowed args. |
| extensions/openai/completions.py | Adds reasoning_content streaming, tool parsing integration, usage-in-stream support, logprobs formatting. |
| css/main.css | Layout/typography tweaks, tools list styling, new chat hover menu, table wrapper styles. |
| css/html_instruct_style.css | Adjusts max width. |
| css/chat_style-wpp.css | Aligns widths and line-height handling with global styles. |
| css/chat_style-messenger.css | Width tweak and dark-mode header/link color overrides. |
| css/chat_style-cai-chat.css | Width tweak and line-height handling with global styles. |
| css/chat_style-cai-chat-square.css | Width tweak. |
| css/chat_style-TheEncrypted777.css | Width tweak and line-height handling with global styles. |
| css/chat_style-Dark.css | Width tweak and relies more on global spacing rules. |
| cmd_windows.bat | Forces UTF-8 mode via PYTHONUTF8=1. |
| README.md | Updates feature list/docs (tool-calling/API), installation notes, and flag list excerpts. |
| .github/workflows/build-portable-release-rocm.yml | Renames ROCm portable artifacts to “rocm7.2”. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
Comment on lines
70
to
+77
| def __call__(self, input_ids: torch.LongTensor, logits: torch.FloatTensor) -> torch.FloatTensor: | ||
| if self.logprobs is not None: # 0-5 | ||
| log_e_probabilities = F.log_softmax(logits, dim=1) | ||
| top_values, top_indices = torch.topk(log_e_probabilities, k=self.logprobs + 1) | ||
| top_values, top_indices = torch.topk(log_e_probabilities, k=self.logprobs) | ||
| top_tokens = [get_reply_from_output_ids([tok]) for tok in top_indices[0]] | ||
| top_probs = [float(x) for x in top_values[0]] | ||
| self.token_alternatives = dict(zip(top_tokens, top_probs)) | ||
| self.token_alternatives_history.append(self.token_alternatives) |
Comment on lines
+15
to
+23
| def _eval(node): | ||
| if isinstance(node, ast.Constant) and isinstance(node.value, (int, float)): | ||
| return node.value | ||
| elif isinstance(node, ast.BinOp) and type(node.op) in OPERATORS: | ||
| left = _eval(node.left) | ||
| right = _eval(node.right) | ||
| if isinstance(node.op, ast.Pow) and isinstance(right, (int, float)) and abs(right) > 10000: | ||
| raise ValueError("Exponent too large (max 10000)") | ||
| return OPERATORS[type(node.op)](left, right) |
Comment on lines
+20
to
+23
| def execute(arguments): | ||
| url = arguments.get("url", "") | ||
| max_tokens = arguments.get("max_tokens", 2048) | ||
| if not url: |
Comment on lines
+17
to
+35
| def _validate_url(url): | ||
| """Validate that a URL is safe to fetch (not targeting private/internal networks).""" | ||
| parsed = urlparse(url) | ||
| if parsed.scheme not in ('http', 'https'): | ||
| raise ValueError(f"Unsupported URL scheme: {parsed.scheme}") | ||
|
|
||
| hostname = parsed.hostname | ||
| if not hostname: | ||
| raise ValueError("No hostname in URL") | ||
|
|
||
| # Resolve hostname and check all returned addresses | ||
| try: | ||
| for family, _, _, _, sockaddr in socket.getaddrinfo(hostname, None): | ||
| ip = ipaddress.ip_address(sockaddr[0]) | ||
| if ip.is_private or ip.is_loopback or ip.is_link_local or ip.is_reserved: | ||
| raise ValueError(f"Access to private/internal address {ip} is blocked") | ||
| except socket.gaierror: | ||
| raise ValueError(f"Could not resolve hostname: {hostname}") | ||
|
|
| shared.gradio['do_sample'] = gr.Checkbox(value=shared.settings['do_sample'], label='do_sample') | ||
| shared.gradio['temperature_last'] = gr.Checkbox(value=shared.settings['temperature_last'], label='temperature_last', info='Moves temperature/dynamic temperature/quadratic sampling to the end of the sampler stack, ignoring their positions in "Sampler priority".') | ||
| shared.gradio['sampler_priority'] = gr.Textbox(value=shared.settings['sampler_priority'], lines=10, label='Sampler priority', info='Parameter names separated by new lines or commas.', elem_classes=['add_scrollbar']) | ||
| shared.gradio['sampler_priority'] = gr.DragDrop(value=shared.settings['sampler_priority'], label='Sampler priority', info='Parameter names separated by new lines or commas.', elem_classes=['add_scrollbar']) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.