Merge dev branch by oobabooga · Pull Request #7425 · oobabooga/text-generation-webui

oobabooga · 2026-03-13T02:31:44Z

No description provided.

…nged wheels

PyTorch 2.9+ includes native XPU support, making intel-extension-for-pytorch and the separate oneAPI conda install unnecessary. Closes #7308

- Use config.eos_token_id_list for all EOS tokens as stop conditions (fixes models like Llama-3 that define multiple EOS token IDs) - Load vision/draft models before main model so autosplit accounts for their VRAM usage - Fix loss computation in ExLlamav3_HF: use cache across chunks so sequences longer than 2048 tokens get correct perplexity values

…Llamav3

- Forward logit_bias and logprobs natively to llama.cpp - Support n>1 completions with seed increment for diversity - Fix logprobs returning empty dict when not requested

…sponses

…esets

Now models have 131k+ context length. The parameters can still be passed to llama.cpp through --extra-flags.

This sets the minimum acceptable context length, which by default is 4096.

Closes #7429

Copilot

Pull request overview

This PR merges the dev branch and introduces first-class tool-calling support across the UI and OpenAI-compatible API, alongside updates to web fetching/search, reasoning/thinking extraction, and multiple installer/dependency adjustments.

Changes:

Add selectable tool scripts (user_data/tools/*.py) with prompt parsing/execution loops in chat and OpenAI API.
Replace/extend web search & page fetching (trafilatura extraction + SSRF-style URL validation/redirect handling).
Update multiple runtime/installer defaults (ctx-size auto behavior, ROCm versions, Gradio wheel, UI/JS/CSS refinements, logprobs/logit_bias support).

Reviewed changes

Copilot reviewed 71 out of 71 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
user_data/tools/web_search.py	Adds a tool wrapper for DuckDuckGo search results (title/url).
user_data/tools/roll_dice.py	Adds a dice-rolling tool.
user_data/tools/get_datetime.py	Adds a date/time tool.
user_data/tools/fetch_webpage.py	Adds a webpage fetch/read tool calling the new extractor.
user_data/tools/calculate.py	Adds a “safe” arithmetic evaluator tool.
user_data/presets/min_p.yaml	Removes preset file.
user_data/presets/Top-P.yaml	Adds a Top-P preset file.
user_data/presets/Instruct.yaml	Removes preset file.
start_windows.bat	Forces UTF-8 mode via `PYTHONUTF8=1`.
server.py	Refactors startup/env setup and theme autosave synchronization.
requirements/portable/requirements_vulkan.txt	Updates Gradio wheel refs, llama.cpp binaries, replaces html2text with trafilatura.
requirements/portable/requirements_nowheels.txt	Updates Gradio wheel refs, replaces html2text with trafilatura.
requirements/portable/requirements_cuda131.txt	Updates Gradio wheel refs, llama.cpp binaries, replaces html2text with trafilatura.
requirements/portable/requirements_cpu_only.txt	Updates Gradio wheel refs, llama.cpp binaries, replaces html2text with trafilatura.
requirements/portable/requirements_apple_silicon.txt	Updates Gradio wheel refs, llama.cpp binaries, replaces html2text with trafilatura.
requirements/portable/requirements_apple_intel.txt	Updates Gradio wheel refs, llama.cpp binaries, replaces html2text with trafilatura.
requirements/portable/requirements_amd.txt	Updates Gradio wheel refs, llama.cpp binaries/ROCm, replaces html2text with trafilatura.
requirements/portable/requirements.txt	Updates Gradio wheel refs, llama.cpp binaries, replaces html2text with trafilatura.
requirements/full/requirements_nowheels.txt	Bumps diffusers, updates Gradio wheel refs, replaces html2text with trafilatura.
requirements/full/requirements_cpu_only.txt	Bumps diffusers, updates Gradio wheel refs/binaries, replaces html2text with trafilatura.
requirements/full/requirements_apple_silicon.txt	Bumps diffusers, updates Gradio wheel refs/binaries, replaces html2text with trafilatura.
requirements/full/requirements_apple_intel.txt	Bumps diffusers, updates Gradio wheel refs/binaries, replaces html2text with trafilatura.
requirements/full/requirements_amd.txt	Bumps diffusers, updates Gradio wheel refs/binaries/ROCm, replaces html2text with trafilatura.
requirements/full/requirements.txt	Bumps diffusers, updates Gradio wheel refs/binaries, replaces html2text with trafilatura.
one_click.py	Updates ROCm install path, Intel install flow, wheel state tracking, Windows python path handling.
modules/web_search.py	Switches to requests + trafilatura, adds URL safety checks + redirect validation, optional content fetch.
modules/ui_parameters.py	Changes sampler priority UI to DragDrop and reorders cutoff controls.
modules/ui_model_menu.py	Updates ctx-size/fit-target UI text and ctx-size behavior for truncation.
modules/ui_image_generation.py	Adds extra token marker stripping for prompt variation.
modules/ui_chat.py	Adds incognito chat option + tools selection UI and refresh/sync behavior.
modules/ui.py	Delegates model element listing to loaders; persists `selected_tools`; tweaks autosave keys.
modules/transformers_loader.py	Adjusts logprob collection and removes RoPE scaling args wiring.
modules/training.py	Blocks LoRA training for llama.cpp loader with user-facing error.
modules/tool_use.py	New: loads tool scripts from `user_data/tools` and executes them.
modules/tool_parsing.py	New: robust multi-format tool call parsing + streaming buffering heuristics.
modules/text_generation.py	Preserves logits_processor across deep-copies; trims token ban parsing.
modules/shared.py	Adjusts CLI defaults (ctx-size=0, fit-target), adds `selected_tools`, template/tool formatting, warnings copy.
modules/reasoning.py	New: shared reasoning/thinking extraction (multiple formats).
modules/presets.py	Adds `top_n_sigma` default preset value ordering.
modules/models_settings.py	Removes RoPE metadata wiring; moves instruction template loading local; uses loaders for params.
modules/models.py	Normalizes ctx_size default per-loader; syncs truncation length from llama.cpp server auto ctx.
modules/loaders.py	Moves model element list here; adds gradio lazy imports; adds sampler list entries.
modules/llama_cpp_server.py	Adds n_ctx capture, logprobs/logit_bias forwarding, fit ctx defaults, improved ctx logging.
modules/html_generator.py	Centralizes reasoning extraction; adds tool-call accordion rendering; table wrapper styling hook.
modules/extensions.py	Lazily imports gradio in UI functions.
modules/exllamav3_hf.py	Adjusts chunked forward labels path to reuse cache.
modules/exllamav3.py	Adds logit_bias filter, logprobs capture, better iterate-loop error handling, vision-before-main load.
modules/chat.py	Major: tool execution loop for UI, incognito chat, tool_sequence metadata, safer history saving, template helpers.
js/main.js	Updates highlight theme sync, scroll logic, tools refresh link injection, incognito hover menu, ResizeObserver.
js/global_scope_js.js	Debounces morphdom updates via rAF; preserves open/scroll state; auto-scroll and padding updates.
extensions/openai/utils.py	Removes duplicated tool parsing helpers (now in modules/tool_parsing.py).
extensions/openai/typing.py	Adds stream_options, tool_choice, logprobs/top_logprobs, max_completion_tokens mapping.
extensions/openai/script.py	Adds OpenAIError handler, streaming `n` restriction, [DONE] emission, listen_host handling, /v1 URLs.
extensions/openai/models.py	Limits model list output to currently loaded model; uses loaders.list_model_elements for allowed args.
extensions/openai/completions.py	Adds reasoning_content streaming, tool parsing integration, usage-in-stream support, logprobs formatting.
css/main.css	Layout/typography tweaks, tools list styling, new chat hover menu, table wrapper styles.
css/html_instruct_style.css	Adjusts max width.
css/chat_style-wpp.css	Aligns widths and line-height handling with global styles.
css/chat_style-messenger.css	Width tweak and dark-mode header/link color overrides.
css/chat_style-cai-chat.css	Width tweak and line-height handling with global styles.
css/chat_style-cai-chat-square.css	Width tweak.
css/chat_style-TheEncrypted777.css	Width tweak and line-height handling with global styles.
css/chat_style-Dark.css	Width tweak and relies more on global spacing rules.
cmd_windows.bat	Forces UTF-8 mode via `PYTHONUTF8=1`.
README.md	Updates feature list/docs (tool-calling/API), installation notes, and flag list excerpts.
.github/workflows/build-portable-release-rocm.yml	Renames ROCm portable artifacts to “rocm7.2”.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

modules/transformers_loader.py

    def __call__(self, input_ids: torch.LongTensor, logits: torch.FloatTensor) -> torch.FloatTensor:
        if self.logprobs is not None:  # 0-5
            log_e_probabilities = F.log_softmax(logits, dim=1)
-            top_values, top_indices = torch.topk(log_e_probabilities, k=self.logprobs + 1)
+            top_values, top_indices = torch.topk(log_e_probabilities, k=self.logprobs)
            top_tokens = [get_reply_from_output_ids([tok]) for tok in top_indices[0]]
            top_probs = [float(x) for x in top_values[0]]
            self.token_alternatives = dict(zip(top_tokens, top_probs))
+            self.token_alternatives_history.append(self.token_alternatives)


user_data/tools/calculate.py

+def _eval(node):
+    if isinstance(node, ast.Constant) and isinstance(node.value, (int, float)):
+        return node.value
+    elif isinstance(node, ast.BinOp) and type(node.op) in OPERATORS:
+        left = _eval(node.left)
+        right = _eval(node.right)
+        if isinstance(node.op, ast.Pow) and isinstance(right, (int, float)) and abs(right) > 10000:
+            raise ValueError("Exponent too large (max 10000)")
+        return OPERATORS[type(node.op)](left, right)


user_data/tools/fetch_webpage.py

+def execute(arguments):
+    url = arguments.get("url", "")
+    max_tokens = arguments.get("max_tokens", 2048)
+    if not url:


modules/web_search.py

+def _validate_url(url):
+    """Validate that a URL is safe to fetch (not targeting private/internal networks)."""
+    parsed = urlparse(url)
+    if parsed.scheme not in ('http', 'https'):
+        raise ValueError(f"Unsupported URL scheme: {parsed.scheme}")
+
+    hostname = parsed.hostname
+    if not hostname:
+        raise ValueError("No hostname in URL")
+
+    # Resolve hostname and check all returned addresses
+    try:
+        for family, _, _, _, sockaddr in socket.getaddrinfo(hostname, None):
+            ip = ipaddress.ip_address(sockaddr[0])
+            if ip.is_private or ip.is_loopback or ip.is_link_local or ip.is_reserved:
+                raise ValueError(f"Access to private/internal address {ip} is blocked")
+    except socket.gaierror:
+        raise ValueError(f"Could not resolve hostname: {hostname}")
+


modules/ui_parameters.py

                            shared.gradio['do_sample'] = gr.Checkbox(value=shared.settings['do_sample'], label='do_sample')
                            shared.gradio['temperature_last'] = gr.Checkbox(value=shared.settings['temperature_last'], label='temperature_last', info='Moves temperature/dynamic temperature/quadratic sampling to the end of the sampler stack, ignoring their positions in "Sampler priority".')
-                            shared.gradio['sampler_priority'] = gr.Textbox(value=shared.settings['sampler_priority'], lines=10, label='Sampler priority', info='Parameter names separated by new lines or commas.', elem_classes=['add_scrollbar'])
+                            shared.gradio['sampler_priority'] = gr.DragDrop(value=shared.settings['sampler_priority'], label='Sampler priority', info='Parameter names separated by new lines or commas.', elem_classes=['add_scrollbar'])


oobabooga added 30 commits March 7, 2026 22:05

ExLlamav3: handle exceptions in ConcurrentGenerator iterate loop

6ff111d

ExLlamav3: fix draft cache size to match main cache

baf4e13

Add PyPI fallback for PyTorch install commands

0132966

Fix passing adaptive-p to llama-server

7a8ca9f

Remove ctx_size_draft from ExLlamav3 loader

5a91b84

Add guard against training with llama.cpp loader

f6ffecf

README: Minor updates

40f1837

Fix pip installing to system Miniconda on Windows, revert 0132966

634609a

Update README

eb4a201

Fix crash on non-UTF-8 Windows locales (e.g. Chinese GBK)

9753b23

Closes #7416

One-click installer: Optimize wheel downloads to only re-download cha…

d6643bb

…nged wheels

Update Intel GPU support to use native PyTorch XPU wheels

970055c

PyTorch 2.9+ includes native XPU support, making intel-extension-for-pytorch and the separate oneAPI conda install unnecessary. Closes #7308

Refactor to not import gradio in --nowebui mode

39e6c99

Update README

83b7e47

Update README

3b71932

Update ExLlamaV3 to 0.0.24

15792c3

Update the --multi-user warning

c604ca6

Minor warning change

307c085

Add missing custom_token_bans to llama.cpp and reasoning_effort to Ex…

6ec4ca8

…Llamav3

Forward logit_bias, logprobs, and n to llama.cpp backend

8aeaa76

- Forward logit_bias and logprobs natively to llama.cpp - Support n>1 completions with seed increment for diversity - Fix logprobs returning empty dict when not requested

Add native logit_bias and logprobs support for ExLlamav3

3304b57

API: Improve OpenAI spec compliance in streaming and non-streaming re…

f1cfeae

…sponses

Update llama.cpp

7a63a56

Update AMD ROCm from 6.4 to 7.2

2497784

Update README with ROCm 7.2 torch install URL

66c976e

Use a new gr.DragDrop element for Sampler priority + update gradio

bb00d96

UI: Minor defensive changes to autosave

980a9d1

Initial tool-calling support in the UI

cf9ad8e

Add tools refresh button and _tool_turn comment

0d62038

oobabooga added 22 commits March 14, 2026 15:30

Change the default ctx-size to 0 (auto) for llama.cpp

4ae2bd8

Fix relative redirect handling in web page fetcher

e11425d

UI: Minor morphdom optimizations

9eacd4a

Fix after 4ae2bd8

b9bdbd6

UI: Minor color change

c126530

UI: Set chat widths to 724px

d1aba08

UI: Fix autoscroll not engaging when regenerating short chats

9955e54

Add a Top-P preset, make it the new default, clean up the built-in pr…

2d3a379

…esets

Remove the rope scaling parameters

f0c1681

Now models have 131k+ context length. The parameters can still be passed to llama.cpp through --extra-flags.

Fix a crash loading the MiniMax-M2.5 jinja template

5763cab

llama.cpp: Use --fit-ctx 8192 when --fit on is used

9119ce0

This sets the minimum acceptable context length, which by default is 4096.

llama.cpp: Change the default --fit-target from 1024 to 512

80d0c03

Move top_p and top_k higher up in the UI and CLI help

bfea49b

UI: Fix scroll jump when toggling thinking blocks during streaming

1a2b840

API: Fix /v1/models to only list the currently loaded model

f6a749a

web_search: Return all results and improve URL extraction

92d376e

Update the custom gradio wheels

f8ff7cf

UI: Follow-up to beab346 (fix scroll deadlock on chat-parent)

4f80b20

UI: Add an incognito chat option

c0de1d1

API: Respect --listen-host for the OpenAI API server

b76a289

Closes #7429

Update README

5cfe9fe

Update README

9d9f5d9

oobabooga requested a review from Copilot March 16, 2026 12:19

Copilot started reviewing on behalf of oobabooga March 16, 2026 12:20 View session

Copilot AI reviewed Mar 16, 2026

View reviewed changes

oobabooga added 4 commits March 16, 2026 05:29

Update README

50685c9

Web search: Fix SSRF validation to block all non-global IPs

737ded6

docs: Mention supported tool-calling models

6c05a96

Update llama.cpp

4481075

oobabooga merged commit 88a3188 into main Mar 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge dev branch#7425

Merge dev branch#7425
oobabooga merged 117 commits intomainfrom
dev

oobabooga commented Mar 13, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

oobabooga commented Mar 13, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants