Skip to content

Merge dev branch#7425

Merged
oobabooga merged 117 commits intomainfrom
dev
Mar 16, 2026
Merged

Merge dev branch#7425
oobabooga merged 117 commits intomainfrom
dev

Conversation

@oobabooga
Copy link
Copy Markdown
Owner

No description provided.

oobabooga added 30 commits March 7, 2026 22:05
PyTorch 2.9+ includes native XPU support, making
intel-extension-for-pytorch and the separate oneAPI conda
install unnecessary.

Closes #7308
- Use config.eos_token_id_list for all EOS tokens as stop conditions
  (fixes models like Llama-3 that define multiple EOS token IDs)
- Load vision/draft models before main model so autosplit accounts
  for their VRAM usage
- Fix loss computation in ExLlamav3_HF: use cache across chunks so
  sequences longer than 2048 tokens get correct perplexity values
- Forward logit_bias and logprobs natively to llama.cpp
- Support n>1 completions with seed increment for diversity
- Fix logprobs returning empty dict when not requested
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR merges the dev branch and introduces first-class tool-calling support across the UI and OpenAI-compatible API, alongside updates to web fetching/search, reasoning/thinking extraction, and multiple installer/dependency adjustments.

Changes:

  • Add selectable tool scripts (user_data/tools/*.py) with prompt parsing/execution loops in chat and OpenAI API.
  • Replace/extend web search & page fetching (trafilatura extraction + SSRF-style URL validation/redirect handling).
  • Update multiple runtime/installer defaults (ctx-size auto behavior, ROCm versions, Gradio wheel, UI/JS/CSS refinements, logprobs/logit_bias support).

Reviewed changes

Copilot reviewed 71 out of 71 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
user_data/tools/web_search.py Adds a tool wrapper for DuckDuckGo search results (title/url).
user_data/tools/roll_dice.py Adds a dice-rolling tool.
user_data/tools/get_datetime.py Adds a date/time tool.
user_data/tools/fetch_webpage.py Adds a webpage fetch/read tool calling the new extractor.
user_data/tools/calculate.py Adds a “safe” arithmetic evaluator tool.
user_data/presets/min_p.yaml Removes preset file.
user_data/presets/Top-P.yaml Adds a Top-P preset file.
user_data/presets/Instruct.yaml Removes preset file.
start_windows.bat Forces UTF-8 mode via PYTHONUTF8=1.
server.py Refactors startup/env setup and theme autosave synchronization.
requirements/portable/requirements_vulkan.txt Updates Gradio wheel refs, llama.cpp binaries, replaces html2text with trafilatura.
requirements/portable/requirements_nowheels.txt Updates Gradio wheel refs, replaces html2text with trafilatura.
requirements/portable/requirements_cuda131.txt Updates Gradio wheel refs, llama.cpp binaries, replaces html2text with trafilatura.
requirements/portable/requirements_cpu_only.txt Updates Gradio wheel refs, llama.cpp binaries, replaces html2text with trafilatura.
requirements/portable/requirements_apple_silicon.txt Updates Gradio wheel refs, llama.cpp binaries, replaces html2text with trafilatura.
requirements/portable/requirements_apple_intel.txt Updates Gradio wheel refs, llama.cpp binaries, replaces html2text with trafilatura.
requirements/portable/requirements_amd.txt Updates Gradio wheel refs, llama.cpp binaries/ROCm, replaces html2text with trafilatura.
requirements/portable/requirements.txt Updates Gradio wheel refs, llama.cpp binaries, replaces html2text with trafilatura.
requirements/full/requirements_nowheels.txt Bumps diffusers, updates Gradio wheel refs, replaces html2text with trafilatura.
requirements/full/requirements_cpu_only.txt Bumps diffusers, updates Gradio wheel refs/binaries, replaces html2text with trafilatura.
requirements/full/requirements_apple_silicon.txt Bumps diffusers, updates Gradio wheel refs/binaries, replaces html2text with trafilatura.
requirements/full/requirements_apple_intel.txt Bumps diffusers, updates Gradio wheel refs/binaries, replaces html2text with trafilatura.
requirements/full/requirements_amd.txt Bumps diffusers, updates Gradio wheel refs/binaries/ROCm, replaces html2text with trafilatura.
requirements/full/requirements.txt Bumps diffusers, updates Gradio wheel refs/binaries, replaces html2text with trafilatura.
one_click.py Updates ROCm install path, Intel install flow, wheel state tracking, Windows python path handling.
modules/web_search.py Switches to requests + trafilatura, adds URL safety checks + redirect validation, optional content fetch.
modules/ui_parameters.py Changes sampler priority UI to DragDrop and reorders cutoff controls.
modules/ui_model_menu.py Updates ctx-size/fit-target UI text and ctx-size behavior for truncation.
modules/ui_image_generation.py Adds extra token marker stripping for prompt variation.
modules/ui_chat.py Adds incognito chat option + tools selection UI and refresh/sync behavior.
modules/ui.py Delegates model element listing to loaders; persists selected_tools; tweaks autosave keys.
modules/transformers_loader.py Adjusts logprob collection and removes RoPE scaling args wiring.
modules/training.py Blocks LoRA training for llama.cpp loader with user-facing error.
modules/tool_use.py New: loads tool scripts from user_data/tools and executes them.
modules/tool_parsing.py New: robust multi-format tool call parsing + streaming buffering heuristics.
modules/text_generation.py Preserves logits_processor across deep-copies; trims token ban parsing.
modules/shared.py Adjusts CLI defaults (ctx-size=0, fit-target), adds selected_tools, template/tool formatting, warnings copy.
modules/reasoning.py New: shared reasoning/thinking extraction (multiple formats).
modules/presets.py Adds top_n_sigma default preset value ordering.
modules/models_settings.py Removes RoPE metadata wiring; moves instruction template loading local; uses loaders for params.
modules/models.py Normalizes ctx_size default per-loader; syncs truncation length from llama.cpp server auto ctx.
modules/loaders.py Moves model element list here; adds gradio lazy imports; adds sampler list entries.
modules/llama_cpp_server.py Adds n_ctx capture, logprobs/logit_bias forwarding, fit ctx defaults, improved ctx logging.
modules/html_generator.py Centralizes reasoning extraction; adds tool-call accordion rendering; table wrapper styling hook.
modules/extensions.py Lazily imports gradio in UI functions.
modules/exllamav3_hf.py Adjusts chunked forward labels path to reuse cache.
modules/exllamav3.py Adds logit_bias filter, logprobs capture, better iterate-loop error handling, vision-before-main load.
modules/chat.py Major: tool execution loop for UI, incognito chat, tool_sequence metadata, safer history saving, template helpers.
js/main.js Updates highlight theme sync, scroll logic, tools refresh link injection, incognito hover menu, ResizeObserver.
js/global_scope_js.js Debounces morphdom updates via rAF; preserves open/scroll state; auto-scroll and padding updates.
extensions/openai/utils.py Removes duplicated tool parsing helpers (now in modules/tool_parsing.py).
extensions/openai/typing.py Adds stream_options, tool_choice, logprobs/top_logprobs, max_completion_tokens mapping.
extensions/openai/script.py Adds OpenAIError handler, streaming n restriction, [DONE] emission, listen_host handling, /v1 URLs.
extensions/openai/models.py Limits model list output to currently loaded model; uses loaders.list_model_elements for allowed args.
extensions/openai/completions.py Adds reasoning_content streaming, tool parsing integration, usage-in-stream support, logprobs formatting.
css/main.css Layout/typography tweaks, tools list styling, new chat hover menu, table wrapper styles.
css/html_instruct_style.css Adjusts max width.
css/chat_style-wpp.css Aligns widths and line-height handling with global styles.
css/chat_style-messenger.css Width tweak and dark-mode header/link color overrides.
css/chat_style-cai-chat.css Width tweak and line-height handling with global styles.
css/chat_style-cai-chat-square.css Width tweak.
css/chat_style-TheEncrypted777.css Width tweak and line-height handling with global styles.
css/chat_style-Dark.css Width tweak and relies more on global spacing rules.
cmd_windows.bat Forces UTF-8 mode via PYTHONUTF8=1.
README.md Updates feature list/docs (tool-calling/API), installation notes, and flag list excerpts.
.github/workflows/build-portable-release-rocm.yml Renames ROCm portable artifacts to “rocm7.2”.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines 70 to +77
def __call__(self, input_ids: torch.LongTensor, logits: torch.FloatTensor) -> torch.FloatTensor:
if self.logprobs is not None: # 0-5
log_e_probabilities = F.log_softmax(logits, dim=1)
top_values, top_indices = torch.topk(log_e_probabilities, k=self.logprobs + 1)
top_values, top_indices = torch.topk(log_e_probabilities, k=self.logprobs)
top_tokens = [get_reply_from_output_ids([tok]) for tok in top_indices[0]]
top_probs = [float(x) for x in top_values[0]]
self.token_alternatives = dict(zip(top_tokens, top_probs))
self.token_alternatives_history.append(self.token_alternatives)
Comment on lines +15 to +23
def _eval(node):
if isinstance(node, ast.Constant) and isinstance(node.value, (int, float)):
return node.value
elif isinstance(node, ast.BinOp) and type(node.op) in OPERATORS:
left = _eval(node.left)
right = _eval(node.right)
if isinstance(node.op, ast.Pow) and isinstance(right, (int, float)) and abs(right) > 10000:
raise ValueError("Exponent too large (max 10000)")
return OPERATORS[type(node.op)](left, right)
Comment on lines +20 to +23
def execute(arguments):
url = arguments.get("url", "")
max_tokens = arguments.get("max_tokens", 2048)
if not url:
Comment on lines +17 to +35
def _validate_url(url):
"""Validate that a URL is safe to fetch (not targeting private/internal networks)."""
parsed = urlparse(url)
if parsed.scheme not in ('http', 'https'):
raise ValueError(f"Unsupported URL scheme: {parsed.scheme}")

hostname = parsed.hostname
if not hostname:
raise ValueError("No hostname in URL")

# Resolve hostname and check all returned addresses
try:
for family, _, _, _, sockaddr in socket.getaddrinfo(hostname, None):
ip = ipaddress.ip_address(sockaddr[0])
if ip.is_private or ip.is_loopback or ip.is_link_local or ip.is_reserved:
raise ValueError(f"Access to private/internal address {ip} is blocked")
except socket.gaierror:
raise ValueError(f"Could not resolve hostname: {hostname}")

shared.gradio['do_sample'] = gr.Checkbox(value=shared.settings['do_sample'], label='do_sample')
shared.gradio['temperature_last'] = gr.Checkbox(value=shared.settings['temperature_last'], label='temperature_last', info='Moves temperature/dynamic temperature/quadratic sampling to the end of the sampler stack, ignoring their positions in "Sampler priority".')
shared.gradio['sampler_priority'] = gr.Textbox(value=shared.settings['sampler_priority'], lines=10, label='Sampler priority', info='Parameter names separated by new lines or commas.', elem_classes=['add_scrollbar'])
shared.gradio['sampler_priority'] = gr.DragDrop(value=shared.settings['sampler_priority'], label='Sampler priority', info='Parameter names separated by new lines or commas.', elem_classes=['add_scrollbar'])
@oobabooga oobabooga merged commit 88a3188 into main Mar 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants