Skip to content

[update] support for qwen3-4b#86

Open
astridesa wants to merge 7 commits intomainfrom
85
Open

[update] support for qwen3-4b#86
astridesa wants to merge 7 commits intomainfrom
85

Conversation

@astridesa
Copy link
Contributor

@astridesa astridesa commented Oct 13, 2025

Summary by CodeRabbit

  • New Features

    • Added support for the Qwen 3 template and the latest Qwen 4B Instruct model.
    • Expanded template mappings for broader model compatibility.
  • Bug Fixes

    • Improved robustness when processing tools, gracefully handling empty or absent tool data.
    • Standardized function-call content handling to prevent formatting issues.
  • Refactor

    • Streamlined tool formatting and prompt construction for consistency.
  • Chores

    • Updated dependency version ranges for core ML libraries to newer compatible releases.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 13, 2025

Walkthrough

Updates dependency pins, expands supported base models, registers a new “qwen3” template, aliases multiple model names to templates, and adjusts dataset tool/function-call handling to use JSON-based formatting and conditional processing.

Changes

Cohort / File(s) Summary
Dependencies
requirements.txt
Relaxed/raised versions: huggingface-hub → >=0.34.0,<1.0; transformers → >=4.51.0. Others unchanged.
Model constants & template mappings
src/core/constant.py
Added "Qwen/Qwen3-4B-Instruct-2507" to SUPPORTED_BASE_MODELS. Introduced MODEL_TEMPLATE_MAP mapping multiple model IDs to template names, including mapping the new Qwen model to "qwen3".
Template registration & aliasing
src/core/template.py
Registered new qwen3 template with system/user/assistant/tool/function/observation formats and stop words. Imported MODEL_TEMPLATE_MAP. Added loops to alias template_dict[model_name] = template_dict[template_name] for all mappings (appears twice in file).
Dataset tool/function handling
src/core/dataset.py
Tool guard now checks only key presence; inner handling runs if parsed tools are truthy. Replaced tool_formater with json.dumps(tools). For function_call, removed function_formatter(json.loads(content)); uses raw content for tool_calls. Retains encoding/token steps.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant DS as Dataset Builder
  participant EX as Example (JSON)
  participant TP as Template
  participant TK as Tokenizer

  DS->>EX: Read conversation + optional "tools"
  alt "tools" key present
    DS->>DS: Parse tools (JSON)
    opt tools is truthy
      DS->>DS: json.dumps(tools)
      DS->>TP: Inject tool schema text
    end
  end
  loop Messages
    alt role=function_call
      DS->>DS: Use raw content as tool_calls (no function_formatter)
    else other roles
      DS->>TP: Format message segments
    end
  end
  DS->>TK: Encode composed prompt(s)
  TK-->>DS: Token IDs
Loading
sequenceDiagram
  autonumber
  participant CT as Constants
  participant TM as Template Registry
  participant CL as Client Code

  CT-->>TM: MODEL_TEMPLATE_MAP {model_name -> template_name}
  TM->>TM: register_template("qwen3", ...)
  loop For each mapping
    TM->>TM: template_dict[model_name] = template_dict[template_name]
  end
  CL->>TM: Resolve template for model_name
  TM-->>CL: Effective template instance
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • Update constant.py #84 — Also edits SUPPORTED_BASE_MODELS in src/core/constant.py, indicating adjacent work on model support lists.

Poem

I twitch my whiskers, patch by patch,
New Qwen hops in, a shiny match.
Tools now speak in JSON streams,
Templates map like burrowed schemes.
Dependencies grow, carrots align—
Ship the prompt, the outputs shine. 🥕✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title “[update] support for qwen3-4b” clearly and concisely captures the main change of adding support for the Qwen3-4B model in this pull request, making it easy for teammates to understand the update at a glance.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch 85

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7b6f484 and 42da60c.

📒 Files selected for processing (4)
  • requirements.txt (1 hunks)
  • src/core/constant.py (2 hunks)
  • src/core/dataset.py (2 hunks)
  • src/core/template.py (3 hunks)
🔇 Additional comments (6)
src/core/constant.py (2)

17-17: LGTM! New model added correctly.

The Qwen3-4B model is properly added to the supported models list.


55-109: LGTM! Comprehensive template mapping introduced.

The new MODEL_TEMPLATE_MAP provides a clear and maintainable mapping from model identifiers to their respective template names. This approach:

  • Centralizes model-to-template associations
  • Makes it easy to add new models in the future
  • Enables using model names directly as template identifiers (via the aliasing in template.py)

The mapping is well-organized by model family and includes all models from SUPPORTED_BASE_MODELS.

src/core/template.py (3)

3-3: LGTM! Import enables model-to-template aliasing.

The import of MODEL_TEMPLATE_MAP enables the aliasing logic at the end of the file, allowing model names to be used directly as template identifiers.


71-88: LGTM! Qwen3 template registered with enhanced tool support.

The new "qwen3" template is well-defined with:

  • Proper system, user, and assistant format matching the qwen1.5 structure
  • Enhanced tool_format with XML tags (<tools>, <tool_call>) and clear instructions for function calling
  • Appropriate observation_format using <tool_response> tags
  • Correct stop word and system message

This template aligns with Qwen3's expected formatting for tool/function calling scenarios.


206-207: LGTM! Elegant aliasing enables using model names as template identifiers.

The loop creates aliases in template_dict so that model names (e.g., "Qwen/Qwen3-4B-Instruct-2507") can be used directly as template identifiers, which will resolve to their corresponding template (e.g., "qwen3"). This improves usability by allowing users to specify model names instead of template names.

Note: The AI summary mentions "two for-loops" appearing in two locations, but only one loop is visible in the provided code. This may be a summary inconsistency or the second loop may exist outside the shown hunks.

src/core/dataset.py (1)

70-71: Validate raw function-call content against function_format.
By replacing function_formatter(json.loads(content)) with raw content, we now feed the template unformatted JSON. Confirm that content matches what template.function_format expects (e.g. same slots/quoting) and doesn’t break existing prompts or datasets—consider adding a validation step or unit test to catch mismatches early.

Comment on lines +2 to +3
huggingface-hub>=0.34.0,<1.0
transformers>=4.51.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify dependency compatibility and check for breaking changes.

The dependency versions have been updated with significant changes:

  • huggingface-hub: from exact pin 0.29.1 to range >=0.34.0,<1.0 (major version jump)
  • transformers: from exact pin 4.49.0 to minimum >=4.51.0 (minor version jump)

These changes relax version constraints, which improves flexibility but may introduce compatibility issues. Verify that:

  • The new versions are compatible with the codebase and don't introduce breaking changes.
  • The version ranges are tested and known to work.
  • Other dependencies (torch, peft, etc.) remain compatible with the new versions.

Run the following script to check for known security vulnerabilities and verify the latest stable versions:


🏁 Script executed:

#!/bin/bash
# Description: Check for security advisories and latest versions

# Check for security advisories in huggingface-hub
gh api graphql -f query='
{
  securityVulnerabilities(first: 5, ecosystem: PIP, package: "huggingface-hub") {
    nodes {
      advisory {
        summary
        severity
        publishedAt
      }
      vulnerableVersionRange
      firstPatchedVersion {
        identifier
      }
    }
  }
}'

# Check for security advisories in transformers
gh api graphql -f query='
{
  securityVulnerabilities(first: 5, ecosystem: PIP, package: "transformers") {
    nodes {
      advisory {
        summary
        severity
        publishedAt
      }
      vulnerableVersionRange
      firstPatchedVersion {
        identifier
      }
    }
  }
}'

# Verify latest versions
curl -s https://pypi.org/pypi/huggingface-hub/json | jq -r '.info.version'
curl -s https://pypi.org/pypi/transformers/json | jq -r '.info.version'

Length of output: 1382


Restrict transformers version to avoid ReDoS and breaking changes
Transformers <4.53.0 are vulnerable to multiple ReDoS issues; update requirement to transformers>=4.53.0,<5.0.0.
huggingface-hub >=0.34.0,<1.0 has no known advisories.

🤖 Prompt for AI Agents
In requirements.txt around lines 2 to 3, the transformers pin is too permissive
and includes vulnerable versions; update the transformers requirement to
"transformers>=4.53.0,<5.0.0" (leave huggingface-hub as
"huggingface-hub>=0.34.0,<1.0") so the package range excludes known
ReDoS-vulnerable releases while preventing major breaking upgrades.

Comment on lines +46 to +54
if "tools" in data.keys():
tools = json.loads(data["tools"])
tool_prompt = tool_formater(tools)
tool_text = self.tool_format.format(content=tool_prompt)
tool_tokens = self.tokenizer.encode(tool_text, add_special_tokens=False)
input_ids = input_ids + tool_tokens
target_mask = target_mask + [0] * len(tool_tokens)
if tools:
# tool_prompt = tool_formater(tools)
tool_prompt = json.dumps(tools)
tool_text = self.tool_format.format(content=tool_prompt)
tool_tokens = self.tokenizer.encode(tool_text, add_special_tokens=False)
input_ids = input_ids + tool_tokens
target_mask = target_mask + [0] * len(tool_tokens)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify the behavioral changes in tool formatting.

The tool processing logic has been modified:

  1. Added a nested if tools: check (line 48) - this is good defensive coding to skip processing when tools is null/empty.
  2. Replaced tool_formater(tools) with json.dumps(tools) (line 50) - this changes how tools are serialized.

The second change alters the output format. Ensure that:

  • The new JSON format is compatible with the qwen3 template expectations.
  • Existing datasets or models relying on tool_formater output are not broken.
  • Unit tests cover this behavior change.

Run the following script to check if tool_formater is still used elsewhere:


🏁 Script executed:

#!/bin/bash
# Description: Verify if tool_formater is used in other parts of the codebase

# Search for tool_formater usage
rg -n "tool_formater" --type=py

Length of output: 236


🏁 Script executed:

#!/bin/bash
echo "Occurrences of function_formatter:"
rg -n "function_formatter" --type=py

echo
echo "Listing test files:"
find . -type f -name "*test*.py"

Length of output: 399


Add tests and validate JSON tool serialization

  • Confirm json.dumps(tools) output aligns with qwen3 template requirements and existing pipelines
  • Add unit tests for the tools branch in src/core/dataset.py (cover both JSON serialization and the commented‐out function_formatter path)
  • Remove unused tool_formater/function_formatter imports if no longer needed
🤖 Prompt for AI Agents
In src/core/dataset.py around lines 46 to 54, the current handling of the
"tools" branch simply json.dumps the tools and uses that text which may not
match the qwen3 template and the commented function_formatter path; update the
code and tests as follows: 1) ensure the serialized tool output matches the
qwen3 template (either adapt json.dumps with the required keys/format or
re-enable/use the function_formatter/tool_formater path to produce the exact
template output), 2) add unit tests that cover the "tools" branch for both
JSON-serialized output and the function_formatter path (include validation that
the resulting string tokens and masks match expected shapes and contents), and
3) remove any unused imports (tool_formater/function_formatter) if you choose to
keep the json.dumps approach, or restore and use the formatter and its tests if
you opt for that path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants