Skip to content

qwen docs + new config#3499

Open
ved1beta wants to merge 9 commits intoaxolotl-ai-cloud:mainfrom
ved1beta:qwen/docss
Open

qwen docs + new config#3499
ved1beta wants to merge 9 commits intoaxolotl-ai-cloud:mainfrom
ved1beta:qwen/docss

Conversation

@ved1beta
Copy link
Contributor

@ved1beta ved1beta commented Mar 16, 2026

examples + docs
image

Summary by CodeRabbit

Release Notes

  • New Features

    • Added new fine-tuning configuration examples for Qwen3.5 models (27B text-only and 9B multimodal)
  • Documentation

    • Updated README with fine-tuning configurations, peak VRAM requirements, and new training commands
    • Expanded getting started section with full fine-tuning guidance and memory considerations

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 16, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a564454d-9d27-4d8d-be37-fe3f3ea1b348

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This pull request introduces three new Qwen3.5 fine-tuning configuration files: a text-only FFT setup for 27B (freezing vision encoder), a multimodal FFT setup for 9B, and updates the 9B LoRA configuration to reflect model version changes. The README is expanded with documentation on these new configurations, training commands, and FFT-specific guidance.

Changes

Cohort / File(s) Summary
New Qwen3.5 FFT Configurations
examples/qwen3.5/27b-fft.yaml, examples/qwen3.5/9b-fft-vision.yaml
Introduces two new YAML configuration files for full fine-tuning: 27b-fft.yaml for text-only FFT with vision encoder frozen using gradient hooks, and 9b-fft-vision.yaml for multimodal FFT on image+text data. Both define model, dataset, training parameters, and optimization settings.
LoRA Configuration Update
examples/qwen3.5/9b-lora-vision.yaml
Updates base_model from Qwen/Qwen3.5-7B to Qwen/Qwen3.5-9B and corresponding comments and notes to reflect the new model version as the threshold for early-fusion VLMs.
Documentation Expansion
examples/qwen3.5/README.md
Expands configuration table with Peak VRAM column, adds entries for 27b-fft.yaml and 7b-fft-vision.yaml, includes new training commands, and extends Getting Started and Tips sections with FFT-specific guidance and memory considerations.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Suggested labels

ready to merge

Suggested reviewers

  • winglian
  • SalmanMohammadi
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'qwen docs + new config' is vague and generic, using a non-descriptive format that doesn't convey meaningful information about the specific changes (new YAML configs for Qwen3.5 models and documentation updates). Consider a more specific title like 'Add Qwen3.5 FFT configs and update documentation' to clearly describe the main changes.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/qwen3.5/27b-fft.yaml`:
- Around line 36-38: The YAML patterns in unfrozen_parameters are over-escaped
and thus don't match when parsed by LayerNamePattern in
src/axolotl/utils/freeze.py; update the entries so they use unescaped dots
(e.g., change "model\.language_model\..*" to "model.language_model.*" and
"lm_head\..*" to "lm_head.*") so LayerNamePattern correctly matches and leaves
the intended parameters unfrozen.

In `@examples/qwen3.5/9b-fft-vision.yaml`:
- Line 29: Update the wandb_project entry to reflect the correct project for
this 9B config: replace the stale/typoed value "7b-visionn" in the
examples/qwen3.5/9b-fft-vision.yaml file (the wandb_project field) with the
intended project name (e.g., "9b-vision" or your team's canonical 9B project
name) so runs are correctly grouped under the 9B project.

In `@examples/qwen3.5/9b-lora-vision.yaml`:
- Around line 4-6: Update the model-family note so it matches other Qwen3.5
examples: change the comment that currently reads "Qwen3.5-9B and above are
early-fusion VLMs... Note: Qwen3.5-2B is a text-only model — the smallest VLM is
Qwen3.5-9B." to reflect the same smallest VLM referenced elsewhere (replace "9B"
with "7B" or otherwise match the project's canonical smallest VLM), ensuring the
lines mentioning "Qwen3.5-9B and above", "Qwen3.5-2B", and "smallest VLM is
Qwen3.5-9B" are made consistent with the other Qwen3.5 examples.

In `@examples/qwen3.5/README.md`:
- Around line 14-15: Update the README to reference the actual config filenames:
replace any occurrence of "7b-fft-vision.yaml" with "9b-fft-vision.yaml"
(including the table entry and the commands that currently use the non-existent
file), and change the incorrect "27b-fft-vision.yaml" reference to
"27b-fft.yaml" while adding a brief note that "27b-fft.yaml" is text-only (not
multimodal) so readers don’t expect a vision+FFT config.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b84242b2-9c02-4d1f-8adc-96aba900b40a

📥 Commits

Reviewing files that changed from the base of the PR and between 7da5f94 and e7e14ec.

📒 Files selected for processing (4)
  • examples/qwen3.5/27b-fft.yaml
  • examples/qwen3.5/9b-fft-vision.yaml
  • examples/qwen3.5/9b-lora-vision.yaml
  • examples/qwen3.5/README.md

Comment on lines +36 to +38
unfrozen_parameters:
- model\.language_model\..*
- lm_head\..*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Patterns configured in examples/qwen3.5/27b-fft.yaml"
rg -n 'unfrozen_parameters|model\\\.language_model\\\.\.\*|lm_head\\\.\.\*' examples/qwen3.5/27b-fft.yaml -C2

echo
echo "Escaping logic in src/axolotl/utils/freeze.py"
rg -n 're\.sub\(r"\\\.\(\?!\\\+\)", "\\\\\.", name_pattern\)' src/axolotl/utils/freeze.py -C2

echo
echo "Behavior check of current vs suggested pattern transformation"
python - <<'PY'
import re
name = "model.language_model.layers.0.self_attn.q_proj.weight"
current = r"model\.language_model\..*"
suggested = r"^model.language_model.+"

def transform(p):
    return re.sub(r"\.(?!\+)", "\\.", p)

for label, p in [("current", current), ("suggested", suggested)]:
    rx = re.compile(transform(p))
    print(f"{label}: transformed={rx.pattern!r}, matches={bool(rx.match(name))}")
PY

Repository: axolotl-ai-cloud/axolotl

Length of output: 1189


Fix the unfrozen_parameters patterns to match intended parameters.

The patterns model\.language_model\..* and lm_head\..* are over-escaped and fail to match parameter names when processed by LayerNamePattern in src/axolotl/utils/freeze.py (line 187). The escaping transformation causes no parameters to match, silently freezing the entire model instead of just the vision encoder.

Proposed fix
 unfrozen_parameters:
-  - model\.language_model\..*
-  - lm_head\..*
+  - ^model.language_model.+
+  - ^lm_head.+
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
unfrozen_parameters:
- model\.language_model\..*
- lm_head\..*
unfrozen_parameters:
- ^model.language_model.+
- ^lm_head.+
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/qwen3.5/27b-fft.yaml` around lines 36 - 38, The YAML patterns in
unfrozen_parameters are over-escaped and thus don't match when parsed by
LayerNamePattern in src/axolotl/utils/freeze.py; update the entries so they use
unescaped dots (e.g., change "model\.language_model\..*" to
"model.language_model.*" and "lm_head\..*" to "lm_head.*") so LayerNamePattern
correctly matches and leaves the intended parameters unfrozen.

sequence_len: 4096
pad_to_sequence_len: false

wandb_project: 7b-visionn
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

wandb_project looks stale/typoed for this 9B config.

7b-visionn is likely a carry-over and may mix runs under an unintended project name.

Suggested fix
-wandb_project: 7b-visionn
+wandb_project: 9b-vision
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
wandb_project: 7b-visionn
wandb_project: 9b-vision
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/qwen3.5/9b-fft-vision.yaml` at line 29, Update the wandb_project
entry to reflect the correct project for this 9B config: replace the
stale/typoed value "7b-visionn" in the examples/qwen3.5/9b-fft-vision.yaml file
(the wandb_project field) with the intended project name (e.g., "9b-vision" or
your team's canonical 9B project name) so runs are correctly grouped under the
9B project.

Comment on lines +4 to +6
# Qwen3.5-9B and above are early-fusion VLMs (Qwen3_5ForConditionalGeneration).
# Vision and text tokens are processed together by the same transformer layers.
# Note: Qwen3.5-2B is a text-only model — the smallest VLM is Qwen3.5-7B.
# Note: Qwen3.5-2B is a text-only model — the smallest VLM is Qwen3.5-9B.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Keep the model-family note consistent across Qwen3.5 examples.

This file says the smallest VLM is 9B, while other docs/examples in this PR still reference 7B vision recipes. Please align these statements so users don’t pick the wrong config path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/qwen3.5/9b-lora-vision.yaml` around lines 4 - 6, Update the
model-family note so it matches other Qwen3.5 examples: change the comment that
currently reads "Qwen3.5-9B and above are early-fusion VLMs... Note: Qwen3.5-2B
is a text-only model — the smallest VLM is Qwen3.5-9B." to reflect the same
smallest VLM referenced elsewhere (replace "9B" with "7B" or otherwise match the
project's canonical smallest VLM), ensuring the lines mentioning "Qwen3.5-9B and
above", "Qwen3.5-2B", and "smallest VLM is Qwen3.5-9B" are made consistent with
the other Qwen3.5 examples.

Comment on lines +14 to +15
| `7b-fft-vision.yaml` | Qwen3.5-7B/9B | Vision+text FFT, single GPU | ~61 GiB |

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "README references:"
rg -n '7b-fft-vision\.yaml|9b-fft-vision\.yaml|27b-fft-vision\.yaml' examples/qwen3.5/README.md -C1

echo
echo "Available vision FFT config files in examples/qwen3.5:"
fd '.*fft-vision\.yaml' examples/qwen3.5

Repository: axolotl-ai-cloud/axolotl

Length of output: 1050


🏁 Script executed:

fd 'fft.*\.yaml' examples/qwen3.5 | sort

Repository: axolotl-ai-cloud/axolotl

Length of output: 133


Update README file references to match actual config files.

The README references non-existent config files in three locations:

  • Line 14: 7b-fft-vision.yaml does not exist; should reference 9b-fft-vision.yaml
  • Lines 47-48: Command uses non-existent 7b-fft-vision.yaml; should use 9b-fft-vision.yaml
  • Line 56: References non-existent 27b-fft-vision.yaml (actual file is 27b-fft.yaml, which is text-only FFT, not multimodal)

The only available vision+FFT configs are 9b-fft-vision.yaml and 27b-fft.yaml (text-only). Update the documentation and copy-paste commands to reference the correct files.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/qwen3.5/README.md` around lines 14 - 15, Update the README to
reference the actual config filenames: replace any occurrence of
"7b-fft-vision.yaml" with "9b-fft-vision.yaml" (including the table entry and
the commands that currently use the non-existent file), and change the incorrect
"27b-fft-vision.yaml" reference to "27b-fft.yaml" while adding a brief note that
"27b-fft.yaml" is text-only (not multimodal) so readers don’t expect a
vision+FFT config.

Copy link
Collaborator

@NanoCode012 NanoCode012 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also check if there's any other places you can reduce comments/ simplify ?

Comment on lines +4 to +8
# Full fine-tune (FFT) of Qwen3.5-9B with image+text (multimodal) data.
# Designed for a single 80GB GPU (A100/H100).

# Memory estimate (bf16): ~14 GB weights + ~7 GB 8-bit Adam = ~21 GB base,
# leaving ample room for activations at sequence_len 4096.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove

Comment on lines +203 to +222
**Text-only FFT with vision encoder frozen (single 80 GB GPU)**

Use `unfrozen_parameters` to restrict gradient updates to the language model, freezing
`model.visual.*` and avoiding wasted optimizer state for parameters that receive no
gradient from text-only data.

```yaml
unfrozen_parameters:
- ^model\.language_model\..*
- ^lm_head\..*
```

Measured peak VRAM — Qwen3.5-27B, `adamw_bnb_8bit`, `sequence_len: 2048`:

| Metric | Value |
|---|---|
| Max active | 52.89 GiB |
| Device reserved | 53.31 GiB |

See [examples/qwen3.5/27b-fft.yaml](https://github.com/axolotl-ai/axolotl/blob/main/examples/qwen3.5/27b-fft.yaml).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These extra details and parts below should be in the README.md and not here

Comment on lines +7 to +9
# Qwen3.5-27B is early-fusion VLM.
# To freeze vision encoder for text-only training:
# For multimodal (image+text) fine-tuning, see 9b-lora-vision.yaml.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not put this here? Let's just explain on readme.

Comment on lines +4 to +6
# Qwen3.5-9B and above are early-fusion VLMs (Qwen3_5ForConditionalGeneration).
# Vision and text tokens are processed together by the same transformer layers.
# Note: Qwen3.5-2B is a text-only model — the smallest VLM is Qwen3.5-7B.
# Note: Qwen3.5-2B is a text-only model — the smallest VLM is Qwen3.5-9B.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplify/Remove

sample_packing: true

# Freeze the vision encoder; train only the language model.
# model.visual.* parameters have requires_grad set to False via gradient hooks.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# model.visual.* parameters have requires_grad set to False via gradient hooks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants