qwen docs + new config by ved1beta · Pull Request #3499 · axolotl-ai-cloud/axolotl

ved1beta · 2026-03-16T07:42:45Z

examples + docs

Summary by CodeRabbit

Release Notes

New Features
- Added new fine-tuning configuration examples for Qwen3.5 models (27B text-only and 9B multimodal)
Documentation
- Updated README with fine-tuning configurations, peak VRAM requirements, and new training commands
- Expanded getting started section with full fine-tuning guidance and memory considerations

coderabbitai · 2026-03-16T07:43:03Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a564454d-9d27-4d8d-be37-fe3f3ea1b348

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

This pull request introduces three new Qwen3.5 fine-tuning configuration files: a text-only FFT setup for 27B (freezing vision encoder), a multimodal FFT setup for 9B, and updates the 9B LoRA configuration to reflect model version changes. The README is expanded with documentation on these new configurations, training commands, and FFT-specific guidance.

Changes

Cohort / File(s)	Summary
New Qwen3.5 FFT Configurations `examples/qwen3.5/27b-fft.yaml`, `examples/qwen3.5/9b-fft-vision.yaml`	Introduces two new YAML configuration files for full fine-tuning: 27b-fft.yaml for text-only FFT with vision encoder frozen using gradient hooks, and 9b-fft-vision.yaml for multimodal FFT on image+text data. Both define model, dataset, training parameters, and optimization settings.
LoRA Configuration Update `examples/qwen3.5/9b-lora-vision.yaml`	Updates base_model from Qwen/Qwen3.5-7B to Qwen/Qwen3.5-9B and corresponding comments and notes to reflect the new model version as the threshold for early-fusion VLMs.
Documentation Expansion `examples/qwen3.5/README.md`	Expands configuration table with Peak VRAM column, adds entries for 27b-fft.yaml and 7b-fft-vision.yaml, includes new training commands, and extends Getting Started and Tips sections with FFT-specific guidance and memory considerations.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Suggested labels

ready to merge

Suggested reviewers

winglian
SalmanMohammadi

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title 'qwen docs + new config' is vague and generic, using a non-descriptive format that doesn't convey meaningful information about the specific changes (new YAML configs for Qwen3.5 models and documentation updates).	Consider a more specific title like 'Add Qwen3.5 FFT configs and update documentation' to clearly describe the main changes.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/qwen3.5/27b-fft.yaml`:
- Around line 36-38: The YAML patterns in unfrozen_parameters are over-escaped
and thus don't match when parsed by LayerNamePattern in
src/axolotl/utils/freeze.py; update the entries so they use unescaped dots
(e.g., change "model\.language_model\..*" to "model.language_model.*" and
"lm_head\..*" to "lm_head.*") so LayerNamePattern correctly matches and leaves
the intended parameters unfrozen.

In `@examples/qwen3.5/9b-fft-vision.yaml`:
- Line 29: Update the wandb_project entry to reflect the correct project for
this 9B config: replace the stale/typoed value "7b-visionn" in the
examples/qwen3.5/9b-fft-vision.yaml file (the wandb_project field) with the
intended project name (e.g., "9b-vision" or your team's canonical 9B project
name) so runs are correctly grouped under the 9B project.

In `@examples/qwen3.5/9b-lora-vision.yaml`:
- Around line 4-6: Update the model-family note so it matches other Qwen3.5
examples: change the comment that currently reads "Qwen3.5-9B and above are
early-fusion VLMs... Note: Qwen3.5-2B is a text-only model — the smallest VLM is
Qwen3.5-9B." to reflect the same smallest VLM referenced elsewhere (replace "9B"
with "7B" or otherwise match the project's canonical smallest VLM), ensuring the
lines mentioning "Qwen3.5-9B and above", "Qwen3.5-2B", and "smallest VLM is
Qwen3.5-9B" are made consistent with the other Qwen3.5 examples.

In `@examples/qwen3.5/README.md`:
- Around line 14-15: Update the README to reference the actual config filenames:
replace any occurrence of "7b-fft-vision.yaml" with "9b-fft-vision.yaml"
(including the table entry and the commands that currently use the non-existent
file), and change the incorrect "27b-fft-vision.yaml" reference to
"27b-fft.yaml" while adding a brief note that "27b-fft.yaml" is text-only (not
multimodal) so readers don’t expect a vision+FFT config.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b84242b2-9c02-4d1f-8adc-96aba900b40a

📥 Commits

Reviewing files that changed from the base of the PR and between 7da5f94 and e7e14ec.

📒 Files selected for processing (4)

examples/qwen3.5/27b-fft.yaml
examples/qwen3.5/9b-fft-vision.yaml
examples/qwen3.5/9b-lora-vision.yaml
examples/qwen3.5/README.md

coderabbitai · 2026-03-16T07:49:20Z

examples/qwen3.5/27b-fft.yaml

+unfrozen_parameters:
+  - model\.language_model\..*
+  - lm_head\..*


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "Patterns configured in examples/qwen3.5/27b-fft.yaml" rg -n 'unfrozen_parameters|model\\\.language_model\\\.\.\*|lm_head\\\.\.\*' examples/qwen3.5/27b-fft.yaml -C2 echo echo "Escaping logic in src/axolotl/utils/freeze.py" rg -n 're\.sub\(r"\\\.\(\?!\\\+\)", "\\\\\.", name_pattern\)' src/axolotl/utils/freeze.py -C2 echo echo "Behavior check of current vs suggested pattern transformation" python - <<'PY' import re name = "model.language_model.layers.0.self_attn.q_proj.weight" current = r"model\.language_model\..*" suggested = r"^model.language_model.+" def transform(p): return re.sub(r"\.(?!\+)", "\\.", p) for label, p in [("current", current), ("suggested", suggested)]: rx = re.compile(transform(p)) print(f"{label}: transformed={rx.pattern!r}, matches={bool(rx.match(name))}") PY

Repository: axolotl-ai-cloud/axolotl

Length of output: 1189

Fix the unfrozen_parameters patterns to match intended parameters.

The patterns model\.language_model\..* and lm_head\..* are over-escaped and fail to match parameter names when processed by LayerNamePattern in src/axolotl/utils/freeze.py (line 187). The escaping transformation causes no parameters to match, silently freezing the entire model instead of just the vision encoder.

Proposed fix

unfrozen_parameters: - - model\.language_model\..* - - lm_head\..* + - ^model.language_model.+ + - ^lm_head.+

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

unfrozen_parameters:

- model\.language_model\..*

- lm_head\..*

unfrozen_parameters:

- ^model.language_model.+

- ^lm_head.+

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/qwen3.5/27b-fft.yaml` around lines 36 - 38, The YAML patterns in unfrozen_parameters are over-escaped and thus don't match when parsed by LayerNamePattern in src/axolotl/utils/freeze.py; update the entries so they use unescaped dots (e.g., change "model\.language_model\..*" to "model.language_model.*" and "lm_head\..*" to "lm_head.*") so LayerNamePattern correctly matches and leaves the intended parameters unfrozen.

coderabbitai · 2026-03-16T07:49:20Z

examples/qwen3.5/9b-fft-vision.yaml

+sequence_len: 4096
+pad_to_sequence_len: false
+
+wandb_project: 7b-visionn


⚠️ Potential issue | 🟡 Minor

wandb_project looks stale/typoed for this 9B config.

7b-visionn is likely a carry-over and may mix runs under an unintended project name.

Suggested fix

-wandb_project: 7b-visionn +wandb_project: 9b-vision

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

wandb_project: 7b-visionn

wandb_project: 9b-vision

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/qwen3.5/9b-fft-vision.yaml` at line 29, Update the wandb_project entry to reflect the correct project for this 9B config: replace the stale/typoed value "7b-visionn" in the examples/qwen3.5/9b-fft-vision.yaml file (the wandb_project field) with the intended project name (e.g., "9b-vision" or your team's canonical 9B project name) so runs are correctly grouped under the 9B project.

coderabbitai · 2026-03-16T07:49:20Z

examples/qwen3.5/9b-lora-vision.yaml

+# Qwen3.5-9B and above are early-fusion VLMs (Qwen3_5ForConditionalGeneration).
 # Vision and text tokens are processed together by the same transformer layers.
-# Note: Qwen3.5-2B is a text-only model — the smallest VLM is Qwen3.5-7B.
+# Note: Qwen3.5-2B is a text-only model — the smallest VLM is Qwen3.5-9B.


⚠️ Potential issue | 🟡 Minor

Keep the model-family note consistent across Qwen3.5 examples.

This file says the smallest VLM is 9B, while other docs/examples in this PR still reference 7B vision recipes. Please align these statements so users don’t pick the wrong config path.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/qwen3.5/9b-lora-vision.yaml` around lines 4 - 6, Update the model-family note so it matches other Qwen3.5 examples: change the comment that currently reads "Qwen3.5-9B and above are early-fusion VLMs... Note: Qwen3.5-2B is a text-only model — the smallest VLM is Qwen3.5-9B." to reflect the same smallest VLM referenced elsewhere (replace "9B" with "7B" or otherwise match the project's canonical smallest VLM), ensuring the lines mentioning "Qwen3.5-9B and above", "Qwen3.5-2B", and "smallest VLM is Qwen3.5-9B" are made consistent with the other Qwen3.5 examples.

coderabbitai · 2026-03-16T07:49:20Z

examples/qwen3.5/README.md

+| `7b-fft-vision.yaml` | Qwen3.5-7B/9B | Vision+text FFT, single GPU | ~61 GiB |
+


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "README references:" rg -n '7b-fft-vision\.yaml|9b-fft-vision\.yaml|27b-fft-vision\.yaml' examples/qwen3.5/README.md -C1 echo echo "Available vision FFT config files in examples/qwen3.5:" fd '.*fft-vision\.yaml' examples/qwen3.5

Repository: axolotl-ai-cloud/axolotl

Length of output: 1050

🏁 Script executed:

fd 'fft.*\.yaml' examples/qwen3.5 | sort

Repository: axolotl-ai-cloud/axolotl

Length of output: 133

Update README file references to match actual config files.

The README references non-existent config files in three locations:

Line 14: 7b-fft-vision.yaml does not exist; should reference 9b-fft-vision.yaml

Lines 47-48: Command uses non-existent 7b-fft-vision.yaml; should use 9b-fft-vision.yaml

Line 56: References non-existent 27b-fft-vision.yaml (actual file is 27b-fft.yaml, which is text-only FFT, not multimodal)

The only available vision+FFT configs are 9b-fft-vision.yaml and 27b-fft.yaml (text-only). Update the documentation and copy-paste commands to reference the correct files.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/qwen3.5/README.md` around lines 14 - 15, Update the README to reference the actual config filenames: replace any occurrence of "7b-fft-vision.yaml" with "9b-fft-vision.yaml" (including the table entry and the commands that currently use the non-existent file), and change the incorrect "27b-fft-vision.yaml" reference to "27b-fft.yaml" while adding a brief note that "27b-fft.yaml" is text-only (not multimodal) so readers don’t expect a vision+FFT config.

NanoCode012

Could you also check if there's any other places you can reduce comments/ simplify ?

NanoCode012 · 2026-03-16T10:45:32Z

examples/qwen3.5/9b-fft-vision.yaml

+# Full fine-tune (FFT) of Qwen3.5-9B with image+text (multimodal) data.
+# Designed for a single 80GB GPU (A100/H100).
+
+# Memory estimate (bf16): ~14 GB weights + ~7 GB 8-bit Adam = ~21 GB base,
+# leaving ample room for activations at sequence_len 4096.


NanoCode012 · 2026-03-17T02:03:14Z

docs/multimodal.qmd

+**Text-only FFT with vision encoder frozen (single 80 GB GPU)**
+
+Use `unfrozen_parameters` to restrict gradient updates to the language model, freezing
+`model.visual.*` and avoiding wasted optimizer state for parameters that receive no
+gradient from text-only data.
+
+```yaml
+unfrozen_parameters:
+  - ^model\.language_model\..*
+  - ^lm_head\..*
+```
+
+Measured peak VRAM — Qwen3.5-27B, `adamw_bnb_8bit`, `sequence_len: 2048`:
+
+| Metric | Value |
+|---|---|
+| Max active | 52.89 GiB |
+| Device reserved | 53.31 GiB |
+
+See [examples/qwen3.5/27b-fft.yaml](https://github.com/axolotl-ai/axolotl/blob/main/examples/qwen3.5/27b-fft.yaml).


These extra details and parts below should be in the README.md and not here

NanoCode012 · 2026-03-17T02:03:40Z

examples/qwen3.5/27b-fft.yaml

+# Qwen3.5-27B is early-fusion VLM.
+# To freeze vision encoder for text-only training:
+# For multimodal (image+text) fine-tuning, see 9b-lora-vision.yaml.


Let's not put this here? Let's just explain on readme.

NanoCode012 · 2026-03-17T02:04:08Z

examples/qwen3.5/9b-lora-vision.yaml

+# Qwen3.5-9B and above are early-fusion VLMs (Qwen3_5ForConditionalGeneration).
 # Vision and text tokens are processed together by the same transformer layers.
-# Note: Qwen3.5-2B is a text-only model — the smallest VLM is Qwen3.5-7B.
+# Note: Qwen3.5-2B is a text-only model — the smallest VLM is Qwen3.5-9B.


Simplify/Remove

NanoCode012 · 2026-03-16T10:45:32Z

examples/qwen3.5/27b-fft.yaml

+sample_packing: true
+
+# Freeze the vision encoder; train only the language model.
+# model.visual.* parameters have requires_grad set to False via gradient hooks.


Suggested change

# model.visual.* parameters have requires_grad set to False via gradient hooks.

docs/multimodal.qmd

examples/qwen3.5/9b-fft-vision.yaml

examples/qwen3.5/27b-fft.yaml

qwen docs + new config

e7e14ec

coderabbitai bot reviewed Mar 16, 2026

View reviewed changes

docss lint

31ee94d

NanoCode012 reviewed Mar 16, 2026

View reviewed changes

ved1beta added 2 commits March 16, 2026 21:52

simplify comments

4551948

read me

da8be59

NanoCode012 reviewed Mar 17, 2026

View reviewed changes

ved1beta and others added 2 commits March 17, 2026 10:23

lint comments

17495cd

Merge branch 'main' into qwen/docss

33da02e

NanoCode012 reviewed Mar 18, 2026

View reviewed changes

NanoCode012 added 3 commits March 18, 2026 17:53

Update docs/multimodal.qmd

2f6a61b

Update docs/multimodal.qmd

42de5fa

Update examples/qwen3.5/9b-fft-vision.yaml

3826dfb

NanoCode012 approved these changes Mar 18, 2026

View reviewed changes

examples/qwen3.5/27b-fft.yaml Outdated Show resolved Hide resolved

NanoCode012 added the ready to merge label Mar 18, 2026

		\| `7b-fft-vision.yaml` \| Qwen3.5-7B/9B \| Vision+text FFT, single GPU \| ~61 GiB \|

Uh oh!

Conversation

ved1beta commented Mar 16, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

NanoCode012 left a comment

Choose a reason for hiding this comment

Uh oh!

NanoCode012 Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

NanoCode012 Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

NanoCode012 Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

NanoCode012 Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

NanoCode012 Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ved1beta commented Mar 16, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 16, 2026 •

edited

Loading