Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis pull request introduces three new Qwen3.5 fine-tuning configuration files: a text-only FFT setup for 27B (freezing vision encoder), a multimodal FFT setup for 9B, and updates the 9B LoRA configuration to reflect model version changes. The README is expanded with documentation on these new configurations, training commands, and FFT-specific guidance. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/qwen3.5/27b-fft.yaml`:
- Around line 36-38: The YAML patterns in unfrozen_parameters are over-escaped
and thus don't match when parsed by LayerNamePattern in
src/axolotl/utils/freeze.py; update the entries so they use unescaped dots
(e.g., change "model\.language_model\..*" to "model.language_model.*" and
"lm_head\..*" to "lm_head.*") so LayerNamePattern correctly matches and leaves
the intended parameters unfrozen.
In `@examples/qwen3.5/9b-fft-vision.yaml`:
- Line 29: Update the wandb_project entry to reflect the correct project for
this 9B config: replace the stale/typoed value "7b-visionn" in the
examples/qwen3.5/9b-fft-vision.yaml file (the wandb_project field) with the
intended project name (e.g., "9b-vision" or your team's canonical 9B project
name) so runs are correctly grouped under the 9B project.
In `@examples/qwen3.5/9b-lora-vision.yaml`:
- Around line 4-6: Update the model-family note so it matches other Qwen3.5
examples: change the comment that currently reads "Qwen3.5-9B and above are
early-fusion VLMs... Note: Qwen3.5-2B is a text-only model — the smallest VLM is
Qwen3.5-9B." to reflect the same smallest VLM referenced elsewhere (replace "9B"
with "7B" or otherwise match the project's canonical smallest VLM), ensuring the
lines mentioning "Qwen3.5-9B and above", "Qwen3.5-2B", and "smallest VLM is
Qwen3.5-9B" are made consistent with the other Qwen3.5 examples.
In `@examples/qwen3.5/README.md`:
- Around line 14-15: Update the README to reference the actual config filenames:
replace any occurrence of "7b-fft-vision.yaml" with "9b-fft-vision.yaml"
(including the table entry and the commands that currently use the non-existent
file), and change the incorrect "27b-fft-vision.yaml" reference to
"27b-fft.yaml" while adding a brief note that "27b-fft.yaml" is text-only (not
multimodal) so readers don’t expect a vision+FFT config.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: b84242b2-9c02-4d1f-8adc-96aba900b40a
📒 Files selected for processing (4)
examples/qwen3.5/27b-fft.yamlexamples/qwen3.5/9b-fft-vision.yamlexamples/qwen3.5/9b-lora-vision.yamlexamples/qwen3.5/README.md
| unfrozen_parameters: | ||
| - model\.language_model\..* | ||
| - lm_head\..* |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Patterns configured in examples/qwen3.5/27b-fft.yaml"
rg -n 'unfrozen_parameters|model\\\.language_model\\\.\.\*|lm_head\\\.\.\*' examples/qwen3.5/27b-fft.yaml -C2
echo
echo "Escaping logic in src/axolotl/utils/freeze.py"
rg -n 're\.sub\(r"\\\.\(\?!\\\+\)", "\\\\\.", name_pattern\)' src/axolotl/utils/freeze.py -C2
echo
echo "Behavior check of current vs suggested pattern transformation"
python - <<'PY'
import re
name = "model.language_model.layers.0.self_attn.q_proj.weight"
current = r"model\.language_model\..*"
suggested = r"^model.language_model.+"
def transform(p):
return re.sub(r"\.(?!\+)", "\\.", p)
for label, p in [("current", current), ("suggested", suggested)]:
rx = re.compile(transform(p))
print(f"{label}: transformed={rx.pattern!r}, matches={bool(rx.match(name))}")
PYRepository: axolotl-ai-cloud/axolotl
Length of output: 1189
Fix the unfrozen_parameters patterns to match intended parameters.
The patterns model\.language_model\..* and lm_head\..* are over-escaped and fail to match parameter names when processed by LayerNamePattern in src/axolotl/utils/freeze.py (line 187). The escaping transformation causes no parameters to match, silently freezing the entire model instead of just the vision encoder.
Proposed fix
unfrozen_parameters:
- - model\.language_model\..*
- - lm_head\..*
+ - ^model.language_model.+
+ - ^lm_head.+📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| unfrozen_parameters: | |
| - model\.language_model\..* | |
| - lm_head\..* | |
| unfrozen_parameters: | |
| - ^model.language_model.+ | |
| - ^lm_head.+ |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/qwen3.5/27b-fft.yaml` around lines 36 - 38, The YAML patterns in
unfrozen_parameters are over-escaped and thus don't match when parsed by
LayerNamePattern in src/axolotl/utils/freeze.py; update the entries so they use
unescaped dots (e.g., change "model\.language_model\..*" to
"model.language_model.*" and "lm_head\..*" to "lm_head.*") so LayerNamePattern
correctly matches and leaves the intended parameters unfrozen.
examples/qwen3.5/9b-fft-vision.yaml
Outdated
| sequence_len: 4096 | ||
| pad_to_sequence_len: false | ||
|
|
||
| wandb_project: 7b-visionn |
There was a problem hiding this comment.
wandb_project looks stale/typoed for this 9B config.
7b-visionn is likely a carry-over and may mix runs under an unintended project name.
Suggested fix
-wandb_project: 7b-visionn
+wandb_project: 9b-vision📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| wandb_project: 7b-visionn | |
| wandb_project: 9b-vision |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/qwen3.5/9b-fft-vision.yaml` at line 29, Update the wandb_project
entry to reflect the correct project for this 9B config: replace the
stale/typoed value "7b-visionn" in the examples/qwen3.5/9b-fft-vision.yaml file
(the wandb_project field) with the intended project name (e.g., "9b-vision" or
your team's canonical 9B project name) so runs are correctly grouped under the
9B project.
examples/qwen3.5/9b-lora-vision.yaml
Outdated
| # Qwen3.5-9B and above are early-fusion VLMs (Qwen3_5ForConditionalGeneration). | ||
| # Vision and text tokens are processed together by the same transformer layers. | ||
| # Note: Qwen3.5-2B is a text-only model — the smallest VLM is Qwen3.5-7B. | ||
| # Note: Qwen3.5-2B is a text-only model — the smallest VLM is Qwen3.5-9B. |
There was a problem hiding this comment.
Keep the model-family note consistent across Qwen3.5 examples.
This file says the smallest VLM is 9B, while other docs/examples in this PR still reference 7B vision recipes. Please align these statements so users don’t pick the wrong config path.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/qwen3.5/9b-lora-vision.yaml` around lines 4 - 6, Update the
model-family note so it matches other Qwen3.5 examples: change the comment that
currently reads "Qwen3.5-9B and above are early-fusion VLMs... Note: Qwen3.5-2B
is a text-only model — the smallest VLM is Qwen3.5-9B." to reflect the same
smallest VLM referenced elsewhere (replace "9B" with "7B" or otherwise match the
project's canonical smallest VLM), ensuring the lines mentioning "Qwen3.5-9B and
above", "Qwen3.5-2B", and "smallest VLM is Qwen3.5-9B" are made consistent with
the other Qwen3.5 examples.
examples/qwen3.5/README.md
Outdated
| | `7b-fft-vision.yaml` | Qwen3.5-7B/9B | Vision+text FFT, single GPU | ~61 GiB | | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "README references:"
rg -n '7b-fft-vision\.yaml|9b-fft-vision\.yaml|27b-fft-vision\.yaml' examples/qwen3.5/README.md -C1
echo
echo "Available vision FFT config files in examples/qwen3.5:"
fd '.*fft-vision\.yaml' examples/qwen3.5Repository: axolotl-ai-cloud/axolotl
Length of output: 1050
🏁 Script executed:
fd 'fft.*\.yaml' examples/qwen3.5 | sortRepository: axolotl-ai-cloud/axolotl
Length of output: 133
Update README file references to match actual config files.
The README references non-existent config files in three locations:
- Line 14:
7b-fft-vision.yamldoes not exist; should reference9b-fft-vision.yaml - Lines 47-48: Command uses non-existent
7b-fft-vision.yaml; should use9b-fft-vision.yaml - Line 56: References non-existent
27b-fft-vision.yaml(actual file is27b-fft.yaml, which is text-only FFT, not multimodal)
The only available vision+FFT configs are 9b-fft-vision.yaml and 27b-fft.yaml (text-only). Update the documentation and copy-paste commands to reference the correct files.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/qwen3.5/README.md` around lines 14 - 15, Update the README to
reference the actual config filenames: replace any occurrence of
"7b-fft-vision.yaml" with "9b-fft-vision.yaml" (including the table entry and
the commands that currently use the non-existent file), and change the incorrect
"27b-fft-vision.yaml" reference to "27b-fft.yaml" while adding a brief note that
"27b-fft.yaml" is text-only (not multimodal) so readers don’t expect a
vision+FFT config.
NanoCode012
left a comment
There was a problem hiding this comment.
Could you also check if there's any other places you can reduce comments/ simplify ?
examples/qwen3.5/9b-fft-vision.yaml
Outdated
| # Full fine-tune (FFT) of Qwen3.5-9B with image+text (multimodal) data. | ||
| # Designed for a single 80GB GPU (A100/H100). | ||
|
|
||
| # Memory estimate (bf16): ~14 GB weights + ~7 GB 8-bit Adam = ~21 GB base, | ||
| # leaving ample room for activations at sequence_len 4096. |
docs/multimodal.qmd
Outdated
| **Text-only FFT with vision encoder frozen (single 80 GB GPU)** | ||
|
|
||
| Use `unfrozen_parameters` to restrict gradient updates to the language model, freezing | ||
| `model.visual.*` and avoiding wasted optimizer state for parameters that receive no | ||
| gradient from text-only data. | ||
|
|
||
| ```yaml | ||
| unfrozen_parameters: | ||
| - ^model\.language_model\..* | ||
| - ^lm_head\..* | ||
| ``` | ||
|
|
||
| Measured peak VRAM — Qwen3.5-27B, `adamw_bnb_8bit`, `sequence_len: 2048`: | ||
|
|
||
| | Metric | Value | | ||
| |---|---| | ||
| | Max active | 52.89 GiB | | ||
| | Device reserved | 53.31 GiB | | ||
|
|
||
| See [examples/qwen3.5/27b-fft.yaml](https://github.com/axolotl-ai/axolotl/blob/main/examples/qwen3.5/27b-fft.yaml). |
There was a problem hiding this comment.
These extra details and parts below should be in the README.md and not here
examples/qwen3.5/27b-fft.yaml
Outdated
| # Qwen3.5-27B is early-fusion VLM. | ||
| # To freeze vision encoder for text-only training: | ||
| # For multimodal (image+text) fine-tuning, see 9b-lora-vision.yaml. |
There was a problem hiding this comment.
Let's not put this here? Let's just explain on readme.
examples/qwen3.5/9b-lora-vision.yaml
Outdated
| # Qwen3.5-9B and above are early-fusion VLMs (Qwen3_5ForConditionalGeneration). | ||
| # Vision and text tokens are processed together by the same transformer layers. | ||
| # Note: Qwen3.5-2B is a text-only model — the smallest VLM is Qwen3.5-7B. | ||
| # Note: Qwen3.5-2B is a text-only model — the smallest VLM is Qwen3.5-9B. |
examples/qwen3.5/27b-fft.yaml
Outdated
| sample_packing: true | ||
|
|
||
| # Freeze the vision encoder; train only the language model. | ||
| # model.visual.* parameters have requires_grad set to False via gradient hooks. |
There was a problem hiding this comment.
| # model.visual.* parameters have requires_grad set to False via gradient hooks. |
examples + docs

Summary by CodeRabbit
Release Notes
New Features
Documentation