You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fix Gemma 4, Qwen 2.5 VL GRPO, Qwen 3.5 27B handling in update_all_notebooks.py and run it (#232)
This PR fixes four bugs in update_all_notebooks.py that were silently
corrupting notebooks, ports a Qwen 2.5 VL GRPO hotfix into the template
so the script stops reverting it, then commits the result of a clean
re-run of the fixed script.
Script fixes
------------
1. _get_base_name_from_filename now recognizes Gemma 4. It already had
explicit cases for Gemma 3 and Gemma 3n but no case for Gemma 4, so
Gemma 4 notebooks fell through to the generic path which stripped
the "4" and returned just "gemma". That caused the model-path
rewrite to rename gemma_4_lora / gemma_4_finetune to gemma_lora /
gemma_finetune on every run across the four Gemma 4 Text notebooks.
Added _RE_GEMMA4 and a return "gemma_4" branch.
2. Added a dedicated installation_gemma4_content block and a dispatch
case in update_notebook_sections matching paths containing "gemma4".
The default installation_content forces transformers==4.56.2 and
appends --no-deps trl==0.22.2 via update_or_append_pip_install, both
of which are wrong for Gemma 4. The new block contains the required
--no-deps transformers==5.5.0, !pip install torchcodec, and
torch._dynamo.config.recompile_limit = 64 lines, and does NOT go
through update_or_append_pip_install so the bad pins are never
applied.
3. Qwen 3.5 dispatch now matches both "qwen3_5" and "qwen_3_5" so that
the inconsistently-named Qwen_3_5_27B_A100(80GB).ipynb (with an
underscore between Qwen and 3) also hits its proper installation
block. Previously it fell through to the default installation_content
which clobbered the custom torch==2.8.0 / xformers==0.0.32.post2 /
flash-linear-attention / causal_conv1d==1.6.0 install block, which
would make the notebook fail to load the model (Qwen 3.5 has mamba
layers and needs causal_conv1d).
Template patch
--------------
4. original_template/Qwen2_5_7B_VL_GRPO.ipynb now contains the same
isinstance(prompt, list) guard that was added to the three nb/
siblings in c75716f ("Fix fast_generate crash in Qwen2.5-VL GRPO
notebooks with TRL >= 0.24.0"). The hotfix was applied directly to
nb/ but never propagated to original_template/, so every subsequent
run of this script was reverting it. With the template patched,
the script regenerates all three sibling notebooks with the guard
intact. Affected code appears at dataset rows 100 and 165 and is
a ~10-line change per cell that renders the multimodal prompt via
tokenizer.apply_chat_template before passing it to fast_generate.
Changes produced by re-running the fixed script (275 files, +7777 / -1433)
-------------------------------------------------------------------------
- update_all_notebooks.py: the four fixes above.
- original_template/Qwen2_5_7B_VL_GRPO.ipynb: +8 / -2 for the hotfix.
- .gitignore: new file (the repo had none). Covers __pycache__ / *.pyc.
- README.md: +17 lines of new Vision / Audio rows.
- nb/*.ipynb: 17 notebooks touched.
- python_scripts/*.py: 243 regenerated + 11 new (missing Gemma 4
exports + Openenv_wordle_grpo.py).
~72% of the line churn (5608 / 7777 added lines) is the 11 brand-new
python_scripts/ files.
Verification
------------
Gemma 4 Text notebooks:
| notebook | gemma_4_lora | gemma_lora | torchcodec | --no-deps t5.5.0 | trl==0.22.2 (bad) |
| --------------- | ------------ | ---------- | ---------- | ---------------- | ----------------- |
| 26B_A4B-Text | 8 | 0 | 1 | 1 | 0 |
| 31B-Text | 8 | 0 | 1 | 1 | 0 |
| E2B-Text | 8 | 0 | 1 | 1 | 0 |
| E4B-Text | 8 | 0 | 1 | 1 | 0 |
Qwen 2.5 VL GRPO siblings all contain the isinstance(prompt, list)
hotfix (2 occurrences each in Qwen2_5_7B_VL_GRPO.ipynb,
HuggingFace Course-Qwen2_5_7B_VL_GRPO.ipynb, and
Kaggle-Qwen2_5_7B_VL_GRPO.ipynb).
Qwen_3_5_27B_A100(80GB).ipynb has a net 1-line diff (transformers 5.2.0
-> 5.3.0 from _normalize_transformers_v5_pin) and preserves torch==2.8.0,
xformers==0.0.32.post2, flash-linear-attention, and causal_conv1d==1.6.0.
"['<bos><|turn>user\\nContinue the sequence: 1, 1, 2, 3, 5, 8,<turn|>\\n<|turn>model\\n<|channel>thought\\n<channel|>13, 21, 34, 55, 89, 144, ...\\n\\nThis is the **Fibonacci sequence**, where each number is the sum of the two preceding ones.<turn|>']"
1545
+
],
1546
+
"text/html": [
1547
+
"<pre>['<bos><|turn>user\\nContinue the sequence: 1, 1, 2, 3, 5, 8,<turn|>\\n<|turn>model\\n<|channel>thought\\n<channel|>13, 21, 34, 55, 89, 144, ...\\n\\nThis is the **Fibonacci sequence**, where each number is the sum of the two preceding ones.<turn|>']</pre>"
1558
1548
]
1559
1549
},
1560
1550
"metadata": {},
@@ -1584,7 +1574,7 @@
1584
1574
"outputs = model.generate(\n",
1585
1575
" **inputs,\n",
1586
1576
" max_new_tokens = 64, # Increase for longer outputs!\n",
1587
-
" use_cache=True,\n",
1577
+
" use_cache = True,\n",
1588
1578
" # Recommended Gemma-3 settings!\n",
1589
1579
" temperature = 1.0, top_p = 0.95, top_k = 64,\n",
1590
1580
")\n",
@@ -1642,7 +1632,7 @@
1642
1632
"_ = model.generate(\n",
1643
1633
" **inputs,\n",
1644
1634
" max_new_tokens = 64, # Increase for longer outputs!\n",
1645
-
" use_cache=True,\n",
1635
+
" use_cache = True,\n",
1646
1636
" # Recommended Gemma-3 settings!\n",
1647
1637
" temperature = 1.0, top_p = 0.95, top_k = 64,\n",
"source": "%%capture\nimport os, re\nif \"COLAB_\" not in \"\".join(os.environ.keys()):\n !pip install unsloth # Do this in local & cloud setups\nelse:\n import torch; v = re.match(r'[\\d]{1,}\\.[\\d]{1,}', str(torch.__version__)).group(0)\n xformers = 'xformers==' + {'2.10':'0.0.34','2.9':'0.0.33.post1','2.8':'0.0.32.post2'}.get(v, \"0.0.34\")\n !pip install sentencepiece protobuf \"datasets==4.3.0\"\"huggingface_hub>=0.34.0\" hf_transfer\n !pip install --no-deps unsloth_zoo bitsandbytes accelerate {xformers} peft trl triton unsloth\n!pip install --no-deps transformers==5.5.0\n!pip install torchcodec\nimport torch; torch._dynamo.config.recompile_limit = 64;"
82
69
},
83
70
{
84
71
"cell_type": "code",
@@ -943,7 +930,7 @@
943
930
"source": [
944
931
"<a name=\"Train\"></a>\n",
945
932
"### Train the model\n",
946
-
"Now let's train our model. We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`. We also support `DPOTrainer` and `GRPOTrainer` for reinforcement learning!!\n",
933
+
"Now let's train our model. We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`. We also support `DPOTrainer` and `GRPOTrainer` for reinforcement learning!\n",
947
934
"\n",
948
935
"We use our new `UnslothVisionDataCollator` which will help in our vision finetuning setup."
0 commit comments