Fix stale Gemma-3 references and wrong chat_template in Gemma4 notebooks (#228)

danielhanchen · web-flow · commit 53db0d428dd7 · 2026-04-07T04:44:33.000-07:00
* Fix stale Gemma-3 references and wrong chat_template in Gemma4 notebooks

Several Gemma4 notebooks still contained leftover Gemma-3 wording that
was never updated when the notebooks were forked from their Gemma-3
predecessors:

- "According to the \`Gemma-3\` team, the recommended settings..."
- "# Recommended Gemma-3 settings!"
- "apply the chat template for \`Gemma-3\` onto the conversations"

Fix all of these to reference Gemma-4. The list of supported chat
template names (\`..., gemma3, gemma-4\`) is intentionally left alone
since \`gemma3\` is a separate, still-supported template distinct from
\`gemma-4\`.

Also fix Gemma4_(26B_A4B)-Text which was using
\`chat_template = "gemma-4-thinking"\`. Only the 31B notebooks should
use the thinking template; the rest should use \`gemma-4\`.

Total: 14 Gemma-3 -&gt; Gemma-4 string fixes across 4 notebooks, plus
2 chat_template fixes in the 26B_A4B Text notebook.

* Revert 26B_A4B-Text chat_template change; keep gemma-4-thinking

The 26B_A4B model should also use the gemma-4-thinking chat template,
not gemma-4. Only the E2B / E4B notebooks should use plain gemma-4.
diff --git a/nb/Gemma4_(26B_A4B)-Text.ipynb b/nb/Gemma4_(26B_A4B)-Text.ipynb
@@ -376,7 +376,7 @@
         "id": "8Xs0LXio7rfd"
       },
       "source": [
-        "We now have to apply the chat template for `Gemma-3` onto the conversations, and save it to `text`. We remove the `<bos>` token using removeprefix(`'<bos>'`) since we're finetuning. The Processor will add this token before training and the model expects only one."
+        "We now have to apply the chat template for `Gemma-4` onto the conversations, and save it to `text`. We remove the `<bos>` token using removeprefix(`'<bos>'`) since we're finetuning. The Processor will add this token before training and the model expects only one."
       ]
     },
     {
@@ -594,7 +594,7 @@
       "source": [
         "<a name=\"Inference\"></a>\n",
         "### Inference\n",
-        "Let's run the model via Unsloth native inference! According to the `Gemma-3` team, the recommended settings for inference are `temperature = 1.0, top_p = 0.95, top_k = 64`"
+        "Let's run the model via Unsloth native inference! According to the `Gemma-4` team, the recommended settings for inference are `temperature = 1.0, top_p = 0.95, top_k = 64`"
       ]
     },
     {
@@ -628,7 +628,7 @@
         "    **inputs,\n",
         "    max_new_tokens = 64, # Increase for longer outputs!\n",
         "    use_cache=True,\n",
-        "    # Recommended Gemma-3 settings!\n",
+        "    # Recommended Gemma-4 settings!\n",
         "    temperature = 1.0, top_p = 0.95, top_k = 64,\n",
         ")\n",
         "tokenizer.batch_decode(outputs)"
@@ -668,7 +668,7 @@
         "    **inputs,\n",
         "    max_new_tokens = 64, # Increase for longer outputs!\n",
         "    use_cache=True,\n",
-        "    # Recommended Gemma-3 settings!\n",
+        "    # Recommended Gemma-4 settings!\n",
         "    temperature = 1.0, top_p = 0.95, top_k = 64,\n",
         "    streamer = TextStreamer(tokenizer, skip_prompt = True),\n",
         ")"
@@ -742,7 +742,7 @@
         "_ = model.generate(\n",
         "    **inputs,\n",
         "    max_new_tokens = 128, # Increase for longer outputs!\n",
-        "    # Recommended Gemma-3 settings!\n",
+        "    # Recommended Gemma-4 settings!\n",
         "    temperature = 1.0, top_p = 0.95, top_k = 64,\n",
         "    streamer = TextStreamer(tokenizer, skip_prompt = True),\n",
         ")"
diff --git a/nb/Gemma4_(31B)-Text.ipynb b/nb/Gemma4_(31B)-Text.ipynb
@@ -376,7 +376,7 @@
         "id": "8Xs0LXio7rfd"
       },
       "source": [
-        "We now have to apply the chat template for `Gemma-3` onto the conversations, and save it to `text`. We remove the `<bos>` token using removeprefix(`'<bos>'`) since we're finetuning. The Processor will add this token before training and the model expects only one."
+        "We now have to apply the chat template for `Gemma-4` onto the conversations, and save it to `text`. We remove the `<bos>` token using removeprefix(`'<bos>'`) since we're finetuning. The Processor will add this token before training and the model expects only one."
       ]
     },
     {
@@ -594,7 +594,7 @@
       "source": [
         "<a name=\"Inference\"></a>\n",
         "### Inference\n",
-        "Let's run the model via Unsloth native inference! According to the `Gemma-3` team, the recommended settings for inference are `temperature = 1.0, top_p = 0.95, top_k = 64`"
+        "Let's run the model via Unsloth native inference! According to the `Gemma-4` team, the recommended settings for inference are `temperature = 1.0, top_p = 0.95, top_k = 64`"
       ]
     },
     {
@@ -628,7 +628,7 @@
         "    **inputs,\n",
         "    max_new_tokens = 64, # Increase for longer outputs!\n",
         "    use_cache=True,\n",
-        "    # Recommended Gemma-3 settings!\n",
+        "    # Recommended Gemma-4 settings!\n",
         "    temperature = 1.0, top_p = 0.95, top_k = 64,\n",
         ")\n",
         "tokenizer.batch_decode(outputs)"
@@ -668,7 +668,7 @@
         "    **inputs,\n",
         "    max_new_tokens = 64, # Increase for longer outputs!\n",
         "    use_cache=True,\n",
-        "    # Recommended Gemma-3 settings!\n",
+        "    # Recommended Gemma-4 settings!\n",
         "    temperature = 1.0, top_p = 0.95, top_k = 64,\n",
         "    streamer = TextStreamer(tokenizer, skip_prompt = True),\n",
         ")"
@@ -743,7 +743,7 @@
         "    **inputs,\n",
         "    max_new_tokens = 128, # Increase for longer outputs!\n",
         "    use_cache=True,\n",
-        "    # Recommended Gemma-3 settings!\n",
+        "    # Recommended Gemma-4 settings!\n",
         "    temperature = 1.0, top_p = 0.95, top_k = 64,\n",
         "    streamer = TextStreamer(tokenizer, skip_prompt = True),\n",
         ")"
diff --git a/nb/Gemma4_(E2B)-Audio.ipynb b/nb/Gemma4_(E2B)-Audio.ipynb
@@ -1117,7 +1117,7 @@
       "source": [
         "<a name=\"Inference\"></a>\n",
         "### Inference\n",
-        "Let's run the model via Unsloth native inference! According to the `Gemma-3` team, the recommended settings for inference are `temperature = 1.0, top_p = 0.95, top_k = 64` but for this example we use `do_sample=False` for ASR."
+        "Let's run the model via Unsloth native inference! According to the `Gemma-4` team, the recommended settings for inference are `temperature = 1.0, top_p = 0.95, top_k = 64` but for this example we use `do_sample=False` for ASR."
       ]
     },
     {
@@ -1268,7 +1268,7 @@
         "_ = model.generate(\n",
         "    **inputs,\n",
         "    max_new_tokens = 128, # Increase for longer outputs!\n",
-        "    # Recommended Gemma-3 settings!\n",
+        "    # Recommended Gemma-4 settings!\n",
         "    temperature = 1.0, top_p = 0.95, top_k = 64,\n",
         "    streamer = TextStreamer(processor, skip_prompt = True),\n",
         ")"
diff --git a/nb/Gemma4_(E4B)-Audio.ipynb b/nb/Gemma4_(E4B)-Audio.ipynb
@@ -628,7 +628,7 @@
    "source": [
     "<a name=\"Inference\"></a>\n",
     "### Inference\n",
-    "Let's run the model via Unsloth native inference! According to the `Gemma-3` team, the recommended settings for inference are `temperature = 1.0, top_p = 0.95, top_k = 64` but for this example we use `do_sample=False` for ASR."
+    "Let's run the model via Unsloth native inference! According to the `Gemma-4` team, the recommended settings for inference are `temperature = 1.0, top_p = 0.95, top_k = 64` but for this example we use `do_sample=False` for ASR."
    ]
   },
   {
@@ -760,7 +760,7 @@
     "    **inputs,\n",
     "    max_new_tokens = 128, # Increase for longer outputs!\n",
     "    use_cache=True,\n",
-    "    # Recommended Gemma-3 settings!\n",
+    "    # Recommended Gemma-4 settings!\n",
     "    temperature = 1.0, top_p = 0.95, top_k = 64,\n",
     "    streamer = TextStreamer(processor, skip_prompt = True),\n",
     ")"