Skip to content

Commit 36acf60

Browse files
committed
Fixing nits
Signed-off-by: Vladimir Suvorov <[email protected]>
1 parent 3291b8c commit 36acf60

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

src/MaxText/examples/rl_llama3_demo.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,9 @@
44
"cell_type": "markdown",
55
"metadata": {},
66
"source": [
7-
"# GRPO/GSPO Llama3.1-8B Demo\n",
7+
"# Llama3.1-8B-Instruct Reinforcement Learning Demo\n",
88
"\n",
9-
"This notebook demonstrates GRPO (Group Relative Policy Optimization) training using the unified `rl_train` function or GSPO (Group Sequence Policy Optimization) - the change is in loss function which is a parameter\n",
9+
"This notebook demonstrates training on Llama3.1-8B-Instruct model with either GRPO (Group Relative Policy Optimization) or GSPO (Group Sequence Policy Optimization).\n",
1010
"\n",
1111
"## What is GRPO/GSPO?\n",
1212
"\n",

0 commit comments

Comments
 (0)