You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"[](https://colab.research.google.com/github/DhivyaBharathy-web/PraisonAI/blob/main/examples/cookbooks/DeepSeek_Qwen3_GRPO.ipynb)\n"
17
+
],
18
+
"metadata": {
19
+
"id": "yuOEagMH86WV"
20
+
},
21
+
"id": "yuOEagMH86WV"
22
+
},
23
+
{
24
+
"cell_type": "markdown",
25
+
"id": "0f798657",
26
+
"metadata": {
27
+
"id": "0f798657"
28
+
},
29
+
"source": [
30
+
"This notebook demonstrates the usage of DeepSeek's Qwen3-8B model with GRPO (Guided Reasoning Prompt Optimization) for interactive conversational reasoning tasks.\n",
31
+
"It is designed to simulate a lightweight agent-style reasoning capability in an accessible and interpretable way."
32
+
]
33
+
},
34
+
{
35
+
"cell_type": "markdown",
36
+
"id": "80f3de9e",
37
+
"metadata": {
38
+
"id": "80f3de9e"
39
+
},
40
+
"source": [
41
+
"## Dependencies"
42
+
]
43
+
},
44
+
{
45
+
"cell_type": "code",
46
+
"execution_count": null,
47
+
"id": "8d1c7f6c",
48
+
"metadata": {
49
+
"id": "8d1c7f6c"
50
+
},
51
+
"outputs": [],
52
+
"source": [
53
+
"!pip install -q transformers accelerate"
54
+
]
55
+
},
56
+
{
57
+
"cell_type": "markdown",
58
+
"id": "78603e7b",
59
+
"metadata": {
60
+
"id": "78603e7b"
61
+
},
62
+
"source": [
63
+
"## Tools"
64
+
]
65
+
},
66
+
{
67
+
"cell_type": "markdown",
68
+
"id": "88e97fbc",
69
+
"metadata": {
70
+
"id": "88e97fbc"
71
+
},
72
+
"source": [
73
+
"- `transformers`: For model loading and interaction\n",
74
+
"- `AutoModelForCausalLM`, `AutoTokenizer`: Interfaces for DeepSeek's LLM"
75
+
]
76
+
},
77
+
{
78
+
"cell_type": "markdown",
79
+
"id": "37d9bd54",
80
+
"metadata": {
81
+
"id": "37d9bd54"
82
+
},
83
+
"source": [
84
+
"## YAML Prompt"
85
+
]
86
+
},
87
+
{
88
+
"cell_type": "code",
89
+
"execution_count": null,
90
+
"id": "adf5cae5",
91
+
"metadata": {
92
+
"id": "adf5cae5"
93
+
},
94
+
"outputs": [],
95
+
"source": [
96
+
"\n",
97
+
"prompt:\n",
98
+
" task: \"Reasoning over multi-step instructions\"\n",
99
+
" context: \"User provides a math problem or logical question.\"\n",
0 commit comments