Skip to content

Commit b012c91

Browse files
committed
2 parents db7b90e + be6f042 commit b012c91

File tree

4 files changed

+870
-0
lines changed

4 files changed

+870
-0
lines changed
Lines changed: 172 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,172 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "29d21289",
6+
"metadata": {
7+
"id": "29d21289"
8+
},
9+
"source": [
10+
"# DeepSeek R1 Qwen3 (8B) - GRPO Agent Demo"
11+
]
12+
},
13+
{
14+
"cell_type": "markdown",
15+
"source": [
16+
"[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/DhivyaBharathy-web/PraisonAI/blob/main/examples/cookbooks/DeepSeek_Qwen3_GRPO.ipynb)\n"
17+
],
18+
"metadata": {
19+
"id": "yuOEagMH86WV"
20+
},
21+
"id": "yuOEagMH86WV"
22+
},
23+
{
24+
"cell_type": "markdown",
25+
"id": "0f798657",
26+
"metadata": {
27+
"id": "0f798657"
28+
},
29+
"source": [
30+
"This notebook demonstrates the usage of DeepSeek's Qwen3-8B model with GRPO (Guided Reasoning Prompt Optimization) for interactive conversational reasoning tasks.\n",
31+
"It is designed to simulate a lightweight agent-style reasoning capability in an accessible and interpretable way."
32+
]
33+
},
34+
{
35+
"cell_type": "markdown",
36+
"id": "80f3de9e",
37+
"metadata": {
38+
"id": "80f3de9e"
39+
},
40+
"source": [
41+
"## Dependencies"
42+
]
43+
},
44+
{
45+
"cell_type": "code",
46+
"execution_count": null,
47+
"id": "8d1c7f6c",
48+
"metadata": {
49+
"id": "8d1c7f6c"
50+
},
51+
"outputs": [],
52+
"source": [
53+
"!pip install -q transformers accelerate"
54+
]
55+
},
56+
{
57+
"cell_type": "markdown",
58+
"id": "78603e7b",
59+
"metadata": {
60+
"id": "78603e7b"
61+
},
62+
"source": [
63+
"## Tools"
64+
]
65+
},
66+
{
67+
"cell_type": "markdown",
68+
"id": "88e97fbc",
69+
"metadata": {
70+
"id": "88e97fbc"
71+
},
72+
"source": [
73+
"- `transformers`: For model loading and interaction\n",
74+
"- `AutoModelForCausalLM`, `AutoTokenizer`: Interfaces for DeepSeek's LLM"
75+
]
76+
},
77+
{
78+
"cell_type": "markdown",
79+
"id": "37d9bd54",
80+
"metadata": {
81+
"id": "37d9bd54"
82+
},
83+
"source": [
84+
"## YAML Prompt"
85+
]
86+
},
87+
{
88+
"cell_type": "code",
89+
"execution_count": null,
90+
"id": "adf5cae5",
91+
"metadata": {
92+
"id": "adf5cae5"
93+
},
94+
"outputs": [],
95+
"source": [
96+
"\n",
97+
"prompt:\n",
98+
" task: \"Reasoning over multi-step instructions\"\n",
99+
" context: \"User provides a math problem or logical question.\"\n",
100+
" model: \"deepseek-ai/deepseek-moe-16b-chat\"\n"
101+
]
102+
},
103+
{
104+
"cell_type": "markdown",
105+
"id": "6985f60c",
106+
"metadata": {
107+
"id": "6985f60c"
108+
},
109+
"source": [
110+
"## Main"
111+
]
112+
},
113+
{
114+
"cell_type": "code",
115+
"execution_count": null,
116+
"id": "d74bf686",
117+
"metadata": {
118+
"id": "d74bf686"
119+
},
120+
"outputs": [],
121+
"source": [
122+
"\n",
123+
"from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline\n",
124+
"\n",
125+
"model_id = \"deepseek-ai/deepseek-moe-16b-chat\"\n",
126+
"tokenizer = AutoTokenizer.from_pretrained(model_id)\n",
127+
"model = AutoModelForCausalLM.from_pretrained(model_id, device_map=\"auto\")\n",
128+
"\n",
129+
"pipe = pipeline(\"text-generation\", model=model, tokenizer=tokenizer)\n",
130+
"\n",
131+
"prompt = \"If a train travels 60 miles in 1.5 hours, what is its average speed?\"\n",
132+
"output = pipe(prompt, max_new_tokens=60)[0]['generated_text']\n",
133+
"print(\"🧠 Reasoned Output:\", output)\n"
134+
]
135+
},
136+
{
137+
"cell_type": "markdown",
138+
"id": "c856167f",
139+
"metadata": {
140+
"id": "c856167f"
141+
},
142+
"source": [
143+
"## Output"
144+
]
145+
},
146+
{
147+
"cell_type": "markdown",
148+
"id": "41039ee8",
149+
"metadata": {
150+
"id": "41039ee8"
151+
},
152+
"source": [
153+
"### 🖼️ Output Summary\n",
154+
"\n",
155+
"Prompt: *\"If a train travels 60 miles in 1.5 hours, what is its average speed?\"*\n",
156+
"\n",
157+
"🧠 Output: The model provides a clear reasoning process, such as:\n",
158+
"\n",
159+
"> \"To find the average speed, divide the total distance by total time: 60 / 1.5 = 40 mph.\"\n",
160+
"\n",
161+
"💡 This shows the model's ability to walk through logical steps using GRPO-enhanced reasoning."
162+
]
163+
}
164+
],
165+
"metadata": {
166+
"colab": {
167+
"provenance": []
168+
}
169+
},
170+
"nbformat": 4,
171+
"nbformat_minor": 5
172+
}

0 commit comments

Comments
 (0)