Skip to content

Commit 25e64d3

Browse files
authored
Merge branch 'agentscope-ai:main' into main
2 parents 12f2d8f + fb51994 commit 25e64d3

File tree

15 files changed

+2078
-60
lines changed

15 files changed

+2078
-60
lines changed

cookbook/en/training_sandbox.md

Lines changed: 162 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -27,10 +27,9 @@ training, tools for dataset calling, and real-time Reward verification.
2727
The Training Sandbox primarily implements high-concurrency data calling through Ray, supporting external Agents to
2828
create, execute, and evaluate instances for different samples after sandbox creation.
2929

30-
+ [APPWorld](https://github.com/StonyBrookNLP/appworld): APPWorld is an advanced, high-fidelity environment designed to
31-
test and evaluate autonomous AI agents' ability to perform complex, multi-step tasks using realistic APIs and user
32-
scenarios. It serves as a crucial testing ground for the AI agents, allowing them to learn and adapt to real-world
33-
scenarios.
30+
+ [APPWorld](https://github.com/StonyBrookNLP/appworld): APPWorld is an advanced, high-fidelity environment designed to test and evaluate autonomous AI agents' ability to perform complex, multi-step tasks using realistic APIs and user
31+
scenarios. It serves as a crucial testing ground for the AI agents, allowing them to learn and adapt to real-world scenarios.
32+
+ [BFCL](https://github.com/ShishirPatil/gorilla): Berkeley Function Calling Leaderboard (BFCL) is the first comprehensive and executable function call evaluation dedicated to assessing Large Language Models' (LLMs) ability to invoke functions. Unlike previous evaluations, BFCL accounts for various forms of function calls, diverse scenarios, and executability.
3433

3534
## Install
3635

@@ -42,7 +41,9 @@ First, install AgentScope Runtime with sandbox support:
4241
pip install "agentscope-runtime[sandbox]"
4342
```
4443

45-
## Prepare Docker Image
44+
### Appworld Example
45+
46+
#### Prepare Docker Image
4647

4748
Pull the image from DockerHub. Suppose you failed to pull the Docker image from DockerHub. In that case, we also provide
4849
a script for building the Docker image locally.
@@ -58,27 +59,29 @@ All Docker images are hosted on Alibaba Cloud Container Registry (ACR) for optim
5859

5960
```bash
6061
# Pull and tag Appworld ARM64 architecture image
61-
docker pull agentscope-registry.ap-southeast-1.cr.aliyuncs.com/agentscope/runtime-sandbox-appworld:latest-arm64 && docker tag agentscope-registry.ap-southeast-1.cr.aliyuncs.com/agentscope/runtime-sandbox-appworld:latest-arm agentscope/runtime-sandbox-appworld:latest-arm
62+
docker pull agentscope-registry.ap-southeast-1.cr.aliyuncs.com/agentscope/runtime-sandbox-appworld:latest-arm64 && docker tag agentscope-registry.ap-southeast-1.cr.aliyuncs.com/agentscope/runtime-sandbox-appworld:latest-arm64 agentscope/runtime-sandbox-appworld:latest-arm64
6263

6364
# Pull and tag Appworld X86_64 architecture image
6465
docker pull agentscope-registry.ap-southeast-1.cr.aliyuncs.com/agentscope/runtime-sandbox-appworld:latest && docker tag agentscope-registry.ap-southeast-1.cr.aliyuncs.com/agentscope/runtime-sandbox-appworld:latest agentscope/runtime-sandbox-appworld:latest
6566
```
6667

67-
### Verify Installation
68+
#### Verify Installation
6869

6970
You can verify that everything is set up correctly by calling `get_env_profile`. If success, you can get a training ID:
7071

71-
```{code-cell}
72-
from agentscope_runtime.sandbox.box.training_box.training_box import (
73-
TrainingSandbox,
74-
)
72+
```python
73+
from agentscope_runtime.sandbox.box.training_box.training_box import APPWorldSandbox
7574

76-
with TrainingSandbox() as box:
77-
profile_list = box.get_env_profile(env_type="appworld", split="train")
78-
print(profile_list[0])
75+
<<<<<<< HEAD
76+
box = APPWorldSandbox()
77+
=======
78+
box = APPWorldSandbox()
79+
>>>>>>> upstream/add_bfcl_box
80+
profile_list = box.get_env_profile(env_type="appworld", split="train")
81+
print(profile_list[0])
7982
```
8083

81-
### (Optional) Built the Docker Images from Scratch
84+
#### (Optional) Build the Docker Image from Scratch
8285

8386
f you prefer to build images locally via `Dockerfile` or need custom modifications, you can build them from scratch.
8487
Please refer to {doc}`sandbox_advanced` for detailed instructions.
@@ -92,30 +95,25 @@ For Appworld:
9295
docker build -f src/agentscope_runtime/sandbox/box/training_box/environments/appworld/Dockerfile -t agentscope/runtime-sandbox-appworld:latest .
9396
```
9497

95-
## Utilize Training Sample from Sandbox
96-
97-
You can create a specific training sandbox (default is `Appworld`), then create multiple different training samples in
98-
parallel, and execute and evaluate them separately.
99-
100-
### Review Dataset Sample
98+
#### Review Dataset Sample
10199

102100
After building the Docker image, we first review the dataset samples.
103101

104102
For example, we can use the `get_env_profile` method to get a list of training IDs.
105103

106-
```{code-cell}
104+
```python
107105
from agentscope_runtime.sandbox.box.training_box.training_box import (
108-
TrainingSandbox,
106+
APPWorldSandbox,
109107
)
110108

111109
#create training sandbox
112-
box = TrainingSandbox()
110+
box = APPWorldSandbox()
113111

114112
profile_list = box.get_env_profile(env_type='appworld',split='train')
115113
print(profile_list)
116114
```
117115

118-
## Get training Sample Query
116+
#### Get Training Sample Query
119117

120118
We can select one task from the training set as an example and display its query along with the system prompt using
121119
the "create_instance" method.
@@ -126,7 +124,7 @@ instances for parallel training.
126124
The prompt (`system prompt`) and actual question (`user prompt`) provided by the training set will be returned as a
127125
`Message List`, located in the state of the return value.
128126

129-
```{code-cell}
127+
```python
130128

131129
profile_list = box.get_env_profile(env_type="appworld", split="train")
132130
init_response = box.create_instance(
@@ -139,15 +137,15 @@ print(f"Created instance {instance_id} with query: {query}")
139137

140138
```
141139

142-
## Agent Action Step
140+
#### Agent Action Step
143141

144142
We first feed the initial state information to the LLM agent and then load the response back to the Sandbox. This form
145143
of transmission can be repeated using the step method. Instance_id is required to identify different sessions during the
146144
training or inference processing. A basic reward is provided after each step.
147145

148146
This method currently only supports input in `Message` format, recommended to input with `"role": "assistant"`.
149147

150-
```{code-cell}
148+
```python
151149
action = {
152150
"role": "assistant",
153151
"content": "```python\nprint('hello appworld!!')\n```",
@@ -161,12 +159,12 @@ print(result)
161159

162160
```
163161

164-
## Eval Trajectory
162+
#### Eval Trajectory
165163

166164
Use the `evaluate` method to assess the status of an instance and obtain a `Reward`. Different datasets may have
167165
additional evaluation parameters, passed through `params`.
168166

169-
```{code-cell}
167+
```python
170168
action = {
171169
"role": "assistant",
172170
"content": "```python\nprint('hello appworld!!')\n```",
@@ -179,12 +177,144 @@ result = box.step(
179177
print(result)
180178
```
181179

182-
## Release Sample
180+
#### Release Sample
183181

184182
You are also allowed to release the cases manually using the release method if needed.
185183
Instances will be auto-released in 5 minutes.
186184

187-
```{code-cell}
185+
```python
188186
success = box.release_instance(instance_id)
189187
print(f"Instance released: {success}")
190188
```
189+
### BFCL Example
190+
#### Prepare Docker Image
191+
Pull the image from DockerHub. Suppose you failed to pull the Docker image from DockerHub. In that case, we also provide
192+
a script for building the Docker image locally.
193+
194+
To ensure a complete sandbox experience with all features enabled, follow the steps below to pull and tag the necessary
195+
Docker images from our repository:
196+
197+
```{note}
198+
**Image Source: Alibaba Cloud Container Registry**
199+
200+
All Docker images are hosted on Alibaba Cloud Container Registry (ACR) for optimal performance and reliability worldwide. Images are pulled from ACR and tagged with standard names for seamless integration with the AgentScope runtime environment.
201+
```
202+
203+
```bash
204+
# Pull and tag BFCL ARM64 architecture image
205+
docker pull agentscope-registry.ap-southeast-1.cr.aliyuncs.com/agentscope/runtime-sandbox-bfcl:latest-arm64 && docker tag agentscope-registry.ap-southeast-1.cr.aliyuncs.com/agentscope/runtime-sandbox-bfcl:latest-arm64 agentscope/runtime-sandbox-bfcl:latest-arm64
206+
207+
# Pull and tag BFCL X86 architecture image
208+
docker pull agentscope-registry.ap-southeast-1.cr.aliyuncs.com/agentscope/runtime-sandbox-bfcl:latest && docker tag agentscope-registry.ap-southeast-1.cr.aliyuncs.com/agentscope/runtime-sandbox-bfcl:latest agentscope/runtime-sandbox-bfcl:latest
209+
```
210+
211+
<details><summary> (Optional) Building your own docker image</summary>
212+
At the root folder, run the following code:
213+
214+
```bash
215+
docker build -f src/agentscope_runtime/sandbox/box/training_box/environments/bfcl/Dockerfile -t agentscope/runtime-sandbox-bfcl:latest .
216+
```
217+
218+
</details>
219+
220+
#### Initialize
221+
BFCL has multiple sub-dataset *all, all_scoring, multi_turn, single_turn, live, non_live, non_python, python*.
222+
Please determine which subset to test before initializing the sandbox where OPENAPI_API_KEY is required for the evaluaton process.
223+
224+
225+
```python
226+
227+
#determined the subset and pass the openaikey if you need to step and evalaute samples.
228+
import os
229+
os.environ["OPENAI_API_KEY"] = os.environ.get("OPENAI_API_KEY")
230+
os.environ["DATASET_SUB_TYPE"] = "multi_turn"
231+
# os.environ["DATASET_SUB_TYPE"] can be one of the following: "all","all_scoring","multi_turn","single_turn","live","non_live","non_python","python"
232+
233+
from agentscope_runtime.sandbox.box.training_box.training_box import BFCLSandbox
234+
235+
#initialize sandbox
236+
box = BFCLSandbox()
237+
profile_list = box.get_env_profile(env_type="bfcl")
238+
init_response = box.create_instance(
239+
env_type="bfcl",
240+
task_id=profile_list[0],
241+
)
242+
inst_id = init_response["info"]["instance_id"]
243+
query = init_response["state"]
244+
```
245+
246+
247+
#### Agent Action Step
248+
The following messages are a simulated sample to start the action step:
249+
<details>
250+
<summary>Click to show messages</summary>
251+
252+
```python
253+
254+
ASSISTANT_MESSAGES = [
255+
# ── Turn-1 ──
256+
{
257+
"role": "assistant",
258+
"content": '<tool_call>\n{"name": "cd", "arguments": {"folder": "document"}}\n</tool_call>\n<tool_call>\n{"name": "mkdir", "arguments": {"dir_name": "temp"}}\n</tool_call>\n<tool_call>\n{"name": "mv", "arguments": {"source": "final_report.pdf", "destination": "temp"}}\n</tool_call>'
259+
},
260+
{
261+
"role": "assistant",
262+
"content": 'ok.1'
263+
},
264+
# ── Turn-2 ──
265+
{
266+
"role": "assistant",
267+
"content": '<tool_call>\n{"name": "cd", "arguments": {"folder": "temp"}}\n</tool_call>\n<tool_call>\n{"name": "grep", "arguments": {"file_name": "final_report.pdf", "pattern": "budget analysis"}}\n</tool_call>'
268+
},
269+
{
270+
"role": "assistant",
271+
"content": 'ok.2'
272+
},
273+
# ── Turn-3 ──
274+
{
275+
"role": "assistant",
276+
"content": '<tool_call>\n{"name": "sort", "arguments": {"file_name": "final_report.pdf"}}\n</tool_call>'
277+
},
278+
{
279+
"role": "assistant",
280+
"content": 'ok.2'
281+
},
282+
# ── Turn-4 ──
283+
{
284+
"role": "assistant",
285+
"content": '<tool_call>\n{"name": "cd", "arguments": {"folder": ".."}}\n</tool_call>\n<tool_call>\n{"name": "mv", "arguments": {"source": "previous_report.pdf", "destination": "temp"}}\n</tool_call>\n<tool_call>\n{"name": "cd", "arguments": {"folder": "temp"}}\n</tool_call>\n<tool_call>\n{"name": "diff", "arguments": {"file_name1": "final_report.pdf", "file_name2": "previous_report.pdf"}}\n</tool_call>'
286+
},
287+
{
288+
"role": "assistant",
289+
"content": 'ok.2'
290+
},
291+
]
292+
293+
```
294+
295+
</details>
296+
297+
```python
298+
for turn_no, msg in enumerate(ASSISTANT_MESSAGES, 1):
299+
res = box.step(
300+
inst_id,
301+
msg
302+
)
303+
print(
304+
f"\n[TURN {turn_no}] term={res['is_terminated']} "
305+
f"reward={res['reward']}\n state: {res.get('state', {})}"
306+
)
307+
if res["is_terminated"]:
308+
break
309+
```
310+
311+
#### Evaluate
312+
```python
313+
score = box.evaluate(inst_id, params={"sparse": True})
314+
print(f"\n[RESULT] sparse_score = {score}")
315+
316+
```
317+
#### Release Instance
318+
```python
319+
box.release_instance(inst_id)
320+
```

0 commit comments

Comments
 (0)