@@ -27,10 +27,9 @@ training, tools for dataset calling, and real-time Reward verification.
2727The Training Sandbox primarily implements high-concurrency data calling through Ray, supporting external Agents to
2828create, execute, and evaluate instances for different samples after sandbox creation.
2929
30- + [ APPWorld] ( https://github.com/StonyBrookNLP/appworld ) : APPWorld is an advanced, high-fidelity environment designed to
31- test and evaluate autonomous AI agents' ability to perform complex, multi-step tasks using realistic APIs and user
32- scenarios. It serves as a crucial testing ground for the AI agents, allowing them to learn and adapt to real-world
33- scenarios.
30+ + [ APPWorld] ( https://github.com/StonyBrookNLP/appworld ) : APPWorld is an advanced, high-fidelity environment designed to test and evaluate autonomous AI agents' ability to perform complex, multi-step tasks using realistic APIs and user
31+ scenarios. It serves as a crucial testing ground for the AI agents, allowing them to learn and adapt to real-world scenarios.
32+ + [ BFCL] ( https://github.com/ShishirPatil/gorilla ) : Berkeley Function Calling Leaderboard (BFCL) is the first comprehensive and executable function call evaluation dedicated to assessing Large Language Models' (LLMs) ability to invoke functions. Unlike previous evaluations, BFCL accounts for various forms of function calls, diverse scenarios, and executability.
3433
3534## Install
3635
@@ -42,7 +41,9 @@ First, install AgentScope Runtime with sandbox support:
4241pip install " agentscope-runtime[sandbox]"
4342```
4443
45- ## Prepare Docker Image
44+ ### Appworld Example
45+
46+ #### Prepare Docker Image
4647
4748Pull the image from DockerHub. Suppose you failed to pull the Docker image from DockerHub. In that case, we also provide
4849a script for building the Docker image locally.
@@ -58,27 +59,29 @@ All Docker images are hosted on Alibaba Cloud Container Registry (ACR) for optim
5859
5960``` bash
6061# Pull and tag Appworld ARM64 architecture image
61- docker pull agentscope-registry.ap-southeast-1.cr.aliyuncs.com/agentscope/runtime-sandbox-appworld:latest-arm64 && docker tag agentscope-registry.ap-southeast-1.cr.aliyuncs.com/agentscope/runtime-sandbox-appworld:latest-arm agentscope/runtime-sandbox-appworld:latest-arm
62+ docker pull agentscope-registry.ap-southeast-1.cr.aliyuncs.com/agentscope/runtime-sandbox-appworld:latest-arm64 && docker tag agentscope-registry.ap-southeast-1.cr.aliyuncs.com/agentscope/runtime-sandbox-appworld:latest-arm64 agentscope/runtime-sandbox-appworld:latest-arm64
6263
6364# Pull and tag Appworld X86_64 architecture image
6465docker pull agentscope-registry.ap-southeast-1.cr.aliyuncs.com/agentscope/runtime-sandbox-appworld:latest && docker tag agentscope-registry.ap-southeast-1.cr.aliyuncs.com/agentscope/runtime-sandbox-appworld:latest agentscope/runtime-sandbox-appworld:latest
6566```
6667
67- ### Verify Installation
68+ #### Verify Installation
6869
6970You can verify that everything is set up correctly by calling ` get_env_profile ` . If success, you can get a training ID:
7071
71- ``` {code-cell}
72- from agentscope_runtime.sandbox.box.training_box.training_box import (
73- TrainingSandbox,
74- )
72+ ``` python
73+ from agentscope_runtime.sandbox.box.training_box.training_box import APPWorldSandbox
7574
76- with TrainingSandbox() as box:
77- profile_list = box.get_env_profile(env_type="appworld", split="train")
78- print(profile_list[0])
75+ <<<<<< < HEAD
76+ box = APPWorldSandbox()
77+ ====== =
78+ box = APPWorldSandbox()
79+ >>>>>> > upstream/ add_bfcl_box
80+ profile_list = box.get_env_profile(env_type = " appworld" , split = " train" )
81+ print (profile_list[0 ])
7982```
8083
81- ### (Optional) Built the Docker Images from Scratch
84+ #### (Optional) Build the Docker Image from Scratch
8285
8386f you prefer to build images locally via ` Dockerfile ` or need custom modifications, you can build them from scratch.
8487Please refer to {doc}` sandbox_advanced ` for detailed instructions.
@@ -92,30 +95,25 @@ For Appworld:
9295docker build -f src/agentscope_runtime/sandbox/box/training_box/environments/appworld/Dockerfile -t agentscope/runtime-sandbox-appworld:latest .
9396```
9497
95- ## Utilize Training Sample from Sandbox
96-
97- You can create a specific training sandbox (default is ` Appworld ` ), then create multiple different training samples in
98- parallel, and execute and evaluate them separately.
99-
100- ### Review Dataset Sample
98+ #### Review Dataset Sample
10199
102100After building the Docker image, we first review the dataset samples.
103101
104102For example, we can use the ` get_env_profile ` method to get a list of training IDs.
105103
106- ``` {code-cell}
104+ ``` python
107105from agentscope_runtime.sandbox.box.training_box.training_box import (
108- TrainingSandbox ,
106+ APPWorldSandbox ,
109107)
110108
111109# create training sandbox
112- box = TrainingSandbox ()
110+ box = APPWorldSandbox ()
113111
114112profile_list = box.get_env_profile(env_type = ' appworld' ,split = ' train' )
115113print (profile_list)
116114```
117115
118- ## Get training Sample Query
116+ #### Get Training Sample Query
119117
120118We can select one task from the training set as an example and display its query along with the system prompt using
121119the "create_instance" method.
@@ -126,7 +124,7 @@ instances for parallel training.
126124The prompt (` system prompt ` ) and actual question (` user prompt ` ) provided by the training set will be returned as a
127125` Message List ` , located in the state of the return value.
128126
129- ``` {code-cell}
127+ ``` python
130128
131129profile_list = box.get_env_profile(env_type = " appworld" , split = " train" )
132130init_response = box.create_instance(
@@ -139,15 +137,15 @@ print(f"Created instance {instance_id} with query: {query}")
139137
140138```
141139
142- ## Agent Action Step
140+ #### Agent Action Step
143141
144142We first feed the initial state information to the LLM agent and then load the response back to the Sandbox. This form
145143of transmission can be repeated using the step method. Instance_id is required to identify different sessions during the
146144training or inference processing. A basic reward is provided after each step.
147145
148146This method currently only supports input in ` Message ` format, recommended to input with ` "role": "assistant" ` .
149147
150- ``` {code-cell}
148+ ``` python
151149action = {
152150 " role" : " assistant" ,
153151 " content" : " ```python\n print('hello appworld!!')\n ```" ,
@@ -161,12 +159,12 @@ print(result)
161159
162160```
163161
164- ## Eval Trajectory
162+ #### Eval Trajectory
165163
166164Use the ` evaluate ` method to assess the status of an instance and obtain a ` Reward ` . Different datasets may have
167165additional evaluation parameters, passed through ` params ` .
168166
169- ``` {code-cell}
167+ ``` python
170168action = {
171169 " role" : " assistant" ,
172170 " content" : " ```python\n print('hello appworld!!')\n ```" ,
@@ -179,12 +177,144 @@ result = box.step(
179177print (result)
180178```
181179
182- ## Release Sample
180+ #### Release Sample
183181
184182You are also allowed to release the cases manually using the release method if needed.
185183Instances will be auto-released in 5 minutes.
186184
187- ``` {code-cell}
185+ ``` python
188186success = box.release_instance(instance_id)
189187print (f " Instance released: { success} " )
190188```
189+ ### BFCL Example
190+ #### Prepare Docker Image
191+ Pull the image from DockerHub. Suppose you failed to pull the Docker image from DockerHub. In that case, we also provide
192+ a script for building the Docker image locally.
193+
194+ To ensure a complete sandbox experience with all features enabled, follow the steps below to pull and tag the necessary
195+ Docker images from our repository:
196+
197+ ``` {note}
198+ **Image Source: Alibaba Cloud Container Registry**
199+
200+ All Docker images are hosted on Alibaba Cloud Container Registry (ACR) for optimal performance and reliability worldwide. Images are pulled from ACR and tagged with standard names for seamless integration with the AgentScope runtime environment.
201+ ```
202+
203+ ``` bash
204+ # Pull and tag BFCL ARM64 architecture image
205+ docker pull agentscope-registry.ap-southeast-1.cr.aliyuncs.com/agentscope/runtime-sandbox-bfcl:latest-arm64 && docker tag agentscope-registry.ap-southeast-1.cr.aliyuncs.com/agentscope/runtime-sandbox-bfcl:latest-arm64 agentscope/runtime-sandbox-bfcl:latest-arm64
206+
207+ # Pull and tag BFCL X86 architecture image
208+ docker pull agentscope-registry.ap-southeast-1.cr.aliyuncs.com/agentscope/runtime-sandbox-bfcl:latest && docker tag agentscope-registry.ap-southeast-1.cr.aliyuncs.com/agentscope/runtime-sandbox-bfcl:latest agentscope/runtime-sandbox-bfcl:latest
209+ ```
210+
211+ <details ><summary > (Optional) Building your own docker image</summary >
212+ At the root folder, run the following code:
213+
214+ ``` bash
215+ docker build -f src/agentscope_runtime/sandbox/box/training_box/environments/bfcl/Dockerfile -t agentscope/runtime-sandbox-bfcl:latest .
216+ ```
217+
218+ </details >
219+
220+ #### Initialize
221+ BFCL has multiple sub-dataset * all, all_scoring, multi_turn, single_turn, live, non_live, non_python, python* .
222+ Please determine which subset to test before initializing the sandbox where OPENAPI_API_KEY is required for the evaluaton process.
223+
224+
225+ ``` python
226+
227+ # determined the subset and pass the openaikey if you need to step and evalaute samples.
228+ import os
229+ os.environ[" OPENAI_API_KEY" ] = os.environ.get(" OPENAI_API_KEY" )
230+ os.environ[" DATASET_SUB_TYPE" ] = " multi_turn"
231+ # os.environ["DATASET_SUB_TYPE"] can be one of the following: "all","all_scoring","multi_turn","single_turn","live","non_live","non_python","python"
232+
233+ from agentscope_runtime.sandbox.box.training_box.training_box import BFCLSandbox
234+
235+ # initialize sandbox
236+ box = BFCLSandbox()
237+ profile_list = box.get_env_profile(env_type = " bfcl" )
238+ init_response = box.create_instance(
239+ env_type = " bfcl" ,
240+ task_id = profile_list[0 ],
241+ )
242+ inst_id = init_response[" info" ][" instance_id" ]
243+ query = init_response[" state" ]
244+ ```
245+
246+
247+ #### Agent Action Step
248+ The following messages are a simulated sample to start the action step:
249+ <details >
250+ <summary >Click to show messages</summary >
251+
252+ ``` python
253+
254+ ASSISTANT_MESSAGES = [
255+ # ── Turn-1 ──
256+ {
257+ " role" : " assistant" ,
258+ " content" : ' <tool_call>\n {"name": "cd", "arguments": {"folder": "document"}} \n </tool_call>\n <tool_call>\n {"name": "mkdir", "arguments": {"dir_name": "temp"}} \n </tool_call>\n <tool_call>\n {"name": "mv", "arguments": {"source": "final_report.pdf", "destination": "temp"}} \n </tool_call>'
259+ },
260+ {
261+ " role" : " assistant" ,
262+ " content" : ' ok.1'
263+ },
264+ # ── Turn-2 ──
265+ {
266+ " role" : " assistant" ,
267+ " content" : ' <tool_call>\n {"name": "cd", "arguments": {"folder": "temp"}} \n </tool_call>\n <tool_call>\n {"name": "grep", "arguments": {"file_name": "final_report.pdf", "pattern": "budget analysis"}} \n </tool_call>'
268+ },
269+ {
270+ " role" : " assistant" ,
271+ " content" : ' ok.2'
272+ },
273+ # ── Turn-3 ──
274+ {
275+ " role" : " assistant" ,
276+ " content" : ' <tool_call>\n {"name": "sort", "arguments": {"file_name": "final_report.pdf"}} \n </tool_call>'
277+ },
278+ {
279+ " role" : " assistant" ,
280+ " content" : ' ok.2'
281+ },
282+ # ── Turn-4 ──
283+ {
284+ " role" : " assistant" ,
285+ " content" : ' <tool_call>\n {"name": "cd", "arguments": {"folder": ".."}} \n </tool_call>\n <tool_call>\n {"name": "mv", "arguments": {"source": "previous_report.pdf", "destination": "temp"}} \n </tool_call>\n <tool_call>\n {"name": "cd", "arguments": {"folder": "temp"}} \n </tool_call>\n <tool_call>\n {"name": "diff", "arguments": {"file_name1": "final_report.pdf", "file_name2": "previous_report.pdf"}} \n </tool_call>'
286+ },
287+ {
288+ " role" : " assistant" ,
289+ " content" : ' ok.2'
290+ },
291+ ]
292+
293+ ```
294+
295+ </details >
296+
297+ ``` python
298+ for turn_no, msg in enumerate (ASSISTANT_MESSAGES , 1 ):
299+ res = box.step(
300+ inst_id,
301+ msg
302+ )
303+ print (
304+ f " \n [TURN { turn_no} ] term= { res[' is_terminated' ]} "
305+ f " reward= { res[' reward' ]} \n state: { res.get(' state' , {})} "
306+ )
307+ if res[" is_terminated" ]:
308+ break
309+ ```
310+
311+ #### Evaluate
312+ ``` python
313+ score = box.evaluate(inst_id, params = {" sparse" : True })
314+ print (f " \n [RESULT] sparse_score = { score} " )
315+
316+ ```
317+ #### Release Instance
318+ ``` python
319+ box.release_instance(inst_id)
320+ ```
0 commit comments