You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/agentlab/benchmarks/osworld.md
+20-2Lines changed: 20 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -31,7 +31,7 @@ The main entry point `experiments/run_osworld.py` is currently configured with h
31
31
2.**Environment Variables:**
32
32
-`AGENTLAB_DEBUG=1`: Automatically runs the debug subset (7 tasks from `osworld_debug_task_ids.json`)
33
33
34
-
### Running OSWorld Tasks
34
+
### Task subsets
35
35
36
36
We provide different subsets of tasks:
37
37
@@ -42,10 +42,28 @@ We provide different subsets of tasks:
42
42
### Example Commands
43
43
44
44
```bash
45
-
# Run with default debug subset (7 tasks)
45
+
# Run with default debug subset using sequential execution in VMware VM
46
46
python experiments/run_osworld.py
47
47
```
48
48
49
+
### Parallel Execution with Docker
50
+
To run OSWorld in parallel using Docker, ensure you have Docker installed and configured.
51
+
To install it, follow the section from the OSWorld README on [Docker setup](https://github.com/xlang-ai/OSWorld?tab=readme-ov-file#docker-server-with-kvm-support-for-better-performance).
52
+
Ensure that your docker installation support KVM, as OSWorld requires it for running VMs.
53
+
We also recommend pulling the latest Docker image for OSWorld before running the benchmark:
54
+
55
+
```bash
56
+
docker pull happysixd/osworld-docker
57
+
```
58
+
59
+
After setting up Docker, you can change the `use_vmware` parameter in the script to `False` and run:
60
+
61
+
```bash
62
+
python experiments/run_osworld.py
63
+
```
64
+
You can control number of parallel jobs by setting the `n_jobs` parameter in the script, the default is 4.
65
+
We recommend setting `n_jobs` to `your_number_of_cpu_cores - 2` to leave some resources for the host system and the benchmark itself.
0 commit comments