Skip to content

Commit 45baf9a

Browse files
authored
docs: rai simbench docs update (#665)
1 parent 4c31d5d commit 45baf9a

File tree

19 files changed

+509
-367
lines changed

19 files changed

+509
-367
lines changed

docs/simulation_and_benchmarking/rai_bench.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -73,16 +73,14 @@ score = (correctly_placed_now - correctly_placed_initially) / initially_incorrec
7373

7474
You can find predefined scene configs in `rai_bench/manipulation_o3de/predefined/configs/`.
7575

76-
Predefined scenarios can be imported like:
76+
Predefined scenarios can be imported, for example, choosing tasks by difficulty:
7777

7878
```python
7979
from rai_bench.manipulation_o3de import get_scenarios
8080

8181
get_scenarios(levels=["easy", "medium"])
8282
```
8383

84-
Choose which task you want by selecting the difficulty, from trivial to very hard scenarios.
85-
8684
## Tool Calling Agent Benchmark
8785

8886
Evaluates agent performance independently from any simulation, based only on tool calls that the agent makes. To make it independent from simulations, this benchmark introduces tool mocks which can be adjusted for different tasks. This makes the benchmark more universal and a lot faster.
@@ -106,6 +104,7 @@ The `Validator` class can combine single or multiple subtasks to create a single
106104

107105
- OrderedCallsValidator - requires a strict order of subtasks. The next subtask will be validated only when the previous one was completed. Validator passes when all subtasks pass.
108106
- NotOrderedCallsValidator - doesn't enforce order of subtasks. Every subtask will be validated against every tool call. Validator passes when all subtasks pass.
107+
- OneFromManyValidator - passes when any one of the given subtasks passes.
109108

110109
### Task
111110

0 commit comments

Comments
 (0)