Skip to content

Commit 5004530

Browse files
committed
add documentation
1 parent 1a37f37 commit 5004530

File tree

2 files changed

+75
-2
lines changed

2 files changed

+75
-2
lines changed

docs/explorer/How-To-Guide/search_and_filter.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,12 @@
11
# Search
22

33
Often, there is a large number of traces in the dataset. In this section, we describe how to search over these traces efficiently
4-
and filter them according to different criteria.
4+
and filter them according to different criteria. For instance, searching for `maps` would return all traces containing the word `maps` somewhere in the trace.
55

66
### Exact search
77

8-
Any text put in the search box, including spaces, is searched for exactly (but case-insensitively) in the trace.
8+
The simplest form of search is an exact search which searches for exact string (but case-insensitively) in the trace.
9+
The set of traces contained ![search](assets/images/search_screenshot.png)
910

1011
### Filters
1112

docs/explorer/traces_datasets.md

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
---
2+
title: Traces and Datasets
3+
---
4+
5+
# Traces
6+
7+
<div class='subtitle'>Format used to represent actions and interactions of an AI agent</div>
8+
9+
Running an agent produces a trace, which is a sequence of events such as user messages, assistant messages, tool calls, and tool outputs.
10+
11+
## Types of events in traces
12+
13+
User can often prompt the agent with a question, e.g. "Hello, how are you?:
14+
> <b>User message</b> <br/><br/>
15+
The trace typically starts with a user message, which is the prompt given to the agent. This is sometimes the only interaction between the user and the agent that
16+
then goes and autonomously tries to solve the task, but often such messages can also occur in the middle of the trace (e.g. to provide feedback).
17+
```json
18+
{
19+
"role": "user",
20+
"content": "Hello, how are you?"
21+
}
22+
```
23+
24+
The agent can reply to a question, e.g. "Thanks, I am doing great!":
25+
> <b>Assistant message</b> <br/><br/>
26+
Similarly to the user message, the assistant message is the response of the agent to the user, which can also contain agent's internal thoughts and reasoning for performing actions.
27+
```json
28+
{
29+
"role": "assistant",
30+
"content": "Thanks, I am doing great!"
31+
}
32+
```
33+
34+
The agent can decide to use tools to make actions in the real world. For example, the agent could decide to send an email to [email protected] with the subject "Running late, sorry!":
35+
> <b>Tool calls</b> <br/><br/>
36+
Tool calls are special actions that the agent performs to solve the task.
37+
Here, in addition to the `role` and `content` fields (same as in the assistant message), the `tool_calls` field is a list used to represent the tool calls made by the agent.
38+
Each tool call is a dictionary with a `type` field, which indicates the type of tool call, and a `function` field, which is a dictionary containing the name of the function to call and its arguments.
39+
Arguments can be passed either as a dictionary or as a JSON string.
40+
```json
41+
{
42+
"role": "assistant",
43+
"content": "Sending an email to your mom now.",
44+
"tool_calls": [
45+
{
46+
"type": "function",
47+
"function": {
48+
"name": "send_email",
49+
"arguments": {
50+
51+
"subject": "Running late, sorry!",
52+
}
53+
}
54+
}
55+
]
56+
}
57+
```
58+
59+
After the tool calls are executed, the agent can observe the output of the tool call, e.g. "Email sent successfully.":
60+
> <b>Tool outputs</b> <br/><br/>
61+
Tool outputs are the results of the tool calls. Here, the `content` field contains the output of the tool call.
62+
```json
63+
{
64+
"role": "tool",
65+
"content": "Email sent successfully.",
66+
}
67+
```
68+
69+
## Datasets
70+
71+
A dataset is a collection of traces obtained by running an agent on a set of related tasks (e.g. coding tasks).
72+
For instance, this is dataset containing 500 traces which result from running the OpenHands agent on SWE-Bench: [https://explorer.invariantlabs.ai/u/invariant/swe-bench--OpenHands---CodeAct-v2-1--claude-3-5-sonnet-20241022-/t/4](https://explorer.invariantlabs.ai/u/invariant/swe-bench--OpenHands---CodeAct-v2-1--claude-3-5-sonnet-20241022-/t/4). Dataset can contain its own metadata, e.g. accuracy of the agent on the dataset.

0 commit comments

Comments
 (0)