|
| 1 | +--- |
| 2 | +title: Traces and Datasets |
| 3 | +--- |
| 4 | + |
| 5 | +# Traces |
| 6 | + |
| 7 | +<div class='subtitle'>Format used to represent actions and interactions of an AI agent</div> |
| 8 | + |
| 9 | +Running an agent produces a trace, which is a sequence of events such as user messages, assistant messages, tool calls, and tool outputs. |
| 10 | + |
| 11 | +## Types of events in traces |
| 12 | + |
| 13 | +User can often prompt the agent with a question, e.g. "Hello, how are you?: |
| 14 | +> <b>User message</b> <br/><br/> |
| 15 | + The trace typically starts with a user message, which is the prompt given to the agent. This is sometimes the only interaction between the user and the agent that |
| 16 | + then goes and autonomously tries to solve the task, but often such messages can also occur in the middle of the trace (e.g. to provide feedback). |
| 17 | + ```json |
| 18 | + { |
| 19 | + "role": "user", |
| 20 | + "content": "Hello, how are you?" |
| 21 | + } |
| 22 | + ``` |
| 23 | + |
| 24 | +The agent can reply to a question, e.g. "Thanks, I am doing great!": |
| 25 | +> <b>Assistant message</b> <br/><br/> |
| 26 | + Similarly to the user message, the assistant message is the response of the agent to the user, which can also contain agent's internal thoughts and reasoning for performing actions. |
| 27 | + ```json |
| 28 | + { |
| 29 | + "role": "assistant", |
| 30 | + "content": "Thanks, I am doing great!" |
| 31 | + } |
| 32 | + ``` |
| 33 | + |
| 34 | +The agent can decide to use tools to make actions in the real world. For example, the agent could decide to send an email to [email protected] with the subject "Running late, sorry!": |
| 35 | +> <b>Tool calls</b> <br/><br/> |
| 36 | + Tool calls are special actions that the agent performs to solve the task. |
| 37 | + Here, in addition to the `role` and `content` fields (same as in the assistant message), the `tool_calls` field is a list used to represent the tool calls made by the agent. |
| 38 | + Each tool call is a dictionary with a `type` field, which indicates the type of tool call, and a `function` field, which is a dictionary containing the name of the function to call and its arguments. |
| 39 | + Arguments can be passed either as a dictionary or as a JSON string. |
| 40 | + ```json |
| 41 | + { |
| 42 | + "role": "assistant", |
| 43 | + "content": "Sending an email to your mom now.", |
| 44 | + "tool_calls": [ |
| 45 | + { |
| 46 | + "type": "function", |
| 47 | + "function": { |
| 48 | + "name": "send_email", |
| 49 | + "arguments": { |
| 50 | + |
| 51 | + "subject": "Running late, sorry!", |
| 52 | + } |
| 53 | + } |
| 54 | + } |
| 55 | + ] |
| 56 | + } |
| 57 | + ``` |
| 58 | + |
| 59 | +After the tool calls are executed, the agent can observe the output of the tool call, e.g. "Email sent successfully.": |
| 60 | +> <b>Tool outputs</b> <br/><br/> |
| 61 | + Tool outputs are the results of the tool calls. Here, the `content` field contains the output of the tool call. |
| 62 | + ```json |
| 63 | + { |
| 64 | + "role": "tool", |
| 65 | + "content": "Email sent successfully.", |
| 66 | + } |
| 67 | + ``` |
| 68 | + |
| 69 | +## Datasets |
| 70 | + |
| 71 | +A dataset is a collection of traces obtained by running an agent on a set of related tasks (e.g. coding tasks). |
| 72 | +For instance, this is dataset containing 500 traces which result from running the OpenHands agent on SWE-Bench: [https://explorer.invariantlabs.ai/u/invariant/swe-bench--OpenHands---CodeAct-v2-1--claude-3-5-sonnet-20241022-/t/4](https://explorer.invariantlabs.ai/u/invariant/swe-bench--OpenHands---CodeAct-v2-1--claude-3-5-sonnet-20241022-/t/4). Dataset can contain its own metadata, e.g. accuracy of the agent on the dataset. |
0 commit comments