You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/en/llm_observability/experiments/_index.md
+21-7Lines changed: 21 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -23,13 +23,9 @@ LLM Observability [Experiments][9] supports the entire lifecycle of building LLM
23
23
Install Datadog's LLM Observability Python SDK:
24
24
25
25
```shell
26
-
pip install ddtrace>=3.14.0
26
+
pip install ddtrace>=3.15.0
27
27
```
28
28
29
-
### Cookbooks
30
-
31
-
To see in-depth examples of what you can do with LLM Experiments, you can check these [jupyter notebooks][10]
32
-
33
29
### Setup
34
30
35
31
Enable LLM Observability:
@@ -221,6 +217,9 @@ Evaluators are functions that measure how well the model or agent performs by co
221
217
- score: returns a numeric value (float)
222
218
- categorical: returns a labeled category (string)
223
219
220
+
### Summary Evaluators
221
+
Summary Evaluators are optionally defined functions that measure how well the model or agent performs, by providing an aggregated score against the entire dataset, outputs, and evaluation results. The supported evaluator types are the same as above.
222
+
224
223
### Creating an experiment
225
224
226
225
1. Load a dataset
@@ -266,7 +265,17 @@ Evaluators are functions that measure how well the model or agent performs by co
266
265
return fake_llm_call
267
266
```
268
267
Evaluator functions can take any non-null type as `input_data` (string, number, Boolean, object, array); `output_data` and `expected_output` can be any type.
269
-
Evaluators can only return a string, number, Boolean.
268
+
Evaluators can only return a string, a number, or a Boolean.
If defined and provided to the experiment, summary evaluator functions are executed after evaluators have finished running. Summary evaluator functions can take a list of any non-null type as `inputs` (string, number, Boolean, object, array); `outputs` and `expected_outputs` can be lists of any type. `evaluators_results` is a dictionary of list of results from evaluators, keyed by the name of the evaluator function. For example, in the above code snippet the summary evaluator `num_exact_matches` uses the results (a list of Booleans) from the `exact_match` evaluator to provide a count of number of exact matches.
278
+
Summary evaluators can only return a string, a number, or a Boolean.
270
279
271
280
6. Create and run the experiment.
272
281
```python
@@ -275,6 +284,7 @@ Evaluators are functions that measure how well the model or agent performs by co
0 commit comments