You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: scripts/ai_assistance/README.md
+3Lines changed: 3 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,6 +4,9 @@ This directory contains scripts for the prompt iteration & evaluation process fo
4
4
5
5
Mainly, `auto-run/auto-run.ts` script takes example URLs, runs the examples and outputs the results to the `auto-run/data/` directory. Then, the HTML page in `eval/` folder takes these results and presents them in a UI for evaluation.
6
6
7
+
**NOTE: looking for the automatic evaluation suite?**
8
+
As of September 2025, we also have an evaluation suite where we can define evaluations to apply to an output and have them automatically evaluated, including using an LLM as judge. See the README in `suites/` for more detail on this.
Copy file name to clipboardExpand all lines: scripts/ai_assistance/suite/README.md
+10-3Lines changed: 10 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ At this time, this is being heavily iterated on and may change rapidly. Chat to
6
6
7
7
## Getting started
8
8
9
-
### 1: get the outputs from GCP
9
+
### 1: download the outputs from GCP
10
10
11
11
The actual output files you need to run the suite are hosted in a GCP bucket. The contents are fetched for you by `gclient sync` but only if you set the `checkout_ai_evals` arg in your `.gclient` config:
12
12
@@ -37,7 +37,9 @@ Run `cd scripts/ai_assistance && npm run eval-suite` to execute the suite.
37
37
38
38
## Adding new outputs
39
39
40
-
Once you have new outputs you want to put into the set, move them into the right place in the `suite/outputs/outputs` folder.:
40
+
To get outputs, you should use the auto-run tool but pass the `--eval` flag. This will cause it to output a secondary file named `*.eval.json` that contains the output in the format the evaluation suiteexpects.
41
+
42
+
Once you have new outputs you want to put into the set, move them into the right place in the `suite/outputs/outputs` folder.
41
43
42
44
The structure of files in this folder is like so: `outputs/type/YYYY-MM-DD/label-XYZ.json`.
43
45
@@ -51,7 +53,12 @@ Then, run (from the DevTools root directory in this case, but it doesn't matter)
51
53
node scripts/ai_assistance/suite/upload_to_gcp.ts
52
54
```
53
55
54
-
This will upload the changes to the GCP bucket and update the `DEPS` file for you, which you should ensure you commit in a CL.
56
+
This will upload the changes to the GCP bucket and update the `DEPS` file for you, which you should ensure you commit in a CL. The best workflow is:
57
+
58
+
1. Generate your new output file(s).
59
+
2. Move any new files into `suites/outputs/...`.
60
+
3. Use the `upload_to_gcp.ts` script.
61
+
4. Commit the `DEPS` change and send the CL for review.
55
62
56
63
If you get any authorisation errors, run `gsutil.py config` to refresh your authentication status.
0 commit comments