You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: AGENTS.md
+5Lines changed: 5 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -37,3 +37,8 @@ These are considered priority 0 issues for this repo, in addition to the normal
37
37
- Check that code identifiers remain descriptive (no leftover placeholder names) and that repeated values are factored into constants when practical.
38
38
- Ensure notebooks or scripts document any required environment variables instead of hard-coding secrets or keys.
39
39
- Confirm metadata files (`registry.yaml`, `authors.yaml`) stay in sync with new or relocated content.
40
+
41
+
## Recent Learnings
42
+
43
+
-**Realtime eval shared imports can resolve the wrong module under pytest** -> Add `shared/__init__.py` and ensure tests prepend `examples/evals/realtime_evals` to `sys.path` before importing `shared.*` -> Prevents collection failures caused by unrelated installed packages named `shared`.
44
+
-**Run-level grades can be overweighted by long simulations** -> Store turn-level grades on the matching turn and trace-level grades on one row per simulation instead of copying them onto every row -> Keeps `results.csv` row semantics intact and prevents summary means from favoring longer conversations.
Copy file name to clipboardExpand all lines: examples/evals/realtime_evals/README.md
+34-11Lines changed: 34 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,20 +11,33 @@ Depending on your realtime eval maturity, point Codex (or your preferred coding
11
11
12
12
## Quickstart
13
13
14
-
Python 3.9+ required.
14
+
Python 3.12+ required.
15
15
16
16
```bash
17
-
pip install -r requirements.txt
17
+
make install
18
+
source .venv/bin/activate
18
19
export OPENAI_API_KEY="your_api_key"
19
20
```
20
21
21
-
Run a first command per harness:
22
+
`make install` creates the local `.venv` and installs both runtime and dev dependencies. It uses `uv` when available and otherwise falls back to `python -m venv` plus `pip install -r requirements.txt -r requirements-dev.txt`.
-`uv run python walk_harness/run_realtime_evals.py`
30
+
- Run: `uv run python run_harness/run_realtime_evals.py --max-examples 1`
31
+
32
+
## Dev commands
33
+
Use the root `Makefile` for common checks. Run `make install` first to create `.venv`. These targets work with or without `uv`: when `uv` is installed they run through `uv run`, and otherwise they use the matching tool binaries from the local `.venv`.
0 commit comments