Skip to content

Commit bfe6088

Browse files
synthetic trace generation + e2e test
1 parent 3fcd67b commit bfe6088

File tree

13 files changed

+2491
-6
lines changed

13 files changed

+2491
-6
lines changed

.cursor/rules/use-just-recipes.mdc

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
---
2+
description: When running things like unit tests, e2e tests, migrations, first look for a corresponding just recipe @justfile
3+
alwaysApply: false
4+
---

client/tests/e2e/assisted-facilitation.spec.ts

Lines changed: 438 additions & 0 deletions
Large diffs are not rendered by default.

client/tests/e2e/discovery-invite-traces.spec.ts

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -109,11 +109,13 @@ test('discovery blocks until multiple participants complete; facilitator-driven
109109

110110
await expect(p.getByTestId('discovery-phase-title')).toBeVisible();
111111

112-
await p.locator('#question1').fill('Clear but slightly verbose.');
113-
await p
114-
.locator('#question2')
115-
.fill('If it included account recovery steps for locked-out users, it would be better.');
112+
// TraceViewerDemo renders discovery questions with ids like `dq-q_1`
113+
const q1 = p.locator('#dq-q_1');
114+
await expect(q1).toBeVisible();
115+
await q1.fill('Clear but slightly verbose. Consider account recovery steps for locked-out users.');
116+
await q1.blur(); // autosave happens onBlur
116117

118+
// Single-trace discovery: the navigation button shows "Complete"
117119
await p.getByRole('button', { name: /^Complete$/i }).click();
118120
await expect(p.getByTestId('complete-discovery-phase-button')).toBeVisible();
119121
await p.getByTestId('complete-discovery-phase-button').click();

client/tests/e2e/rubric-creation.spec.ts

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -144,8 +144,8 @@ test('rubric creation: facilitator can advance from discovery and create a rubri
144144
page.getByRole('button', { name: /^Save$/i }).click(),
145145
]);
146146

147-
// Assert UI shows rubric summary
148-
await expect(page.getByText(/Rubric Summary/i)).toBeVisible();
147+
// Assert UI is on the rubric editor and the criterion exists
148+
await expect(page.getByText(/Evaluation Criteria/i)).toBeVisible();
149149
await expect
150150
.poll(async () => {
151151
return page.locator('input').evaluateAll(

client/tests/fixtures/discovery-traces.json

Lines changed: 236 additions & 0 deletions
Large diffs are not rendered by default.

doc/DISCOVERY.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,16 @@ The process can be more art than science. It's often messy, and can suffer from
2323

2424
Assisted facilitation helps participants go deeper on each example and helps facilitators guide discussion without needing to be a domain expert.
2525

26+
## Development: DSPy tracing (optional)
27+
28+
If you want to capture **DSPy/Discovery LLM call traces** in MLflow during development, set:
29+
30+
- **`MLFLOW_DSPY_DEV_EXPERIMENT_ID`**: MLflow experiment id to log DSPy traces to (dev-only, separate from the workshop’s MLflow intake experiment).
31+
32+
Notes:
33+
- This only affects discovery’s DSPy calls (question generation + summaries) and is a **no-op** when unset.
34+
- Your MLflow tracking/auth still needs to be configured (e.g., Databricks `DATABRICKS_HOST` / `DATABRICKS_TOKEN` in environments that use `mlflow.set_tracking_uri("databricks")`).
35+
2636
### During participant review (per example)
2737

2838
- **Start simple, then go deeper**: each example begins with a baseline prompt (“what makes this effective or ineffective?”). As a participant responds, the application can propose a small number of follow-up questions that encourage deeper thinking (edge cases, missing info, boundary conditions, failure modes).

notebooks/README.md

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
# Notebooks
2+
3+
This directory contains Jupyter notebooks for data generation, exploration, and testing workflows that are **not** part of the core server application.
4+
5+
## Notebooks
6+
7+
### `generate_discovery_traces.ipynb`
8+
9+
Generates synthetic **Code Assistant** traces designed to stress test all 6 discovery question categories:
10+
11+
| Category | Description | Code Assistant Examples |
12+
|----------|-------------|------------------------|
13+
| `themes` | General quality patterns | Code readability, best practices, documentation |
14+
| `edge_cases` | Unusual inputs/scenarios | Empty arrays, unicode strings, deeply nested structures |
15+
| `boundary_conditions` | Limits and thresholds | Off-by-one errors, array bounds, integer overflow |
16+
| `failure_modes` | Ways the system can fail | Missing error handling, security flaws, incorrect logic |
17+
| `missing_info` | Ambiguous or incomplete context | Unclear requirements, missing type info, vague intent |
18+
| `disagreements` | Multiple valid approaches | Style preferences, performance vs readability trade-offs |
19+
20+
**Use cases:**
21+
- User testing of the assisted facilitation flow
22+
- Generating E2E test fixtures for the discovery phase
23+
- Future DSPy optimization using coverage metrics
24+
25+
## Setup
26+
27+
1. Install notebook dependencies:
28+
```bash
29+
uv pip install jupyter ipykernel
30+
```
31+
32+
2. Configure Databricks/MLflow credentials (if exporting to MLflow):
33+
```bash
34+
export DATABRICKS_HOST="https://your-workspace.cloud.databricks.com"
35+
export DATABRICKS_TOKEN="your-token"
36+
```
37+
38+
3. Run the notebook:
39+
```bash
40+
uv run jupyter notebook notebooks/generate_discovery_traces.ipynb
41+
```
42+
43+
## Output Formats
44+
45+
The notebook can export traces in two formats:
46+
47+
1. **MLflow Traces**: Direct upload to an MLflow experiment for workshop ingestion
48+
2. **JSON Fixtures**: Static files for E2E tests in `client/tests/fixtures/`
49+
50+
## DSPy Signatures
51+
52+
The `synthetic_trace_dspy.py` module defines DSPy signatures for:
53+
54+
- `GenerateSyntheticTrace`: Generates traces targeting specific discovery categories
55+
- `ScoreTraceCoverage`: Evaluates how well a trace elicits target categories (for optimization)
56+
57+
These signatures can be used with DSPy optimizers (e.g., `BootstrapFewShot`) to self-improve trace generation based on actual workshop outcomes.

0 commit comments

Comments
 (0)