Skip to content

Commit e5fda10

Browse files
committed
docs: add AGENTS.md; align README with implementation (compiled demos env, GET /trace, calculator limits, remove unimplemented flags/endpoints)
1 parent bd3ec27 commit e5fda10

File tree

2 files changed

+72
-9
lines changed

2 files changed

+72
-9
lines changed

AGENTS.md

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
# AGENTS.md — Guidance for agents working in this repo
2+
3+
This repository implements a minimal agent runtime on top of DSPy. It is intentionally small and opinionated: use DSPy modules for planning/tool-calls and a thin Python loop for orchestration, tracing, and evaluation.
4+
5+
What to preserve
6+
- Keep the runtime loop thin (see `micro_agent/agent.py`). Avoid adding heavyweight frameworks.
7+
- Respect provider modes:
8+
- OpenAI → native tool-calls via DSPy `PlanWithTools` + `JSONAdapter`.
9+
- Others (e.g., Ollama) → robust JSON decision loop with few-shot demos and JSON repair.
10+
- Keep observability simple and durable: append JSONL records under `TRACES_DIR` (default `traces/`).
11+
- Do not remove the JSON repair/lenient parsing fallback — models can drift.
12+
13+
Policy and behavior (planning/acting)
14+
- Tools are required when the question implies:
15+
- math → must use `calculator` before finalize
16+
- time/date → must use `now` before finalize
17+
- Violations are recorded as steps (`tool: "⛔️policy_violation"`) and the loop continues.
18+
- Tool args are validated against JSON Schema before execution; invalid inputs add a `⛔️validation_error` step. The loop then requests a corrected call next iteration.
19+
- One tool per step. Favor strict JSON decisions.
20+
- For OpenAI, the agent composes final answers from tool results (calculator/now) when available to preserve key values.
21+
22+
Provider configuration
23+
- Environment variables:
24+
- `LLM_PROVIDER`: `openai` | `ollama` | `mock` (default: `openai`)
25+
- OpenAI: `OPENAI_API_KEY`, `OPENAI_MODEL` (default `gpt-4o-mini`)
26+
- Ollama: `OLLAMA_HOST` (default `http://localhost:11434`), `OLLAMA_MODEL`
27+
- Optimization demos: `COMPILED_DEMOS_PATH` (default `opt/plan_demos.json`)
28+
- Costs (OpenAI): `OPENAI_INPUT_PRICE_PER_1K`, `OPENAI_OUTPUT_PRICE_PER_1K` (defaults provided for 4o/4o-mini/4.1)
29+
- `configure_lm()` turns on `track_usage` and falls back in order: Ollama → OpenAI → mock LM.
30+
31+
Tools and safety
32+
- Tools are defined in `micro_agent/tools.py` with a small registry and JSON Schemas.
33+
- Runtime arg validation uses `jsonschema`. Unknown/malformed inputs do not crash the loop; they are surfaced in the trace for self-correction.
34+
- Calculator constraints: factorial ≤ 12, exponent bound, AST node-count limit, and number magnitude cap. The operator set is allow‑listed.
35+
- Plugin loader: set `TOOLS_MODULES="pkg1.tools,pkg2.tools"` to merge additional tool dicts.
36+
37+
Tracing and API
38+
- Every ask appends a record to `traces/<id>.jsonl`: `{id, ts, question, steps, answer}`.
39+
- Steps are `{tool, args, observation}`.
40+
- CLI: `micro-agent replay --path traces/<id>.jsonl --index -1`.
41+
- HTTP: `POST /ask` and `GET /trace/{id}` (CORS enabled) via `micro_agent/server.py`.
42+
43+
Optimization (teleprompting)
44+
- `micro-agent optimize` compiles a few-shot set for the OpenAI planner using DSPy `BootstrapFewShot`:
45+
- Strict metric for this repo’s sample tasks: finals must contain the expected substring for math; time tasks must call `now`.
46+
- Saves demos to JSON; the agent auto-loads them if present.
47+
48+
Evals
49+
- `evals/run_evals.py` runs a small dataset and prints:
50+
- `success_rate`, `contains_hit_rate`, `key_hit_rate`, `avg_latency_sec`, `avg_lm_calls`, `avg_tool_calls`, `avg_steps`, `avg_cost_usd`, `n`.
51+
- Scoring combines substring hits and key-in-observation hits per `evals/rubrics.yaml`.
52+
53+
Directory map (key files)
54+
- `micro_agent/agent.py` — runtime loop, provider branches, policy, usage aggregation.
55+
- `micro_agent/signatures.py` — DSPy signatures (`PlanWithTools`, etc.).
56+
- `micro_agent/tools.py` — tool registry, validation, safe calculator, plugin loader, dspy.Tool bridge.
57+
- `micro_agent/runtime.py` — tracing & robust JSON extraction (json_repair fallback).
58+
- `micro_agent/config.py` — LM configuration and fallbacks; `track_usage=True`.
59+
- `micro_agent/server.py` — FastAPI app (`POST /ask`, `GET /trace/{id}`) with CORS.
60+
- `micro_agent/optimize.py` — compile few‑shot demos and save for the OpenAI planner.
61+
- `evals/run_evals.py` — metrics harness with cost/latency/usage.
62+
63+
Conventions
64+
- Keep changes minimal and focused; match code style and existing patterns.
65+
- Prefer explicit, strict JSON outputs from models; use repair/lenient paths as fallbacks.
66+
- Avoid adding non‑deterministic behavior that would complicate traces/replay.
67+

README.md

Lines changed: 5 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ pip install -e .
3737
- Optional tuning: `TEMPERATURE` (default `0.2`), `MAX_TOKENS` (default `1024`)
3838
- Tool plugins: `TOOLS_MODULES="your_pkg.tools,other_pkg.tools"` to load extra tools (see Tools below)
3939
- Traces location: `TRACES_DIR` (default `traces/`)
40-
- Function-calls override: `USE_TOOL_CALLS=1|0` to force-enable/disable OpenAI function-calls mode
40+
- Compiled demos (OpenAI planner): `COMPILED_DEMOS_PATH` (default `opt/plan_demos.json`)
4141

4242
Examples:
4343
```bash
@@ -55,9 +55,6 @@ export OLLAMA_HOST=http://localhost:11434
5555
- `micro-agent ask --question <text> [--utc] [--max-steps N]`
5656
- `--utc` appends a hint to prefer UTC when time is used.
5757
- Saves a JSONL trace under `traces/<id>.jsonl` and prints the path.
58-
- Function-calls control:
59-
- `--func-calls` forces OpenAI-native function-calls when available.
60-
- `--no-func-calls` disables function-calls and uses robust JSON planning.
6158
- `micro-agent replay --path traces/<id>.jsonl [--index -1]`
6259
- Pretty-prints a saved record from the JSONL file.
6360

@@ -73,7 +70,6 @@ micro-agent replay --path traces/<id>.jsonl --index -1
7370
- Endpoint: `POST /ask`
7471
- Request JSON: `{ "question": "...", "max_steps": 6 }`
7572
- Response JSON: `{ "answer": str, "trace_id": str, "trace_path": str, "steps": [...] }`
76-
- Optional: `use_tool_calls: true|false` to force function-calls behavior.
7773

7874
Example:
7975
```bash
@@ -85,9 +81,6 @@ curl -s http://localhost:8000/ask \
8581
OpenAPI:
8682
- FastAPI publishes `/openapi.json` and interactive docs at `/docs`.
8783
- Schemas reflect `AskRequest` and `AskResponse` models in `micro_agent/server.py`.
88-
- Health: `GET /health` returns `{status, provider, model, max_steps}`.
89-
- Minimal health: `GET /healthz` returns `{status: "ok"}`.
90-
- Version: `GET /version` returns `{name, version}`.
9184
9285
## Tools
9386
- Built-ins live in `micro_agent/tools.py`:
@@ -107,6 +100,9 @@ Tool(
107100
Runtime validation
108101
- Tool args are validated against the JSON Schema before execution; invalid args add a `⛔️validation_error` step and the agent requests a correction in the next loop. See `micro_agent/tools.py` (run_tool) and `micro_agent/agent.py` (validation error handling).
109102
103+
Calculator limits
104+
- Factorial capped at 12; exponent size bounded; AST node count limited; large magnitudes rejected to prevent runaway compute. Only a small set of arithmetic nodes is allowed.
105+
110106
111107
## Provider Modes
112108
- OpenAI: uses DSPy `PlanWithTools` with `JSONAdapter` to enable native function-calls. The model may return `tool_calls` or a `final` answer; tool calls are executed via our registry.
@@ -191,7 +187,7 @@ The agent loads these demos on OpenAI providers and attaches them to the `PlanWi
191187
- Optional install: `pip install -e .[repair]`
192188
193189
## Limitations and Next Steps
194-
- Costs/usage are not recorded; you can plumb LM usage metadata into the eval harness if your wrapper exposes it.
190+
- Usage/cost capture is best-effort: exact numbers depend on provider support; otherwise the agent estimates from text.
195191
- The finalization step often composes from tool results for reliability; you can swap in a DSPy `Finalize` predictor if preferred.
196192
- Add persistence to a DB instead of JSONL by replacing `dump_trace`.
197193
- Add human-in-the-loop, budgets, retries, or branching per your needs.

0 commit comments

Comments
 (0)