|
| 1 | +# AGENTS.md — Guidance for agents working in this repo |
| 2 | + |
| 3 | +This repository implements a minimal agent runtime on top of DSPy. It is intentionally small and opinionated: use DSPy modules for planning/tool-calls and a thin Python loop for orchestration, tracing, and evaluation. |
| 4 | + |
| 5 | +What to preserve |
| 6 | +- Keep the runtime loop thin (see `micro_agent/agent.py`). Avoid adding heavyweight frameworks. |
| 7 | +- Respect provider modes: |
| 8 | + - OpenAI → native tool-calls via DSPy `PlanWithTools` + `JSONAdapter`. |
| 9 | + - Others (e.g., Ollama) → robust JSON decision loop with few-shot demos and JSON repair. |
| 10 | +- Keep observability simple and durable: append JSONL records under `TRACES_DIR` (default `traces/`). |
| 11 | +- Do not remove the JSON repair/lenient parsing fallback — models can drift. |
| 12 | + |
| 13 | +Policy and behavior (planning/acting) |
| 14 | +- Tools are required when the question implies: |
| 15 | + - math → must use `calculator` before finalize |
| 16 | + - time/date → must use `now` before finalize |
| 17 | +- Violations are recorded as steps (`tool: "⛔️policy_violation"`) and the loop continues. |
| 18 | +- Tool args are validated against JSON Schema before execution; invalid inputs add a `⛔️validation_error` step. The loop then requests a corrected call next iteration. |
| 19 | +- One tool per step. Favor strict JSON decisions. |
| 20 | +- For OpenAI, the agent composes final answers from tool results (calculator/now) when available to preserve key values. |
| 21 | + |
| 22 | +Provider configuration |
| 23 | +- Environment variables: |
| 24 | + - `LLM_PROVIDER`: `openai` | `ollama` | `mock` (default: `openai`) |
| 25 | + - OpenAI: `OPENAI_API_KEY`, `OPENAI_MODEL` (default `gpt-4o-mini`) |
| 26 | + - Ollama: `OLLAMA_HOST` (default `http://localhost:11434`), `OLLAMA_MODEL` |
| 27 | + - Optimization demos: `COMPILED_DEMOS_PATH` (default `opt/plan_demos.json`) |
| 28 | + - Costs (OpenAI): `OPENAI_INPUT_PRICE_PER_1K`, `OPENAI_OUTPUT_PRICE_PER_1K` (defaults provided for 4o/4o-mini/4.1) |
| 29 | +- `configure_lm()` turns on `track_usage` and falls back in order: Ollama → OpenAI → mock LM. |
| 30 | + |
| 31 | +Tools and safety |
| 32 | +- Tools are defined in `micro_agent/tools.py` with a small registry and JSON Schemas. |
| 33 | +- Runtime arg validation uses `jsonschema`. Unknown/malformed inputs do not crash the loop; they are surfaced in the trace for self-correction. |
| 34 | +- Calculator constraints: factorial ≤ 12, exponent bound, AST node-count limit, and number magnitude cap. The operator set is allow‑listed. |
| 35 | +- Plugin loader: set `TOOLS_MODULES="pkg1.tools,pkg2.tools"` to merge additional tool dicts. |
| 36 | + |
| 37 | +Tracing and API |
| 38 | +- Every ask appends a record to `traces/<id>.jsonl`: `{id, ts, question, steps, answer}`. |
| 39 | +- Steps are `{tool, args, observation}`. |
| 40 | +- CLI: `micro-agent replay --path traces/<id>.jsonl --index -1`. |
| 41 | +- HTTP: `POST /ask` and `GET /trace/{id}` (CORS enabled) via `micro_agent/server.py`. |
| 42 | + |
| 43 | +Optimization (teleprompting) |
| 44 | +- `micro-agent optimize` compiles a few-shot set for the OpenAI planner using DSPy `BootstrapFewShot`: |
| 45 | + - Strict metric for this repo’s sample tasks: finals must contain the expected substring for math; time tasks must call `now`. |
| 46 | + - Saves demos to JSON; the agent auto-loads them if present. |
| 47 | + |
| 48 | +Evals |
| 49 | +- `evals/run_evals.py` runs a small dataset and prints: |
| 50 | + - `success_rate`, `contains_hit_rate`, `key_hit_rate`, `avg_latency_sec`, `avg_lm_calls`, `avg_tool_calls`, `avg_steps`, `avg_cost_usd`, `n`. |
| 51 | +- Scoring combines substring hits and key-in-observation hits per `evals/rubrics.yaml`. |
| 52 | + |
| 53 | +Directory map (key files) |
| 54 | +- `micro_agent/agent.py` — runtime loop, provider branches, policy, usage aggregation. |
| 55 | +- `micro_agent/signatures.py` — DSPy signatures (`PlanWithTools`, etc.). |
| 56 | +- `micro_agent/tools.py` — tool registry, validation, safe calculator, plugin loader, dspy.Tool bridge. |
| 57 | +- `micro_agent/runtime.py` — tracing & robust JSON extraction (json_repair fallback). |
| 58 | +- `micro_agent/config.py` — LM configuration and fallbacks; `track_usage=True`. |
| 59 | +- `micro_agent/server.py` — FastAPI app (`POST /ask`, `GET /trace/{id}`) with CORS. |
| 60 | +- `micro_agent/optimize.py` — compile few‑shot demos and save for the OpenAI planner. |
| 61 | +- `evals/run_evals.py` — metrics harness with cost/latency/usage. |
| 62 | + |
| 63 | +Conventions |
| 64 | +- Keep changes minimal and focused; match code style and existing patterns. |
| 65 | +- Prefer explicit, strict JSON outputs from models; use repair/lenient paths as fallbacks. |
| 66 | +- Avoid adding non‑deterministic behavior that would complicate traces/replay. |
| 67 | + |
0 commit comments