I’m curious how people here are thinking about managing agentic LLM systems once they’re running in production. #10379
Replies: 4 comments
-
|
Great topic — this is exactly what I've been working on. Here's what's worked for me: 1. Shared State as the Observability LayerInstead of parsing logs, make the coordination mechanism be the audit trail: state = {
"run_id": "abc123",
"agent_decisions": {
"retriever": {"action": "query", "tokens": 150, "latency_ms": 230},
"analyzer": {"action": "summarize", "tokens": 890, "latency_ms": 1200}
},
"total_cost": 0.0042,
"errors": []
}Every agent reads/writes to this state. You get:
2. Token Budgets with Circuit Breakersif state["token_usage"]["total"] > BUDGET_LIMIT:
state["status"] = "paused"
state["pause_reason"] = "Budget exceeded"
# Alert or escalate3. Structured Diffs for DebuggingInstead of comparing logs, compare state snapshots: diff = compare_states(run_v1_state, run_v2_state)
# Shows exactly what changed between runsKey InsightThe coordination layer (shared state) becomes the observability layer. You don't add monitoring on top — it's built into how agents coordinate. This pattern reduced our debugging time dramatically. Working implementation: https://github.com/KeepALifeUS/autonomous-agents Would love to hear what others are building! |
Beta Was this translation helpful? Give feedback.
-
|
Biggest friction for me has been that debugging and optimization are two different problems but everyone treats them as one. Debugging = "what went wrong on this specific run" (tracing, logs, replay). The existing tools like Langfuse handle this well. Optimization = "which model/provider should I be using right now for this task type based on how things are actually performing." Nobody has great tooling for this because it requires continuous feedback, not just post-hoc analysis. Been working on Kalibr for the optimization side. You define success at the outcome level and it routes traffic toward what's working. The idea is you shouldn't have to read dashboards and manually swap models when something degrades. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Beyond basic observability, things like instrumentation, runtime control, and cost management seem to get complicated quickly as soon as you have multiple agents, tools, and models involved. In particular, it feels hard to reason about cost and token usage at the agent level, apply guardrails or budgets at runtime, or debug and compare agent runs in a structured way rather than just reading logs after the fact.
I’m interested in hearing how others are approaching this today. What parts are you building yourselves, what’s working, and where are you still feeling friction? This is just for discussion and learning, not pitching anything.
Beta Was this translation helpful? Give feedback.
All reactions