I’m curious how people here are thinking about managing agentic LLM systems once they’re running in production. #10379

bryanadenhq · 2026-01-13T20:44:48Z

bryanadenhq
Jan 13, 2026

Beyond basic observability, things like instrumentation, runtime control, and cost management seem to get complicated quickly as soon as you have multiple agents, tools, and models involved. In particular, it feels hard to reason about cost and token usage at the agent level, apply guardrails or budgets at runtime, or debug and compare agent runs in a structured way rather than just reading logs after the fact.
I’m interested in hearing how others are approaching this today. What parts are you building yourselves, what’s working, and where are you still feeling friction? This is just for discussion and learning, not pitching anything.

KeepALifeUS · 2026-02-12T20:36:49Z

KeepALifeUS
Feb 12, 2026

Great topic — this is exactly what I've been working on. Here's what's worked for me:

1. Shared State as the Observability Layer

Instead of parsing logs, make the coordination mechanism be the audit trail:

state = {
    "run_id": "abc123",
    "agent_decisions": {
        "retriever": {"action": "query", "tokens": 150, "latency_ms": 230},
        "analyzer": {"action": "summarize", "tokens": 890, "latency_ms": 1200}
    },
    "total_cost": 0.0042,
    "errors": []
}

Every agent reads/writes to this state. You get:

Token tracking per agent — no log parsing needed
Cost attribution — which agent is expensive?
Deterministic replay — feed the same state, get the same behavior

2. Token Budgets with Circuit Breakers

if state["token_usage"]["total"] > BUDGET_LIMIT:
    state["status"] = "paused"
    state["pause_reason"] = "Budget exceeded"
    # Alert or escalate

3. Structured Diffs for Debugging

Instead of comparing logs, compare state snapshots:

diff = compare_states(run_v1_state, run_v2_state)
# Shows exactly what changed between runs

Key Insight

The coordination layer (shared state) becomes the observability layer. You don't add monitoring on top — it's built into how agents coordinate.

This pattern reduced our debugging time dramatically. Working implementation: https://github.com/KeepALifeUS/autonomous-agents

Would love to hear what others are building!

0 replies

devonakelley · 2026-02-22T06:44:18Z

devonakelley
Feb 22, 2026

Biggest friction for me has been that debugging and optimization are two different problems but everyone treats them as one.

Debugging = "what went wrong on this specific run" (tracing, logs, replay). The existing tools like Langfuse handle this well.

Optimization = "which model/provider should I be using right now for this task type based on how things are actually performing." Nobody has great tooling for this because it requires continuous feedback, not just post-hoc analysis.

Been working on Kalibr for the optimization side. You define success at the outcome level and it routes traffic toward what's working. The idea is you shouldn't have to read dashboards and manually swap models when something degrades.

0 replies

This comment was marked as spam.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I’m curious how people here are thinking about managing agentic LLM systems once they’re running in production. #10379

Uh oh!

{{title}}

Uh oh!

Replies: 4 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

This comment was marked as spam.

This comment was marked as spam.

Select a reply

Uh oh!

I’m curious how people here are thinking about managing agentic LLM systems once they’re running in production. #10379

Uh oh!

bryanadenhq Jan 13, 2026

Replies: 4 comments

Uh oh!

KeepALifeUS Feb 12, 2026

1. Shared State as the Observability Layer

2. Token Budgets with Circuit Breakers

3. Structured Diffs for Debugging

Key Insight

Uh oh!

devonakelley Feb 22, 2026

This comment was marked as spam.

This comment was marked as spam.

bryanadenhq
Jan 13, 2026

KeepALifeUS
Feb 12, 2026

devonakelley
Feb 22, 2026