-
Notifications
You must be signed in to change notification settings - Fork 58
Open
Description
Summary
Add persistence support for agents so LangGraph thread context and state survive process restarts and can be resumed reliably.
Problem
Current agent runs do not persist LangGraph thread state durably, which makes recovery, resumability, and long-running workflows fragile.
Proposed Scope
- Persist LangGraph thread identifiers and state snapshots for agent runs
- Restore persisted thread/state on subsequent invocations
- Define lifecycle rules for creating/updating/resuming thread state
- Add tests covering persistence and resume behavior
Acceptance Criteria
- Agent thread/state persists across process restarts
- Existing workflows can resume from persisted state without regressions
- Tests validate persistence, retrieval, and resume paths
- Documentation updated for configuration and operational behavior
Architecture Notes (Feb 26, 2026)
Why Postgres over SQLite
- We have one thread per user and expect many threads to run in parallel.
- Concurrency guards in the current system mainly prevent duplicate in-flight work on the same thread; different threads still run concurrently and write concurrently.
- SQLite's single-writer model is likely to become a bottleneck under sustained multi-thread write load (target scale: 100-1000 users).
- Decision: use Postgres for production durability and concurrent write throughput.
.langgraph_api status right now
.langgraph_apiis acceptable for the current low-scale phase and development.- It is sufficient short-term while user concurrency is low (early beta), but is not the target persistence strategy for 100-1000 concurrent users.
Runtime / deployment notes
- Current stack runs
langgraph devin Docker for all agents. - Keep agents isolated by design: each agent should have its own independent persistence infrastructure and scale independently.
- For production-grade scaling, run one Postgres instance per agent (shared by that agent's replicas).
Redis notes
- Redis is not strictly required for the immediate
langgraph devpath. - If we adopt the full Agent Server /
langgraph upstyle architecture, Redis is part of that model (coordination/streaming), alongside Postgres.
Phased plan
- Phase 0 (now): continue with
langgraph dev+.langgraph_apiwhile validating product with low concurrency. - Phase 1 (scale-up): migrate each agent to Postgres-backed persistence before high-concurrency rollout.
- Phase 2 (optional): add Redis where needed for server-side stream/coordination features and operational resilience.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels