-
Notifications
You must be signed in to change notification settings - Fork 605
Description
Problem Statement
Current S3 Session Manager Issues:
- Loads entire conversation history (1000s of messages) before trimming
- Stores full message history linearly, causing:
- Slow initial load times for long-running agents
- High memory pressure during deserialization
- Increasing storage costs as conversations grow
- Inefficient reads (load everything even when only recent context needed)
Broader Issue:
Current session representation is conversation-centric (list of all messages) rather than checkpoint-centric (single state snapshot).
This makes it:
- Expensive to store (redundant information across messages)
- Expensive to load (deserialize everything before use)
- Expensive to operate (storage costs scale linearly with conversation length)
Impact: Latency, memory usage, and storage costs increase unboundedly for long-running agents.
Proposed Solution
Proposed Solution
Move from conversation-based session storage to checkpoint-based session storage, similar to Langgraph's checkpointer pattern.
Key Concept:
Instead of storing entire conversation history, store compact state snapshots (checkpoints) that represent agent state at a point in
time.
Current (Conversation-based):
Session = [msg1, msg2, msg3, ..., msg1000] → Load all 1000 messages
Proposed (Checkpoint-based):
Session = Checkpoint{
state: {...},
recent_context: [msg998, msg999, msg1000], → Load only what's needed
metadata: {...}
}
Core Changes:
- Checkpoint as Primary Representation
- Session state is a single checkpoint object
- Contains minimal state needed for resumption
- Recent conversation context included (not full history)
- Older messages archived separately if needed - Lazy Loading Architecture
session_manager = S3SessionManager(
session_id=context.session_id,
max_recent_messages=50, # Only load last N messages
lazy_load=True # Don't load until needed
)
- Incremental Updates
- Update checkpoint incrementally instead of rewriting full history
- Only write state deltas on each turn
- Compress older conversation history - Configurable Retention
session_manager = S3SessionManager(
checkpoint_strategy="recent", # Only recent context
archive_after_messages=100, # Archive older messages
compression=True # Compress archived history
)
Use Case
Use Case 1: Long-Running Customer Support Agent
Scenario: Support agent with 500+ message conversation
Current Behavior:
Turn 501:
- Load all 500 messages from S3 (5 seconds)
- Deserialize 500 messages into memory (2GB)
- Invoke model with context window (uses last 20 messages)
- Conversation manager trims to 20 messages
- Write all 501 messages back to S3
Total: 7 seconds, 2GB memory, high S3 cost
With Checkpoint Model:
Turn 501:
- Load checkpoint with last 50 messages (0.5 seconds)
- Deserialize checkpoint (100MB)
- Invoke model with context window (uses last 20 messages)
- Update checkpoint with new message
- Write checkpoint delta to S3
Total: 1 second, 100MB memory, 10x lower S3 cost
Benefit: 7x faster, 20x less memory, 10x lower storage costs
Use Case 2: Multi-Day Agent Sessions
Scenario: Research agent that pauses and resumes over multiple days
Current:
- Day 1: 100 messages stored
- Day 2: Load 100, add 50 → 150 messages stored
- Day 3: Load 150, add 50 → 200 messages stored
- Storage grows linearly, load time increases each day
With Checkpoint:
- Day 1: Checkpoint with recent 30 messages, archive rest
- Day 2: Load checkpoint (30 messages), update checkpoint
- Day 3: Load checkpoint (30 messages), update checkpoint
- Storage stays constant, load time consistent
Benefit: Predictable performance and costs regardless of session length
Use Case 3: High-Volume Production Agents
Scenario: 10,000 active sessions, each averaging 200 messages
Current Storage:
10,000 sessions × 200 messages × 5KB per message = 10GB
Monthly S3 cost: ~$0.25
Monthly GET operations: 1M requests = $0.40
Total: $0.65/month per 10K sessions
With Checkpoint:
10,000 sessions × 1 checkpoint × 250KB = 2.5GB
Monthly S3 cost: ~$0.06
Monthly GET operations: 100K requests = $0.04
Total: $0.10/month per 10K sessions
Benefit: 75% cost reduction at scale
Use Case 4: Agents with Token Limits
Scenario: Agent using Claude with 200K token context window
Current:
- Load all messages (maybe 500K tokens of history)
- Trim to 200K tokens before invoking model
- Wasted bandwidth and processing
With Checkpoint:
- Checkpoint stores only relevant context (200K tokens)
- No trimming needed
- Direct invocation
Benefit: Faster invocations, lower bandwidth
Use Case 5: Session Recovery After Failures
Scenario: Agent crashes mid-execution, needs to resume
Current:
- Load entire conversation history
- Rebuild state from all messages
- Time-consuming for long sessions
With Checkpoint:
- Load latest checkpoint (already contains state)
- Immediate resumption
- Fast recovery
Benefit: Better availability, faster recovery
Use Case 6: Audit and Compliance
Scenario: Need full conversation history for compliance, but not for operations
Current:
- Full history loaded on every turn (even when not needed)
- Performance penalty for compliance requirement
With Checkpoint:
- Checkpoint used for operations (fast)
- Full history archived separately (S3 Glacier for compliance)
- Load archive only when needed for audit
Benefit: Separate operational and compliance concerns
Alternatives Solutions
No response
Additional Context
No response