Skip to content

[FEATURE] Optimize Session State Representation with Checkpoint Model #1230

@siwachabhi

Description

@siwachabhi

Problem Statement

Current S3 Session Manager Issues:

  • Loads entire conversation history (1000s of messages) before trimming
  • Stores full message history linearly, causing:
    • Slow initial load times for long-running agents
    • High memory pressure during deserialization
    • Increasing storage costs as conversations grow
    • Inefficient reads (load everything even when only recent context needed)

Broader Issue:
Current session representation is conversation-centric (list of all messages) rather than checkpoint-centric (single state snapshot).
This makes it:

  • Expensive to store (redundant information across messages)
  • Expensive to load (deserialize everything before use)
  • Expensive to operate (storage costs scale linearly with conversation length)

Impact: Latency, memory usage, and storage costs increase unboundedly for long-running agents.

Proposed Solution

Proposed Solution

Move from conversation-based session storage to checkpoint-based session storage, similar to Langgraph's checkpointer pattern.

Key Concept:
Instead of storing entire conversation history, store compact state snapshots (checkpoints) that represent agent state at a point in
time.

  Current (Conversation-based):
  Session = [msg1, msg2, msg3, ..., msg1000]  → Load all 1000 messages

  Proposed (Checkpoint-based):
  Session = Checkpoint{
    state: {...},
    recent_context: [msg998, msg999, msg1000],  → Load only what's needed
    metadata: {...}
  }

Core Changes:

  1. Checkpoint as Primary Representation
    - Session state is a single checkpoint object
    - Contains minimal state needed for resumption
    - Recent conversation context included (not full history)
    - Older messages archived separately if needed
  2. Lazy Loading Architecture
  session_manager = S3SessionManager(
      session_id=context.session_id,
      max_recent_messages=50,  # Only load last N messages
      lazy_load=True  # Don't load until needed
  )
  1. Incremental Updates
    - Update checkpoint incrementally instead of rewriting full history
    - Only write state deltas on each turn
    - Compress older conversation history
  2. Configurable Retention
session_manager = S3SessionManager(
    checkpoint_strategy="recent",  # Only recent context
    archive_after_messages=100,    # Archive older messages
    compression=True               # Compress archived history
)

Use Case

Use Case 1: Long-Running Customer Support Agent

Scenario: Support agent with 500+ message conversation

Current Behavior:
Turn 501:

  1. Load all 500 messages from S3 (5 seconds)
  2. Deserialize 500 messages into memory (2GB)
  3. Invoke model with context window (uses last 20 messages)
  4. Conversation manager trims to 20 messages
  5. Write all 501 messages back to S3

Total: 7 seconds, 2GB memory, high S3 cost

With Checkpoint Model:
Turn 501:

  1. Load checkpoint with last 50 messages (0.5 seconds)
  2. Deserialize checkpoint (100MB)
  3. Invoke model with context window (uses last 20 messages)
  4. Update checkpoint with new message
  5. Write checkpoint delta to S3

Total: 1 second, 100MB memory, 10x lower S3 cost

Benefit: 7x faster, 20x less memory, 10x lower storage costs


Use Case 2: Multi-Day Agent Sessions

Scenario: Research agent that pauses and resumes over multiple days

Current:

  • Day 1: 100 messages stored
  • Day 2: Load 100, add 50 → 150 messages stored
  • Day 3: Load 150, add 50 → 200 messages stored
  • Storage grows linearly, load time increases each day

With Checkpoint:

  • Day 1: Checkpoint with recent 30 messages, archive rest
  • Day 2: Load checkpoint (30 messages), update checkpoint
  • Day 3: Load checkpoint (30 messages), update checkpoint
  • Storage stays constant, load time consistent

Benefit: Predictable performance and costs regardless of session length


Use Case 3: High-Volume Production Agents

Scenario: 10,000 active sessions, each averaging 200 messages

Current Storage:
10,000 sessions × 200 messages × 5KB per message = 10GB
Monthly S3 cost: ~$0.25
Monthly GET operations: 1M requests = $0.40
Total: $0.65/month per 10K sessions

With Checkpoint:
10,000 sessions × 1 checkpoint × 250KB = 2.5GB
Monthly S3 cost: ~$0.06
Monthly GET operations: 100K requests = $0.04
Total: $0.10/month per 10K sessions

Benefit: 75% cost reduction at scale


Use Case 4: Agents with Token Limits

Scenario: Agent using Claude with 200K token context window

Current:

  • Load all messages (maybe 500K tokens of history)
  • Trim to 200K tokens before invoking model
  • Wasted bandwidth and processing

With Checkpoint:

  • Checkpoint stores only relevant context (200K tokens)
  • No trimming needed
  • Direct invocation

Benefit: Faster invocations, lower bandwidth


Use Case 5: Session Recovery After Failures

Scenario: Agent crashes mid-execution, needs to resume

Current:

  • Load entire conversation history
  • Rebuild state from all messages
  • Time-consuming for long sessions

With Checkpoint:

  • Load latest checkpoint (already contains state)
  • Immediate resumption
  • Fast recovery

Benefit: Better availability, faster recovery


Use Case 6: Audit and Compliance

Scenario: Need full conversation history for compliance, but not for operations

Current:

  • Full history loaded on every turn (even when not needed)
  • Performance penalty for compliance requirement

With Checkpoint:

  • Checkpoint used for operations (fast)
  • Full history archived separately (S3 Glacier for compliance)
  • Load archive only when needed for audit

Benefit: Separate operational and compliance concerns

Alternatives Solutions

No response

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-persistenceSession management or checkpointingenhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions