Skip to content

AgentCoreMemorySaver: Add "end-of-workflow" checkpoint mode to reduce API call overhead #806

@kamrul-finlexgmbh

Description

@kamrul-finlexgmbh

Package

langgraph-checkpoint-aws

Checked other resources

  • I added a descriptive title to this issue
  • I searched the LangChain documentation with the integrated search
  • I used the GitHub search to find a similar issue and didn't find it
  • I am sure this is a feature request and not a bug report or question

Feature Description

Add a configuration option to AgentCoreMemorySaver that allows checkpointing only at the end of workflow execution, instead of after every graph node (super-step).

Currently, LangGraph checkpoints after every node, which results in a large number of API calls to AgentCore Memory.

For a typical 6-node conversational workflow with tool calling, this generates 62 API calls (49 createEvent + 13 listEvents) adding ~8.7 seconds of latency per user request.

Use Case

We're building a real-time conversational agent using LangGraph with AgentCoreMemorySaver for session persistence. Our workflow:

User Message → LLM → Tool Call → LLM → Tool Call → LLM → Response
Current behavior: Checkpoint saved after each of the 6 nodes = 62 API calls = 8.7s overhead Desired behavior: Checkpoint saved once at the end = 2 API calls = ~300ms overhead For our use case:

  • We don't need mid-workflow fault tolerance or recovery
  • We only need the final conversation state persisted for session continuity
  • Response latency is critical for user experience

Proposed Implementation (optional)

Add a checkpoint_mode parameter to AgentCoreMemorySaver:

from langgraph_checkpoint_aws import AgentCoreMemorySaver

# Current behavior (default)
checkpointer = AgentCoreMemorySaver(
    MEMORY_ID,
    region_name="us-east-1"
)

# New: checkpoint only at end of workflow
checkpointer = AgentCoreMemorySaver(
    MEMORY_ID,
    region_name="us-east-1",
    checkpoint_mode="end_of_workflow"  # or "deferred" / "batch"
)

Implementation options:
1. Buffer writes internally - Override put() and put_writes() to buffer checkpoint data, then flush only when the graph execution completes
2. Expose LangGraph's checkpoint hooks - If LangGraph supports conditional checkpointing, expose that configuration
3. Add a flush() method - Let users manually control when checkpoints are written

Additional Context

Additional Context
Performance impact:

  • Current (every node): 62 API calls, ~8.7s latency overhead
  • End-of-workflow: 2 API calls, ~300ms latency overhead

Breakdown of current API calls:

  • createEvent: 49 calls × ~150ms average = 4.1s total
  • listEvents: 13 calls × ~350ms average = 4.6s total

Environment:

  • langgraph-checkpoint-aws version: latest
  • LangGraph workflow: 6 nodes (3 LLM calls, 2 tool executions, 1 conditional)
  • Region: eu-west-1

Related: LangGraph checkpointing docs indicate per-node checkpointing is by design for fault tolerance, but many real-time use cases don't require mid-workflow recovery. No existing configuration found in:

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions