Skip to content

Conversation

@zastrowm
Copy link
Member

Description

Unblock customers who would like to explicitly control snapshots of an agent. Related issue: strands-agents/sdk-python#1138


By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@strands-agent
Copy link
Contributor

Documentation Deployment Complete

Your documentation preview has been successfully deployed!

Preview URL: https://d3ehv1nix5p99z.cloudfront.net/pr-497/


# Save to file
snapshot = agent.save_snapshot(metadata={"user_id": "123"})
with open("checkpoint.json", "w") as f:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should keep our terminology consistent. I'm fine using checkpoint.json to match yours, but would you be open to snapshot.json instead?

  • Snapshot: Replica of the serialized version of an agent / multi-agent
  • Transcript: Append-only historical Message history (Extendable in future)
  • Checkpoint: Abstract concept represents the lifecycle moment where we create a Snapshot artifact

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep; this should have been snapshot.json; updated.

In this case the filename is just an example and would be caller determined; but the example should match the api name so updated to snapshot.json

@strands-agent
Copy link
Contributor

Documentation Deployment Complete

Your documentation preview has been successfully deployed!

Preview URL: https://d3ehv1nix5p99z.cloudfront.net/pr-497/

JackYPCOnline
JackYPCOnline previously approved these changes Jan 28, 2026

### Future Concerns

- Snapshotting for MultiAgent constructs: This proposal would
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this line is unfinished


### Behavior

Snapshots capture **agent state** (data), not **runtime behavior** (code):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this, it is not easy to capture and persist code, and I dont think strands should try to do this.

However, we should explore how one would restore an agent from a snapshot, and load lets say tools back into the agent after persisting it. I would like to see an example devex of what this looks like.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I view the tool state as a feature that we'd be adding to the agent to make "enabled" tools into a state on the agent. So, if we had that I imagine it would be something like:

agent = Agent(tools=[tool1, tool2, tool3, tool4], enabled_tools=["tool1"])

Where only tool1 would be enabled/available on the agent. Then to enable other tools something would eventually trigger:

agent.enabled_tools = ["tool1", "tool3"]

and for restoring an agent with specific tools, it would be the same as

agent2 = Agent(tools=[tool3, tool4])
agent2.load_snapshot(snapshot)

and the snapshot would be restoring the enabled_tools state back into the agent.


### Contract

- **`metadata`** — Caller-owned. Strands does not read, modify, or manage this field. Use it to store checkpoint labels, timestamps, or any application-specific data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this could be used to store metadata about certain tools that were attached to an agent before persisting, and then loading those tools back?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could if the application wanted to do that; it could also be date/time, a "name" for the snapshot, or other application specific metadata.

The intent here is to allow applications to include data that strands isn't managing. So that if they chose to just serialize the session to disk, they wouldn't - for example - need to store another file for metadata associated with it.

@strands-agent
Copy link
Contributor

Documentation Deployment Complete

Your documentation preview has been successfully deployed!

Preview URL: https://d3ehv1nix5p99z.cloudfront.net/pr-497/

### API Changes

```python
class Snapshot:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we have this as TypedDict so that it can be serialized easily? I saw in DevX that we need to explicitly call asdict

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either TypedDict or implementing json serialization explicitly yeah; will update the example(s)

Snapshots capture **agent state** (data), not **runtime behavior** (code):

- **Agent State** — Data persisted as part of session-management: conversation messages, context, and other JSON-serializable data. This is what snapshots save and restore.
- **Runtime Behavior** — Configuration that defines how the agent operates: model, tools, ConversationManager, etc. These are *not* included in snapshots and must be set separately when creating or restoring an agent.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we allow these components to expose "snapshot-able data"? e.g. I am a conv manager developer, I want my data to be restored with snapshots

What's the recommendation? Keeping that data in agent state?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the recommendation? Keeping that data in agent state?

Yeah; the recommendation is agent state

Do we allow these components to expose "snapshot-able data"?

It should be Agent State (AgentState directly; or if we're missing something, an equivalent thereof). The idea that I'm trying to get across in this section is "Snapshots do not represent anything other than what already exists in agent state/session-management, it just provides a more direct api to control it".

agent2 = Agent(tools=[tool3, tool4])
agent2.load_snapshot(snapshot)
result2 = agent2("What is the weather?")
# Compare result1 and result2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could use couple more sentences here to see what the expectation is, and maybe do we want to enforce tools?

For example, if you restore a list of messages with a toolset of (1,2) to an agent with toolset of (3,4); you are a lot more likely to get hallucinations. The agent tries to follow the examples in message history, as you essentially turn your context into "few-shot"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update it to better illustrate that it's evaluating the result of result1 and result2.

and maybe do we want to enforce tools?

...if you restore a list of messages with a toolset of (1,2) to an agent with toolset of (3,4); you are a lot more likely to get hallucinations.

ACK that this is a concern, but IMO this is not the goal of the snapshot api. Snapshots are intended to only save/load the agent state - transformation of state or normalizing would be something that could be built on top of the low level primitive. If I "resume" an agent from a snapshot or a session-management, it shouldn't be doing any conversion/munging of behavior on the way in or out.

@strands-agent
Copy link
Contributor

Documentation Deployment Complete

Your documentation preview has been successfully deployed!

Preview URL: https://d3ehv1nix5p99z.cloudfront.net/pr-497/


Further justification: these three properties are also what SessionManagement persists today, so this API aligns with existing behavior.

**Open question:** Is this the right boundary? Are there other properties that should be considered "evolving state"?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be the right boundary, but I want to understand the devex a bit more for restoring "Agent Definition" after loading a snapshot. I get you can do this:

agent = Agent(tools=[tool1, tool2])
snapshot = agent.save_snapshot()

result = agent("What is the weather?")

# ...

agent = Agent(tools=[tool1, tool2])
agent.load_snapshot(snapshot)

Im thinking about defining custom serializers and deserializers you can pass into save_snapshot and load_snapshot, but I guess that doesnt really make sense since the customer can do that themselves anyway today like this:

snapshot = custom_serializer(agent)

agent = custom_deserializer(agent)

Maybe this is where AgentConfig comes in to save the day?

agent = Agent.from_config(config)
agent.load_snapshot(snapshot)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants