You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -44,22 +47,80 @@ The middleware operates on two levels during each agent run:
44
47
45
48
**Tool output truncation**: When `max_tool_output_tokens` is set, the middleware intercepts tool results via the `after_tool_call` hook and truncates any output that exceeds the token limit, keeping configurable head and tail lines.
46
49
50
+
**Message persistence**: When `messages_path` is set, all messages are saved to a JSON file on every history processor call. This provides a permanent, uncompressed record of the full conversation — ideal for session resume.
51
+
47
52
## Parameters
48
53
49
54
| Parameter | Type | Default | Description |
50
55
|-----------|------|---------|-------------|
51
-
|`max_tokens`|`int`|`200_000`| Maximum token budget for the conversation |
56
+
|`max_tokens`|`int \| None`|`None`| Maximum token budget. `None` auto-detects from genai-prices (falls back to 200,000) |
57
+
|`model_name`|`str \| None`|`None`| Model name for auto-detecting `max_tokens` (e.g., `"openai:gpt-4.1"`) |
52
58
|`compress_threshold`|`float`|`0.9`| Fraction of `max_tokens` at which auto-compression triggers (0.0, 1.0]|
53
-
|`keep`|`ContextSize`|`("messages", 20)`| How much context to retain after compression |
59
+
|`keep`|`ContextSize`|`("messages", 0)`| How much context to retain after compression. `0` = only summary survives|
54
60
|`summarization_model`|`str`|`"openai:gpt-4.1-mini"`| Model used for generating summaries |
55
-
|`token_counter`|`TokenCounter`|`count_tokens_approximately`| Function to count tokens in messages|
61
+
|`token_counter`|`TokenCounter`|`count_tokens_approximately`| Function to count tokens (sync or async)|
56
62
|`summary_prompt`|`str`|`DEFAULT_SUMMARY_PROMPT`| Prompt template for summary generation |
57
63
|`trim_tokens_to_summarize`|`int`|`4000`| Max tokens to include when generating the summary |
58
64
|`max_input_tokens`|`int \| None`|`None`| Model max input tokens (required for fraction-based keep) |
# Re-inject instructions that must survive compression
116
+
returnCRITICAL_INSTRUCTIONS
117
+
118
+
middleware = create_context_manager_middleware(
119
+
on_after_compress=on_after_compress,
120
+
)
121
+
```
122
+
123
+
This is inspired by Claude Code's SessionStart hook with compact matcher — ensures critical rules survive context compression.
63
124
64
125
## UsageCallback
65
126
@@ -77,6 +138,43 @@ The callback receives three arguments:
77
138
78
139
Both sync and async callables are supported. If the callable returns an awaitable, it will be awaited automatically.
79
140
141
+
## Message Persistence
142
+
143
+
When `messages_path` is set, all messages are written to a JSON file on every history processor call:
144
+
145
+
```python
146
+
middleware = create_context_manager_middleware(
147
+
messages_path="/tmp/session/messages.json",
148
+
)
149
+
```
150
+
151
+
The file contains the full, uncompressed conversation history. On compression, the summary message is appended — the file is always the permanent record.
152
+
153
+
To resume a session, load the file and pass it as `message_history`:
154
+
155
+
```python
156
+
from pathlib import Path
157
+
from pydantic_ai.messages import ModelMessagesTypeAdapter
158
+
159
+
raw = Path("/tmp/session/messages.json").read_bytes()
160
+
history =list(ModelMessagesTypeAdapter.validate_json(raw))
161
+
result =await agent.run("Continue...", message_history=history)
162
+
```
163
+
164
+
## Guided Compaction
165
+
166
+
Both `compact()` and `request_compact()` accept a `focus` parameter to guide the summary:
167
+
168
+
```python
169
+
# Direct compaction (for CLI commands)
170
+
history =await middleware.compact(history, focus="Focus on the API design decisions")
171
+
172
+
# Request compaction on next __call__ (deferred)
173
+
middleware.request_compact(focus="Focus on the debugging session")
174
+
```
175
+
176
+
The focus string is appended to the summary prompt, telling the LLM what to prioritize in the summary.
177
+
80
178
## Basic Usage
81
179
82
180
```python
@@ -85,9 +183,8 @@ from pydantic_ai_middleware import MiddlewareAgent
85
183
from pydantic_ai_summarization import create_context_manager_middleware
0 commit comments