Automatic Conversation Summarization and History Management
Intelligent Summarization — LLM-powered context compression • Sliding Window — zero-cost message trimming • Context Manager — real-time token tracking + tool truncation • Safe Cutoff — preserves tool call pairs
Context Management for Pydantic AI helps your Pydantic AI agents handle long conversations without exceeding model context limits. Choose between intelligent LLM summarization or fast sliding window trimming.
Full framework? Check out Pydantic Deep Agents — complete agent framework with planning, filesystem, subagents, and skills.
| What You Want to Build | How This Library Helps |
|---|---|
| Long-Running Agent | Automatically compress history when context fills up |
| Customer Support Bot | Preserve key details while discarding routine exchanges |
| Code Assistant | Keep recent code context, summarize older discussions |
| High-Throughput App | Zero-cost sliding window for maximum speed |
| Cost-Sensitive App | Choose between quality (summarization) or free (sliding window) |
pip install summarization-pydantic-aiOr with uv:
uv add summarization-pydantic-aiFor accurate token counting:
pip install summarization-pydantic-ai[tiktoken]For real-time token tracking and tool output truncation:
pip install summarization-pydantic-ai[hybrid]from pydantic_ai import Agent
from pydantic_ai_summarization import create_summarization_processor
processor = create_summarization_processor(
trigger=("tokens", 100000),
keep=("messages", 20),
)
agent = Agent(
"openai:gpt-4o",
history_processors=[processor],
)
result = await agent.run("Hello!")That's it. Your agent now:
- Monitors conversation size on every turn
- Summarizes older messages when limits are reached
- Preserves tool call/response pairs (never breaks them)
- Keeps recent context intact
| Processor | LLM Cost | Latency | Context Preservation |
|---|---|---|---|
SummarizationProcessor |
High | High | Intelligent summary |
SlidingWindowProcessor |
Zero | ~0ms | Discards old messages |
ContextManagerMiddleware |
Per compression | Low tracking / High compression | Intelligent summary |
Uses an LLM to create summaries of older messages:
from pydantic_ai_summarization import create_summarization_processor
processor = create_summarization_processor(
trigger=("tokens", 100000), # When to summarize
keep=("messages", 20), # What to keep
)Simply discards old messages — no LLM calls:
from pydantic_ai_summarization import create_sliding_window_processor
processor = create_sliding_window_processor(
trigger=("messages", 100), # When to trim
keep=("messages", 50), # What to keep
)Dual-protocol middleware combining token tracking, auto-compression, and tool output truncation:
from pydantic_ai import Agent
from pydantic_ai_summarization import create_context_manager_middleware
middleware = create_context_manager_middleware(
max_tokens=200_000,
compress_threshold=0.9,
on_usage_update=lambda pct, cur, mx: print(f"{pct:.0%} used ({cur:,}/{mx:,})"),
)
agent = Agent(
"openai:gpt-4o",
history_processors=[middleware],
)Requires pip install summarization-pydantic-ai[hybrid]
| Type | Example | Description |
|---|---|---|
messages |
("messages", 50) |
Trigger when message count exceeds threshold |
tokens |
("tokens", 100000) |
Trigger when token count exceeds threshold |
fraction |
("fraction", 0.8) |
Trigger at percentage of max_input_tokens |
| Type | Example | Description |
|---|---|---|
messages |
("messages", 20) |
Keep last N messages |
tokens |
("tokens", 10000) |
Keep last N tokens worth |
fraction |
("fraction", 0.2) |
Keep last N% of context |
from pydantic_ai_summarization import SummarizationProcessor
processor = SummarizationProcessor(
model="openai:gpt-4o",
trigger=[
("messages", 50), # OR 50+ messages
("tokens", 100000), # OR 100k+ tokens
],
keep=("messages", 10),
)processor = SummarizationProcessor(
model="openai:gpt-4o",
trigger=("fraction", 0.8), # 80% of context window
keep=("fraction", 0.2), # Keep last 20%
max_input_tokens=128000, # GPT-4's context window
)def my_token_counter(messages):
return sum(len(str(msg)) for msg in messages) // 4
processor = create_summarization_processor(
token_counter=my_token_counter,
)from pydantic_ai.models.openai import OpenAIModel
from pydantic_ai.providers.openai import OpenAIProvider
from pydantic_ai_summarization import create_summarization_processor
azure_model = OpenAIModel(
"gpt-4o",
provider=OpenAIProvider(
base_url="https://my-resource.openai.azure.com/openai/deployments/gpt-4o",
api_key="your-azure-api-key",
),
)
processor = create_summarization_processor(
model=azure_model,
trigger=("tokens", 100000),
keep=("messages", 20),
)processor = create_summarization_processor(
summary_prompt="""
Extract key information from this conversation.
Focus on: decisions made, code written, pending tasks.
Conversation:
{messages}
""",
)| Feature | Description |
|---|---|
| Two Strategies | Intelligent summarization or fast sliding window |
| Flexible Triggers | Message count, token count, or fraction-based |
| Safe Cutoff | Never breaks tool call/response pairs |
| Custom Counters | Bring your own token counting logic |
| Custom Prompts | Control how summaries are generated |
| Token Tracking | Real-time usage monitoring with callbacks |
| Tool Truncation | Automatic truncation of large tool outputs |
| Custom Models | Use any pydantic-ai Model (Azure, custom providers) |
| Lightweight | Only requires pydantic-ai-slim (no extra model SDKs) |
| Package | Description |
|---|---|
| Pydantic Deep Agents | Full agent framework (uses this library) |
| pydantic-ai-backend | File storage and Docker sandbox |
| pydantic-ai-todo | Task planning toolset |
| subagents-pydantic-ai | Multi-agent orchestration |
| pydantic-ai | The foundation — agent framework by Pydantic |
git clone https://github.com/vstorm-co/summarization-pydantic-ai.git
cd summarization-pydantic-ai
make install
make test # 100% coverage requiredMIT — see LICENSE
Built with ❤️ by vstorm-co