Releases: vstorm-co/summarization-pydantic-ai
Releases · vstorm-co/summarization-pydantic-ai
0.0.4
[0.0.4] - 2026-02-25
Added
on_before_compresscallback onContextManagerMiddleware— called with
(messages_to_discard, cutoff_index)before compression summarizes and discards
messages. Enables persistent history archival (e.g. save full conversation to
files before pruning).on_after_compresscallback — called with compressed messages after
compression. Return a string to re-inject it into context as aSystemPromptPart
(inspired by Claude Code's SessionStart hook with compact matcher).- Continuous message persistence via
messages_pathonContextManagerMiddleware—
every message (user input, agent responses, tool calls) is saved to a single
messages.jsonfile on every history processor call. On compression, the summary
is appended to the same file. The file is the permanent, uncompressed record of
the full conversation. Supports session resume (loads existing history on init). - Guided compaction —
_compress()and_create_summary()accept afocus
parameter (e.g., "Focus on the API changes") appended to the summary prompt. request_compact(focus)method — request manual compaction on the next
__call__, with optional focus instructions.compact(messages, focus)method — directly compact messages with LLM
summarization (for CLI/compactcommands).max_tokensauto-detection fromgenai-prices— whenmax_tokens=None
(the new default), the middleware resolves the model's context window
automatically viagenai-prices. Falls back to 200,000 if not found.resolve_max_tokens(model_name)function exported from the package —
standalone lookup of context windows from genai-prices.model_nameparameter onContextManagerMiddlewareand factory — used for
auto-detection ofmax_tokenswhen not explicitly set.- Async token counting —
TokenCountertype now accepts both sync and async
callables (Callable[..., int] | Callable[..., Awaitable[int]]). Enables use of
provider token counting APIs (e.g. Anthropic's/count_tokensendpoint) or
pydantic-ai'scount_tokens()method. (#6) async_count_tokens()helper function exported from the package.BeforeCompressCallback,AfterCompressCallbacktype aliases exported.messages_path,model_name,on_before_compress,on_after_compress
parameters added tocreate_context_manager_middleware()factory.- Examples — 6 runnable examples in
examples/covering all features:
auto-compression, persistence, callbacks, auto-detection, interactive chat,
standalone processors.
Changed
max_tokensdefault changed from200_000toNone(auto-detect from
genai-prices, fallback to 200,000).keepdefault changed from("messages", 20)to("messages", 0)—
on compression, only the LLM summary survives (like Claude Code). This produces
the most compact context after compression.- Validation now allows
0for messages/tokens keep and trigger values
(previously required > 0). Negative values are still rejected.
Dependencies
genai-pricesused for auto-detection of context windows (already a transitive
dependency via pydantic-ai-middleware).
0.0.3
[0.0.3] - 2025-02-15
Added
-
ContextManagerMiddleware - Dual-protocol middleware for real-time context management
- Acts as pydantic-ai
history_processorfor token tracking and auto-compression - Acts as pydantic-ai-middleware
AgentMiddlewarefor tool output truncation on_usage_updatecallback for real-time usage trackingmax_tool_output_tokensfor limiting individual tool output sizescreate_context_manager_middleware()factory function- Requires
hybridextra:pip install summarization-pydantic-ai[hybrid]
- Acts as pydantic-ai
-
Shared cutoff algorithms - New internal
_cutoff.pymodule extracted from processorsvalidate_context_size()- Context configuration validationshould_trigger()- Trigger condition evaluationdetermine_cutoff_index()- Retention-aware cutoff calculationfind_safe_cutoff()- Tool call/response pair preservationfind_token_based_cutoff()- Binary search token cutoffis_safe_cutoff_point()- Safety validation for cutoff pointsvalidate_triggers_and_keep()- Configuration normalization- Reduces code duplication between
SummarizationProcessorandSlidingWindowProcessor
-
ModelTypetype alias (str | Model | KnownModelName) exported from the package for convenience.
Changed
- Lightweight dependency: Replaced
pydantic-aiwithpydantic-ai-slimto avoid pulling in unnecessary model-specific SDKs (openai, anthropic, etc.). (#4) - Custom model support:
SummarizationProcessor.model,ContextManagerMiddleware.summarization_model, and factory functions now acceptstr | Model | KnownModelName— enabling custom providers like Azure OpenAI. (#3) - Code refactoring: Extracted common logic from
processor.pyandsliding_window.pyinto shared_cutoff.pymodule - README: Updated with ContextManagerMiddleware, hybrid extra, and new features
Dependencies
- Added
hybridextra:pydantic-ai-middleware>=0.2.0(optional) pydantic-ai-middlewareadded to dev dependencies- Replaced
pydantic-ai>=0.1.0withpydantic-ai-slim>=0.1.0
0.0.2
[0.0.2] - 2025-01-22
Changed
- README: Complete rewrite with centered header, badges, Use Cases table, and vstorm-co branding
- Documentation: Updated styling to match pydantic-deep pink theme
- Inter font for text, JetBrains Mono for code
- Pink accent color scheme
- Custom CSS and announcement bar
- mkdocs.yml: Updated with full Material theme configuration
Added
- Custom Styling: docs/overrides/main.html, docs/stylesheets/extra.css
- Abbreviations: docs/includes/abbreviations.md for markdown expansions
0.0.1-fix
[0.0.1] - 2025-01-20
Added
-
SummarizationProcessor - History processor that uses LLM to intelligently summarize older messages when context limits are reached
- Configurable triggers: message count, token count, or fraction of context window
- Configurable retention: keep last N messages, tokens, or fraction
- Custom token counter support
- Custom summary prompt support
- Safe cutoff detection - never splits tool call/response pairs
-
SlidingWindowProcessor - Zero-cost history processor that simply discards old messages
- Same trigger and retention options as SummarizationProcessor
- No LLM calls - instant, deterministic processing
- Ideal for high-throughput scenarios
-
Factory functions for convenient processor creation:
create_summarization_processor()- with sensible defaultscreate_sliding_window_processor()- with sensible defaults
-
Utility functions:
count_tokens_approximately()- heuristic token counter (~4 chars per token)format_messages_for_summary()- formats messages for LLM summarization
-
Type definitions:
ContextSize- union type for trigger/keep configurationContextFraction,ContextTokens,ContextMessages- specific context size typesTokenCounter- callable type for custom token counters
-
Documentation:
- Full MkDocs documentation with Material theme
- Concepts, examples, and API reference
- Integration examples with pydantic-ai