-
Notifications
You must be signed in to change notification settings - Fork 5
feat: Implement hybrid token-based conversation history system #22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
## Summary Implemented comprehensive token-based conversation history management that respects both record count and token limits (50K tokens max). The system uses a hybrid approach with efficient two-level filtering for optimal performance. ## Key Features Added ### 1. Token Calculation & Storage - Added `tokens` field to ConversationRecord model for storing combined input+output token count - Created `token_utils.py` with token calculation utilities (1 token ≈ 4 characters) - Automatic token calculation and storage on every record save ### 2. Hybrid Database Cleanup (Save-time) - Enhanced `_cleanup_old_messages()` with efficient two-step process: 1. If record count > max_records, remove 1 oldest record (since we add one-by-one) 2. If total tokens > 50K, remove oldest records until within limit - Maintains both record count (20) AND token limits (50K) in persistent storage - Sessions can have fewer than 20 records if they contain large records ### 3. LLM Context Filtering (Load-time) - Updated `load_context_for_enrichment()` to filter history for LLM context - Ensures history + current prompt fits within token limits - Filters in-memory list without modifying database - Two-level approach: DB enforces storage limits, load enforces LLM context limits ### 4. Constants & Configuration - Added `MAX_CONTEXT_TOKENS = 50000` constant - Token limit integrated into filtering utilities for consistent usage ## Files Modified ### Core Implementation - `src/mcp_as_a_judge/constants.py` - Added MAX_CONTEXT_TOKENS constant - `src/mcp_as_a_judge/db/interface.py` - Added tokens field to ConversationRecord - `src/mcp_as_a_judge/db/providers/sqlite_provider.py` - Enhanced with hybrid cleanup logic - `src/mcp_as_a_judge/db/conversation_history_service.py` - Updated load logic for LLM context ### New Utilities - `src/mcp_as_a_judge/utils/__init__.py` - Created utils package - `src/mcp_as_a_judge/utils/token_utils.py` - Token calculation and filtering utilities ### Comprehensive Testing - `tests/test_token_based_history.py` - New comprehensive test suite (10 tests) - `tests/test_conversation_history_lifecycle.py` - Enhanced existing tests with token verification ## Technical Improvements ### Performance Optimizations - Simplified record count cleanup to remove exactly 1 record (matches one-by-one addition pattern) - Removed unnecessary parameter passing (limit=None) using method defaults - Efficient two-step cleanup process instead of recalculating everything ### Architecture Benefits - **Write Heavy, Read Light**: Enforce constraints at save time, simplify loads - **Two-level filtering**: Storage limits vs LLM context limits serve different purposes - **FIFO consistency**: Oldest records removed first in both cleanup phases - **Hybrid approach**: Respects whichever limit (record count or tokens) is more restrictive ## Test Coverage - ✅ Token calculation accuracy (1 token ≈ 4 characters) - ✅ Database token storage and retrieval - ✅ Record count limit enforcement - ✅ Token limit enforcement with FIFO removal - ✅ Hybrid behavior (record vs token limits) - ✅ Mixed record sizes handling - ✅ Edge cases and error conditions - ✅ Integration with existing lifecycle tests - ✅ Database cleanup during save operations - ✅ LLM context filtering during load operations ## Backward Compatibility - All existing functionality preserved - Existing tests continue to pass - Database schema extended (not breaking) - API remains the same for consumers ## Usage Example ```python # System automatically handles both limits: service = ConversationHistoryService(config) # Save: Enforces storage limits (record count + tokens) await service.save_tool_interaction(session_id, tool, input, output) # Load: Filters for LLM context (history + prompt ≤ 50K tokens) context = await service.load_context_for_enrichment(session_id) ``` The implementation provides a robust, efficient, and well-tested foundation for token-aware conversation history management.
src/mcp_as_a_judge/server.py
Outdated
| messages=messages, | ||
| ctx=ctx, | ||
| max_tokens=5000, | ||
| max_tokens=MAX_CONTEXT_TOKENS, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This it valid @zvi ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@doriwal - it should be MAX_TOKENS, this field is use for the output tokens (not related to this task)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is less recommended
I suggest: https://docs.litellm.ai/docs/completion/token_usage#3-token_counter
If the client provide LLM_API_KEY we can get the model name.
Else, if it uses sampling, it's a bit more tricky:
result = await ctx.session.create_message(
messages=[SamplingMessage(role="user", content=TextContent(type="text", text=prompt))],
max_tokens=100,
)
# Cache the model name for token counting
model_name = result.model
feat: implement dynamic token limits with model-specific context management This commit introduces a comprehensive token management system that replaces hardcoded limits with dynamic, model-specific token limits while maintaining backward compatibility. ## Key Features Added: ### Dynamic Token Limits (NEW) - `src/mcp_as_a_judge/db/dynamic_token_limits.py`: New module providing model-specific token limits with LiteLLM integration - Initialization pattern: start with hardcoded defaults, upgrade from cache or LiteLLM API if available, return whatever is available - Caching system to avoid repeated API calls for model information ### Enhanced Token Calculation - `src/mcp_as_a_judge/db/token_utils.py`: Upgraded to async functions with accurate LiteLLM token counting and character-based fallback - Unified model detection from LLM config or MCP sampling context - Functions: `calculate_tokens_in_string`, `calculate_tokens_in_record`, `filter_records_by_token_limit` (all now async) ### Two-Level Token Management - **Database Level**: Storage limits enforced during save operations - Record count limit (20 per session) - Token count limit (dynamic based on model, fallback to 50K) - LRU session cleanup (50 total sessions max) - **Load Level**: LLM context limits enforced during retrieval - Ensures history + current prompt fits within model's input limit - FIFO removal of oldest records when limits exceeded ### Updated Service Layer - `src/mcp_as_a_judge/db/conversation_history_service.py`: Added await for async token filtering function - `src/mcp_as_a_judge/db/providers/sqlite_provider.py`: Integrated dynamic token limits in cleanup operations ### Test Infrastructure - `tests/test_helpers/`: New test utilities package - `tests/test_helpers/token_utils_helpers.py`: Helper functions for token calculation testing and model cache management - `tests/test_improved_token_counting.py`: Comprehensive async test suite - Updated existing tests to support async token functions ## Implementation Details: ### Model Detection Strategy: 1. Try LLM configuration (fast, synchronous) 2. Try MCP sampling detection (async, requires context) 3. Fallback to None with hardcoded limits ### Token Limit Logic: - **On Load**: Check total history + current prompt tokens against model max input - **On Save**: Two-step cleanup (record count limit, then token limit) - **FIFO Removal**: Always remove oldest records first to preserve recent context ### Backward Compatibility: - All existing method signatures preserved with alias support - Graceful fallback when model information unavailable - No breaking changes to existing functionality ## Files Changed: - Modified: 5 core files (service, provider, token utils, server) - Added: 3 new files (dynamic limits, test helpers) - Enhanced: 2 test files with async support ## Testing: - All 160 tests pass (1 skipped for integration-only) - Comprehensive coverage of token calculation, limits, and cleanup logic - Edge cases and error handling verified This implementation follows the user's preferred patterns: - Configuration-based approach with rational fallbacks - Clean separation of concerns between storage and LLM limits - Efficient FIFO cleanup maintaining recent conversation context
Summary
Implemented comprehensive token-based conversation history management that respects both record count and token limits (50K tokens max). The system uses a hybrid approach with efficient two-level filtering for optimal performance.
Key Features Added
1. Token Calculation & Storage
tokensfield to ConversationRecord model for storing combined input+output token counttoken_utils.pywith token calculation utilities (1 token ≈ 4 characters)2. Hybrid Database Cleanup (Save-time)
_cleanup_old_messages()with efficient two-step process:3. LLM Context Filtering (Load-time)
load_context_for_enrichment()to filter history for LLM context4. Constants & Configuration
MAX_CONTEXT_TOKENS = 50000constantFiles Modified
Core Implementation
src/mcp_as_a_judge/constants.py- Added MAX_CONTEXT_TOKENS constantsrc/mcp_as_a_judge/db/interface.py- Added tokens field to ConversationRecordsrc/mcp_as_a_judge/db/providers/sqlite_provider.py- Enhanced with hybrid cleanup logicsrc/mcp_as_a_judge/db/conversation_history_service.py- Updated load logic for LLM contextNew Utilities
src/mcp_as_a_judge/utils/__init__.py- Created utils packagesrc/mcp_as_a_judge/utils/token_utils.py- Token calculation and filtering utilitiesComprehensive Testing
tests/test_token_based_history.py- New comprehensive test suite (10 tests)tests/test_conversation_history_lifecycle.py- Enhanced existing tests with token verificationTechnical Improvements
Performance Optimizations
Architecture Benefits
Test Coverage
Backward Compatibility
Usage Example
The implementation provides a robust, efficient, and well-tested foundation for token-aware conversation history management.
Description
Brief description of the changes in this PR.
Type of Change
Testing
Checklist
Related Issues
Closes #(issue number)
Screenshots (if applicable)
Additional Notes
Any additional information that reviewers should know.