feat: Implement hybrid token-based conversation history system #22

doriwal · 2025-09-11T10:01:23Z

Summary

Implemented comprehensive token-based conversation history management that respects both record count and token limits (50K tokens max). The system uses a hybrid approach with efficient two-level filtering for optimal performance.

Key Features Added

1. Token Calculation & Storage

Added tokens field to ConversationRecord model for storing combined input+output token count
Created token_utils.py with token calculation utilities (1 token ≈ 4 characters)
Automatic token calculation and storage on every record save

2. Hybrid Database Cleanup (Save-time)

Enhanced _cleanup_old_messages() with efficient two-step process:
1. If record count > max_records, remove 1 oldest record (since we add one-by-one)
2. If total tokens > 50K, remove oldest records until within limit
Maintains both record count (20) AND token limits (50K) in persistent storage
Sessions can have fewer than 20 records if they contain large records

3. LLM Context Filtering (Load-time)

Updated load_context_for_enrichment() to filter history for LLM context
Ensures history + current prompt fits within token limits
Filters in-memory list without modifying database
Two-level approach: DB enforces storage limits, load enforces LLM context limits

4. Constants & Configuration

Added MAX_CONTEXT_TOKENS = 50000 constant
Token limit integrated into filtering utilities for consistent usage

Files Modified

Core Implementation

src/mcp_as_a_judge/constants.py - Added MAX_CONTEXT_TOKENS constant
src/mcp_as_a_judge/db/interface.py - Added tokens field to ConversationRecord
src/mcp_as_a_judge/db/providers/sqlite_provider.py - Enhanced with hybrid cleanup logic
src/mcp_as_a_judge/db/conversation_history_service.py - Updated load logic for LLM context

New Utilities

src/mcp_as_a_judge/utils/__init__.py - Created utils package
src/mcp_as_a_judge/utils/token_utils.py - Token calculation and filtering utilities

Comprehensive Testing

tests/test_token_based_history.py - New comprehensive test suite (10 tests)
tests/test_conversation_history_lifecycle.py - Enhanced existing tests with token verification

Technical Improvements

Performance Optimizations

Simplified record count cleanup to remove exactly 1 record (matches one-by-one addition pattern)
Removed unnecessary parameter passing (limit=None) using method defaults
Efficient two-step cleanup process instead of recalculating everything

Architecture Benefits

Write Heavy, Read Light: Enforce constraints at save time, simplify loads
Two-level filtering: Storage limits vs LLM context limits serve different purposes
FIFO consistency: Oldest records removed first in both cleanup phases
Hybrid approach: Respects whichever limit (record count or tokens) is more restrictive

Test Coverage

✅ Token calculation accuracy (1 token ≈ 4 characters)
✅ Database token storage and retrieval
✅ Record count limit enforcement
✅ Token limit enforcement with FIFO removal
✅ Hybrid behavior (record vs token limits)
✅ Mixed record sizes handling
✅ Edge cases and error conditions
✅ Integration with existing lifecycle tests
✅ Database cleanup during save operations
✅ LLM context filtering during load operations

Backward Compatibility

All existing functionality preserved
Existing tests continue to pass
Database schema extended (not breaking)
API remains the same for consumers

Usage Example

# System automatically handles both limits:
service = ConversationHistoryService(config)

# Save: Enforces storage limits (record count + tokens)
await service.save_tool_interaction(session_id, tool, input, output)

# Load: Filters for LLM context (history + prompt ≤ 50K tokens)
context = await service.load_context_for_enrichment(session_id)

The implementation provides a robust, efficient, and well-tested foundation for token-aware conversation history management.

Description

Brief description of the changes in this PR.

Type of Change

🐛 Bug fix (non-breaking change which fixes an issue)
✨ New feature (non-breaking change which adds functionality)
💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
📚 Documentation update
🔧 Maintenance/refactoring
⚡ Performance improvement
🧪 Test improvement

Testing

Tests pass locally with my changes
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
Any dependent changes have been merged and published in downstream modules

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
Any dependent changes have been merged and published in downstream modules
I have checked my code and corrected any misspellings

Related Issues

Closes #(issue number)

Screenshots (if applicable)

Additional Notes

Any additional information that reviewers should know.

## Summary Implemented comprehensive token-based conversation history management that respects both record count and token limits (50K tokens max). The system uses a hybrid approach with efficient two-level filtering for optimal performance. ## Key Features Added ### 1. Token Calculation & Storage - Added `tokens` field to ConversationRecord model for storing combined input+output token count - Created `token_utils.py` with token calculation utilities (1 token ≈ 4 characters) - Automatic token calculation and storage on every record save ### 2. Hybrid Database Cleanup (Save-time) - Enhanced `_cleanup_old_messages()` with efficient two-step process: 1. If record count > max_records, remove 1 oldest record (since we add one-by-one) 2. If total tokens > 50K, remove oldest records until within limit - Maintains both record count (20) AND token limits (50K) in persistent storage - Sessions can have fewer than 20 records if they contain large records ### 3. LLM Context Filtering (Load-time) - Updated `load_context_for_enrichment()` to filter history for LLM context - Ensures history + current prompt fits within token limits - Filters in-memory list without modifying database - Two-level approach: DB enforces storage limits, load enforces LLM context limits ### 4. Constants & Configuration - Added `MAX_CONTEXT_TOKENS = 50000` constant - Token limit integrated into filtering utilities for consistent usage ## Files Modified ### Core Implementation - `src/mcp_as_a_judge/constants.py` - Added MAX_CONTEXT_TOKENS constant - `src/mcp_as_a_judge/db/interface.py` - Added tokens field to ConversationRecord - `src/mcp_as_a_judge/db/providers/sqlite_provider.py` - Enhanced with hybrid cleanup logic - `src/mcp_as_a_judge/db/conversation_history_service.py` - Updated load logic for LLM context ### New Utilities - `src/mcp_as_a_judge/utils/__init__.py` - Created utils package - `src/mcp_as_a_judge/utils/token_utils.py` - Token calculation and filtering utilities ### Comprehensive Testing - `tests/test_token_based_history.py` - New comprehensive test suite (10 tests) - `tests/test_conversation_history_lifecycle.py` - Enhanced existing tests with token verification ## Technical Improvements ### Performance Optimizations - Simplified record count cleanup to remove exactly 1 record (matches one-by-one addition pattern) - Removed unnecessary parameter passing (limit=None) using method defaults - Efficient two-step cleanup process instead of recalculating everything ### Architecture Benefits - **Write Heavy, Read Light**: Enforce constraints at save time, simplify loads - **Two-level filtering**: Storage limits vs LLM context limits serve different purposes - **FIFO consistency**: Oldest records removed first in both cleanup phases - **Hybrid approach**: Respects whichever limit (record count or tokens) is more restrictive ## Test Coverage - ✅ Token calculation accuracy (1 token ≈ 4 characters) - ✅ Database token storage and retrieval - ✅ Record count limit enforcement - ✅ Token limit enforcement with FIFO removal - ✅ Hybrid behavior (record vs token limits) - ✅ Mixed record sizes handling - ✅ Edge cases and error conditions - ✅ Integration with existing lifecycle tests - ✅ Database cleanup during save operations - ✅ LLM context filtering during load operations ## Backward Compatibility - All existing functionality preserved - Existing tests continue to pass - Database schema extended (not breaking) - API remains the same for consumers ## Usage Example ```python # System automatically handles both limits: service = ConversationHistoryService(config) # Save: Enforces storage limits (record count + tokens) await service.save_tool_interaction(session_id, tool, input, output) # Load: Filters for LLM context (history + prompt ≤ 50K tokens) context = await service.load_context_for_enrichment(session_id) ``` The implementation provides a robust, efficient, and well-tested foundation for token-aware conversation history management.

…according

doriwal · 2025-09-11T11:48:05Z

src/mcp_as_a_judge/server.py

        messages=messages,
        ctx=ctx,
-        max_tokens=5000,
+        max_tokens=MAX_CONTEXT_TOKENS,


This it valid @zvi ?

@doriwal - it should be MAX_TOKENS, this field is use for the output tokens (not related to this task)

OtherVibes · 2025-09-11T12:06:24Z

src/mcp_as_a_judge/db/token_utils.py

this is less recommended
I suggest: https://docs.litellm.ai/docs/completion/token_usage#3-token_counter
If the client provide LLM_API_KEY we can get the model name.
Else, if it uses sampling, it's a bit more tricky:

result = await ctx.session.create_message( messages=[SamplingMessage(role="user", content=TextContent(type="text", text=prompt))], max_tokens=100, ) # Cache the model name for token counting model_name = result.model

feat: implement dynamic token limits with model-specific context management This commit introduces a comprehensive token management system that replaces hardcoded limits with dynamic, model-specific token limits while maintaining backward compatibility. ## Key Features Added: ### Dynamic Token Limits (NEW) - `src/mcp_as_a_judge/db/dynamic_token_limits.py`: New module providing model-specific token limits with LiteLLM integration - Initialization pattern: start with hardcoded defaults, upgrade from cache or LiteLLM API if available, return whatever is available - Caching system to avoid repeated API calls for model information ### Enhanced Token Calculation - `src/mcp_as_a_judge/db/token_utils.py`: Upgraded to async functions with accurate LiteLLM token counting and character-based fallback - Unified model detection from LLM config or MCP sampling context - Functions: `calculate_tokens_in_string`, `calculate_tokens_in_record`, `filter_records_by_token_limit` (all now async) ### Two-Level Token Management - **Database Level**: Storage limits enforced during save operations - Record count limit (20 per session) - Token count limit (dynamic based on model, fallback to 50K) - LRU session cleanup (50 total sessions max) - **Load Level**: LLM context limits enforced during retrieval - Ensures history + current prompt fits within model's input limit - FIFO removal of oldest records when limits exceeded ### Updated Service Layer - `src/mcp_as_a_judge/db/conversation_history_service.py`: Added await for async token filtering function - `src/mcp_as_a_judge/db/providers/sqlite_provider.py`: Integrated dynamic token limits in cleanup operations ### Test Infrastructure - `tests/test_helpers/`: New test utilities package - `tests/test_helpers/token_utils_helpers.py`: Helper functions for token calculation testing and model cache management - `tests/test_improved_token_counting.py`: Comprehensive async test suite - Updated existing tests to support async token functions ## Implementation Details: ### Model Detection Strategy: 1. Try LLM configuration (fast, synchronous) 2. Try MCP sampling detection (async, requires context) 3. Fallback to None with hardcoded limits ### Token Limit Logic: - **On Load**: Check total history + current prompt tokens against model max input - **On Save**: Two-step cleanup (record count limit, then token limit) - **FIFO Removal**: Always remove oldest records first to preserve recent context ### Backward Compatibility: - All existing method signatures preserved with alias support - Graceful fallback when model information unavailable - No breaking changes to existing functionality ## Files Changed: - Modified: 5 core files (service, provider, token utils, server) - Added: 3 new files (dynamic limits, test helpers) - Enhanced: 2 test files with async support ## Testing: - All 160 tests pass (1 skipped for integration-only) - Comprehensive coverage of token calculation, limits, and cleanup logic - Edge cases and error handling verified This implementation follows the user's preferred patterns: - Configuration-based approach with rational fallbacks - Clean separation of concerns between storage and LLM limits - Efficient FIFO cleanup maintaining recent conversation context

doriwal requested a review from OtherVibes as a code owner September 11, 2025 10:01

dori added 5 commits September 11, 2025 13:10

feat: during load only verify max token limit and filter old records …

475da21

…according

feat: refactor ai code

f65f622

feat: refactor ai code

9c2f5bb

feat: fix error

9ad9634

feat: cleanup

a5927f4

doriwal commented Sep 11, 2025

View reviewed changes

OtherVibes assigned OtherVibes and doriwal and unassigned OtherVibes Sep 11, 2025

OtherVibes reviewed Sep 11, 2025

View reviewed changes

dori added 9 commits September 11, 2025 15:17

feat: fix response token

24f9cc7

feat: fix build

3160bd1

feat: try to fix build

c6bac7b

feat: try to fix build

91cb5ec

feat: try to fix build

c087999

feat: try to fix build

1fd2a73

feat: try to fix build

07b1b50

feat: fix build

2772d2b

OtherVibes approved these changes Sep 11, 2025

View reviewed changes

OtherVibes merged commit 3b73bbc into main Sep 11, 2025
9 checks passed

OtherVibes deleted the feat/remain-history-by-token branch September 11, 2025 22:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Implement hybrid token-based conversation history system #22

feat: Implement hybrid token-based conversation history system #22

Uh oh!

doriwal commented Sep 11, 2025 •

edited

Loading

Uh oh!

doriwal Sep 11, 2025

Uh oh!

OtherVibes Sep 11, 2025

Uh oh!

OtherVibes Sep 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: Implement hybrid token-based conversation history system #22

feat: Implement hybrid token-based conversation history system #22

Uh oh!

Conversation

doriwal commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Features Added

1. Token Calculation & Storage

2. Hybrid Database Cleanup (Save-time)

3. LLM Context Filtering (Load-time)

4. Constants & Configuration

Files Modified

Core Implementation

New Utilities

Comprehensive Testing

Technical Improvements

Performance Optimizations

Architecture Benefits

Test Coverage

Backward Compatibility

Usage Example

Description

Type of Change

Testing

Checklist

Related Issues

Screenshots (if applicable)

Additional Notes

Uh oh!

doriwal Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

OtherVibes Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

OtherVibes Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

doriwal commented Sep 11, 2025 •

edited

Loading