You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: Implement hybrid token-based conversation history system
## Summary
Implemented comprehensive token-based conversation history management that respects both record count and token limits (50K tokens max). The system uses a hybrid approach with efficient two-level filtering for optimal performance.
## Key Features Added
### 1. Token Calculation & Storage
- Added `tokens` field to ConversationRecord model for storing combined input+output token count
- Created `token_utils.py` with token calculation utilities (1 token ≈ 4 characters)
- Automatic token calculation and storage on every record save
### 2. Hybrid Database Cleanup (Save-time)
- Enhanced `_cleanup_old_messages()` with efficient two-step process:
1. If record count > max_records, remove 1 oldest record (since we add one-by-one)
2. If total tokens > 50K, remove oldest records until within limit
- Maintains both record count (20) AND token limits (50K) in persistent storage
- Sessions can have fewer than 20 records if they contain large records
### 3. LLM Context Filtering (Load-time)
- Updated `load_context_for_enrichment()` to filter history for LLM context
- Ensures history + current prompt fits within token limits
- Filters in-memory list without modifying database
- Two-level approach: DB enforces storage limits, load enforces LLM context limits
### 4. Constants & Configuration
- Added `MAX_CONTEXT_TOKENS = 50000` constant
- Token limit integrated into filtering utilities for consistent usage
## Files Modified
### Core Implementation
- `src/mcp_as_a_judge/constants.py` - Added MAX_CONTEXT_TOKENS constant
- `src/mcp_as_a_judge/db/interface.py` - Added tokens field to ConversationRecord
- `src/mcp_as_a_judge/db/providers/sqlite_provider.py` - Enhanced with hybrid cleanup logic
- `src/mcp_as_a_judge/db/conversation_history_service.py` - Updated load logic for LLM context
### New Utilities
- `src/mcp_as_a_judge/utils/__init__.py` - Created utils package
- `src/mcp_as_a_judge/utils/token_utils.py` - Token calculation and filtering utilities
### Comprehensive Testing
- `tests/test_token_based_history.py` - New comprehensive test suite (10 tests)
- `tests/test_conversation_history_lifecycle.py` - Enhanced existing tests with token verification
## Technical Improvements
### Performance Optimizations
- Simplified record count cleanup to remove exactly 1 record (matches one-by-one addition pattern)
- Removed unnecessary parameter passing (limit=None) using method defaults
- Efficient two-step cleanup process instead of recalculating everything
### Architecture Benefits
- **Write Heavy, Read Light**: Enforce constraints at save time, simplify loads
- **Two-level filtering**: Storage limits vs LLM context limits serve different purposes
- **FIFO consistency**: Oldest records removed first in both cleanup phases
- **Hybrid approach**: Respects whichever limit (record count or tokens) is more restrictive
## Test Coverage
- ✅ Token calculation accuracy (1 token ≈ 4 characters)
- ✅ Database token storage and retrieval
- ✅ Record count limit enforcement
- ✅ Token limit enforcement with FIFO removal
- ✅ Hybrid behavior (record vs token limits)
- ✅ Mixed record sizes handling
- ✅ Edge cases and error conditions
- ✅ Integration with existing lifecycle tests
- ✅ Database cleanup during save operations
- ✅ LLM context filtering during load operations
## Backward Compatibility
- All existing functionality preserved
- Existing tests continue to pass
- Database schema extended (not breaking)
- API remains the same for consumers
## Usage Example
```python
# System automatically handles both limits:
service = ConversationHistoryService(config)
# Save: Enforces storage limits (record count + tokens)
await service.save_tool_interaction(session_id, tool, input, output)
# Load: Filters for LLM context (history + prompt ≤ 50K tokens)
context = await service.load_context_for_enrichment(session_id)
```
The implementation provides a robust, efficient, and well-tested foundation for token-aware conversation history management.
0 commit comments