Skip to content

Conversation

@doriwal
Copy link
Collaborator

@doriwal doriwal commented Sep 11, 2025

Summary

Implemented comprehensive token-based conversation history management that respects both record count and token limits (50K tokens max). The system uses a hybrid approach with efficient two-level filtering for optimal performance.

Key Features Added

1. Token Calculation & Storage

  • Added tokens field to ConversationRecord model for storing combined input+output token count
  • Created token_utils.py with token calculation utilities (1 token ≈ 4 characters)
  • Automatic token calculation and storage on every record save

2. Hybrid Database Cleanup (Save-time)

  • Enhanced _cleanup_old_messages() with efficient two-step process:
    1. If record count > max_records, remove 1 oldest record (since we add one-by-one)
    2. If total tokens > 50K, remove oldest records until within limit
  • Maintains both record count (20) AND token limits (50K) in persistent storage
  • Sessions can have fewer than 20 records if they contain large records

3. LLM Context Filtering (Load-time)

  • Updated load_context_for_enrichment() to filter history for LLM context
  • Ensures history + current prompt fits within token limits
  • Filters in-memory list without modifying database
  • Two-level approach: DB enforces storage limits, load enforces LLM context limits

4. Constants & Configuration

  • Added MAX_CONTEXT_TOKENS = 50000 constant
  • Token limit integrated into filtering utilities for consistent usage

Files Modified

Core Implementation

  • src/mcp_as_a_judge/constants.py - Added MAX_CONTEXT_TOKENS constant
  • src/mcp_as_a_judge/db/interface.py - Added tokens field to ConversationRecord
  • src/mcp_as_a_judge/db/providers/sqlite_provider.py - Enhanced with hybrid cleanup logic
  • src/mcp_as_a_judge/db/conversation_history_service.py - Updated load logic for LLM context

New Utilities

  • src/mcp_as_a_judge/utils/__init__.py - Created utils package
  • src/mcp_as_a_judge/utils/token_utils.py - Token calculation and filtering utilities

Comprehensive Testing

  • tests/test_token_based_history.py - New comprehensive test suite (10 tests)
  • tests/test_conversation_history_lifecycle.py - Enhanced existing tests with token verification

Technical Improvements

Performance Optimizations

  • Simplified record count cleanup to remove exactly 1 record (matches one-by-one addition pattern)
  • Removed unnecessary parameter passing (limit=None) using method defaults
  • Efficient two-step cleanup process instead of recalculating everything

Architecture Benefits

  • Write Heavy, Read Light: Enforce constraints at save time, simplify loads
  • Two-level filtering: Storage limits vs LLM context limits serve different purposes
  • FIFO consistency: Oldest records removed first in both cleanup phases
  • Hybrid approach: Respects whichever limit (record count or tokens) is more restrictive

Test Coverage

  • ✅ Token calculation accuracy (1 token ≈ 4 characters)
  • ✅ Database token storage and retrieval
  • ✅ Record count limit enforcement
  • ✅ Token limit enforcement with FIFO removal
  • ✅ Hybrid behavior (record vs token limits)
  • ✅ Mixed record sizes handling
  • ✅ Edge cases and error conditions
  • ✅ Integration with existing lifecycle tests
  • ✅ Database cleanup during save operations
  • ✅ LLM context filtering during load operations

Backward Compatibility

  • All existing functionality preserved
  • Existing tests continue to pass
  • Database schema extended (not breaking)
  • API remains the same for consumers

Usage Example

# System automatically handles both limits:
service = ConversationHistoryService(config)

# Save: Enforces storage limits (record count + tokens)
await service.save_tool_interaction(session_id, tool, input, output)

# Load: Filters for LLM context (history + prompt ≤ 50K tokens)
context = await service.load_context_for_enrichment(session_id)

The implementation provides a robust, efficient, and well-tested foundation for token-aware conversation history management.

Description

Brief description of the changes in this PR.

Type of Change

  • 🐛 Bug fix (non-breaking change which fixes an issue)
  • ✨ New feature (non-breaking change which adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 📚 Documentation update
  • 🔧 Maintenance/refactoring
  • ⚡ Performance improvement
  • 🧪 Test improvement

Testing

  • Tests pass locally with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules
  • I have checked my code and corrected any misspellings

Related Issues

Closes #(issue number)

Screenshots (if applicable)

Additional Notes

Any additional information that reviewers should know.

## Summary
Implemented comprehensive token-based conversation history management that respects both record count and token limits (50K tokens max). The system uses a hybrid approach with efficient two-level filtering for optimal performance.

## Key Features Added

### 1. Token Calculation & Storage
- Added `tokens` field to ConversationRecord model for storing combined input+output token count
- Created `token_utils.py` with token calculation utilities (1 token ≈ 4 characters)
- Automatic token calculation and storage on every record save

### 2. Hybrid Database Cleanup (Save-time)
- Enhanced `_cleanup_old_messages()` with efficient two-step process:
  1. If record count > max_records, remove 1 oldest record (since we add one-by-one)
  2. If total tokens > 50K, remove oldest records until within limit
- Maintains both record count (20) AND token limits (50K) in persistent storage
- Sessions can have fewer than 20 records if they contain large records

### 3. LLM Context Filtering (Load-time)
- Updated `load_context_for_enrichment()` to filter history for LLM context
- Ensures history + current prompt fits within token limits
- Filters in-memory list without modifying database
- Two-level approach: DB enforces storage limits, load enforces LLM context limits

### 4. Constants & Configuration
- Added `MAX_CONTEXT_TOKENS = 50000` constant
- Token limit integrated into filtering utilities for consistent usage

## Files Modified

### Core Implementation
- `src/mcp_as_a_judge/constants.py` - Added MAX_CONTEXT_TOKENS constant
- `src/mcp_as_a_judge/db/interface.py` - Added tokens field to ConversationRecord
- `src/mcp_as_a_judge/db/providers/sqlite_provider.py` - Enhanced with hybrid cleanup logic
- `src/mcp_as_a_judge/db/conversation_history_service.py` - Updated load logic for LLM context

### New Utilities
- `src/mcp_as_a_judge/utils/__init__.py` - Created utils package
- `src/mcp_as_a_judge/utils/token_utils.py` - Token calculation and filtering utilities

### Comprehensive Testing
- `tests/test_token_based_history.py` - New comprehensive test suite (10 tests)
- `tests/test_conversation_history_lifecycle.py` - Enhanced existing tests with token verification

## Technical Improvements

### Performance Optimizations
- Simplified record count cleanup to remove exactly 1 record (matches one-by-one addition pattern)
- Removed unnecessary parameter passing (limit=None) using method defaults
- Efficient two-step cleanup process instead of recalculating everything

### Architecture Benefits
- **Write Heavy, Read Light**: Enforce constraints at save time, simplify loads
- **Two-level filtering**: Storage limits vs LLM context limits serve different purposes
- **FIFO consistency**: Oldest records removed first in both cleanup phases
- **Hybrid approach**: Respects whichever limit (record count or tokens) is more restrictive

## Test Coverage
- ✅ Token calculation accuracy (1 token ≈ 4 characters)
- ✅ Database token storage and retrieval
- ✅ Record count limit enforcement
- ✅ Token limit enforcement with FIFO removal
- ✅ Hybrid behavior (record vs token limits)
- ✅ Mixed record sizes handling
- ✅ Edge cases and error conditions
- ✅ Integration with existing lifecycle tests
- ✅ Database cleanup during save operations
- ✅ LLM context filtering during load operations

## Backward Compatibility
- All existing functionality preserved
- Existing tests continue to pass
- Database schema extended (not breaking)
- API remains the same for consumers

## Usage Example
```python
# System automatically handles both limits:
service = ConversationHistoryService(config)

# Save: Enforces storage limits (record count + tokens)
await service.save_tool_interaction(session_id, tool, input, output)

# Load: Filters for LLM context (history + prompt ≤ 50K tokens)
context = await service.load_context_for_enrichment(session_id)
```

The implementation provides a robust, efficient, and well-tested foundation for token-aware conversation history management.
@doriwal doriwal requested a review from OtherVibes as a code owner September 11, 2025 10:01
messages=messages,
ctx=ctx,
max_tokens=5000,
max_tokens=MAX_CONTEXT_TOKENS,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This it valid @zvi ?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@doriwal - it should be MAX_TOKENS, this field is use for the output tokens (not related to this task)

@OtherVibes OtherVibes assigned OtherVibes and doriwal and unassigned OtherVibes Sep 11, 2025
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is less recommended
I suggest: https://docs.litellm.ai/docs/completion/token_usage#3-token_counter
If the client provide LLM_API_KEY we can get the model name.
Else, if it uses sampling, it's a bit more tricky:

    result = await ctx.session.create_message(  
        messages=[SamplingMessage(role="user", content=TextContent(type="text", text=prompt))],  
        max_tokens=100,  
    )  
      
    # Cache the model name for token counting  
    model_name = result.model 

dori added 9 commits September 11, 2025 15:17
feat: implement dynamic token limits with model-specific context management

This commit introduces a comprehensive token management system that replaces
hardcoded limits with dynamic, model-specific token limits while maintaining
backward compatibility.

## Key Features Added:

### Dynamic Token Limits (NEW)
- `src/mcp_as_a_judge/db/dynamic_token_limits.py`: New module providing
  model-specific token limits with LiteLLM integration
- Initialization pattern: start with hardcoded defaults, upgrade from cache
  or LiteLLM API if available, return whatever is available
- Caching system to avoid repeated API calls for model information

### Enhanced Token Calculation
- `src/mcp_as_a_judge/db/token_utils.py`: Upgraded to async functions with
  accurate LiteLLM token counting and character-based fallback
- Unified model detection from LLM config or MCP sampling context
- Functions: `calculate_tokens_in_string`, `calculate_tokens_in_record`,
  `filter_records_by_token_limit` (all now async)

### Two-Level Token Management
- **Database Level**: Storage limits enforced during save operations
  - Record count limit (20 per session)
  - Token count limit (dynamic based on model, fallback to 50K)
  - LRU session cleanup (50 total sessions max)
- **Load Level**: LLM context limits enforced during retrieval
  - Ensures history + current prompt fits within model's input limit
  - FIFO removal of oldest records when limits exceeded

### Updated Service Layer
- `src/mcp_as_a_judge/db/conversation_history_service.py`: Added await for
  async token filtering function
- `src/mcp_as_a_judge/db/providers/sqlite_provider.py`: Integrated dynamic
  token limits in cleanup operations

### Test Infrastructure
- `tests/test_helpers/`: New test utilities package
- `tests/test_helpers/token_utils_helpers.py`: Helper functions for token
  calculation testing and model cache management
- `tests/test_improved_token_counting.py`: Comprehensive async test suite
- Updated existing tests to support async token functions

## Implementation Details:

### Model Detection Strategy:
1. Try LLM configuration (fast, synchronous)
2. Try MCP sampling detection (async, requires context)
3. Fallback to None with hardcoded limits

### Token Limit Logic:
- **On Load**: Check total history + current prompt tokens against model max input
- **On Save**: Two-step cleanup (record count limit, then token limit)
- **FIFO Removal**: Always remove oldest records first to preserve recent context

### Backward Compatibility:
- All existing method signatures preserved with alias support
- Graceful fallback when model information unavailable
- No breaking changes to existing functionality

## Files Changed:
- Modified: 5 core files (service, provider, token utils, server)
- Added: 3 new files (dynamic limits, test helpers)
- Enhanced: 2 test files with async support

## Testing:
- All 160 tests pass (1 skipped for integration-only)
- Comprehensive coverage of token calculation, limits, and cleanup logic
- Edge cases and error handling verified

This implementation follows the user's preferred patterns:
- Configuration-based approach with rational fallbacks
- Clean separation of concerns between storage and LLM limits
- Efficient FIFO cleanup maintaining recent conversation context
@OtherVibes OtherVibes merged commit 3b73bbc into main Sep 11, 2025
9 checks passed
@OtherVibes OtherVibes deleted the feat/remain-history-by-token branch September 11, 2025 22:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants