Skip to content

Commit 23abe58

Browse files
Peterclaude
andcommitted
feat: optimize MCP tool descriptions for token efficiency
- Add tiktoken dependency for accurate token counting - Create token validation utilities in validation.py - Optimize startPreIngestion tutorial from 191 to ~60 tokens - Add comprehensive unit tests for token efficiency - Update living memory files with optimization findings Key findings: - Existing tutorials already well-optimized (all under 200 tokens) - Implemented soft validation with warnings for token limits - Added fallback to character estimation for robustness 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
1 parent f124eb1 commit 23abe58

15 files changed

+1234
-310
lines changed

Architecture.md

Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2825,6 +2825,120 @@ The MCP manifest schema has been updated to use anyOf patterns for numeric param
28252825

28262826
This ensures compatibility with various MCP client implementations that may serialize parameters differently while maintaining strict type safety and validation.
28272827

2828+
## Token Optimization
2829+
2830+
### Token Counting Utilities
2831+
2832+
The system implements comprehensive token optimization utilities in the validation module to ensure efficient context usage across the application:
2833+
2834+
```mermaid
2835+
graph TB
2836+
subgraph "Token Optimization System"
2837+
TIKTOKEN[tiktoken Library<br/>GPT-4 compatible encoding<br/>cl100k_base default]
2838+
COUNT_FUNC[count_tokens() Function<br/>Text tokenization with fallback]
2839+
CHAR_TRUNC[Character Truncation<br/>100k character limit<br/>Stack overflow prevention]
2840+
VALIDATE_FUNC[validate_tutorial_tokens() Function<br/>Tutorial content validation<br/>200 token limit]
2841+
end
2842+
2843+
subgraph "Fallback Strategy"
2844+
TIKTOKEN_FAIL[tiktoken Import/Execution Failure]
2845+
CHAR_EST[Character Estimation<br/>1 token ≈ 4 characters]
2846+
LOGGING[Warning Logging<br/>Fallback notifications]
2847+
end
2848+
2849+
subgraph "Tutorial Validation Pipeline"
2850+
TUTORIAL_INPUT[Tutorial Content Input]
2851+
CHAR_CHECK[Character Limit Check<br/>5 chars per token upper bound]
2852+
TOKEN_COUNT[Actual Token Counting<br/>tiktoken or fallback]
2853+
LIMIT_CHECK[200 Token Limit Validation]
2854+
WARNING[Warning Logging<br/>Limit exceeded notifications]
2855+
end
2856+
2857+
TIKTOKEN --> COUNT_FUNC
2858+
COUNT_FUNC --> CHAR_TRUNC
2859+
COUNT_FUNC --> VALIDATE_FUNC
2860+
2861+
COUNT_FUNC -.->|Import/Runtime Error| TIKTOKEN_FAIL
2862+
TIKTOKEN_FAIL --> CHAR_EST
2863+
CHAR_EST --> LOGGING
2864+
2865+
TUTORIAL_INPUT --> CHAR_CHECK
2866+
CHAR_CHECK --> TOKEN_COUNT
2867+
TOKEN_COUNT --> LIMIT_CHECK
2868+
LIMIT_CHECK --> WARNING
2869+
2870+
TOKEN_COUNT --> COUNT_FUNC
2871+
```
2872+
2873+
### Core Token Optimization Functions
2874+
2875+
**count_tokens(text: str, encoding: str = "cl100k_base") -> int**
2876+
- Utilizes tiktoken library with GPT-4 compatible cl100k_base encoding
2877+
- Implements character truncation (100k limit) before tokenization to prevent stack overflow
2878+
- Provides robust fallback to character estimation (1 token ≈ 4 characters) on tiktoken failures
2879+
- Includes comprehensive error handling with warning logging
2880+
2881+
**validate_tutorial_tokens(value: str | None, max_tokens: int = 200) -> str | None**
2882+
- Validates tutorial content against strict 200-token limit
2883+
- Implements two-stage validation: character estimation followed by precise token counting
2884+
- Handles None values gracefully for optional tutorial fields
2885+
- Provides detailed warning logging when limits are exceeded
2886+
- Returns validated content or None for empty/whitespace-only input
2887+
2888+
### Token Optimization Architecture
2889+
2890+
```mermaid
2891+
graph LR
2892+
subgraph "Input Processing"
2893+
TEXT_INPUT[Text Input<br/>Tutorial/Content strings]
2894+
NULL_CHECK[None/Empty Validation<br/>Early return optimization]
2895+
CHAR_LIMIT[Character Limit Check<br/>100k truncation safety]
2896+
end
2897+
2898+
subgraph "Token Counting Engine"
2899+
TIKTOKEN_ENC[tiktoken Encoding<br/>cl100k_base for GPT-4]
2900+
ENCODE_TOKENS[Token Encoding<br/>Precise count generation]
2901+
FALLBACK_EST[Fallback Estimation<br/>Character-based approximation]
2902+
end
2903+
2904+
subgraph "Validation & Limits"
2905+
TUTORIAL_LIMIT[Tutorial Token Limit<br/>200 tokens maximum]
2906+
CHAR_ESTIMATE[Character Estimation<br/>5 chars/token upper bound]
2907+
WARNING_LOG[Warning Logging<br/>Limit violation notifications]
2908+
end
2909+
2910+
TEXT_INPUT --> NULL_CHECK
2911+
NULL_CHECK --> CHAR_LIMIT
2912+
CHAR_LIMIT --> TIKTOKEN_ENC
2913+
2914+
TIKTOKEN_ENC --> ENCODE_TOKENS
2915+
TIKTOKEN_ENC -.->|Import/Runtime Error| FALLBACK_EST
2916+
2917+
ENCODE_TOKENS --> TUTORIAL_LIMIT
2918+
FALLBACK_EST --> TUTORIAL_LIMIT
2919+
TUTORIAL_LIMIT --> CHAR_ESTIMATE
2920+
CHAR_ESTIMATE --> WARNING_LOG
2921+
```
2922+
2923+
### Performance Characteristics
2924+
2925+
- **Token Counting Performance**: tiktoken encoding optimized for batch processing
2926+
- **Memory Safety**: 100k character truncation prevents stack overflow during tokenization
2927+
- **Graceful Degradation**: Character-based estimation fallback ensures system reliability
2928+
- **Tutorial Validation**: 200-token limit ensures optimal context efficiency for MCP tool documentation
2929+
- **Error Resilience**: Comprehensive exception handling prevents validation failures from breaking application flow
2930+
2931+
### Integration Points
2932+
2933+
The token optimization system integrates with several key components:
2934+
2935+
1. **MCP Tool Documentation**: Tutorial field validation ensures token budget compliance
2936+
2. **Content Generation**: Smart snippet extraction uses token counting for optimal context usage
2937+
3. **Validation Pipeline**: Centralized token utilities support consistent token management across request models
2938+
4. **Error Handling**: Token limit violations generate informative warnings without breaking functionality
2939+
2940+
This token optimization architecture ensures efficient context usage while maintaining system reliability through robust fallback mechanisms and comprehensive error handling.
2941+
28282942
## Security Model
28292943

28302944
```mermaid

ResearchFindings.json

Lines changed: 57 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,8 @@
4949
"uvicorn": "0.27.1",
5050
"ijson": "3.4.0",
5151
"pygments": "2.19.2",
52-
"slowapi": "0.1.10"
52+
"slowapi": "0.1.10",
53+
"tiktoken": "0.11.0"
5354
},
5455
"perf_metrics": {
5556
"sqlite_vec_query": "<100ms for 1M+ vectors",
@@ -807,6 +808,52 @@
807808
"patterns": {"org": "Group related patterns", "progression": "Simple→advanced", "errors": "Common conditions", "practices": "Optimal usage"}
808809
}
809810
},
811+
"token_optimization": {
812+
"counting_libraries": {
813+
"tiktoken": {
814+
"ver": "0.11.0",
815+
"purpose": "Standard OpenAI token counting library",
816+
"encoding": "cl100k_base for GPT-4 models",
817+
"performance_warning": "Stack overflow risk with >100k character inputs",
818+
"best_practice": "Truncate text before tokenization to avoid performance issues"
819+
},
820+
"character_estimation": {
821+
"fallback_ratio": "1 token ≈ 4 characters",
822+
"accuracy": "Approximate - varies by language and content type",
823+
"use_case": "Quick estimation when tiktoken unavailable"
824+
}
825+
},
826+
"optimization_strategies": {
827+
"writing_patterns": {
828+
"bullet_points": "Reduce token count vs paragraph format",
829+
"technical_abbreviations": "Use standard abbreviations when clear",
830+
"imperative_voice": "More concise than descriptive prose"
831+
},
832+
"content_structuring": {
833+
"hierarchical": "Use headers and sections for scanability",
834+
"key_information_first": "Front-load critical details",
835+
"eliminate_redundancy": "Remove repeated information"
836+
}
837+
},
838+
"mcp_manifest_issues": {
839+
"token_overflow": {
840+
"range": "23K-93K tokens for large responses",
841+
"cause": "Extensive tool manifests with detailed schemas",
842+
"mitigation": "Optimize tool descriptions and parameter documentation"
843+
},
844+
"best_practices": {
845+
"description_length": "50-125 characters per tool description",
846+
"parameter_docs": "20-80 characters per parameter",
847+
"tutorial_limits": "200-token limit per tutorial section"
848+
}
849+
},
850+
"performance_considerations": {
851+
"tiktoken_limitations": "Memory and CPU intensive for large inputs",
852+
"preprocessing": "Apply truncation before token counting",
853+
"caching": "Cache token counts for frequently processed content",
854+
"monitoring": "Track token usage to identify optimization opportunities"
855+
}
856+
},
810857
"mcp_documentation": {
811858
"protocol_principles": {
812859
"api_design": "Token-efficient, self-documenting APIs with JSON schemas",
@@ -840,7 +887,9 @@
840887
"p2: Adding UNIQUE constraints requires 12-step rebuild",
841888
"p2: Use ijson #{ijson} with yajl2_c backend for #{ijson_speedup}",
842889
"p2: Use tree-sitter for #{treesitter_speedup} code extraction",
843-
"p2: Maintain strong asyncio task references"
890+
"p2: Maintain strong asyncio task references",
891+
"p2: Truncate text inputs before tiktoken processing to prevent stack overflow",
892+
"p2: Monitor MCP manifest token usage to prevent 23K-93K token overflow"
844893
],
845894
"performance": [
846895
"#{sqlite_vec_query} with sqlite-vec",
@@ -881,7 +930,12 @@
881930
"Apply fuzzy normalization in Pydantic field validators for query preprocessing",
882931
"Use regex-based word boundary matching for spelling normalization",
883932
"Configure dynamic fetch_k with 1.5x multiplier when MMR is enabled",
884-
"Maintain <500ms P95 latency with all ranking optimizations enabled"
933+
"Maintain <500ms P95 latency with all ranking optimizations enabled",
934+
"Use tiktoken 0.11.0 with cl100k_base encoding for accurate GPT-4 token counting",
935+
"Apply 1 token ≈ 4 characters estimation as fallback for quick calculations",
936+
"Implement text truncation before tiktoken processing to prevent performance issues",
937+
"Use bullet points and imperative voice for token-efficient documentation",
938+
"Cache token counts for frequently processed content to improve performance"
885939
],
886940
"production": [
887941
"Distributed rate limiting with Redis for multi-worker",

0 commit comments

Comments
 (0)