A proof-of-concept implementation for lazy loading MCP servers and tools in Claude Code, reducing initial context consumption by 95% (from 108k to 5k tokens).
Claude Code currently loads all MCP servers, tools, and agents at startup:
- Before: 108k tokens (54% of 200k limit) consumed immediately
- Result: Only 92k tokens available for actual work
- Impact: Complex tasks become impossible with multiple tools configured
This lazy loading system loads tools only when needed:
- After: 5k tokens (2.5%) for lightweight registry
- Result: 195k tokens available for work
- Savings: 103k tokens (95% reduction!)
| Component | Before | After | Savings |
|---|---|---|---|
| MCP tools | 39.8k tokens | On-demand | ~39k |
| Custom agents | 9.7k tokens | On-demand | ~9k |
| System tools | 22.6k tokens | On-demand | ~22k |
| Memory files | 36.0k tokens | Optimized | ~33k |
| Total at startup | 108k (54%) | 5k (2.5%) | 103k (51.5%) |
# Clone this repository
git clone https://github.com/machjesusmoto/claude-lazy-loading.git
cd claude-lazy-loading
# Copy to Claude configuration
cp -r optimization ~/.claude/# Create lightweight index of your tools
python3 ~/.claude/optimization/generate-index.py# Set up alias
alias cl='python3 ~/.claude/optimization/lazy-loader.py'
# Check what would load for a task
cl analyze "I need to build a React component"
# Output: Would load context7, magic (3.5k tokens)
# Check current stats
cl stats
# Output: 0 tools loaded, 103k tokens saved
# Preload for specific workflow
cl preload react # Loads React development tools
cl preload wordpress # Loads WordPress tools- Scans Claude configuration for MCP servers
- Extracts trigger keywords intelligently
- Creates minimal index (~500 tokens)
- Analyzes user input for keywords
- Loads only required tools
- Caches for session duration
- Groups related tools
Profiles for common workflows:
react: Context7, Magic, Frontend toolswordpress: SSH-WordPress, TayloredFocustesting: Playwright, Test toolsbackend: API, database, server tools
The system uses smart keyword detection:
{
"mcp_servers": {
"sequential-thinking": {
"trigger_keywords": ["complex", "analyze", "debug"],
"token_cost": 1300,
"auto_load": false
},
"magic": {
"trigger_keywords": ["ui", "component", "/21"],
"token_cost": 2800,
"auto_load": false
}
}
}Testing with real-world scenarios:
| Scenario | Traditional | Lazy Loading | Improvement |
|---|---|---|---|
| Simple query | 108k tokens | 5k tokens | 95% reduction |
| React dev | 108k tokens | 8.5k tokens | 92% reduction |
| WordPress | 108k tokens | 11k tokens | 90% reduction |
| Complex analysis | 108k tokens | 15k tokens | 86% reduction |
Perfect for:
- Power users with many MCP servers
- Long conversations requiring maximum context
- Complex tasks needing multiple tools
- Development workflows with specialized tools
Currently this is a proof-of-concept that demonstrates the potential. Full integration requires Claude Code native support.
- ✅ Registry generation from your MCP configs
- ✅ Keyword-based tool detection
- ✅ Manual lazy loading simulation
- ✅ Token usage tracking
- ⏳ Automatic lazy loading at runtime
- ⏳ Seamless tool injection on-demand
- ⏳ Native registry support
- ⏳ Hooks for pre-prompt analysis
This is a community effort to improve Claude Code. Contributions welcome!
- Test with your MCP configuration
- Report token savings achieved
- Suggest keyword improvements
- Add new preload profiles
- Feature Request: anthropics/claude-code#7336
- Discussion: Share your experience and results
- Contact: [Your contact info]
MIT - Free to use and modify
Note: This is a proof-of-concept demonstrating the potential for 95% context reduction. Full implementation requires native support in Claude Code.