-
Notifications
You must be signed in to change notification settings - Fork 133
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Bug Report: MCP Tool Response Limit - Model-Specific Issue
Problem Description
The MCP tool gemini-cli is returning a token limit error with specific models, even for very small prompts:
Error: MCP tool "ask-gemini" response (45735 tokens) exceeds maximum allowed tokens (25000)
Issue Details
- Configured limit: 25,000 tokens
- Actual response: 45,735 tokens (always the same number)
- Issue: Occurs regardless of prompt size
- Behavior: Even 10-word prompts generate 45k+ token responses
Model-Specific Analysis
After thorough testing, this is a model-specific bug:
| Model | Status | Behavior |
|---|---|---|
gemini-2.5-pro (default) |
❌ BROKEN | Always returns 45,735 tokens |
gemini-2.5-flash |
✅ WORKING | Normal response sizes |
gemini-2.0-flash-thinking |
❌ 404 ERROR | Model not found |
Reproduction Steps
-
Use gemini-cli MCP tool with default model (gemini-2.5-pro):
/gemini-cli:analyze "What is 2+2?"Result: 45,735 token error
-
Use gemini-cli MCP tool with flash model:
/gemini-cli:analyze -m gemini-2.5-flash "What is 2+2?"Result: Works perfectly (normal response size)
Environment
- Tool: gemini-cli MCP tool v1.1.1
- Context: Claude Code interface
- Node.js: v20.19.3
- Google Gemini CLI: v0.1.10
- Configuration: Properly configured with claude_desktop_config.json
Installation Verification
All installation requirements are met:
- ✅ Node.js ≥ v16.0.0 (have v20.19.3)
- ✅ Google Gemini CLI installed and configured
- ✅ MCP server configured correctly via NPX method
- ✅ claude_desktop_config.json properly set up
Expected Behavior
- Response should respect the 25,000 token limit
- All models should work consistently
- Large responses should be truncated or paginated
Workaround
Use gemini-2.5-flash model explicitly:
/gemini-cli:analyze -m gemini-2.5-flash "your prompt here"
Root Cause Analysis
This appears to be a model-specific bug in gemini-2.5-pro:
- The model returns a hardcoded response size of 45,735 tokens
- This happens regardless of the actual prompt content
- The issue is in the Gemini API response for the Pro model, not the MCP tool itself
Suggested Solutions
- Fix gemini-2.5-pro model to return appropriate response sizes
- Implement model-specific token limits in the MCP tool
- Add automatic fallback to gemini-2.5-flash when Pro model fails
- Update documentation to recommend using Flash model for analysis tasks
- Add model validation to prevent using broken models
Additional Context
- This is not an installation issue - all components are properly configured
- The MCP tool works perfectly with compatible models
- The bug is isolated to the gemini-2.5-pro model specifically
- Logs show MCP server initializes correctly
Related
- Consider making gemini-2.5-flash the default model
- Add model health checks before processing requests
- Update troubleshooting documentation with model-specific issues
Update Status
Installation verified as correct - Issue is confirmed as model-specific bug in gemini-2.5-pro.
murdrae and akasimojamubc, astrum3 and murdrae
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working