Bug: MCP Tool Response Exceeds Token Limit (45k tokens) Even for Small Prompts

# Bug Report: MCP Tool Response Limit - Model-Specific Issue

## Problem Description
The MCP tool `gemini-cli` is returning a token limit error with specific models, even for very small prompts:

```
Error: MCP tool "ask-gemini" response (45735 tokens) exceeds maximum allowed tokens (25000)
```

## Issue Details
- **Configured limit**: 25,000 tokens
- **Actual response**: 45,735 tokens (always the same number)
- **Issue**: Occurs regardless of prompt size
- **Behavior**: Even 10-word prompts generate 45k+ token responses

## Model-Specific Analysis
After thorough testing, this is a **model-specific bug**:

| **Model** | **Status** | **Behavior** |
|-----------|------------|--------------|
| `gemini-2.5-pro` (default) | ❌ **BROKEN** | Always returns 45,735 tokens |
| `gemini-2.5-flash` | ✅ **WORKING** | Normal response sizes |
| `gemini-2.0-flash-thinking` | ❌ **404 ERROR** | Model not found |

## Reproduction Steps
1. Use gemini-cli MCP tool with **default model** (gemini-2.5-pro):
   ```
   /gemini-cli:analyze "What is 2+2?"
   ```
   **Result**: 45,735 token error

2. Use gemini-cli MCP tool with **flash model**:
   ```
   /gemini-cli:analyze -m gemini-2.5-flash "What is 2+2?"
   ```
   **Result**: Works perfectly (normal response size)

## Environment
- **Tool**: gemini-cli MCP tool v1.1.1
- **Context**: Claude Code interface
- **Node.js**: v20.19.3
- **Google Gemini CLI**: v0.1.10
- **Configuration**: Properly configured with claude_desktop_config.json

## Installation Verification
All installation requirements are met:
- ✅ Node.js ≥ v16.0.0 (have v20.19.3)
- ✅ Google Gemini CLI installed and configured
- ✅ MCP server configured correctly via NPX method
- ✅ claude_desktop_config.json properly set up

## Expected Behavior
- Response should respect the 25,000 token limit
- All models should work consistently
- Large responses should be truncated or paginated

## Workaround
Use `gemini-2.5-flash` model explicitly:
```
/gemini-cli:analyze -m gemini-2.5-flash "your prompt here"
```

## Root Cause Analysis
This appears to be a **model-specific bug in gemini-2.5-pro**:
- The model returns a hardcoded response size of 45,735 tokens
- This happens regardless of the actual prompt content
- The issue is in the Gemini API response for the Pro model, not the MCP tool itself

## Suggested Solutions
1. **Fix gemini-2.5-pro model** to return appropriate response sizes
2. **Implement model-specific token limits** in the MCP tool
3. **Add automatic fallback** to gemini-2.5-flash when Pro model fails
4. **Update documentation** to recommend using Flash model for analysis tasks
5. **Add model validation** to prevent using broken models

## Additional Context
- This is **not an installation issue** - all components are properly configured
- The MCP tool works perfectly with compatible models
- The bug is isolated to the gemini-2.5-pro model specifically
- Logs show MCP server initializes correctly

## Related
- Consider making gemini-2.5-flash the default model
- Add model health checks before processing requests
- Update troubleshooting documentation with model-specific issues

## Update Status
**Installation verified as correct** - Issue is confirmed as model-specific bug in gemini-2.5-pro.



Model	Status	Behavior
`gemini-2.5-pro` (default)	❌ BROKEN	Always returns 45,735 tokens
`gemini-2.5-flash`	✅ WORKING	Normal response sizes
`gemini-2.0-flash-thinking`	❌ 404 ERROR	Model not found

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug: MCP Tool Response Exceeds Token Limit (45k tokens) Even for Small Prompts #6

Bug Report: MCP Tool Response Limit - Model-Specific Issue

Problem Description

Issue Details

Model-Specific Analysis

Reproduction Steps

Environment

Installation Verification

Expected Behavior

Workaround

Root Cause Analysis

Suggested Solutions

Additional Context

Related

Update Status

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Bug: MCP Tool Response Exceeds Token Limit (45k tokens) Even for Small Prompts #6

Description

Bug Report: MCP Tool Response Limit - Model-Specific Issue

Problem Description

Issue Details

Model-Specific Analysis

Reproduction Steps

Environment

Installation Verification

Expected Behavior

Workaround

Root Cause Analysis

Suggested Solutions

Additional Context

Related

Update Status

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions