Skip to content

Bug: MCP Tool Response Exceeds Token Limit (45k tokens) Even for Small Prompts #6

@tjazevedo

Description

@tjazevedo

Bug Report: MCP Tool Response Limit - Model-Specific Issue

Problem Description

The MCP tool gemini-cli is returning a token limit error with specific models, even for very small prompts:

Error: MCP tool "ask-gemini" response (45735 tokens) exceeds maximum allowed tokens (25000)

Issue Details

  • Configured limit: 25,000 tokens
  • Actual response: 45,735 tokens (always the same number)
  • Issue: Occurs regardless of prompt size
  • Behavior: Even 10-word prompts generate 45k+ token responses

Model-Specific Analysis

After thorough testing, this is a model-specific bug:

Model Status Behavior
gemini-2.5-pro (default) BROKEN Always returns 45,735 tokens
gemini-2.5-flash WORKING Normal response sizes
gemini-2.0-flash-thinking 404 ERROR Model not found

Reproduction Steps

  1. Use gemini-cli MCP tool with default model (gemini-2.5-pro):

    /gemini-cli:analyze "What is 2+2?"
    

    Result: 45,735 token error

  2. Use gemini-cli MCP tool with flash model:

    /gemini-cli:analyze -m gemini-2.5-flash "What is 2+2?"
    

    Result: Works perfectly (normal response size)

Environment

  • Tool: gemini-cli MCP tool v1.1.1
  • Context: Claude Code interface
  • Node.js: v20.19.3
  • Google Gemini CLI: v0.1.10
  • Configuration: Properly configured with claude_desktop_config.json

Installation Verification

All installation requirements are met:

  • ✅ Node.js ≥ v16.0.0 (have v20.19.3)
  • ✅ Google Gemini CLI installed and configured
  • ✅ MCP server configured correctly via NPX method
  • ✅ claude_desktop_config.json properly set up

Expected Behavior

  • Response should respect the 25,000 token limit
  • All models should work consistently
  • Large responses should be truncated or paginated

Workaround

Use gemini-2.5-flash model explicitly:

/gemini-cli:analyze -m gemini-2.5-flash "your prompt here"

Root Cause Analysis

This appears to be a model-specific bug in gemini-2.5-pro:

  • The model returns a hardcoded response size of 45,735 tokens
  • This happens regardless of the actual prompt content
  • The issue is in the Gemini API response for the Pro model, not the MCP tool itself

Suggested Solutions

  1. Fix gemini-2.5-pro model to return appropriate response sizes
  2. Implement model-specific token limits in the MCP tool
  3. Add automatic fallback to gemini-2.5-flash when Pro model fails
  4. Update documentation to recommend using Flash model for analysis tasks
  5. Add model validation to prevent using broken models

Additional Context

  • This is not an installation issue - all components are properly configured
  • The MCP tool works perfectly with compatible models
  • The bug is isolated to the gemini-2.5-pro model specifically
  • Logs show MCP server initializes correctly

Related

  • Consider making gemini-2.5-flash the default model
  • Add model health checks before processing requests
  • Update troubleshooting documentation with model-specific issues

Update Status

Installation verified as correct - Issue is confirmed as model-specific bug in gemini-2.5-pro.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions