Skip to content

[Bug]: GLM gateway sessions can undercount request size, overflow late, and persist guessed fallback context limits #2599

@paraddox

Description

@paraddox

Bug Description

Long, tool-heavy gateway sessions using glm-5-turbo can still hit provider-side context overflow even when Hermes believes the request is still under the compaction threshold.

This appears to be a combination of three related problems:

  1. Hermes can undercount the real request size before the API call by reasoning mainly from the conversation transcript while the actual payload also includes large tool schemas.
  2. Z.AI returns a generic overflow message:
    • Prompt exceeds max length
      which needs to be treated as a context-overflow signal.
  3. If Hermes steps down to a lower fallback tier after a generic overflow, that guessed lower tier can end up influencing future behavior more than it should unless it is clearly
    treated as a temporary fallback.

There is also a related gateway UX issue:

  • the post-compaction token number can be misleading if it reflects only the stripped transcript rather than the full next request payload.

Steps to Reproduce

  1. Configure Hermes to use:
    • provider: zai
    • model: glm-5-turbo
    • base URL: https://api.z.ai/api/coding/paas/v4
  2. Use Hermes via a gateway platform (I observed this on Discord DM) in one long session with many tool calls, file reads, patches, and searches.
  3. Keep working in the same session until Hermes starts reporting context pressure / compaction pressure.
  4. Continue the same session.
  5. Observe that the provider can still reject the request with:
    • Error code: 400 - {'error': {'code': '1261', 'message': 'Prompt exceeds max length'}}

Expected Behavior

  • Hermes should compact based on the full request shape it actually sends, including tool schemas.
    • Provider-specific overflow messages like Prompt exceeds max length should trigger context-overflow recovery.
    • Temporary fallback step-downs should not be treated as confirmed provider limits unless the provider actually reported a numeric limit.
    • Gateway post-compaction reporting should describe the real next-request estimate, not only the stripped transcript.

Actual Behavior

  • Hermes can believe a session is still below the compaction threshold, but the provider rejects the next request anyway.

    • A generic GLM overflow can push Hermes toward a lower fallback context tier.
    • Gateway compaction output can be misleading. Example shape:
    Session is large (171 messages, ~124,701 tokens). Auto-compressing...
    Compressed: 171 → 7 messages, ~124,701 → ~402 tokens
    
    That post-compaction ~402 tokens number does not reflect the full next request payload.
    
    

Affected Component

Gateway (Telegram/Discord/Slack/WhatsApp), Agent Core (conversation loop, context compression, memory)

Messaging Platform (if gateway-related)

Discord

Operating System

Ubuntu 25.10

Python Version

3.11.13

Hermes Version

0.4.0

Relevant Logs / Traceback

Error code: 400 - {'error': {'code': '1261', 'message': 'Prompt exceeds max length'}}

Root Cause Analysis (optional)

Observed contributing factors:

# Issue Area
1 Real request size can be underestimated when tool schemas are large run_agent.py preflight/request estimation
2 Z.AI overflow string is generic and needs explicit handling provider-specific context overflow detection
3 Fallback step-down behavior can be confused with confirmed provider metadata if not handled carefully context-length caching / probing
4 Gateway post-compaction reporting can describe transcript-only size instead of full request size gateway session hygiene messaging

Proposed Fix (optional)

  • Estimate the full request payload for compaction decisions, not just the transcript.
  • Treat Z.AI Prompt exceeds max length as a context-overflow signal.
  • Only persist provider-confirmed numeric context limits.
  • Keep guessed fallback step-downs temporary unless later confirmed.
  • Make gateway post-compaction reporting use a full-request estimate.

Are you willing to submit a PR for this?

  • I'd like to fix this myself and submit a PR

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions