-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Bug Description
Long, tool-heavy gateway sessions using glm-5-turbo can still hit provider-side context overflow even when Hermes believes the request is still under the compaction threshold.
This appears to be a combination of three related problems:
- Hermes can undercount the real request size before the API call by reasoning mainly from the conversation transcript while the actual payload also includes large tool schemas.
- Z.AI returns a generic overflow message:
Prompt exceeds max length
which needs to be treated as a context-overflow signal.
- If Hermes steps down to a lower fallback tier after a generic overflow, that guessed lower tier can end up influencing future behavior more than it should unless it is clearly
treated as a temporary fallback.
There is also a related gateway UX issue:
- the post-compaction token number can be misleading if it reflects only the stripped transcript rather than the full next request payload.
Steps to Reproduce
- Configure Hermes to use:
- provider:
zai - model:
glm-5-turbo - base URL:
https://api.z.ai/api/coding/paas/v4
- provider:
- Use Hermes via a gateway platform (I observed this on Discord DM) in one long session with many tool calls, file reads, patches, and searches.
- Keep working in the same session until Hermes starts reporting context pressure / compaction pressure.
- Continue the same session.
- Observe that the provider can still reject the request with:
Error code: 400 - {'error': {'code': '1261', 'message': 'Prompt exceeds max length'}}
Expected Behavior
- Hermes should compact based on the full request shape it actually sends, including tool schemas.
- Provider-specific overflow messages like
Prompt exceeds max lengthshould trigger context-overflow recovery. - Temporary fallback step-downs should not be treated as confirmed provider limits unless the provider actually reported a numeric limit.
- Gateway post-compaction reporting should describe the real next-request estimate, not only the stripped transcript.
- Provider-specific overflow messages like
Actual Behavior
-
Hermes can believe a session is still below the compaction threshold, but the provider rejects the next request anyway.
- A generic GLM overflow can push Hermes toward a lower fallback context tier.
- Gateway compaction output can be misleading. Example shape:
Session is large (171 messages, ~124,701 tokens). Auto-compressing... Compressed: 171 → 7 messages, ~124,701 → ~402 tokens That post-compaction ~402 tokens number does not reflect the full next request payload.
Affected Component
Gateway (Telegram/Discord/Slack/WhatsApp), Agent Core (conversation loop, context compression, memory)
Messaging Platform (if gateway-related)
Discord
Operating System
Ubuntu 25.10
Python Version
3.11.13
Hermes Version
0.4.0
Relevant Logs / Traceback
Error code: 400 - {'error': {'code': '1261', 'message': 'Prompt exceeds max length'}}Root Cause Analysis (optional)
Observed contributing factors:
| # | Issue | Area |
|---|---|---|
| 1 | Real request size can be underestimated when tool schemas are large | run_agent.py preflight/request estimation |
| 2 | Z.AI overflow string is generic and needs explicit handling | provider-specific context overflow detection |
| 3 | Fallback step-down behavior can be confused with confirmed provider metadata if not handled carefully | context-length caching / probing |
| 4 | Gateway post-compaction reporting can describe transcript-only size instead of full request size | gateway session hygiene messaging |
Proposed Fix (optional)
- Estimate the full request payload for compaction decisions, not just the transcript.
- Treat Z.AI Prompt exceeds max length as a context-overflow signal.
- Only persist provider-confirmed numeric context limits.
- Keep guessed fallback step-downs temporary unless later confirmed.
- Make gateway post-compaction reporting use a full-request estimate.
Are you willing to submit a PR for this?
- I'd like to fix this myself and submit a PR
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working