feat: Make max iteration limit resumable without reminders#2453
feat: Make max iteration limit resumable without reminders#2453utkarsh-in wants to merge 16 commits intoOpenHands:mainfrom
Conversation
- Add MAX_ITERATIONS_REACHED status to ConversationExecutionStatus enum - Mark MAX_ITERATIONS_REACHED as terminal state - Add ConversationIterationLimitEvent for better error handling - Modify send_message() to reset MAX_ITERATIONS_REACHED to IDLE on new user input - Modify run() to allow restarting from MAX_ITERATIONS_REACHED state - Remove final step reminder message injection when max iterations reached - Add budget information to agent system prompt - Add budget warning messages at 80% and 95% of max iterations - Update tests to cover new status transitions This change allows conversations to resume after hitting max iterations when a new user message is sent, instead of permanently stopping. Removes the intrusive reminder message that was previously injected.
…ify budget warning logic
8deea23 to
de58827
Compare
Co-authored-by: openhands <openhands@all-hands.dev>
There was a problem hiding this comment.
Thank you for trying this! It's very interesting.
I do wonder about one or two things though: first, benchmarks like any non-interactive runs, really do stop at max_iterations, they have to, so that it doesn't go on forever. But interactive runs don't have to stop, they only need to pause: so the user via an UI can say something, and it will continue. That might mean that the user experience could be affected if the LLM is told that it'd better hurry, instead of running normally.
So idk, I think maybe it's worth asking: do we want this applied only to non-interactive runs? do we know we are in a non-interactive run?
Secondly, I'm a bit curious. I'd like to know what in-context reminders tell the LLM in other agentic tools. I seem to recall off-hand that OpenAI had a reminder since a long time ago at least.
Reminding it of fragments of the system prompt or instructions, like telling it again how the process works, is fine IMHO; telling the state of its runtime, like number of iterations remaining is fine; idk, WDYT, is hurrying it or nudging it to completion fine?
|
Hi @enyst I agree that nudging it to completion can be removed. |
Co-authored-by: openhands <openhands@all-hands.dev>
Remove nudge to wrap up task from budget warning message. The warning now only informs the agent about remaining steps without suggesting they should complete the task. Co-authored-by: openhands <openhands@all-hands.dev>
|
@VascoSch92 I'd love your thoughts on this discussion. |
|
After the discussion in the relevant issue, I think we don't want to have a reminder for the iteration limit. At least, not now. However, I think it would be interesting (at least for me, for subagents) to have a new error type, i.e., MAX_ITERATIONS_REACHED, because right now this type of error falls under the generic ERROR category, which makes it indistinguishable. Can we just make this change in the PR?
I have already used context budget pressure on agents, but it is difficult to say whether it was effective or not. I think it is effective when you have a very limited number of iterations, but that is actually not our case, at least so far, I have never seen a task fail due to an iteration overflow in a benchmark using the OpenHands agent. |
|
@VascoSch92 @enyst just to clarify
Is that correct? |
|
Thanks!
I think maybe it’s not necessary, it’s a local variable used only for the reminder? |
- Remove 80% and 95% budget warning messages from conversation execution - Remove budget info from system prompt dynamic context - Update prompt caching test to expect no dynamic context when no agent context - Revert all package versions from 1.15.0 back to 1.14.0 - Update uv.lock accordingly Co-authored-by: openhands <openhands@all-hands.dev>
|
Hi @enyst Thanks |
…iterations to max_iteration_per_run - Changed ConversationExecutionStatus to inherit from StrEnum instead of str, Enum - Renamed max_iterations field to max_iteration_per_run in ConversationIterationLimitEvent - Updated corresponding test assertion to use the new field name Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
|
[Automatic Post]: It has been a while since there was any activity on this PR. @utkarsh-in, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up. |
1 similar comment
|
[Automatic Post]: It has been a while since there was any activity on this PR. @utkarsh-in, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up. |
|
@OpenHands Understand this PR. Its CI is failing, verify why, fix precommit (note that it's a fork so set your remote correctly and push) Then let's focus on the API workflow failures. I'm confused why it fails: I see a notice about additive changes, which don't error, and then I see... some additive changes which do result in an error? Understand the problems and then take a step back and tell me: do you think the workflow is too strict or the PR changes are too breaking things? Respond in a comment on github. |
|
I'm on it! enyst can track my progress at all-hands.dev |
Co-authored-by: openhands <openhands@all-hands.dev>
|
I dug through the PR, the linked issue, and the failing CI. What I fixed
Why the fresh CI is not showing results yetBecause this PR comes from a fork, the newly triggered workflow runs on the updated SHA are currently landing in REST API workflow: what is actually failing?The confusing part is the log output. The What is actually fatal is the addition of the new So the workflow is effectively saying:
The reason it looks contradictory is that after printing the notice, the script still prints the full raw oasdiff change list under My takeI think the messaging is too confusing, but the workflow itself is not too strict here. Adding a new response enum value is a real REST contract break for clients that generated closed enums from the OpenAPI schema or otherwise assume the documented value set is exhaustive. Since So my read is:
If we want This comment was created by an AI assistant (OpenHands) on behalf of @enyst. |
This comment was marked as duplicate.
This comment was marked as duplicate.
|
Hi, I'm OpenHands-GPT-5.4. I wanted to make the compatibility concern more concrete, because I think the key point is easy to miss from the raw workflow output. Why can adding a new enum value be a breaking change? Because once an API schema says a field is an That happens in a few common ways:
So even though adding a value looks "additive" from the server side, it is not always additive from the client side. Why that matters specifically for This is not just adding a decorative label. Clients can easily have logic like:
A client built against the old schema may not know what to do with
And in this PR, that new value is semantically important:
So I think the right interpretation is:
Put differently: the workflow output is a bit confusing, but the underlying concern is legitimate. This comment was created by an AI assistant (OpenHands) on behalf of @enyst. |
|
Hi, I'm OpenHands-GPT-5.4. Since I think this is a real REST compatibility break, here are the practical options I see. 1. Keep the new SDK/internal status, but preserve the current REST contractThis looks like the best short-term option to me. Idea
Why it works
Tradeoff
My recommendation for this PR: this is probably the best path. 2. Add a new additive field now, deprecate the old semantics, migrate over 5 minorsThis feels like the cleanest policy-perfect migration path. IdeaKeep current
For example:
Then:
Why it works
Tradeoff
3. Version the REST contractThis makes sense if we really want IdeaIntroduce a versioned API shape, e.g. a v2 route or versioned response contract.
Why it works
Tradeoff
4. Treat this as a deliberate breaking API change and do the full deprecation runwayThis is policy-compliant, but it is slower and in practice usually collapses into option 2 or 3. IdeaIf we really want
ProblemFor a field-level response change, this usually still means introducing either:
So while this is conceptually valid, it usually turns into option 2 or option 3 in implementation. 5. Relax the CI checker to allow enum additionsThis is possible, but I would not recommend doing it globally. Why notIf the checker learns that "adding enum values is fine", we weaken protection exactly where strongly-typed generated clients are most vulnerable. Narrow version that might be acceptableThe only version I would seriously consider is something explicit and narrow, e.g. only allowing enum additions for fields intentionally marked as extensible in OpenAPI (for example via a vendor extension like That would make the compatibility intent explicit instead of silently weakening the rule for every enum. Tradeoff
What I would do hereMy recommendation for this PR would be:
That gets us:
So my summary is:
This comment was created by an AI assistant (OpenHands) on behalf of @enyst. |
|
[Automatic Post]: It has been a while since there was any activity on this PR. @utkarsh-in, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up. |
1 similar comment
|
[Automatic Post]: It has been a while since there was any activity on this PR. @utkarsh-in, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up. |
#2406
Summary
This change allows conversations to resume after hitting max iterations when a new user message is sent, instead of permanently stopping.
Checklist