Fix workflow stuck in refining/evaluation loops by neuromechanist · Pull Request #120 · Annotation-Garden/HEDit

neuromechanist · 2026-03-04T04:27:50Z

Summary

Make evaluation informational-only when run_assessment=False (default), preventing the evaluation-driven refinement loop that caused the "stuck" behavior
Add 15s LLM call timeout via request_timeout on ChatLiteLLM to prevent hanging on slow providers
Default evaluation parsing to ACCEPT when ambiguous instead of REFINE
Derive max_total_iterations from max_validation_attempts + 1 (was hardcoded 10)
Add per-node timing (time.monotonic()) to all workflow nodes for diagnosing slowness
Switch default evaluation model to openai/gpt-oss-120b on groq
Lower LangGraph recursion_limit from 100 to 50
Update default max_validation_attempts from 5 to 3

Behavior Change

Setting	Before	After
`run_assessment=False`	Evaluation could loop back to refinement	Evaluation is informational only, always ends
`run_assessment=True`	Same as above	Evaluation can trigger refinement (capped)
Default max iterations	10 (hardcoded)	`max_validation_attempts + 1` (4 by default)
LLM call timeout	None (infinite)	15 seconds
Eval model	qwen/qwen3-235b on Cerebras	openai/gpt-oss-120b on groq

All settings remain tunable via the frontend (Max Validation Attempts dropdown, Run Assessment checkbox).

Test plan

415 unit tests pass, 0 failures
Manual test: simple description completes in 1-2 iterations
Manual test: run_assessment=False never triggers refinement from evaluation
Manual test: run_assessment=True allows refinement but caps at max_validation_attempts + 1
Verify per-node timing appears in server logs

Closes #119

- Make evaluation informational-only when run_assessment=False - Add 15s LLM call timeout via request_timeout on ChatLiteLLM - Default evaluation parsing to ACCEPT when ambiguous - Derive max_total_iterations from max_validation_attempts + 1 - Add per-node timing to all workflow nodes - Switch eval model default to openai/gpt-oss-120b on groq - Lower recursion_limit from 100 to 50 - Update default max_validation_attempts from 5 to 3 Closes #119

cloudflare-workers-and-pages · 2026-03-04T04:28:04Z

Deploying hedit with Cloudflare Pages

Latest commit:	`6cf0986`
Status:	✅ Deploy successful!
Preview URL:	https://228724b8.hedit.pages.dev
Branch Preview URL:	https://119-fix-workflow-stuck-in-re.hedit.pages.dev

View logs

- Move re import to module level in evaluation_agent.py - Add missing "Entering assess node" log for consistency - Centralize max_total_iterations derivation in state.py and workflow.py (was duplicated 3x in main.py, now defaults to max_validation_attempts + 1) - Update create_initial_state defaults (was stale at 5/10) - Fix ty warnings: remove unused type: ignore comments - Fix ty errors: add type: ignore for LangGraph/Starlette typing limitations - Fix return type on get_default_path (-> str | None) - Update test_state to match new default

codecov · 2026-03-04T04:49:48Z

Codecov Report

❌ Patch coverage is 26.53061% with 72 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/agents/workflow.py	13.33%	25 Missing and 1 partial ⚠️
src/api/main.py	31.57%	26 Missing ⚠️
src/agents/evaluation_agent.py	23.07%	10 Missing ⚠️
src/agents/assessment_agent.py	28.57%	5 Missing ⚠️
src/agents/feedback_summarizer.py	28.57%	5 Missing ⚠️

📢 Thoughts on this report? Let us know!

- Add try/except with logging to evaluation, assessment, and feedback agents - Map timeouts to HTTP 504, rate limits to HTTP 429 in API endpoints - Add error_type field to streaming SSE error events - Sanitize error messages to avoid leaking internal details - Add debug log for silent ACCEPT fallback in evaluation parsing

neuromechanist · 2026-03-04T10:45:32Z

PR Review Summary

Three specialized review agents analyzed this PR. Here is a summary of all findings and how each was addressed.

Silent Failure Hunter

CRITICAL - No timeout error handling in LLM calls (evaluation, assessment, feedback agents)

Fixed in commit 6cf0986: Added try/except with logging around llm.ainvoke() calls in evaluation_agent.py, assessment_agent.py, and feedback_summarizer.py. Errors are logged with full traceback and re-raised.

HIGH - API endpoints return generic 500 for all errors

Fixed in commit 6cf0986: Added specific exception handlers for APITimeoutError (maps to HTTP 504) and RateLimitError (maps to HTTP 429) in all four annotation endpoints. Streaming SSE error events now include an error_type field (timeout, rate_limit, internal). Raw exception strings no longer leak to clients.

MEDIUM - Silent ACCEPT fallback in evaluation parsing

Fixed in commit 6cf0986: Added logger.debug() when _parse_decision() falls through to the default ACCEPT, so ambiguous responses are traceable in logs.

Code Reviewer

Stale defaults in create_initial_state() (max_validation_attempts=5, max_total_iterations=10)

Fixed in commit c89dcac: Updated to max_validation_attempts=3, max_total_iterations=None (auto-derived as max_validation_attempts + 1).

import re inside method body instead of module-level

Fixed in commit c89dcac: Moved to module-level import.

Missing "Entering assess node" log

Fixed in commit c89dcac: Added print statement consistent with other nodes.

Test assertion using old default (5 instead of 3)

Fixed in commit c89dcac: Updated test_create_initial_state assertion.

Code Simplifier

max_total_iterations derivation duplicated 3x in main.py

Fixed in commit c89dcac: Centralized derivation in create_initial_state() and workflow.run() with None default and auto-derivation. All call sites in main.py simplified.

Pre-existing Issues (not introduced by this PR)

test_cli_integration.py::test_annotate_complex_description fails on develop (server error, unrelated to this PR)
Dependabot vulnerability on default branch (pre-existing)

neuromechanist merged commit e67d2e4 into develop Mar 4, 2026
12 of 13 checks passed

neuromechanist deleted the 119-fix-workflow-stuck-in-refiningevaluation-loops branch March 4, 2026 10:52

neuromechanist mentioned this pull request Mar 4, 2026

Fix workflow stuck in refining/evaluation loops #119

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix workflow stuck in refining/evaluation loops#120

Fix workflow stuck in refining/evaluation loops#120
neuromechanist merged 3 commits intodevelopfrom
119-fix-workflow-stuck-in-refiningevaluation-loops

neuromechanist commented Mar 4, 2026

Uh oh!

cloudflare-workers-and-pages bot commented Mar 4, 2026 •

edited

Loading

Uh oh!

codecov bot commented Mar 4, 2026 •

edited

Loading

Uh oh!

neuromechanist commented Mar 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

neuromechanist commented Mar 4, 2026

Summary

Behavior Change

Test plan

Uh oh!

cloudflare-workers-and-pages bot commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying hedit with Cloudflare Pages

Uh oh!

codecov bot commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

neuromechanist commented Mar 4, 2026

PR Review Summary

Silent Failure Hunter

Code Reviewer

Code Simplifier

Pre-existing Issues (not introduced by this PR)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cloudflare-workers-and-pages bot commented Mar 4, 2026 •

edited

Loading

codecov bot commented Mar 4, 2026 •

edited

Loading