Skip to content

Conversation

@OtherVibes
Copy link
Owner

Description

Major changes:

  • Add request_plan_approval tool for human plan approval workflow
  • Implement PLAN_PENDING_APPROVAL state for plan approval workflow
  • Add judge response repair system for non-JSON LLM responses
  • Add markdown-to-JSON coercion fallback for judge responses
  • Enhance workflow guidance to include research requirements visibility
  • Fix task size-based plan evaluation criteria (simplified for M, comprehensive for L/XL)
  • Implement unified workflow: all tasks go through planning phase
  • Add comprehensive test coverage for new functionality

Key Features:

  • Human approval step before judge validation
  • Robust fallback handling for judge response parsing failures
  • Task-size-appropriate validation criteria
  • Enhanced workflow guidance with research requirement transparency
  • Improved error handling and response repair mechanisms

This establishes the foundation for human-in-the-loop plan approval while maintaining robust automated validation and fallback handling.

…nse handling

Major changes:
- Add request_plan_approval tool for human plan approval workflow
- Implement PLAN_PENDING_APPROVAL state for plan approval workflow
- Add judge response repair system for non-JSON LLM responses
- Add markdown-to-JSON coercion fallback for judge responses
- Enhance workflow guidance to include research requirements visibility
- Fix task size-based plan evaluation criteria (simplified for M, comprehensive for L/XL)
- Implement unified workflow: all tasks go through planning phase
- Add comprehensive test coverage for new functionality

Key Features:
- Human approval step before judge validation
- Robust fallback handling for judge response parsing failures
- Task-size-appropriate validation criteria
- Enhanced workflow guidance with research requirement transparency
- Improved error handling and response repair mechanisms

This establishes the foundation for human-in-the-loop plan approval
while maintaining robust automated validation and fallback handling.
@OtherVibes OtherVibes mentioned this pull request Sep 25, 2025
20 tasks
@OtherVibes OtherVibes requested a review from doriwal September 25, 2025 07:30
- Enhanced test mocking in test_small_task_follows_unified_workflow to properly mock messaging providers
- Applied ruff formatting to all source files
- All tests now pass (183/183)
- All quality checks pass (ruff, mypy, bandit)
- Act (local GitHub Actions) setup is working correctly
@OtherVibes OtherVibes merged commit 88f48ca into main Sep 26, 2025
14 of 15 checks passed
@OtherVibes OtherVibes deleted the feat/human-plan-approval branch September 26, 2025 05:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants