Skip to content

Conversation

@frankbria
Copy link
Owner

@frankbria frankbria commented Jan 11, 2026

Summary

  • Connect task approval flow to multi-agent execution engine (P0 blocker fix)
  • After POST /api/projects/{id}/tasks/approve, system now triggers LeadAgent.start_multi_agent_execution() in background
  • Uses FastAPI BackgroundTasks pattern for non-blocking async execution
  • Broadcasts development_failed events via WebSocket on errors/timeout

Changes

File Description
codeframe/ui/routers/tasks.py Add start_development_execution background task, integrate into approve_tasks
tests/ui/test_task_approval_execution.py New: 15 unit tests for execution functionality
tests/ui/test_task_approval.py Update 21 tests with mock_background_tasks fixture
tests/e2e/test_task_approval.spec.ts Add E2E tests for multi-agent execution
tests/config/test_config.py Fix environment isolation issues

Architecture Flow

User clicks "Approve Tasks"
    ↓
POST /api/projects/{id}/tasks/approve
    ↓
approve_tasks() validates & transitions phase to "active"
    ↓
background_tasks.add_task(start_development_execution, ...)
    ↓
Response returns immediately (non-blocking)
    ↓
Background: LeadAgent.start_multi_agent_execution()
    ↓
WebSocket broadcasts: agent_created, task_assigned events

Test plan

  • Run unit tests: uv run pytest tests/ui/test_task_approval*.py -v (36 tests passing)
  • Run config tests: uv run pytest tests/config/test_config.py -v (13 tests passing)
  • Run E2E tests: npx playwright test test_task_approval.spec.ts (18 passed, 2 webkit login timeouts - pre-existing issue)
  • Manual: Approve tasks in UI, verify agents are created and tasks assigned

Summary by CodeRabbit

  • New Features

    • Task approvals now schedule non‑blocking background multi‑agent execution and broadcast real‑time success/failure events; execution is skipped when the required API key is absent and approvals return quickly.
  • Tests

    • Expanded test coverage: isolated env handling, API‑key gated suites, serialized UI flows, direct API contract checks, dedicated execution/integration tests, and mocks for background scheduling and WebSocket broadcasts.
  • Documentation

    • Added E2E test architecture guidance on gating, visibility assertions, and anti‑patterns.

✏️ Tip: You can customize this high-level summary in your review settings.

Connect task approval flow to multi-agent execution engine. After tasks
are approved via POST /api/projects/{id}/tasks/approve, the system now
triggers LeadAgent.start_multi_agent_execution() in the background.

Changes:
- Add start_development_execution background task to tasks.py
- Integrate BackgroundTasks into approve_tasks endpoint
- Broadcast development_failed events on errors/timeout
- Add 15 unit tests for new execution functionality
- Update 21 existing tests with mock_background_tasks fixture
- Add E2E tests for multi-agent execution verification
- Fix config test environment isolation issues
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 11, 2026

Walkthrough

Schedules non-blocking background multi-agent execution after task approval by adding start_development_execution to instantiate a LeadAgent, run start_multi_agent_execution with timeout/retry controls and WebSocket broadcasts; updates approve_tasks to accept BackgroundTasks and schedule the job when ANTHROPIC_API_KEY is present. (≤50 words)

Changes

Cohort / File(s) Summary
Core Feature Implementation
codeframe/ui/routers/tasks.py
Adds start_development_execution(project_id, db, ws_manager, api_key) to instantiate LeadAgent, run start_multi_agent_execution with concurrency/retry/timeout handling and WebSocket broadcasts; updates approve_tasks(...) signature to accept BackgroundTasks and schedule the background job when ANTHROPIC_API_KEY exists.
Unit Tests — Approval Flow
tests/ui/test_task_approval.py
Adds mock_background_tasks fixture and updates tests to pass background_tasks into approve_tasks() ensuring deterministic behavior when ANTHROPIC_API_KEY is cleared.
Integration Tests — Execution
tests/ui/test_task_approval_execution.py
New test module validating start_development_execution() behavior: LeadAgent instantiation, start_multi_agent_execution invocation, timeout/general-exception handling, WebSocket broadcast contents, API key presence checks, and non-blocking scheduling from approve_tasks().
E2E Tests
tests/e2e/test_task_approval.spec.ts
Reworks Task Approval E2E: consolidates tests serially, simplifies UI assertions, adds API-key gated suites, and adds direct API tests validating non-blocking approval and backend request/response structure when multi-agent execution is expected.
Config Test Isolation
tests/config/test_config.py
Tests now monkeypatch environment loading and construct GlobalConfig(_env_file=None) and clear env vars to avoid .env side effects during validation tests.
Docs / Policy
CLAUDE.md
Adds E2E Test Architecture Policy guidance and examples concerning gating, visibility assertions, and skipping anti-patterns.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant Approval as approve_tasks Endpoint
    participant DB as Database
    participant BG as BackgroundTasks Scheduler
    participant Dev as start_development_execution (BG task)
    participant LA as LeadAgent
    participant WS as WebSocket Manager

    Client->>Approval: POST /tasks/approve
    Approval->>DB: mark tasks approved -> Development
    Approval->>BG: schedule start_development_execution(project_id,...)
    Approval-->>Client: 200 OK (immediate)

    BG->>Dev: run background task
    Dev->>LA: instantiate LeadAgent(api_key, db, ws_manager)
    Dev->>LA: call start_multi_agent_execution()
    LA->>DB: load approved tasks & dependencies
    LA->>WS: broadcast development_started
    alt execution success
        LA->>WS: broadcast agent_created (per agent)
        Dev->>WS: broadcast development_completed (summary)
    else timeout
        Dev->>WS: broadcast development_failed (timeout)
    else exception
        Dev->>WS: broadcast development_failed (error)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐇 I nudged the plan and set it free,
LeadAgent scurries, agents agree.
Webs ping bright with each small feat,
I nibble logs and tap my feet.
A hop, a buzz — development’s sweet.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title directly and clearly summarizes the main change: adding multi-agent execution triggering after task approval, which is the core objective of this changeset.
Docstring Coverage ✅ Passed Docstring coverage is 97.96% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/task-approval-multi-agent-execution

Comment @coderabbitai help to get the list of available commands and usage tips.

@macroscopeapp
Copy link

macroscopeapp bot commented Jan 11, 2026

Trigger non-blocking multi-agent execution after task approval in POST /api/projects/{project_id}/tasks/approve via ui.routers.tasks.start_development_execution

Add BackgroundTasks scheduling in approve_tasks to start LeadAgent.start_multi_agent_execution asynchronously; handle timeout/errors by broadcasting development_failed with ISO UTC timestamps; update tests and E2E policy to assert UI and API behavior.

📍Where to Start

Start with approve_tasks and the start_development_execution background task in tasks.py.


Macroscope summarized 92cbe18.

@github-actions
Copy link
Contributor

Code Review: Task Approval → Multi-Agent Execution

Overall Assessment

LGTM with minor suggestions - This is a well-implemented P0 blocker fix with excellent test coverage. The code follows existing patterns and has proper error handling.

Strengths

  1. Clean Architecture: Separated background task function () from endpoint logic
  2. Non-blocking Design: FastAPI pattern ensures API returns quickly
  3. Graceful Degradation: Works without , logs warning instead of failing
  4. Excellent Test Coverage: 15 new unit tests + E2E tests + 21 updated tests
  5. Proper Error Handling: Catches and general exceptions, broadcasts to WebSocket
  6. Good Logging: Uses emojis (🚀, ✅, ❌, ⚠️) for quick visual scanning
  7. WebSocket Integration: Broadcasts events on errors

Suggestions (Not Blocking)

1. WebSocket Broadcast Consistency ()

Issue: The broadcast is done inline rather than using a helper function.

Current:

Suggestion: Add to for consistency with other broadcasts (e.g., , ).

2. Database Connection Lifecycle ()

Question: Is the instance safe for use in background threads/tasks?

FastAPI runs in a separate thread pool. If the database connection is not thread-safe or is closed before the task executes, this could cause issues.

Recommendation: Document or validate that the dependency is thread-safe and won't be closed while the background task is running.

3. Configurable Timeout ()

Suggestion: Make timeout configurable:

4. Error Message Sanitization ()

Concern: Broadcasting raw exception messages could expose implementation details or sensitive data.

Suggestion: Consider sanitizing or using a generic message for client-facing errors while logging the full exception:

5. Type Annotation ()

Suggestion: Consider using a protocol or type alias for better type safety:

6. Test Comments ()

The comment is a bit unclear:

Suggestion: Make the comment more descriptive:

Test Coverage Notes

  • ✅ Unit tests for background execution (15 tests)
  • ✅ Integration tests for approve_tasks flow
  • ✅ WebSocket event validation
  • ✅ E2E tests for multi-agent execution
  • ✅ Edge case coverage (no API key, empty API key, rejection)

Performance Considerations

  • ✅ Non-blocking API response - good for UX
  • ⚠️ Background task could consume significant resources (multi-agent execution)
  • 💡 Consider: Adding a task queue with rate limiting for concurrent approvals

Security Considerations

Summary

This PR is ready to merge. The suggestions above are minor improvements that could be addressed in follow-up commits. The core functionality is well-implemented and thoroughly tested.

@github-actions
Copy link
Contributor

Code Review: Task Approval → Multi-Agent Execution

Overall Assessment

LGTM with minor suggestions - This is a well-implemented P0 blocker fix with excellent test coverage. The code follows existing patterns and has proper error handling.

Strengths

  1. Clean Architecture: Separated background task function (start_development_execution) from endpoint logic
  2. Non-blocking Design: FastAPI BackgroundTasks pattern ensures API returns quickly
  3. Graceful Degradation: Works without ANTHROPIC_API_KEY, logs warning instead of failing
  4. Excellent Test Coverage: 15 new unit tests + E2E tests + 21 updated tests
  5. Proper Error Handling: Catches asyncio.TimeoutError and general exceptions, broadcasts to WebSocket
  6. Good Logging: Uses emojis (🚀, ✅, ❌, ⚠️) for quick visual scanning
  7. WebSocket Integration: Broadcasts development_failed events on errors

Suggestions (Not Blocking)

1. WebSocket Broadcast Consistency (tasks.py:107-130)

Issue: The development_failed broadcast is done inline rather than using a helper function.

Suggestion: Add broadcast_development_failed() to websocket_broadcasts.py for consistency with other broadcasts (e.g., broadcast_planning_failed, broadcast_development_started).

2. Database Connection Lifecycle (tasks.py:43-131)

Question: Is the Database instance safe for use in background threads/tasks?

FastAPI BackgroundTasks runs in a separate thread pool. If the database connection is not thread-safe or is closed before the task executes, this could cause issues.

Recommendation: Document or validate that the db dependency is thread-safe and won't be closed while the background task is running.

3. Configurable Timeout (tasks.py:93)

The timeout value is hardcoded to 300 seconds. Consider making this configurable via environment variable or config for flexibility.

4. Error Message Sanitization (tasks.py:111, 126)

Broadcasting raw exception messages (str(e)) could expose implementation details or sensitive data to clients.

Recommendation: Consider sanitizing or using a generic message for client-facing errors while logging the full exception with exc_info=True.

5. Type Annotation (tasks.py:46)

The ws_manager: Any type annotation could be more specific. Consider using a protocol or type alias for better type safety.

6. Test Comments (test_config.py:150-151)

The comment could be more descriptive about why environment isolation is needed:

# Prevent load_environment from reading .env files to ensure clean test environment

Test Coverage Notes

  • Unit tests for background execution (15 tests)
  • Integration tests for approve_tasks flow
  • WebSocket event validation
  • E2E tests for multi-agent execution
  • Edge case coverage (no API key, empty API key, rejection)

Performance Considerations

  • Non-blocking API response - good for UX
  • Background task could consume significant resources (multi-agent execution)
  • Consider: Adding a task queue with rate limiting for concurrent approvals

Security Considerations

Summary

This PR is ready to merge. The suggestions above are minor improvements that could be addressed in follow-up commits. The core functionality is well-implemented and thoroughly tested.

@github-actions
Copy link
Contributor

Review posted: #246 (comment)

Summary

LGTM with minor suggestions - This is a well-implemented P0 blocker fix connecting task approval to multi-agent execution.

Key Strengths

  • Clean architecture with background task pattern
  • Excellent test coverage (15 new unit tests + E2E tests)
  • Proper error handling with graceful degradation
  • Non-blocking API response design

Suggestions (Non-blocking)

  1. WebSocket broadcast consistency - Consider adding broadcast_development_failed() helper for consistency
  2. Database connection lifecycle - Verify db dependency is thread-safe for background tasks
  3. Configurable timeout - The 300s timeout is hardcoded
  4. Error message sanitization - Broadcasting raw str(e) may leak implementation details
  5. Type annotation - ws_manager: Any could use a Protocol for better type safety

New%20session%20-%202026-01-11T06%3A21%3A58.048Z
opencode session  |  github run

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/ui/test_task_approval.py (1)

73-88: Make tests deterministic w.r.t. ANTHROPIC_API_KEY.

approve_tasks now conditionally schedules work based on os.environ["ANTHROPIC_API_KEY"]. These tests don’t pin that env var, so behavior can vary by developer machine/CI.

Suggestion: in mock_background_tasks (or an autouse fixture), monkeypatch.delenv("ANTHROPIC_API_KEY", raising=False) unless a test explicitly needs it.

Also applies to: 96-111, 116-131, 140-155, 160-175, 191-205

🤖 Fix all issues with AI agents
In @tests/config/test_config.py:
- Around line 150-167: Remove the unused local variable original_get_global and
simplify the monkeypatch: delete the assignment original_get_global =
config.get_global and keep only the patched_get_global function and the
monkeypatch.setattr(config, "get_global", patched_get_global) so the test no
longer defines an unused symbol; ensure references to Config and GlobalConfig
remain and that patched_get_global sets config._global_config =
GlobalConfig(_env_file=None) when absent.

In @tests/e2e/test_task_approval.spec.ts:
- Around line 412-533: The test currently can exit silently when key UI elements
or API responses are missing (e.g., tasksTab not visible, approveButton not
visible, or approval response non-200), so add explicit outcomes for those
paths: where tasksTab.isVisible() is false call test.skip('Project not in
planning phase or tasks tab missing') or assert with
expect(tasksTab).toBeVisible() depending on intent; where
approveButton.isVisible() is false call test.skip('Approve button not visible -
project not in planning phase'); where the approval response is null or
response.status() !== 200 call test.skip('Approval failed or returned non-200
status'); keep the existing authToken check (test.skip when missing) and retain
the positive assertions for developmentStartedReceived and agents checks
(developmentStartedReceived, approveButton, tasksTab, response) so every branch
either asserts or explicitly skips.
🧹 Nitpick comments (3)
codeframe/ui/routers/tasks.py (1)

336-354: Avoid scheduling execution when there’s nothing to execute.

If all tasks are excluded (approved_count=0), consider skipping background_tasks.add_task(...) to reduce noisy development_failed events (e.g., LeadAgent may raise “no tasks found”).

tests/e2e/test_task_approval.spec.ts (1)

480-519: Replace fixed sleeps with polling to reduce flake.

The waitForTimeout(2000) / waitForTimeout(5000) will be flaky under load. Prefer await expect.poll(() => page.evaluate(...)).toBe(true) or page.waitForFunction on window.__developmentStartedReceived / window.__agentCreatedReceived.

tests/ui/test_task_approval_execution.py (1)

326-353: Timing assertion (<1s) may be flaky; prefer behavioral assertion.

Rather than elapsed < 1.0, consider asserting:

  • mock_background_tasks.add_task.assert_called_once() and
  • no inline execution (e.g., LeadAgent not instantiated in approve_tasks), which proves “non-blocking” without wall-clock dependence.
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 064bf0e and 9aa2e61.

📒 Files selected for processing (5)
  • codeframe/ui/routers/tasks.py
  • tests/config/test_config.py
  • tests/e2e/test_task_approval.spec.ts
  • tests/ui/test_task_approval.py
  • tests/ui/test_task_approval_execution.py
🧰 Additional context used
📓 Path-based instructions (2)
tests/e2e/**/*.ts

📄 CodeRabbit inference engine (CLAUDE.md)

Implement E2E tests using Playwright + TestSprite with loginUser() helper from tests/e2e/test-utils.ts for authentication

Files:

  • tests/e2e/test_task_approval.spec.ts
codeframe/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

codeframe/**/*.py: Use Python 3.11+ for backend development with FastAPI, AsyncAnthropic, SQLite with async support (aiosqlite), and tiktoken for token counting
Use token counting via tiktoken library for token budget management with ~50,000 token limit per conversation
Use asyncio patterns with AsyncAnthropic for async/await in Python backend for concurrent operations
Implement quality gates with multi-stage pre-completion checks (tests → type → coverage → review) and Git + SQLite + context snapshots for project state rollback
Use tiered memory system (HOT/WARM/COLD) with importance scoring using hybrid exponential decay algorithm for context management with 30-50% token reduction
Implement session lifecycle management with auto-save/restore using file-based storage at .codeframe/session_state.json

Files:

  • codeframe/ui/routers/tasks.py
🧠 Learnings (5)
📓 Common learnings
Learnt from: CR
Repo: frankbria/codeframe PR: 0
File: docs/CLAUDE.md:0-0
Timestamp: 2025-11-25T19:08:37.203Z
Learning: Implement Lead Agent for orchestration and Worker Agents for specialization (Backend, Frontend, Test, Review) with maturity levels D1-D4
📚 Learning: 2026-01-04T06:26:12.870Z
Learnt from: CR
Repo: frankbria/codeframe PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-04T06:26:12.870Z
Learning: Applies to tests/e2e/**/*.ts : Implement E2E tests using Playwright + TestSprite with loginUser() helper from tests/e2e/test-utils.ts for authentication

Applied to files:

  • tests/e2e/test_task_approval.spec.ts
📚 Learning: 2025-11-25T19:08:37.203Z
Learnt from: CR
Repo: frankbria/codeframe PR: 0
File: docs/CLAUDE.md:0-0
Timestamp: 2025-11-25T19:08:37.203Z
Learning: Applies to docs/web-ui/**/__tests__/**/*.test.{ts,tsx} : Create JavaScript test files colocated or in __tests__/ as *.test.ts

Applied to files:

  • tests/e2e/test_task_approval.spec.ts
📚 Learning: 2025-12-17T19:21:40.014Z
Learnt from: frankbria
Repo: frankbria/codeframe PR: 128
File: tests/agents/test_bottleneck_detection.py:486-500
Timestamp: 2025-12-17T19:21:40.014Z
Learning: In tests/agents/test_bottleneck_detection.py, test_skip_agents_below_threshold should be async and mock _get_agent_workload to return workload below AGENT_OVERLOAD_THRESHOLD (5) while providing non-empty tasks list to prevent early return in detect_bottlenecks().

Applied to files:

  • tests/e2e/test_task_approval.spec.ts
  • tests/ui/test_task_approval_execution.py
📚 Learning: 2025-11-25T19:08:37.203Z
Learnt from: CR
Repo: frankbria/codeframe PR: 0
File: docs/CLAUDE.md:0-0
Timestamp: 2025-11-25T19:08:37.203Z
Learning: Applies to docs/tests/**/*.py : Use pytest fixtures for Python testing and avoid over-mocking

Applied to files:

  • tests/ui/test_task_approval.py
🧬 Code graph analysis (4)
tests/e2e/test_task_approval.spec.ts (3)
tests/e2e/test-utils.ts (4)
  • setupErrorMonitoring (64-104)
  • ExtendedPage (25-27)
  • loginUser (483-500)
  • checkTestErrors (192-216)
tests/e2e/e2e-config.ts (2)
  • FRONTEND_URL (14-14)
  • BACKEND_URL (11-11)
codeframe/cli.py (1)
  • agents (164-169)
tests/ui/test_task_approval_execution.py (3)
codeframe/ui/shared.py (1)
  • broadcast (154-185)
codeframe/agents/lead_agent.py (1)
  • start_multi_agent_execution (1597-1635)
codeframe/ui/routers/tasks.py (3)
  • start_development_execution (43-130)
  • approve_tasks (236-366)
  • TaskApprovalRequest (220-223)
codeframe/ui/routers/tasks.py (1)
codeframe/agents/lead_agent.py (2)
  • LeadAgent (28-2333)
  • start_multi_agent_execution (1597-1635)
tests/config/test_config.py (1)
codeframe/core/config.py (3)
  • GlobalConfig (84-191)
  • Config (210-302)
  • get_global (244-255)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Macroscope - Correctness Check
  • GitHub Check: opencode-review
  • GitHub Check: E2E Smoke Tests (Chromium)
🔇 Additional comments (4)
tests/config/test_config.py (1)

12-23: Good env isolation; consider centralizing/expanding the cleared env set.

Using GlobalConfig(_env_file=None) plus explicit monkeypatch.delenv(...) makes this test deterministic across dev machines/CI.

tests/ui/test_task_approval.py (1)

15-27: Nice: BackgroundTasks is mocked cleanly and reused.

The mock_background_tasks fixture is a straightforward way to adapt the suite to the new endpoint signature.

tests/ui/test_task_approval_execution.py (1)

65-186: Solid coverage for the new orchestration + error broadcasts.

This test module hits the key contracts: LeadAgent wiring, timeout/exception broadcasts, and background scheduling with/without ANTHROPIC_API_KEY.

Also applies to: 235-353, 370-549

codeframe/ui/routers/tasks.py (1)

43-131: Remove the critical issue about use-after-close. The db parameter passed to BackgroundTasks is not request-scoped and will not be closed after the response. The get_db dependency returns request.app.state.db, a long-lived application singleton initialized at app startup (line 171 of server.py) and only closed at app shutdown (line 198). The underlying sqlite3.Connection is created with check_same_thread=False, explicitly permitting safe concurrent access from background tasks.

Likely an incorrect or invalid review comment.

@github-actions
Copy link
Contributor

Review Summary

LGTM with one critical and one minor issue to address.

Critical: Unguarded broadcasts (tasks.py:107, 122)

Both macroscopeapp and coderabbitai flagged this. If fails in exception handlers, the original error is masked. Wrap each broadcast in try/except.

Minor: Unused variable (test_config.py:159)

is assigned but never used. Remove it.

Strengths

  • Clean separation of concerns with background task pattern
  • Excellent test coverage (15 new unit tests + E2E)
  • Proper error handling and graceful degradation
  • Non-blocking API response design

@github-actions
Copy link
Contributor

Review posted: #246 (comment)

New%20session%20-%202026-01-11T06%3A31%3A32.317Z
opencode session  |  github run

- Make test_task_approval.py deterministic by clearing ANTHROPIC_API_KEY
  in mock_background_tasks fixture
- Remove unused original_get_global variable in test_config.py
- Add explicit test.skip() calls in E2E tests for missing UI elements
  instead of silently passing
@github-actions
Copy link
Contributor

Code Review

LGTM - well-implemented P0 blocker fix with excellent test coverage.

Critical Issue (Consistent with Prior Reviews)

Unguarded broadcasts in exception handlers (tasks.py:107-130)

Both macroscopeapp and coderabbitai flagged this: if ws_manager.broadcast() fails in the exception handlers, the original error is masked and the background task crashes silently.

# Current (tasks.py:107-115) - NOT GUARDED
except asyncio.TimeoutError:
    logger.error(f"❌ Multi-agent execution timed out for project {project_id} after 300s")
    await ws_manager.broadcast({...})  # If this fails, error is lost

# Current (tasks.py:122-130) - NOT GUARDED  
except Exception as e:
    logger.error(f"❌ Failed to start multi-agent execution for project {project_id}: {e}", exc_info=True)
    await ws_manager.broadcast({...})  # If this fails, error is lost

Suggestion: Follow the pattern from broadcast_planning_failed() (websocket_broadcasts.py:943-947) and broadcast_development_started() (websocket_broadcasts.py:971-978) which wrap broadcasts in try/except:

except asyncio.TimeoutError:
    logger.error(f"❌ Multi-agent execution timed out for project {project_id} after 300s")
    try:
        await ws_manager.broadcast({...})
    except Exception:
        logger.exception("Failed to broadcast development_failed (timeout)")

Minor Issue

Unused variable (tests/config/test_config.py:159)

original_get_global = config.get_global is assigned but never used. Remove it to clean up the test.

Strengths

  1. Clean architecture - separated start_development_execution background task from endpoint logic
  2. Non-blocking design with FastAPI BackgroundTasks pattern
  3. Graceful degradation when ANTHROPIC_API_KEY is missing
  4. Excellent test coverage: 15 new unit tests + E2E tests + 21 updated tests
  5. Proper error handling with WebSocket notifications
  6. Good logging with emoji indicators (🚀, ✅, ❌, ⚠️)

Test Coverage Notes

  • Unit tests for background execution (15 tests) ✓
  • Integration tests for approve_tasks flow ✓
  • WebSocket event validation ✓
  • E2E tests for multi-agent execution ✓
  • Edge case coverage (no API key, empty API key, rejection) ✓

Summary

The core functionality is solid. The critical issue is consistent with prior reviews and should be addressed before merge. The minor cleanup is optional.

@github-actions
Copy link
Contributor

Posted review comment: #246 (comment)

Summary

LGTM - This is a well-implemented P0 blocker fix connecting task approval to multi-agent execution.

Critical Issue (Consistent with Prior Reviews)

Unguarded broadcasts (tasks.py:107-130) - Both macroscopeapp and coderabbitai flagged: if ws_manager.broadcast() fails in exception handlers, the original error is masked. Follow the pattern from broadcast_planning_failed() which wraps broadcasts in try/except.

Minor Issue

Unused variable (test_config.py:159) - original_get_global assigned but never used.

Strengths

  • Clean separation of concerns with background task pattern
  • Excellent test coverage (15 new unit tests + E2E)
  • Proper error handling and graceful degradation
  • Non-blocking API response design

New%20session%20-%202026-01-11T06%3A35%3A14.184Z
opencode session  |  github run

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
tests/config/test_config.py (2)

12-23: Good test isolation; consider DRYing env var cleanup to reduce future misses.

This is a solid move to prevent .env bleed-through (GlobalConfig(_env_file=None) + explicit delenv). To keep this from getting stale as defaults evolve, consider collecting the keys into a list and looping, so adding new default-sensitive vars is a 1-line change.


150-166: Simplify/clarify the .env bypass patching (and avoid truthy checks).

The intent is right (you need to bypass both load_environment() and GlobalConfig()’s .env loading), but this is a bit intricate. At least tighten the cache check to is None and drop the redundant in-test import.

Proposed tweak (same behavior, slightly clearer)
-        # Patch get_global to not read from .env
-        from codeframe.core.config import GlobalConfig
-
-        def patched_get_global():
-            if not config._global_config:
-                config._global_config = GlobalConfig(_env_file=None)
-            return config._global_config
+        # Patch get_global to not read from .env
+        def patched_get_global():
+            if config._global_config is None:
+                config._global_config = GlobalConfig(_env_file=None)
+            return config._global_config
tests/e2e/test_task_approval.spec.ts (1)

372-590: WebSocket interception is useful, but the window.WebSocket = function(...) { ... } override is a bit risky + sleep-based waits may flake.

Consider (1) using a subclass to preserve instanceof WebSocket semantics, and (2) replacing fixed waitForTimeout() with waitForFunction() on the injected flags.

Proposed refactor (safer WebSocket hook + event-based waits)
-    await page.addInitScript(() => {
-      // Store original WebSocket
-      const OriginalWebSocket = window.WebSocket;
-
-      // Override WebSocket constructor
-      (window as any).WebSocket = function(url: string, protocols?: string | string[]) {
-        const ws = new OriginalWebSocket(url, protocols);
-
-        const originalOnMessage = ws.onmessage;
-        ws.onmessage = function(event: MessageEvent) {
-          try {
-            const data = JSON.parse(event.data);
-            // Store events for later inspection
-            (window as any).__wsEvents = (window as any).__wsEvents || [];
-            (window as any).__wsEvents.push(data);
-
-            // Track specific events
-            if (data.type === 'development_started') {
-              (window as any).__developmentStartedReceived = true;
-            }
-            if (data.type === 'agent_created') {
-              (window as any).__agentCreatedReceived = true;
-            }
-          } catch {}
-
-          if (originalOnMessage) {
-            originalOnMessage.call(ws, event);
-          }
-        };
-
-        return ws;
-      } as any;
-    });
+    await page.addInitScript(() => {
+      const OriginalWebSocket = window.WebSocket;
+
+      class InterceptedWebSocket extends OriginalWebSocket {
+        constructor(url: string | URL, protocols?: string | string[]) {
+          super(url, protocols);
+          const originalOnMessage = this.onmessage;
+          this.onmessage = (event: MessageEvent) => {
+            try {
+              const data = JSON.parse(event.data as string);
+              (window as any).__wsEvents = (window as any).__wsEvents || [];
+              (window as any).__wsEvents.push(data);
+              if (data.type === 'development_started') (window as any).__developmentStartedReceived = true;
+              if (data.type === 'agent_created') (window as any).__agentCreatedReceived = true;
+            } catch {}
+            if (originalOnMessage) originalOnMessage.call(this, event);
+          };
+        }
+      }
+
+      (window as any).WebSocket = InterceptedWebSocket;
+    });
@@
-    // Give time for background task to start and broadcast events
-    await page.waitForTimeout(2000);
-
-    // Check if development_started event was received
-    developmentStartedReceived = await page.evaluate(
-      () => (window as any).__developmentStartedReceived || false
-    );
+    await page.waitForFunction(() => (window as any).__developmentStartedReceived === true, null, {
+      timeout: 15000
+    });
+    developmentStartedReceived = true;
@@
-      // Wait longer for agents to be created (they take time)
-      await page.waitForTimeout(5000);
-
-      agentCreatedReceived = await page.evaluate(
-        () => (window as any).__agentCreatedReceived || false
-      );
+      await page.waitForFunction(() => (window as any).__agentCreatedReceived === true, null, {
+        timeout: 60000
+      });
+      agentCreatedReceived = true;

As per coding guidelines, good call using loginUser() for authentication.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1167390 and 6fe801d.

📒 Files selected for processing (3)
  • tests/config/test_config.py
  • tests/e2e/test_task_approval.spec.ts
  • tests/ui/test_task_approval.py
🧰 Additional context used
📓 Path-based instructions (1)
tests/e2e/**/*.ts

📄 CodeRabbit inference engine (CLAUDE.md)

Implement E2E tests using Playwright + TestSprite with loginUser() helper from tests/e2e/test-utils.ts for authentication

Files:

  • tests/e2e/test_task_approval.spec.ts
🧠 Learnings (5)
📓 Common learnings
Learnt from: CR
Repo: frankbria/codeframe PR: 0
File: docs/CLAUDE.md:0-0
Timestamp: 2025-11-25T19:08:37.203Z
Learning: Implement Lead Agent for orchestration and Worker Agents for specialization (Backend, Frontend, Test, Review) with maturity levels D1-D4
📚 Learning: 2025-11-25T19:08:37.203Z
Learnt from: CR
Repo: frankbria/codeframe PR: 0
File: docs/CLAUDE.md:0-0
Timestamp: 2025-11-25T19:08:37.203Z
Learning: Applies to docs/tests/**/*.py : Use pytest fixtures for Python testing and avoid over-mocking

Applied to files:

  • tests/ui/test_task_approval.py
📚 Learning: 2026-01-04T06:26:12.870Z
Learnt from: CR
Repo: frankbria/codeframe PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-04T06:26:12.870Z
Learning: Applies to tests/e2e/**/*.ts : Implement E2E tests using Playwright + TestSprite with loginUser() helper from tests/e2e/test-utils.ts for authentication

Applied to files:

  • tests/e2e/test_task_approval.spec.ts
📚 Learning: 2025-11-25T19:08:37.203Z
Learnt from: CR
Repo: frankbria/codeframe PR: 0
File: docs/CLAUDE.md:0-0
Timestamp: 2025-11-25T19:08:37.203Z
Learning: Applies to docs/web-ui/**/__tests__/**/*.test.{ts,tsx} : Create JavaScript test files colocated or in __tests__/ as *.test.ts

Applied to files:

  • tests/e2e/test_task_approval.spec.ts
📚 Learning: 2025-12-17T19:21:40.014Z
Learnt from: frankbria
Repo: frankbria/codeframe PR: 128
File: tests/agents/test_bottleneck_detection.py:486-500
Timestamp: 2025-12-17T19:21:40.014Z
Learning: In tests/agents/test_bottleneck_detection.py, test_skip_agents_below_threshold should be async and mock _get_agent_workload to return workload below AGENT_OVERLOAD_THRESHOLD (5) while providing non-empty tasks list to prevent early return in detect_bottlenecks().

Applied to files:

  • tests/e2e/test_task_approval.spec.ts
🧬 Code graph analysis (3)
tests/ui/test_task_approval.py (2)
codeframe/ui/routers/tasks.py (2)
  • approve_tasks (236-366)
  • TaskApprovalRequest (220-223)
tests/ui/test_task_approval_execution.py (2)
  • mock_background_tasks (228-232)
  • mock_background_tasks (466-470)
tests/e2e/test_task_approval.spec.ts (3)
tests/e2e/test-utils.ts (4)
  • setupErrorMonitoring (64-104)
  • ExtendedPage (25-27)
  • loginUser (483-500)
  • checkTestErrors (192-216)
tests/e2e/e2e-config.ts (2)
  • FRONTEND_URL (14-14)
  • BACKEND_URL (11-11)
codeframe/cli.py (1)
  • agents (164-169)
tests/config/test_config.py (1)
codeframe/core/config.py (2)
  • GlobalConfig (84-191)
  • Config (210-302)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Backend Unit Tests
  • GitHub Check: Frontend Unit Tests
  • GitHub Check: E2E Smoke Tests (Chromium)
🔇 Additional comments (2)
tests/ui/test_task_approval.py (2)

15-34: Deterministic mock_background_tasks fixture looks good; watch fixture ordering for API-key tests.

Clearing ANTHROPIC_API_KEY in mock_background_tasks is a good default for stability, but it means any future “API key present” tests must either (a) set the env var in the test body, or (b) make a dedicated fixture that depends on mock_background_tasks to avoid ordering surprises. Also, MagicMock(spec=BackgroundTasks) is a solid choice to keep mocks honest. Based on learnings, avoid over-mocking.


80-215: Good propagation of background_tasks through approve_tasks calls.

Passing background_tasks=mock_background_tasks everywhere aligns the unit tests with the new FastAPI signature and should prevent accidental scheduling during unrelated tests (given the env var is cleared).

Also applies to: 220-315, 495-702

BREAKING: E2E tests now fail instead of silently skipping when
preconditions aren't met. This surfaces real bugs instead of hiding them.

Policy documented in CLAUDE.md:
- Never use test.skip() inside test logic - it masks bugs
- Use expect().toBeVisible() to assert UI elements exist
- Only use test.skip() at describe level for environmental conditions
- Test data (TEST_PROJECT_IDS.PLANNING) guarantees preconditions

Changes to test_task_approval.spec.ts:
- Replace all conditional test.skip() with proper assertions
- Convert auth token checks from skip to expect().toBeTruthy()
- Move API key skip to describe level (correct pattern)
- Simplify multi-agent execution tests to direct API tests
- Remove tests that can't be properly verified in E2E

Tests now fail fast when:
- Tasks tab not visible (bug in test data or UI)
- Approve button not visible (project not in planning phase)
- Login fails (auth system broken)
- API returns unexpected status (contract violation)
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In @CLAUDE.md:
- Around line 344-395: The "E2E Test Architecture Policy" needs two edits: add
an explicit sentence that the "Never use test.skip() inside test logic" rule
also applies to beforeEach/afterEach hooks (i.e., avoid calling test.skip or
conditional skips inside those hooks), and remove the duplicated policy block by
keeping this single authoritative "E2E Test Architecture Policy" section (or
merge differences) and replace other copies with a cross-reference note pointing
to this section; reference the policy heading "E2E Test Architecture Policy" and
the examples/anti-patterns in the same section when making the edits.

In @tests/e2e/test_task_approval.spec.ts:
- Around line 226-281: Remove the page.on('response') handler and the external
async mutation of approvalResponse; instead use the awaited response from
page.waitForResponse(...) to synchronously inspect status and body: call
page.waitForResponse filtering for r.url().includes('/tasks/approve') then check
response.status() (throw an Error with the parsed body if 422), parse
response.json() into approvalResponse (or null on parse failure) and run the
existing assertions against that local approvalResponse; update references to
approvalResponse to use the response-derived value and remove the race-prone
event handler and its inner logic.
🧹 Nitpick comments (1)
tests/e2e/test_task_approval.spec.ts (1)

300-372: Direct API contract tests look solid; consider narrowing “acceptable statuses” to reduce false positives.

Allowing 400/403/404 is pragmatic, but it also means a real regression could slip through (e.g., auth scope change returning 403). If feasible, make the environment guarantee stronger (known seeded project + known test user) and expect 200 deterministically.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6fe801d and c1632dd.

📒 Files selected for processing (2)
  • CLAUDE.md
  • tests/e2e/test_task_approval.spec.ts
🧰 Additional context used
📓 Path-based instructions (3)
tests/e2e/**/*.ts

📄 CodeRabbit inference engine (CLAUDE.md)

Implement E2E tests using Playwright + TestSprite with loginUser() helper from tests/e2e/test-utils.ts for authentication

Files:

  • tests/e2e/test_task_approval.spec.ts
**/*.md

📄 CodeRabbit inference engine (AGENTS.md)

Documentation files must be sized to fit in a single agent context window (spec.md ~200-400 lines, plan.md ~300-600 lines, tasks.md ~400-800 lines)

Files:

  • CLAUDE.md
{README.md,CODEFRAME_SPEC.md,CHANGELOG.md,SPRINTS.md,CLAUDE.md,AGENTS.md,TESTING.md,CONTRIBUTING.md}

📄 CodeRabbit inference engine (AGENTS.md)

Root-level documentation must include: README.md (project intro), CODEFRAME_SPEC.md (architecture, ~800 lines), CHANGELOG.md (user-facing changes), SPRINTS.md (timeline index), CLAUDE.md (coding standards), AGENTS.md (navigation guide), TESTING.md (test standards), and CONTRIBUTING.md (contribution guidelines)

Files:

  • CLAUDE.md
🧠 Learnings (8)
📓 Common learnings
Learnt from: CR
Repo: frankbria/codeframe PR: 0
File: docs/CLAUDE.md:0-0
Timestamp: 2025-11-25T19:08:37.203Z
Learning: Implement Lead Agent for orchestration and Worker Agents for specialization (Backend, Frontend, Test, Review) with maturity levels D1-D4
📚 Learning: 2025-11-25T19:08:54.154Z
Learnt from: CR
Repo: frankbria/codeframe PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-25T19:08:54.154Z
Learning: Applies to specs/*/tasks.md : Feature task files (tasks.md) must include phase-by-phase task breakdown with unique task identifiers (T001, T002, etc.), acceptance criteria per task, beads issue references, and estimated effort

Applied to files:

  • tests/e2e/test_task_approval.spec.ts
📚 Learning: 2025-11-25T19:08:37.203Z
Learnt from: CR
Repo: frankbria/codeframe PR: 0
File: docs/CLAUDE.md:0-0
Timestamp: 2025-11-25T19:08:37.203Z
Learning: Applies to docs/web-ui/**/__tests__/**/*.test.{ts,tsx} : Create JavaScript test files colocated or in __tests__/ as *.test.ts

Applied to files:

  • tests/e2e/test_task_approval.spec.ts
  • CLAUDE.md
📚 Learning: 2026-01-04T06:26:12.870Z
Learnt from: CR
Repo: frankbria/codeframe PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-04T06:26:12.870Z
Learning: Applies to tests/e2e/**/*.ts : Implement E2E tests using Playwright + TestSprite with loginUser() helper from tests/e2e/test-utils.ts for authentication

Applied to files:

  • tests/e2e/test_task_approval.spec.ts
  • CLAUDE.md
📚 Learning: 2025-12-17T19:21:40.014Z
Learnt from: frankbria
Repo: frankbria/codeframe PR: 128
File: tests/agents/test_bottleneck_detection.py:486-500
Timestamp: 2025-12-17T19:21:40.014Z
Learning: In tests/agents/test_bottleneck_detection.py, test_skip_agents_below_threshold should be async and mock _get_agent_workload to return workload below AGENT_OVERLOAD_THRESHOLD (5) while providing non-empty tasks list to prevent early return in detect_bottlenecks().

Applied to files:

  • tests/e2e/test_task_approval.spec.ts
📚 Learning: 2026-01-04T06:26:12.870Z
Learnt from: CR
Repo: frankbria/codeframe PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-04T06:26:12.870Z
Learning: Applies to web-ui/src/app/page.tsx : Implement automatic project discovery start after project creation with loading state transitions and 'Start Discovery' button for idle projects

Applied to files:

  • tests/e2e/test_task_approval.spec.ts
📚 Learning: 2025-11-25T19:08:54.154Z
Learnt from: CR
Repo: frankbria/codeframe PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-25T19:08:54.154Z
Learning: Applies to {README.md,CODEFRAME_SPEC.md,CHANGELOG.md,SPRINTS.md,CLAUDE.md,AGENTS.md,TESTING.md,CONTRIBUTING.md} : Root-level documentation must include: README.md (project intro), CODEFRAME_SPEC.md (architecture, ~800 lines), CHANGELOG.md (user-facing changes), SPRINTS.md (timeline index), CLAUDE.md (coding standards), AGENTS.md (navigation guide), TESTING.md (test standards), and CONTRIBUTING.md (contribution guidelines)

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-01-04T06:26:12.870Z
Learnt from: CR
Repo: frankbria/codeframe PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-04T06:26:12.870Z
Learning: Applies to web-ui/src/lib/websocketMessageMapper.ts : Implement WebSocket event mapping with 9 event types (agent_created, task_assigned, etc.) with automatic reconnection using exponential backoff (1s → 30s)

Applied to files:

  • CLAUDE.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Frontend Unit Tests
  • GitHub Check: Backend Unit Tests
  • GitHub Check: opencode-review
  • GitHub Check: E2E Smoke Tests (Chromium)

Comment on lines +344 to +395
### E2E Test Architecture Policy

**CRITICAL: Never use `test.skip()` inside test logic.**

Using `test.skip()` within test execution masks bugs. If UI doesn't show up due to a bug, the test silently skips instead of failing.

**Correct patterns:**
```typescript
// ✅ Skip at describe level for environmental conditions
test.describe('Feature requiring API key', () => {
test.skip(!process.env.ANTHROPIC_API_KEY, 'Requires ANTHROPIC_API_KEY');

test('does something with API', async () => {
// Test runs OR entire suite is skipped - never silently passes
});
});

// ✅ Assert UI elements exist - FAIL if missing
test('should show approve button', async ({ page }) => {
const button = page.getByRole('button', { name: /approve/i });
await expect(button).toBeVisible(); // FAILS if not visible
await button.click();
});

// ✅ Use separate test projects for different states
const PROJECT_ID = TEST_PROJECT_IDS.PLANNING; // Seeded in planning phase
```

**Anti-patterns to avoid:**
```typescript
// ❌ NEVER: Skip inside test logic
test('approves tasks', async ({ page }) => {
const button = page.getByRole('button', { name: /approve/i });
if (!(await button.isVisible())) {
test.skip(true, 'Button not visible'); // MASKS BUGS
return;
}
// ...
});

// ❌ NEVER: Silent pass on missing elements
if (await element.isVisible()) {
// do assertions
}
// Test passes even if element never shows up!
```

**Test data guarantees:**
- `TEST_PROJECT_IDS.PLANNING` is seeded in planning phase with tasks
- `TEST_PROJECT_IDS.ACTIVE` is seeded in active phase with agents
- If test data doesn't match expectations, that's a bug in `seed-test-data.py`

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Good policy; tighten scope wording + avoid duplicated policy text in-file.

  • Consider explicitly stating the rule also applies to beforeEach/afterEach hooks (same “inside test execution” pitfall).
  • AI summary mentions the section is inserted twice; if so, prefer one authoritative section + cross-links to prevent drift.
🤖 Prompt for AI Agents
In @CLAUDE.md around lines 344 - 395, The "E2E Test Architecture Policy" needs
two edits: add an explicit sentence that the "Never use test.skip() inside test
logic" rule also applies to beforeEach/afterEach hooks (i.e., avoid calling
test.skip or conditional skips inside those hooks), and remove the duplicated
policy block by keeping this single authoritative "E2E Test Architecture Policy"
section (or merge differences) and replace other copies with a cross-reference
note pointing to this section; reference the policy heading "E2E Test
Architecture Policy" and the examples/anti-patterns in the same section when
making the edits.

@github-actions
Copy link
Contributor

Review Summary

LGTM with one critical issue to address

Critical Issue: Unguarded broadcasts in exception handlers (tasks.py:107-130)

Both macroscopeapp and coderabbitai flagged this: if ws_manager.broadcast() fails in the exception handlers (lines 107-115, 122-130), the original error is masked and the background task crashes silently.

# Current - NOT GUARDED
except asyncio.TimeoutError:
    logger.error(...)
    await ws_manager.broadcast({...})  # If this fails, error is lost

Recommended fix (following existing pattern):

except asyncio.TimeoutError:
    logger.error(...)
    try:
        await ws_manager.broadcast({...})
    except Exception:
        logger.exception("Failed to broadcast development_failed (timeout)")

This follows the same pattern as broadcast_planning_failed() and broadcast_development_started() in codeframe/ui/websocket_broadcasts.py (lines 943-947, 971-978).

Minor Issue (Already Fixed)

  • original_get_global unused variable was removed in commit 6fe801d

Strengths

  • Clean architecture with background task pattern
  • Excellent test coverage (15 new unit tests + E2E tests)
  • Proper error handling with graceful degradation
  • Non-blocking API response design
  • Good logging with emoji indicators (🚀, ✅, ❌, ⚠️)

@github-actions
Copy link
Contributor

Review posted: #246 (comment)

LGTM with one critical issue to address:

Unguarded broadcasts in exception handlers (tasks.py:107-130) - if ws_manager.broadcast() fails in the TimeoutError/Exception handlers, the original error is masked. Follow the existing pattern from broadcast_planning_failed() which wraps broadcasts in try/except.

The original_get_global unused variable was already fixed in commit 6fe801d.

New%20session%20-%202026-01-11T06%3A45%3A49.771Z
opencode session  |  github run

Wrap ws_manager.broadcast() calls in try/except within exception
handlers to prevent masking the original error if broadcast fails.
Follows pattern used in websocket_broadcasts.py.

Without this guard, if broadcast fails during error handling:
- Original error context is lost
- Background task crashes silently
- No logging of what went wrong
@github-actions
Copy link
Contributor

Review Summary: LGTM with remaining E2E test improvements

✅ Fixed Issues (from prior reviews)

  1. Critical: Unguarded broadcasts (tasks.py:107-130) - ✅ FIXED

    • Broadcasts now wrapped in try/except blocks following existing pattern
    • Original errors won't be masked by broadcast failures
  2. Minor: Unused variable (test_config.py:159) - ✅ FIXED

🔄 Remaining E2E Test Issues

These were flagged by prior reviews but still remain:

1. Silent failure in error UI check (test_task_approval.spec.ts:186-193)

Issue: matches hidden elements, and masks failures.

Suggested fix:

2. Race condition in response handler (test_task_approval.spec.ts:226-280)

Issue: Errors in event handlers don't propagate to tests, creating race conditions.

Suggested fix: Use the awaited directly with synchronous assertions.

3. Hardcoded login instead of loginUser() helper (test_task_approval.spec.ts:396+)

Issue: Violates E2E test coding guidelines - should use from test-utils.

Suggested fix: Use in beforeEach and get token via localStorage.

Summary

Core functionality: ✅ LGTM

  • Background task pattern is correctly implemented
  • Non-blocking API response design is solid
  • Exception handling is now properly guarded

E2E tests: 🔄 Minor improvements needed
The E2E tests need refinement to match coding standards and avoid flaky patterns. These are non-blocking but should be addressed for long-term test reliability.

timeout=300
)

logger.info(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lead_agent.start_multi_agent_execution() may return None. Then summary.get(...) crashes and incorrectly reports a failure. Consider guarding with summary = summary or {} before logging.

Suggested change
logger.info(
summary = summary or {}
logger.info(

🚀 Want me to fix this? Reply ex: "fix it for me".

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
codeframe/ui/routers/tasks.py (2)

126-136: Consider sanitizing error messages before broadcasting to clients.

Exposing str(e) directly in WebSocket broadcasts could potentially leak sensitive information such as internal paths, configuration details, or stack trace fragments depending on the exception type. Consider using a generic error message or sanitizing the error.

♻️ Suggested approach
         try:
             await ws_manager.broadcast(
                 {
                     "type": "development_failed",
                     "project_id": project_id,
-                    "error": str(e),
+                    "error": "Multi-agent execution failed. Check server logs for details.",
                     "timestamp": datetime.now(UTC).isoformat().replace("+00:00", "Z")
                 },
                 project_id=project_id
             )

345-359: Consider using dependency injection for the API key.

Fetching ANTHROPIC_API_KEY directly from os.environ works but reduces testability. Consider injecting this as a dependency (similar to get_db) to simplify mocking in tests and align with FastAPI's dependency injection pattern.

♻️ Example dependency approach
# In dependencies.py
def get_anthropic_api_key() -> Optional[str]:
    return os.environ.get("ANTHROPIC_API_KEY")

# In approve_tasks
async def approve_tasks(
    ...
    api_key: Optional[str] = Depends(get_anthropic_api_key),
) -> TaskApprovalResponse:
    ...
    if api_key:
        background_tasks.add_task(...)
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c1632dd and e9365af.

📒 Files selected for processing (1)
  • codeframe/ui/routers/tasks.py
🧰 Additional context used
📓 Path-based instructions (1)
codeframe/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

codeframe/**/*.py: Use Python 3.11+ for backend development with FastAPI, AsyncAnthropic, SQLite with async support (aiosqlite), and tiktoken for token counting
Use token counting via tiktoken library for token budget management with ~50,000 token limit per conversation
Use asyncio patterns with AsyncAnthropic for async/await in Python backend for concurrent operations
Implement quality gates with multi-stage pre-completion checks (tests → type → coverage → review) and Git + SQLite + context snapshots for project state rollback
Use tiered memory system (HOT/WARM/COLD) with importance scoring using hybrid exponential decay algorithm for context management with 30-50% token reduction
Implement session lifecycle management with auto-save/restore using file-based storage at .codeframe/session_state.json

Files:

  • codeframe/ui/routers/tasks.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: frankbria/codeframe PR: 0
File: docs/CLAUDE.md:0-0
Timestamp: 2025-11-25T19:08:37.203Z
Learning: Implement Lead Agent for orchestration and Worker Agents for specialization (Backend, Frontend, Test, Review) with maturity levels D1-D4
📚 Learning: 2025-11-25T19:08:37.203Z
Learnt from: CR
Repo: frankbria/codeframe PR: 0
File: docs/CLAUDE.md:0-0
Timestamp: 2025-11-25T19:08:37.203Z
Learning: Implement Lead Agent for orchestration and Worker Agents for specialization (Backend, Frontend, Test, Review) with maturity levels D1-D4

Applied to files:

  • codeframe/ui/routers/tasks.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Frontend Unit Tests
  • GitHub Check: Backend Unit Tests
  • GitHub Check: E2E Smoke Tests (Chromium)
  • GitHub Check: opencode-review
  • GitHub Check: Macroscope - Correctness Check
🔇 Additional comments (5)
codeframe/ui/routers/tasks.py (5)

11-28: LGTM!

The imports are well-organized and appropriate for the new functionality. Using UTC from datetime ensures timezone-aware timestamps, and the LeadAgent import aligns with the multi-agent orchestration architecture. Based on learnings, the Lead Agent pattern is the correct approach for coordinating Worker Agents.


102-118: Good defensive coding with guarded broadcast calls.

The try/except wrapper around the WebSocket broadcast in exception handlers correctly prevents broadcast failures from masking the original error. This ensures the original timeout/error is logged even if the WebSocket connection is unavailable.


241-248: LGTM on the endpoint signature and response handling.

The BackgroundTasks integration follows FastAPI best practices for non-blocking background execution. The immediate response with TaskApprovalResponse provides good UX while the multi-agent execution runs asynchronously. The docstring is properly updated to reflect the new behavior.

Also applies to: 366-372


342-360: No changes required—the Database instance is safe for background task usage.

The Database class is intentionally designed as an app-level singleton with proper concurrency safeguards. The connection is created with check_same_thread=False, enabling safe cross-thread usage; access is protected by _sync_lock in all repositories; and WAL mode with busy timeouts handles concurrent read/write scenarios. The instance persists for the entire application lifetime and never closes during request boundaries, making it safe to pass to background tasks.


90-94: Timeout handling is already properly implemented — no action needed.

The start_multi_agent_execution method wraps its execution with asyncio.timeout(timeout) (lines 1627-1634 of codeframe/agents/lead_agent.py), which enforces the 300-second timeout using the Python 3.11+ standard. The timeout triggers an asyncio.TimeoutError that is caught, logged, and re-raised, allowing the caller in tasks.py to handle it appropriately. This implementation is correct and complete.

1. Fix silent failure in error UI check
   - Replace .catch(() => {}) with proper expect().toBeVisible()
   - Use .or() for multiple error indicators

2. Fix race condition in response handler
   - Remove page.on('response') which swallows errors
   - Use Promise.all([waitForResponse, click]) pattern
   - Parse response synchronously after await

3. Use loginUser() helper instead of hardcoded login
   - Add beforeEach with loginUser() per E2E coding guidelines
   - Get token from localStorage after login
@github-actions
Copy link
Contributor

Code Review: Task Approval → Multi-Agent Execution

Overall Assessment

LGTM - All critical and minor issues from prior reviews have been addressed. The core functionality is solid and thoroughly tested.

✅ Fixed Issues (from prior reviews)

  1. Critical: Unguarded broadcasts (tasks.py:107-130) - ✅ FIXED in e9365af

    • Broadcasts now wrapped in try/except blocks
    • Original errors wont be masked by broadcast failures
  2. Minor: Unused variable (test_config.py:159) - ✅ FIXED in 6fe801d

    • original_get_global removed
  3. E2E test issues - ✅ FIXED in 5ec3044

    • Silent failure in error UI check → Now uses proper expect().toBeVisible()
    • Race condition in response handler → Uses Promise.all([waitForResponse, click]) pattern
    • Hardcoded login → Uses loginUser() helper in beforeEach

📝 Minor Suggestions (Non-blocking)

  1. Summary None-guarding (tasks.py:96)
    The start_multi_agent_execution() method always returns a Dict or raises an exception, so summary.get() is safe. However, for extra defensive coding:

    summary = summary or {}
    logger.info(...)

    This is non-blocking as the method contract guarantees a return value.

  2. Error message sanitization (tasks.py:130)
    Broadcasting str(e) directly could leak implementation details. Consider using a generic message for client-facing errors while logging the full exception:

    "error": "Multi-agent execution failed. Check server logs for details."

    This is non-blocking as the current approach doesnt expose sensitive credentials.

  3. Configurable timeout (tasks.py:93)
    The 300s timeout is hardcoded. Consider making it configurable via environment variable for flexibility in different deployment environments.

  4. Type annotation (tasks.py:46)
    ws_manager: Any could use a Protocol for better type safety. This is non-blocking.

Strengths

  1. Clean architecture - Separated start_development_execution background task from endpoint logic
  2. Excellent test coverage - 15 new unit tests + E2E tests + 21 updated tests
  3. Proper error handling - Graceful degradation when ANTHROPIC_API_KEY is missing
  4. Non-blocking design - FastAPI BackgroundTasks pattern ensures quick API response
  5. Good logging - Uses emoji indicators (🚀, ✅, ❌, ⚠️) for quick visual scanning
  6. WebSocket integration - Broadcasts development_failed events on errors

Summary

All critical issues from prior reviews have been fixed. The PR is ready to merge. The suggestions above are minor improvements that could be addressed in follow-up commits if desired.

Core functionality: ✅ LGTM
E2E tests: ✅ LGTM (all quality issues resolved)
Code quality: ✅ LGTM

@github-actions
Copy link
Contributor

Review posted: #246 (comment)

LGTM - All critical issues from prior reviews have been addressed:

  • ✅ Unguarded broadcasts fixed (e9365af)
  • ✅ Unused variable removed (6fe801d)
  • ✅ E2E test quality issues fixed (5ec3044)

4 minor, non-blocking suggestions provided for future improvements.

New%20session%20-%202026-01-11T22%3A51%3A20.787Z
opencode session  |  github run

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/e2e/test_task_approval.spec.ts (1)

28-29: Test isolation concern: shared PROJECT_ID across suites may cause failures in parallel execution.

All test suites use the same TEST_PROJECT_IDS.PLANNING project (ID 2), which is seeded once during global setup. The approval test transitions this project from "planning" phase to "active" phase (line 367 comment and line 425 assertion). With fullyParallel: true in the Playwright config, if test suites run in parallel and the approval suite executes first, the project state persists in the database, causing subsequent tests that expect a "planning" phase project to fail.

Consider:

  1. Running these tests serially with test.describe.serial()
  2. Resetting the project state in beforeEach via a test API
  3. Using different seeded projects for different test scenarios

This may already be handled by your test infrastructure or test execution order, but the state mutation risk is real given the shared project and parallel configuration.

🧹 Nitpick comments (2)
tests/e2e/test_task_approval.spec.ts (2)

187-193: Consider using data-testid for more stable error detection.

The class-based selectors ([class*="destructive"], [class*="error"]) may break if CSS class names change. Consider adding a data-testid="error-message" or data-testid="approval-error" in the frontend error component for more stable test selectors.

The current approach with role="alert" as a fallback is good for accessibility, so this works for now.


377-382: Missing afterEach for error monitoring and WebSocket verification.

This suite lacks the afterEach hook with checkTestErrors() that other suites have, which could miss network errors during test execution.

Additionally, the docstring on lines 368-369 mentions verifying "WebSocket receives development_started event" but no WebSocket test is present. Consider adding:

Add afterEach and consider WebSocket verification
 test.describe('Multi-Agent Execution After Task Approval - Direct API Tests', () => {
   // Use loginUser() helper in beforeEach per E2E test coding guidelines
   test.beforeEach(async ({ context, page }) => {
+    const errorMonitor = setupErrorMonitoring(page);
+    (page as ExtendedPage).__errorMonitor = errorMonitor;
+
     await context.clearCookies();
     await loginUser(page);
   });
+
+  test.afterEach(async ({ page }) => {
+    checkTestErrors(page, 'Multi-agent execution test', [
+      'net::ERR_ABORTED',
+      'Failed to fetch RSC payload',
+      'NS_BINDING_ABORTED',
+      'Load request cancelled'
+    ]);
+  });

For WebSocket verification, you could add a test that connects to the WebSocket endpoint and waits for development_started or agent_created events after approval. This would validate the background task actually runs.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e9365af and 5ec3044.

📒 Files selected for processing (1)
  • tests/e2e/test_task_approval.spec.ts
🧰 Additional context used
📓 Path-based instructions (1)
tests/e2e/**/*.ts

📄 CodeRabbit inference engine (CLAUDE.md)

Implement E2E tests using Playwright + TestSprite with loginUser() helper from tests/e2e/test-utils.ts for authentication

Files:

  • tests/e2e/test_task_approval.spec.ts
🧠 Learnings (6)
📓 Common learnings
Learnt from: CR
Repo: frankbria/codeframe PR: 0
File: docs/CLAUDE.md:0-0
Timestamp: 2025-11-25T19:08:37.203Z
Learning: Implement Lead Agent for orchestration and Worker Agents for specialization (Backend, Frontend, Test, Review) with maturity levels D1-D4
📚 Learning: 2025-11-25T19:08:54.154Z
Learnt from: CR
Repo: frankbria/codeframe PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-25T19:08:54.154Z
Learning: Applies to specs/*/tasks.md : Feature task files (tasks.md) must include phase-by-phase task breakdown with unique task identifiers (T001, T002, etc.), acceptance criteria per task, beads issue references, and estimated effort

Applied to files:

  • tests/e2e/test_task_approval.spec.ts
📚 Learning: 2025-11-25T19:08:37.203Z
Learnt from: CR
Repo: frankbria/codeframe PR: 0
File: docs/CLAUDE.md:0-0
Timestamp: 2025-11-25T19:08:37.203Z
Learning: Applies to docs/web-ui/**/__tests__/**/*.test.{ts,tsx} : Create JavaScript test files colocated or in __tests__/ as *.test.ts

Applied to files:

  • tests/e2e/test_task_approval.spec.ts
📚 Learning: 2026-01-04T06:26:12.870Z
Learnt from: CR
Repo: frankbria/codeframe PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-04T06:26:12.870Z
Learning: Applies to tests/e2e/**/*.ts : Implement E2E tests using Playwright + TestSprite with loginUser() helper from tests/e2e/test-utils.ts for authentication

Applied to files:

  • tests/e2e/test_task_approval.spec.ts
📚 Learning: 2025-12-17T19:21:40.014Z
Learnt from: frankbria
Repo: frankbria/codeframe PR: 128
File: tests/agents/test_bottleneck_detection.py:486-500
Timestamp: 2025-12-17T19:21:40.014Z
Learning: In tests/agents/test_bottleneck_detection.py, test_skip_agents_below_threshold should be async and mock _get_agent_workload to return workload below AGENT_OVERLOAD_THRESHOLD (5) while providing non-empty tasks list to prevent early return in detect_bottlenecks().

Applied to files:

  • tests/e2e/test_task_approval.spec.ts
📚 Learning: 2026-01-04T06:26:12.870Z
Learnt from: CR
Repo: frankbria/codeframe PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-04T06:26:12.870Z
Learning: Applies to web-ui/src/app/page.tsx : Implement automatic project discovery start after project creation with loading state transitions and 'Start Discovery' button for idle projects

Applied to files:

  • tests/e2e/test_task_approval.spec.ts
🧬 Code graph analysis (1)
tests/e2e/test_task_approval.spec.ts (2)
tests/e2e/test-utils.ts (4)
  • setupErrorMonitoring (64-104)
  • ExtendedPage (25-27)
  • loginUser (483-500)
  • checkTestErrors (192-216)
tests/e2e/e2e-config.ts (2)
  • FRONTEND_URL (14-14)
  • BACKEND_URL (11-11)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Backend Unit Tests
  • GitHub Check: Frontend Unit Tests
  • GitHub Check: E2E Smoke Tests (Chromium)
  • GitHub Check: opencode-review
🔇 Additional comments (4)
tests/e2e/test_task_approval.spec.ts (4)

88-134: LGTM! Explicit visibility checks and contract validation are solid.

The explicit await expect(...).toBeVisible() assertions before interactions are a good practice - they fail fast with clear messages if the UI isn't in the expected state. The critical assertions on lines 117-131 would have caught the original task_ids vs excluded_task_ids contract mismatch.


240-247: Excellent fix for the race condition using Promise.all.

The Promise.all([waitForResponse, click]) pattern ensures the response listener is registered before the click triggers the request. This is the correct approach and avoids the flaky behavior mentioned in the commit messages about the page.on('response') pattern.


419-421: Good timing assertion for non-blocking behavior, but consider making the threshold configurable.

The 5-second threshold is reasonable for ensuring the background task doesn't block the response. However, in CI environments with high load, even non-blocking responses might occasionally exceed this threshold.

Consider:

  1. Using an environment variable for the threshold in CI
  2. Or increasing to 10 seconds with a soft warning at 5 seconds

For now, this is acceptable as the test correctly validates the P0 fix behavior.


315-317: This is a contract validation test—including 403 is correct.

The test verifies that the request format matches the backend Pydantic model and doesn't trigger a 422 validation error. The statuses [200, 400, 403, 404] are all valid responses to a correctly formatted request—they represent different business logic or authorization outcomes, not format errors.

If format were invalid, the backend would return 422 before the authorization check (line 282 in tasks.py) even runs. Since the endpoint's own docstring documents 403 as an expected response, accepting it here is intentional and correct.

Add test.describe.serial() wrapper around all task approval tests
to prevent parallel execution issues. Multiple test suites share
TEST_PROJECT_IDS.PLANNING and approval tests mutate state
(planning → active phase).

While workers:1 currently prevents true parallelism, this protects
against future config changes and makes the dependency explicit.
@github-actions
Copy link
Contributor

Verdict:APPROVE

All critical issues from prior reviews have been addressed:

  • ✅ Unguarded broadcasts fixed (e9365af)
  • ✅ Unused variable removed (6fe801d)
  • ✅ E2E test quality issues fixed (5ec3044)

The core functionality connecting task approval to multi-agent execution is solid with excellent test coverage. Ready to merge.

@github-actions
Copy link
Contributor

Approval comment posted: #246 (comment)

New%20session%20-%202026-01-11T22%3A57%3A20.280Z
opencode session  |  github run

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/e2e/test_task_approval.spec.ts (1)

339-368: Negative test may fail if project phase changed.

This test expects a 422 for invalid request format, but if the project transitioned to active phase from earlier tests in the serial suite, the backend might return 400 (phase check) before validating the request body (Pydantic validation). The test would then fail expecting 422 but receiving 400.

Suggested fix: Accept 400 as an alternative valid response
     const status = response.status();
 
     // Should be 422 (validation error) because "approved" field is missing
     // and "task_ids" is an unknown field
-    expect(status).toBe(422);
+    // Note: If project already transitioned to active phase from prior tests,
+    // backend may return 400 (phase check) before body validation (422)
+    expect([400, 422]).toContain(status);
 
     const body = await response.json();
     expect(body).toHaveProperty('detail');
🧹 Nitpick comments (2)
tests/e2e/test_task_approval.spec.ts (2)

194-200: Consider using data-testid for error indicator detection.

The error detection relies on CSS class patterns (class*="destructive", class*="error") which may be fragile if styling changes. Consider adding a data-testid="approval-error" attribute to the error component for more reliable detection.


384-389: Missing afterEach for error monitoring consistency.

Unlike the other test suites in this file, the "Multi-Agent Execution" suite has a beforeEach but no afterEach with checkTestErrors(). For consistency and to ensure test errors are properly reported, consider adding the same error checking pattern.

Add afterEach for consistency
 test.describe('Multi-Agent Execution After Task Approval - Direct API Tests', () => {
   // Use loginUser() helper in beforeEach per E2E test coding guidelines
   test.beforeEach(async ({ context, page }) => {
     await context.clearCookies();
     await loginUser(page);
   });
+
+  test.afterEach(async ({ page }) => {
+    checkTestErrors(page, 'Multi-agent execution test', [
+      'net::ERR_ABORTED',
+      'Failed to fetch RSC payload',
+      'NS_BINDING_ABORTED',
+      'Load request cancelled'
+    ]);
+  });
 
   /**
    * Test that approval returns immediately without waiting for agent execution.
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5ec3044 and b717788.

📒 Files selected for processing (1)
  • tests/e2e/test_task_approval.spec.ts
🧰 Additional context used
📓 Path-based instructions (1)
tests/e2e/**/*.ts

📄 CodeRabbit inference engine (CLAUDE.md)

Implement E2E tests using Playwright + TestSprite with loginUser() helper from tests/e2e/test-utils.ts for authentication

Files:

  • tests/e2e/test_task_approval.spec.ts
🧠 Learnings (6)
📓 Common learnings
Learnt from: CR
Repo: frankbria/codeframe PR: 0
File: docs/CLAUDE.md:0-0
Timestamp: 2025-11-25T19:08:37.203Z
Learning: Implement Lead Agent for orchestration and Worker Agents for specialization (Backend, Frontend, Test, Review) with maturity levels D1-D4
📚 Learning: 2026-01-04T06:26:12.870Z
Learnt from: CR
Repo: frankbria/codeframe PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-04T06:26:12.870Z
Learning: Applies to tests/e2e/**/*.ts : Implement E2E tests using Playwright + TestSprite with loginUser() helper from tests/e2e/test-utils.ts for authentication

Applied to files:

  • tests/e2e/test_task_approval.spec.ts
📚 Learning: 2025-11-25T19:08:37.203Z
Learnt from: CR
Repo: frankbria/codeframe PR: 0
File: docs/CLAUDE.md:0-0
Timestamp: 2025-11-25T19:08:37.203Z
Learning: Applies to docs/web-ui/**/__tests__/**/*.test.{ts,tsx} : Create JavaScript test files colocated or in __tests__/ as *.test.ts

Applied to files:

  • tests/e2e/test_task_approval.spec.ts
📚 Learning: 2025-11-25T19:08:54.154Z
Learnt from: CR
Repo: frankbria/codeframe PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-25T19:08:54.154Z
Learning: Applies to specs/*/tasks.md : Feature task files (tasks.md) must include phase-by-phase task breakdown with unique task identifiers (T001, T002, etc.), acceptance criteria per task, beads issue references, and estimated effort

Applied to files:

  • tests/e2e/test_task_approval.spec.ts
📚 Learning: 2025-12-17T19:21:40.014Z
Learnt from: frankbria
Repo: frankbria/codeframe PR: 128
File: tests/agents/test_bottleneck_detection.py:486-500
Timestamp: 2025-12-17T19:21:40.014Z
Learning: In tests/agents/test_bottleneck_detection.py, test_skip_agents_below_threshold should be async and mock _get_agent_workload to return workload below AGENT_OVERLOAD_THRESHOLD (5) while providing non-empty tasks list to prevent early return in detect_bottlenecks().

Applied to files:

  • tests/e2e/test_task_approval.spec.ts
📚 Learning: 2026-01-04T06:26:12.870Z
Learnt from: CR
Repo: frankbria/codeframe PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-04T06:26:12.870Z
Learning: Applies to web-ui/src/app/page.tsx : Implement automatic project discovery start after project creation with loading state transitions and 'Start Discovery' button for idle projects

Applied to files:

  • tests/e2e/test_task_approval.spec.ts
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Frontend Unit Tests
  • GitHub Check: Backend Unit Tests
  • GitHub Check: E2E Smoke Tests (Chromium)
🔇 Additional comments (4)
tests/e2e/test_task_approval.spec.ts (4)

28-36: Clear documentation of shared state risks.

The warning comments and serial wrapper appropriately address the race condition concerns from mutating shared project state. This aligns with the commit message about wrapping tests in test.describe.serial().


70-142: Solid request body validation test.

The test correctly intercepts the API call and validates the contract. The assertions at lines 124-138 would have caught the original task_ids bug. The use of page.route() with route.continue() is the proper Playwright pattern for request inspection.


247-254: Good use of Promise.all for click and response coordination.

Using Promise.all to synchronize the button click with waitForResponse prevents the race condition where the response could complete before the listener is registered.


297-331: Potential flakiness due to shared project state mutation.

This direct API test runs after the UI tests in the serial suite. If an earlier test successfully approves tasks, the project transitions to active phase, and this test would get a 400 (not in planning phase) instead of 200. While [200, 400, 403, 404] are all accepted, the test's intent may be obscured.

Consider using a separate test project ID for direct API tests, or explicitly documenting that 400 is expected after prior approval tests mutate the state.

Tests share TEST_PROJECT_IDS.PLANNING and run serially. Once one test
approves, subsequent tests see project in active phase (not planning).

Changes:
- Negative format test: accept 400 or 422 (phase check may fail first)
- Multi-agent test: accept 200 or 400, focus on response time assertion
  (the critical P0 fix is non-blocking behavior, not approval success)

Both changes preserve the primary test intent while handling the reality
of shared mutable state across serial test execution.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
tests/e2e/test_task_approval.spec.ts (2)

194-200: Consider adding a more specific error message assertion.

The .or() locator combination is valid, but if both locators find nothing, the error message won't indicate which specific error UI element was expected. Consider adding a fallback check or more specific error context.

💡 Optional: Add diagnostic logging on failure
     // At least one error indicator must be visible
-    await expect(
-      errorIndicators.first().or(errorText.first())
-    ).toBeVisible({ timeout: 5000 });
+    const hasErrorIndicator = await errorIndicators.first().isVisible().catch(() => false);
+    const hasErrorText = await errorText.first().isVisible().catch(() => false);
+    
+    if (!hasErrorIndicator && !hasErrorText) {
+      // Log page content for debugging
+      console.log('[422 Test] No error UI found. Page content:', await page.content());
+    }
+    
+    expect(hasErrorIndicator || hasErrorText).toBe(true);

428-430: Consider extracting the timeout threshold as a named constant.

The 5000ms threshold is used to verify non-blocking behavior. Extracting it as a named constant would improve readability and make it easier to adjust for different CI environments.

💡 Optional: Extract magic number to constant
+// Max acceptable response time for non-blocking approval (background task should not delay response)
+const NON_BLOCKING_RESPONSE_THRESHOLD_MS = 5000;
+
 // ...
 
-    // PRIMARY ASSERTION: Response should return quickly (< 5 seconds)
-    // This is the critical P0 fix - agent creation happens in background
-    expect(elapsed).toBeLessThan(5000);
+    // PRIMARY ASSERTION: Response should return quickly
+    // This is the critical P0 fix - agent creation happens in background
+    expect(elapsed).toBeLessThan(NON_BLOCKING_RESPONSE_THRESHOLD_MS);
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b717788 and 92cbe18.

📒 Files selected for processing (1)
  • tests/e2e/test_task_approval.spec.ts
🧰 Additional context used
📓 Path-based instructions (1)
tests/e2e/**/*.ts

📄 CodeRabbit inference engine (CLAUDE.md)

Implement E2E tests using Playwright + TestSprite with loginUser() helper from tests/e2e/test-utils.ts for authentication

Files:

  • tests/e2e/test_task_approval.spec.ts
🧠 Learnings (6)
📓 Common learnings
Learnt from: CR
Repo: frankbria/codeframe PR: 0
File: docs/CLAUDE.md:0-0
Timestamp: 2025-11-25T19:08:37.203Z
Learning: Implement Lead Agent for orchestration and Worker Agents for specialization (Backend, Frontend, Test, Review) with maturity levels D1-D4
📚 Learning: 2026-01-04T06:26:12.870Z
Learnt from: CR
Repo: frankbria/codeframe PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-04T06:26:12.870Z
Learning: Applies to tests/e2e/**/*.ts : Implement E2E tests using Playwright + TestSprite with loginUser() helper from tests/e2e/test-utils.ts for authentication

Applied to files:

  • tests/e2e/test_task_approval.spec.ts
📚 Learning: 2025-11-25T19:08:37.203Z
Learnt from: CR
Repo: frankbria/codeframe PR: 0
File: docs/CLAUDE.md:0-0
Timestamp: 2025-11-25T19:08:37.203Z
Learning: Applies to docs/web-ui/**/__tests__/**/*.test.{ts,tsx} : Create JavaScript test files colocated or in __tests__/ as *.test.ts

Applied to files:

  • tests/e2e/test_task_approval.spec.ts
📚 Learning: 2025-11-25T19:08:54.154Z
Learnt from: CR
Repo: frankbria/codeframe PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-25T19:08:54.154Z
Learning: Applies to specs/*/tasks.md : Feature task files (tasks.md) must include phase-by-phase task breakdown with unique task identifiers (T001, T002, etc.), acceptance criteria per task, beads issue references, and estimated effort

Applied to files:

  • tests/e2e/test_task_approval.spec.ts
📚 Learning: 2025-12-17T19:21:40.014Z
Learnt from: frankbria
Repo: frankbria/codeframe PR: 128
File: tests/agents/test_bottleneck_detection.py:486-500
Timestamp: 2025-12-17T19:21:40.014Z
Learning: In tests/agents/test_bottleneck_detection.py, test_skip_agents_below_threshold should be async and mock _get_agent_workload to return workload below AGENT_OVERLOAD_THRESHOLD (5) while providing non-empty tasks list to prevent early return in detect_bottlenecks().

Applied to files:

  • tests/e2e/test_task_approval.spec.ts
📚 Learning: 2026-01-04T06:26:12.870Z
Learnt from: CR
Repo: frankbria/codeframe PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-04T06:26:12.870Z
Learning: Applies to web-ui/src/app/page.tsx : Implement automatic project discovery start after project creation with loading state transitions and 'Start Discovery' button for idle projects

Applied to files:

  • tests/e2e/test_task_approval.spec.ts
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Frontend Unit Tests
  • GitHub Check: Backend Unit Tests
  • GitHub Check: E2E Smoke Tests (Chromium)
  • GitHub Check: opencode-review
🔇 Additional comments (7)
tests/e2e/test_task_approval.spec.ts (7)

15-32: LGTM!

The imports correctly use the loginUser() helper from tests/e2e/test-utils.ts as per the coding guidelines. The warning comments about shared project state and serial execution requirements are excellent for maintainability.


34-36: LGTM!

Using test.describe.serial() is the correct approach to handle shared project state mutations across test suites. This ensures deterministic test execution order.


247-254: LGTM!

Using Promise.all() to parallelize waitForResponse and click() is the correct pattern to avoid race conditions where the response might complete before the wait is registered.


297-300: Auth token retrieval pattern is acceptable but fragile.

Retrieving the auth token from localStorage assumes loginUser() stores it there. This coupling is acceptable for E2E tests but worth noting.


334-371: LGTM!

Excellent documentation explaining why both 400 and 422 are acceptable responses. The test correctly validates that incorrect request formats are rejected while accounting for the execution order of validation checks in the backend.


374-386: WebSocket verification mentioned in comment but not implemented.

The test suite description mentions verifying "WebSocket receives development_started event" but no WebSocket assertions are present in the tests. Consider adding WebSocket verification or updating the comment to reflect the actual test coverage.

Is WebSocket verification intentionally deferred to a future PR, or should it be added here?


447-448: LGTM!

The serial describe wrapper is properly closed, ensuring all tests run sequentially to avoid shared state race conditions.

@github-actions
Copy link
Contributor

Code Review Summary

APPROVE - Ready to merge

All critical issues from prior reviews have been addressed:

  • ✅ Unguarded broadcasts fixed (e9365af) - Try/except wrappers prevent masking original errors
  • ✅ Unused variable removed (6fe801d)
  • ✅ E2E test quality issues fixed (5ec3044, 92cbe18) - Proper error handling and serial execution

Code Quality (Excellent)

Strengths:

  1. Clean architecture - start_development_execution background task properly separated from endpoint logic
  2. Non-blocking design - FastAPI BackgroundTasks pattern ensures quick API response (~<5s)
  3. Proper error handling - Catches asyncio.TimeoutError and general exceptions with graceful degradation
  4. WebSocket integration - Broadcasts development_failed events with guarded calls (tasks.py:107-118, 125-136)
  5. Excellent logging - Uses emoji indicators (🚀, ✅, ❌, ⚠️) for quick visual scanning

Minor suggestions (non-blocking):

  1. Error message sanitization (tasks.py:130) - str(e) could leak implementation details; consider generic client message with full logging
  2. Configurable timeout (tasks.py:93) - 300s timeout is hardcoded; could be environment variable
  3. Type annotation (tasks.py:46) - ws_manager: Any could use Protocol for better type safety

Potential Bugs (None found)

The summary.get() call at line 96 is safe - start_multi_agent_execution() either returns a Dict or raises, never returns None.


Performance Considerations (Good)

  • ✅ Non-blocking API response - excellent UX
  • ✅ Background task pattern prevents response delays
  • 💡 Consideration: Task queue with rate limiting for concurrent approvals in high-traffic scenarios

Security Considerations (Good)

  • ✅ Authorization check (user_has_project_access) at tasks.py:281
  • ⚠️ Minor: Error messages via str(e) could expose implementation details (non-blocking)

Test Coverage (Excellent)

  • ✅ 15 new unit tests covering background execution, error handling, timeout scenarios
  • ✅ Integration tests for approval → execution flow
  • ✅ WebSocket event validation
  • ✅ E2E tests with proper error monitoring and serial execution
  • ✅ Edge cases (no API key, empty API key, rejection, instantiation errors)

Overall Assessment

This PR implements the P0 blocker fix connecting task approval to multi-agent execution. The code follows existing patterns, has comprehensive error handling, and excellent test coverage. All critical issues from prior reviews have been addressed. The minor suggestions above can be addressed in follow-up commits.

@github-actions
Copy link
Contributor

Review posted: #246 (comment)

Summary

APPROVED - All critical issues from prior reviews have been addressed. The PR is ready to merge.

Key strengths:

  • Clean architecture with background task pattern
  • Excellent test coverage (15 new unit tests + comprehensive E2E tests)
  • Proper error handling with guarded WebSocket broadcasts
  • Non-blocking API response design

Minor suggestions (non-blocking):

  • Error message sanitization to avoid exposing implementation details
  • Configurable timeout instead of hardcoded 300s
  • Protocol type annotation for ws_manager: Any

New%20session%20-%202026-01-11T23%3A28%3A56.257Z
opencode session  |  github run

@frankbria frankbria merged commit ddd386f into main Jan 11, 2026
14 of 15 checks passed
@frankbria frankbria deleted the feat/task-approval-multi-agent-execution branch January 11, 2026 23:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants