Skip to content

Conversation

@frankbria
Copy link
Owner

@frankbria frankbria commented Jan 13, 2026

Summary

Implements an intelligent task classification system that determines which quality gates should apply to each task based on its characteristics. This prevents false failures when non-code tasks (design, documentation) go through code-specific quality gates.

Problem Solved: When complete_task() was called for tasks like "Design database schema for events", all quality gates ran unconditionally, creating impossible-to-resolve blockers because:

  • Coverage gate runs pytest --cov which fails if pytest-cov isn't installed
  • Test gate expects executable code to test
  • The blocker asks users to "fix these issues" when there's nothing to fix for design tasks

Solution: Task classification based on keyword analysis of task title/description, with selective gate execution.

Changes

New Modules

  • codeframe/lib/task_classifier.py - Classifies tasks into categories using keyword analysis
  • codeframe/lib/quality_gate_rules.py - Defines which gates apply to each category
  • codeframe/config/quality_gates_config.py - Environment-based configuration

Modified Files

  • codeframe/lib/quality_gates.py - Updated run_all_gates() for selective execution
  • codeframe/core/models.py - Added skipped_gates and task_category to QualityGateResult
  • codeframe/agents/worker_agent.py - Updated completion messages with gate status

Task Categories & Applicable Gates

Task Category Tests Coverage Type Check Linting Code Review Skip Detection
CODE_IMPLEMENTATION
DESIGN
DOCUMENTATION
CONFIGURATION
TESTING
REFACTORING
MIXED

Configuration

Environment variables for toggling behavior:

  • QUALITY_GATES_ENABLE_TASK_CLASSIFICATION=true (default) - Enable/disable classification
  • QUALITY_GATES_STRICT_MODE=false (default) - Run all gates regardless of task type

Test plan

  • Run uv run pytest tests/lib/test_task_classifier.py - 39 tests passing
  • Run uv run pytest tests/lib/test_quality_gate_rules.py - 19 tests passing
  • Run uv run pytest tests/lib/test_quality_gates.py - 54 tests passing
  • Run uv run pytest tests/config/test_quality_gates_config.py - 17 tests passing
  • Run uv run ruff check - All checks passed
  • Run uv run mypy - No issues found

Summary by CodeRabbit

  • New Features

    • Task classification to categorize tasks (code, design, docs, testing, refactor, config, mixed)
    • Per-category gate rules and configurable quality-gates settings
  • Improvements

    • Quality gates run selectively by category; skipped gates are tracked and reported
    • Results, completion and blocker messages include task category and skipped-gates context
    • Category-specific guidance in blocker messages; improved logging and coverage handling
  • Tests

    • Extensive unit and integration tests for classifier, gate rules and config behavior

✏️ Tip: You can customize this high-level summary in your review settings.

… execution

Add intelligent task classification system to determine which quality gates
apply based on task characteristics. This prevents false failures when
non-code tasks (design, documentation) go through code-specific gates.

Key Changes:
- Add TaskClassifier module with keyword-based task categorization
- Add QualityGateRules module defining gate applicability per category
- Add QualityGatesConfig for environment-based configuration
- Update QualityGates.run_all_gates() for selective execution
- Update QualityGateResult model with skipped_gates and task_category
- Improve error handling for missing tools (pytest-cov)
- Update WorkerAgent completion messages with gate status

Task Categories and Applicable Gates:
- CODE_IMPLEMENTATION: All 6 gates
- DESIGN: Only code_review
- DOCUMENTATION: Only linting
- CONFIGURATION: linting, type_check
- TESTING: tests, coverage, skip_detection
- REFACTORING: All 6 gates
- MIXED: All 6 gates (conservative)

Tests: 129 new/updated tests passing
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 13, 2026

Walkthrough

Implements task classification, per-category gate rules, and configurable gate selection; quality gate orchestration now records skipped gates and task category in results, and WorkerAgent.complete_task returns and uses task_category and skipped_gates in success and blocked/failure payloads.

Changes

Cohort / File(s) Summary
Task classification & rules
codeframe/lib/task_classifier.py, codeframe/lib/quality_gate_rules.py, codeframe/config/quality_gates_config.py
Add TaskClassifier and TaskCategory enum; per-category gate mappings and skip reasons; env-driven QualityGatesConfig with caching, strict-mode toggle, and custom per-category gate resolution. (Review: regex/keyword sets, mapping correctness, env parsing and cache reset.)
Quality gate orchestration
codeframe/lib/quality_gates.py
run_all_gates now classifies tasks and runs per-category applicable gates; per-gate runners accept optional task_category; skipped gates collected and logged; blockers include category guidance; results include skipped_gates and task_category. (Review: flow changes, skipped gate handling, blocker messages, DB/human-approval guards.)
Models & agent
codeframe/core/models.py, codeframe/agents/worker_agent.py
QualityGateResult gains skipped_gates, task_category, and gates_skipped_count; WorkerAgent.complete_task message/payload updated to embed dynamic gates_msg and include task_category and skipped_gates on success and blocked/failure paths. (Review: public return contract change and downstream consumers.)
Tests
tests/config/test_quality_gates_config.py, tests/lib/test_quality_gate_rules.py, tests/lib/test_quality_gates.py, tests/lib/test_task_classifier.py
Add/expand unit and integration tests for config parsing/caching, rules mapping and reasons, classifier accuracy/priorities, and gate orchestration including skipped_gates reporting. (Review: test coverage, AsyncMock/patch usage, and integration assumptions.)

Sequence Diagram(s)

sequenceDiagram
    participant Client as Incoming Task
    participant Classifier as TaskClassifier
    participant Config as QualityGatesConfig
    participant Rules as QualityGateRules
    participant Orchestrator as QualityGates
    participant Agent as WorkerAgent

    Client->>Classifier: classify_task(task)
    Classifier-->>Orchestrator: task_category
    Orchestrator->>Config: get_quality_gates_config()
    Orchestrator->>Rules: get_applicable_gates(task_category)
    Rules-->>Orchestrator: applicable_gates (and skip_reasons)
    Orchestrator->>Orchestrator: run applicable gates, collect failures & skipped_gates
    Orchestrator-->>Agent: QualityGateResult (status, failures, skipped_gates, task_category)
    Agent-->>Client: complete_task response (message including gates_msg, task_category, skipped_gates, status)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Poem

🐰 I hopped through titles, sniffed descriptions wide,
I nudged the gates that matter, let the others hide.
Skips tucked in my pouch, categories on my sleeve,
Focused checks, no busywork — that's how I weave.
Hooray for sharper gates, the rabbit takes a bow.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely summarizes the main change: implementing task classification for selective quality gate execution, which is the core objective across all modified and new modules.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/task-classification-quality-gates

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dabbf12 and b80f03e.

📒 Files selected for processing (2)
  • codeframe/config/quality_gates_config.py
  • tests/config/test_quality_gates_config.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • codeframe/config/quality_gates_config.py
  • tests/config/test_quality_gates_config.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Frontend Unit Tests
  • GitHub Check: Backend Unit Tests
  • GitHub Check: opencode-review
  • GitHub Check: E2E Smoke Tests (Chromium)
  • GitHub Check: Macroscope - Correctness Check

Comment @coderabbitai help to get the list of available commands and usage tips.

@macroscopeapp
Copy link

macroscopeapp bot commented Jan 13, 2026

Implement classification-driven selective execution of quality gates in QualityGates.run_all_gates and propagate task_category and skipped_gates through WorkerAgent responses

Introduce task classification to drive gate applicability, add config-controlled rules, record skipped gates and category in QualityGateResult, and update gate runners and WorkerAgent messaging to include task_category and skipped_gates. Core orchestration changes are in quality_gates.py with supporting config in quality_gates_config.py and classification in task_classifier.py.

📍Where to Start

Start with QualityGates.run_all_gates in quality_gates.py, then review QualityGatesConfig in quality_gates_config.py and TaskClassifier in task_classifier.py.


Macroscope summarized b80f03e.

@frankbria frankbria linked an issue Jan 13, 2026 that may be closed by this pull request
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In @codeframe/agents/worker_agent.py:
- Around line 1400-1408: Replace the hardcoded "6" with a dynamic total gate
count: compute total_gates = len(QualityGateType) (or an equivalent runtime
lookup of the enum/mapping) and use total_gates - skipped_count when building
gates_msg; reference quality_result.gates_skipped_count,
quality_result.task_category and QualityGateType so the message remains correct
if the enum changes.

In @codeframe/lib/quality_gates.py:
- Around line 1145-1160: The gate methods currently call _create_quality_blocker
without the task_category, so blockers lack category context; modify each gate
method signature (run_tests_gate, run_type_check_gate, run_coverage_gate,
run_linting_gate, run_skip_detection_gate) to accept an optional task_category
parameter and update run_all_gates to pass the classified task_category into
each call, then pass that task_category through to the _create_quality_blocker
invocations inside those methods; alternatively, if category is not needed,
remove the task_category parameter from _create_quality_blocker and any
references to it in those gate methods and run_all_gates—prefer the first
approach to preserve category context.
🧹 Nitpick comments (5)
codeframe/config/quality_gates_config.py (1)

53-78: Consider logging invalid gate names.

The get_custom_gates_for_category method silently skips invalid gate names. While this prevents failures, it could hide configuration errors from operators.

Optional: Add logging for invalid gate names
+import logging
+
+logger = logging.getLogger(__name__)
+
 # Convert string names to QualityGateType enum values
 gates = []
 for name in gate_names:
     try:
         gates.append(QualityGateType(name))
     except ValueError:
         # Invalid gate name, skip it
-        pass
+        logger.warning(f"Invalid gate name '{name}' in custom_category_rules for '{category}', skipping")

 return gates if gates else None
tests/lib/test_quality_gates.py (2)

1247-1267: Consider using unique task numbers to avoid potential database conflicts.

The _create_task helper always uses "1.1.1" as the task number. While this may work if tests run in isolation with separate databases, it could cause issues if tests are ever run in sequence or if the task_number has a unique constraint.

♻️ Suggested improvement
+    _task_counter = 0
+
     def _create_task(self, db, project_id, title, description=""):
         """Helper to create test tasks with specific titles."""
+        TestQualityGatesTaskClassification._task_counter += 1
+        task_number = f"1.1.{TestQualityGatesTaskClassification._task_counter}"
         cursor = db.conn.cursor()
         cursor.execute(
             """
             INSERT INTO tasks (project_id, task_number, title, description, status)
             VALUES (?, ?, ?, ?, ?)
             """,
-            (project_id, "1.1.1", title, description, "in_progress"),
+            (project_id, task_number, title, description, "in_progress"),
         )
         db.conn.commit()
         task_id = cursor.lastrowid

         return Task(
             id=task_id,
             project_id=project_id,
-            task_number="1.1.1",
+            task_number=task_number,
             title=title,
             description=description,
             status=TaskStatus.IN_PROGRESS,
         )

1358-1369: Strengthen the assertion for skipped gates verification.

The assertion hasattr(result, 'skipped_gates') or 'skipped' in str(result) is weak and could pass even if the implementation is incorrect. Since QualityGateResult now has explicit skipped_gates field per the PR changes, assert directly on it.

♻️ Proposed fix
     @pytest.mark.asyncio
     async def test_result_includes_skipped_gates_info(self, db, project_id, quality_gates):
         """Result should include information about skipped gates."""
         task = self._create_task(db, project_id, "Design microservices architecture")

         with patch.object(quality_gates, "run_review_gate", new_callable=AsyncMock) as mock_review:
             mock_review.return_value = QualityGateResult(task_id=task.id, status="passed", failures=[], execution_time_seconds=0.1)

             result = await quality_gates.run_all_gates(task)

-            # Check that skipped_gates is populated
-            assert hasattr(result, 'skipped_gates') or 'skipped' in str(result)
+            # Check that skipped_gates is populated with non-code gates
+            assert hasattr(result, 'skipped_gates')
+            assert len(result.skipped_gates) > 0
+            # Design tasks should skip tests, coverage, type_check, linting, skip_detection
+            assert 'tests' in result.skipped_gates
+            assert 'coverage' in result.skipped_gates
codeframe/lib/quality_gates.py (2)

661-683: Consider defensive handling for classifier exceptions.

If TaskClassifier.classify_task() raises an exception, the entire run_all_gates method will fail. Consider wrapping the classification in a try-except to fall back to running all gates if classification fails.

♻️ Suggested defensive handling
     # Classify the task and determine applicable gates
     classifier = TaskClassifier()
     rules = QualityGateRules()
-    category = classifier.classify_task(task)
+    try:
+        category = classifier.classify_task(task)
+    except Exception as e:
+        logger.warning(
+            f"Task classification failed for task {task.id}: {e}. "
+            "Falling back to running all gates."
+        )
+        category = TaskCategory.MIXED  # Conservative fallback

980-993: Good handling of missing pytest-cov, but case sensitivity may miss some errors.

The check uses .lower() for "no module named 'pytest_cov'" but not for "unrecognized arguments: --cov". Consider making both checks case-insensitive for robustness.

♻️ Suggested improvement
             # Check if pytest-cov is not installed
-            if "unrecognized arguments: --cov" in output or "no module named 'pytest_cov'" in output.lower():
+            output_lower = output.lower()
+            if "unrecognized arguments: --cov" in output_lower or "no module named 'pytest_cov'" in output_lower:
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 75d248c and 0940396.

📒 Files selected for processing (10)
  • codeframe/agents/worker_agent.py
  • codeframe/config/quality_gates_config.py
  • codeframe/core/models.py
  • codeframe/lib/quality_gate_rules.py
  • codeframe/lib/quality_gates.py
  • codeframe/lib/task_classifier.py
  • tests/config/test_quality_gates_config.py
  • tests/lib/test_quality_gate_rules.py
  • tests/lib/test_quality_gates.py
  • tests/lib/test_task_classifier.py
🧰 Additional context used
📓 Path-based instructions (1)
codeframe/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

codeframe/**/*.py: Use Python 3.11+ for backend development
Use FastAPI framework for backend API development
Use SQLite with async support (aiosqlite) for database operations
Use tiktoken for token counting
Run linting with ruff check .
Import and use domain-specific repositories from codeframe/persistence/repositories/ for data access instead of direct database calls

Files:

  • codeframe/core/models.py
  • codeframe/lib/task_classifier.py
  • codeframe/agents/worker_agent.py
  • codeframe/lib/quality_gate_rules.py
  • codeframe/config/quality_gates_config.py
  • codeframe/lib/quality_gates.py
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
Repo: frankbria/codeframe PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-25T19:08:54.154Z
Learning: Applies to specs/*/tasks.md : Feature task files (tasks.md) must include phase-by-phase task breakdown with unique task identifiers (T001, T002, etc.), acceptance criteria per task, beads issue references, and estimated effort
🧬 Code graph analysis (7)
codeframe/lib/task_classifier.py (2)
codeframe/core/models.py (2)
  • Task (257-305)
  • title (242-243)
tests/lib/test_quality_gates.py (3)
  • task (56-79)
  • task (467-488)
  • task (901-924)
codeframe/agents/worker_agent.py (1)
codeframe/core/models.py (1)
  • gates_skipped_count (938-940)
tests/lib/test_task_classifier.py (1)
codeframe/lib/task_classifier.py (3)
  • TaskClassifier (94-185)
  • TaskCategory (31-43)
  • classify_task (122-185)
codeframe/lib/quality_gate_rules.py (2)
codeframe/core/models.py (1)
  • QualityGateType (158-166)
codeframe/lib/task_classifier.py (1)
  • TaskCategory (31-43)
codeframe/config/quality_gates_config.py (1)
codeframe/core/models.py (1)
  • QualityGateType (158-166)
tests/lib/test_quality_gates.py (3)
codeframe/core/models.py (8)
  • project_id (234-235)
  • title (242-243)
  • Task (257-305)
  • id (230-231)
  • status (246-247)
  • TaskStatus (10-18)
  • QualityGateResult (906-940)
  • passed (928-930)
codeframe/persistence/database.py (2)
  • Database (51-718)
  • initialize (110-130)
codeframe/lib/quality_gates.py (2)
  • QualityGates (86-1437)
  • run_all_gates (615-774)
codeframe/lib/quality_gates.py (5)
codeframe/config/quality_gates_config.py (3)
  • get_quality_gates_config (103-125)
  • should_use_task_classification (43-51)
  • get_custom_gates_for_category (53-78)
codeframe/enforcement/skip_pattern_detector.py (1)
  • SkipPatternDetector (38-457)
codeframe/lib/task_classifier.py (2)
  • TaskClassifier (94-185)
  • classify_task (122-185)
codeframe/lib/quality_gate_rules.py (4)
  • QualityGateRules (102-157)
  • get_applicable_gates (115-124)
  • all_gates (111-113)
  • get_skip_reason (139-157)
codeframe/core/models.py (4)
  • id (230-231)
  • QualityGateType (158-166)
  • Task (257-305)
  • title (242-243)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Macroscope - Correctness Check
  • GitHub Check: E2E Smoke Tests (Chromium)
  • GitHub Check: opencode-review
🔇 Additional comments (35)
codeframe/core/models.py (1)

906-941: LGTM!

The new skipped_gates and task_category fields are well-designed with appropriate defaults using Field(default_factory=list) for the mutable list, avoiding the common pitfall of mutable default arguments. The gates_skipped_count property provides a clean computed accessor.

codeframe/lib/task_classifier.py (3)

1-44: LGTM!

The module structure is clean with a comprehensive docstring explaining the purpose and usage. The TaskCategory enum inherits from str for proper JSON serialization, and the seven categories cover the expected task types well.


46-121: LGTM!

The keyword definitions use frozenset for immutability, and the _compile_pattern method correctly sorts keywords by length descending to ensure longer keywords are matched first in the regex alternation. The word boundary matching (\b) prevents false positives from embedded keywords.


122-185: LGTM!

The classification logic follows a clear priority order with good documentation. The distinction between "strong" and "weak" code keywords prevents generic verbs like "create" from triggering MIXED classification when combined with non-code categories. The conservative default to CODE_IMPLEMENTATION ensures all quality gates run when classification is uncertain.

codeframe/agents/worker_agent.py (2)

1414-1423: LGTM!

The success return payload correctly includes the new task_category and skipped_gates fields with appropriate fallbacks using hasattr() for backward compatibility.


1444-1457: LGTM!

The failure path appropriately includes task category context in the message and return payload, providing useful debugging information when quality gates fail.

tests/config/test_quality_gates_config.py (3)

1-72: LGTM!

Comprehensive test coverage for the QualityGatesConfig dataclass including default values, task classification behavior with various flag combinations, and custom category rules with validation of invalid gate names.


74-99: LGTM!

The _parse_bool_env tests cover true/false values, case variations, defaults when unset, and defaults when invalid. Good edge case coverage.


101-153: LGTM!

The singleton behavior tests properly use setup_method/teardown_method to reset state between tests. The patch.dict usage correctly isolates environment variable changes.

codeframe/lib/quality_gate_rules.py (3)

1-70: LGTM!

The gate applicability matrix is well-documented and the mappings are comprehensive. The conservative approach of applying all gates for MIXED and unknown categories ensures no quality checks are accidentally skipped.


72-99: LGTM!

Skip reasons are human-readable and provide clear context for why specific gates don't apply to certain task categories. This will be helpful for debugging and user feedback.


102-157: LGTM!

The QualityGateRules class provides a clean API with get_applicable_gates, should_skip_gate, and get_skip_reason. The fallback to TaskCategory.MIXED for unknown categories and the default skip reason generation handle edge cases gracefully.

tests/lib/test_task_classifier.py (3)

1-53: LGTM!

Good test structure with a reusable _make_task helper and pytest fixture for the classifier. The TestTaskCategory class validates enum completeness.


54-197: LGTM!

Thorough coverage of DESIGN, DOCUMENTATION, CONFIGURATION, and CODE_IMPLEMENTATION categories with tests for various keywords. The test for classification by description (line 88-94) ensures both title and description are considered.


198-280: LGTM!

Excellent edge case coverage including:

  • MIXED classification when design + implement keywords present
  • Empty description defaulting to CODE_IMPLEMENTATION
  • Case-insensitive matching
  • Word boundary enforcement (preventing false matches)
  • Priority/precedence rules for TESTING and REFACTORING
codeframe/config/quality_gates_config.py (2)

1-52: LGTM!

The module docstring clearly documents environment variables. The QualityGatesConfig dataclass has sensible defaults and the should_use_task_classification() method correctly implements the logic that strict mode overrides task classification.


81-134: LGTM!

The _parse_bool_env helper handles common boolean string representations. The singleton pattern with get_quality_gates_config() and reset_config() is appropriate for configuration that should be loaded once and cached, while still supporting testing scenarios.

tests/lib/test_quality_gates.py (5)

900-924: LGTM! Fixture updated to align with task classification tests.

The fixture now explicitly creates a code implementation task with appropriate title ("Implement feature") and description ("Build authentication module") that will trigger the CODE_IMPLEMENTATION classification, ensuring all gates run for integration tests.


1274-1301: LGTM! Design task gate selection test is well-structured.

The test properly verifies that design tasks only run the code_review gate while skipping tests, coverage, type_check, linting, and skip_detection gates. The mocking approach with patch.object and AsyncMock is appropriate for this integration-style test.


1303-1329: LGTM! Code implementation gate coverage test is comprehensive.

The test correctly verifies that code implementation tasks run all six gates (tests, coverage, type_check, linting, code_review, skip_detection), which aligns with the applicability matrix in the PR objectives.


1331-1356: LGTM! Documentation task test correctly validates linting-only execution.

The test verifies that documentation tasks only run the linting gate, consistent with the quality gate applicability matrix.


1371-1396: LGTM! Configuration task test correctly validates linting and type_check execution.

The test verifies that configuration tasks run both linting and type_check gates while skipping tests, coverage, code_review, and skip_detection.

tests/lib/test_quality_gate_rules.py (6)

1-21: Excellent documentation with applicability matrix.

The docstring clearly documents the Quality Gate Applicability Matrix, which serves as both documentation and a reference for understanding the expected behavior. This makes the tests self-documenting and maintainable.


35-46: LGTM! CODE_IMPLEMENTATION gate coverage is comprehensive.

The test correctly verifies that code implementation tasks have all 6 gates applied and explicitly checks each gate type.


51-66: LGTM! DESIGN gate tests are thorough.

The tests correctly verify that design tasks only have the code_review gate and explicitly check that tests and coverage are excluded.


105-117: LGTM! TESTING category gate tests are correct.

The tests verify that testing tasks get tests, coverage, and skip_detection gates (3 total), while excluding code_review. This makes sense as testing tasks focus on test quality rather than code review.


167-170: LGTM! Comprehensive CODE_IMPLEMENTATION skip verification.

The test iterates through all QualityGateType values to ensure none are skipped for code implementation tasks, providing complete coverage of the expected behavior.


192-198: LGTM! all_gates property test validates enumeration completeness.

The test ensures that the all_gates property returns all 6 gate types and that every QualityGateType enum value is included.

codeframe/lib/quality_gates.py (7)

55-58: LGTM! New imports for task classification system.

The imports for get_quality_gates_config, TaskClassifier, and QualityGateRules are correctly placed and support the new task classification feature.


685-688: LGTM! Informative logging for task classification.

The log message includes task ID, category, and applicable gates which will be valuable for debugging and understanding gate execution decisions.


701-753: LGTM! Consistent pattern for gate execution with skip tracking.

Each gate follows a consistent pattern: check if applicable, execute if yes, log and track skip reason if no. The skip reasons from rules.get_skip_reason() provide useful context for debugging.


760-772: LGTM! Result construction includes classification context.

The QualityGateResult now includes skipped_gates and task_category, providing complete context about what was executed and why. The final log message includes the skip count for visibility.


825-866: LGTM! Proper separation of test execution and coverage.

The change to run pytest without --cov flags is appropriate since coverage is now handled separately by _run_coverage(). This prevents test failures when pytest-cov is not installed, improving the robustness of the quality gates.


1145-1176: LGTM! Enhanced blocker messages with task category context.

The addition of task_category parameter and the inclusion of category guidance in blocker messages helps users understand which gates were applied and provides actionable advice if the task was misclassified.


1216-1254: LGTM! Helpful category-specific guidance messages.

The _get_category_guidance method provides clear, actionable guidance for each task category. The messages help users understand the gate behavior and suggest corrective actions (e.g., updating task titles if code was written for a documentation task).

@github-actions
Copy link
Contributor

PR Review: Task Classification for Quality Gates

Overall, this is a well-implemented feature with excellent test coverage (129 total tests). The task classification system effectively solves the problem of false gate failures for non-code tasks. The code is clean, well-documented, and follows project conventions.

✅ Strengths

  1. Excellent test coverage - 129 tests across 4 test files, covering classification logic, gate rules, configuration, and integration
  2. Clean architecture - Separation of concerns with , , and
  3. Good documentation - Comprehensive docstrings explaining usage and design decisions
  4. Performance optimizations - Pre-compiled regex patterns, frozenset for immutability
  5. Type safety - Good use of Python type hints throughout

🔴 High Priority Issues

1. Silent Invalid Configuration Handling

File: codeframe/config/quality_gates_config.py:69-78

The get_custom_gates_for_category() method silently skips invalid gate names without logging warnings:

for name in gate_names:
    try:
        gates.append(QualityGateType(name))
    except ValueError:
        # Invalid gate name, skip it  <-- No logging!
        pass

Impact: If a user has a typo in custom_category_rules (e.g., "test" instead of "tests"), the configuration will silently apply an unexpected subset of gates without any indication of the error.

Recommendation: Add logging for invalid gate names:

except ValueError as e:
    logger.warning(f"Invalid gate name '{name}' for category '{category}': {e}")
    pass

2. Coverage Gate Passes When pytest-cov Not Installed

File: codeframe/lib/quality_gates.py:982-993

When pytest-cov is not installed, the coverage gate passes with 100%:

if "unrecognized arguments: --cov" in output or "no module named 'pytest_cov'" in output.lower():
    return {
        "returncode": 0,  # <-- Passes the gate!
        "output": "pytest-cov not installed, skipping coverage check",
        "coverage_pct": 100.0,
        "skipped": True,
    }

Impact: Users may think they have 100% coverage when they don't even have the coverage plugin installed. This could lead to undetected quality issues.

Recommendation: Consider:

  • Option A: Fail the gate when coverage tool is not available (stricter)
  • Option B: Add a warning-level log entry indicating coverage wasn't actually measured
  • Option C: Return a distinct status (e.g., "skipped") instead of "passed"

3. Database Update Lacks Error Handling

File: codeframe/lib/quality_gates.py:694-699

Direct database cursor usage without proper error handling:

cursor = self.db.conn.cursor()  # type: ignore[union-attr]
cursor.execute(
    "UPDATE tasks SET requires_human_approval = 1 WHERE id = ?",
    (task.id,),
)
self.db.conn.commit()  # type: ignore[union-attr]

Impact: If the UPDATE fails (e.g., constraint violation, connection issue), the exception will propagate, but there's no logging or transaction rollback. This could leave the database in an inconsistent state.

Recommendation: Wrap in try-except or use repository methods:

try:
    cursor = self.db.conn.cursor()
    cursor.execute(
        "UPDATE tasks SET requires_human_approval = 1 WHERE id = ?",
        (task.id,),
    )
    self.db.conn.commit()
except Exception as e:
    self.db.conn.rollback()
    logger.error(f"Failed to mark task {task.id} for human approval: {e}")

🟡 Medium Priority Issues

4. MIXED Category Logic Could Be More Sophisticated

File: codeframe/lib/task_classifier.py:165-168

The MIXED category only triggers when strong code + strong non-code keywords are present:

has_strong_non_code = has_design or has_doc
if has_strong_non_code and has_code_strong:
    return TaskCategory.MIXED

Impact: Tasks like "Update configuration for test environment" (config + testing) won't be classified as MIXED. Weak code keywords (create, add) don't trigger MIXED.

Recommendation: Consider a scoring system where multiple category signals produce MIXED, not just strong code + strong non-code.

5. Design Task Review May Not Have Code to Review

File: codeframe/lib/quality_gate_rules.py:39-41

DESIGN tasks only have CODE_REVIEW gate:

TaskCategory.DESIGN: [
    QualityGateType.CODE_REVIEW,
],

Impact: If a design task produces code artifacts (e.g., database schema files), the review gate will run on them. But if the task only produces documentation/diagrams, there's nothing to review, and the gate may still run unnecessarily.

Recommendation: Consider if DESIGN tasks should have a dedicated design review gate, or clarify what the code review gate is supposed to review for design tasks.

🔵 Low Priority / Suggestions

6. Redundant .value Access on QualityGateType

File: codeframe/lib/quality_gates.py:708, 717, 726, 735, 744, 753

QualityGateType is defined as str, Enum, so enum members are already strings. Adding .value is redundant (but not incorrect):

skipped_gates.append(QualityGateType.LINTING.value)  # LINTING is already a string

Recommendation: For consistency, either use everywhere or nowhere. Given QualityGateType is str, Enum, direct access without is cleaner.

7. Consider Adding Classification Metrics

No tracking of classification accuracy or distribution. Adding metrics would help:

  • Monitor which categories are most common
  • Identify misclassifications
  • Tune keyword sets over time

✅ Test Coverage Assessment

Coverage: Excellent (129 tests)

  • test_task_classifier.py: 39 tests - comprehensive keyword and priority testing
  • test_quality_gate_rules.py: 19 tests - complete category-to-gate mapping tests
  • test_quality_gates_config.py: 17 tests - good configuration parsing tests
  • test_quality_gates.py: 54 tests (with selective gate tests added)

Missing Tests:

  1. Invalid custom_category_rules edge cases (e.g., non-dict values)
  2. Concurrent access to singleton config (thread safety)
  3. Edge cases: task with empty title AND description
  4. Integration test: full workflow with strict_mode enabled

✅ Performance Assessment

Good practices observed:

  • Pre-compiled regex patterns in TaskClassifier.init
  • Frozenset for immutability of keyword sets
  • Appropriate subprocess timeouts (60-300s)

Minor optimizations possible:

  • Cache TaskClassifier instance instead of creating new one in run_all_gates (if called repeatedly)
  • Consider using asyncio.gather() for parallel gate execution (careful with dependencies)

✅ Security Assessment

No major security concerns. The implementation follows good practices:

  • No command injection risk (subprocess calls with list args, not strings)
  • Input sanitization for LLM prompts (already present in worker_agent.py)
  • No sensitive data in logs

Minor suggestions:

  • The task classifier processes task titles/descriptions from database without additional sanitization. While regex word boundaries provide some protection, consider what happens with very long or unusual input strings.

📝 Summary

This is a solid implementation that effectively solves the stated problem. The test coverage is excellent, and the code quality is high. The main issues are:

  1. Silent configuration failures - Add logging for invalid gate names
  2. Coverage gate passes when tool unavailable - Consider stricter handling
  3. Database error handling - Add try-except for cursor operations

These issues should be addressed before merging. The medium/low priority issues can be deferred to follow-up PRs.

Recommendation: Address the 3 high-priority issues and merge. The other points can be tracked in a follow-up issue.

@github-actions
Copy link
Contributor

Additional Review Comments

After reviewing the PR and examining existing reviews from @github-actions, @macroscopeapp, and @coderabbitai, I've identified several additional issues not yet covered:


🟡 Medium Priority Issues

8. custom_category_rules Cannot Be Configured via Environment Variables

File: codeframe/config/quality_gates_config.py:118-123

The custom_category_rules field exists in the dataclass and is tested, but there is no environment variable to configure it:

# In get_quality_gates_config()
if _config is None:
    _config = QualityGatesConfig(
        enable_task_classification=_parse_bool_env(
            "QUALITY_GATES_ENABLE_TASK_CLASSIFICATION", True
        ),
        strict_mode=_parse_bool_env("QUALITY_GATES_STRICT_MODE", False),
        # custom_category_rules is never set from env vars!
    )

Impact: Users cannot actually use the custom category rules feature without modifying code. The tests pass values directly to the dataclass constructor, but in production this will always be None.

Recommendation: Add environment variable support for JSON-based custom rules:

# Add JSON parsing and env var support
import json

custom_category_rules_raw = os.environ.get("QUALITY_GATES_CUSTOM_CATEGORY_RULES")
custom_category_rules = None
if custom_category_rules_raw:
    try:
        custom_category_rules = json.loads(custom_category_rules_raw)
    except json.JSONDecodeError as e:
        logger.warning(f"Failed to parse QUALITY_GATES_CUSTOM_CATEGORY_RULES: {e}")

_config = QualityGatesConfig(
    enable_task_classification=_parse_bool_env(
        "QUALITY_GATES_ENABLE_TASK_CLASSIFICATION", True
    ),
    strict_mode=_parse_bool_env("QUALITY_GATES_STRICT_MODE", False),
    custom_category_rules=custom_category_rules,  # Add this
)

9. No Validation/Warning When Custom Rules Are Ignored

File: codeframe/lib/quality_gates.py:671-676

When custom rules result in an empty gate list or all invalid gate names, code silently falls back to default rules without any indication:

# In run_all_gates()
if config.should_use_task_classification():
    # Check for custom rules first
    custom_gates = config.get_custom_gates_for_category(category.value)
    if custom_gates is not None:
        applicable_gates = custom_gates  # User expects custom rules to apply
    else:
        applicable_gates = rules.get_applicable_gates(category)
        # No warning that custom rules were ignored!

Impact: If a user configures a custom rule but all gate names are invalid (e.g., typos like "test" instead of "tests"), the code silently falls back to default rules. The user won't know their configuration was ignored.

Recommendation: Add logging when custom rules are ignored despite being configured:

if config.should_use_task_classification():
    custom_gates = config.get_custom_gates_for_category(category.value)
    if custom_gates is not None:
        applicable_gates = custom_gates
    else:
        applicable_gates = rules.get_applicable_gates(category)
        # Warn if custom rules were configured but ignored
        if config.custom_category_rules and category.value in config.custom_category_rules:
            logger.warning(
                f"Custom category rules for '{category.value}' were ignored. "
                f"This happens if all gate names are invalid. "
                f"Valid gates: {[g.value for g in QualityGateType]}"
            )

🔵 Low Priority / Suggestions

10. Document How to Use custom_category_rules

Since the feature exists but cannot be configured via environment variables (issue #8), documentation would help:

Example JSON format (if issue #8 is implemented):

export QUALITY_GATES_CUSTOM_CATEGORY_RULES='{"design": ["code_review", "tests"], "documentation": ["linting", "type_check"]}'

11. Consider Thread Safety for Singleton Config

File: codeframe/config/quality_gates_config.py:100-125

The singleton pattern with global _config variable is not thread-safe:

# Singleton config instance
_config: Optional[QualityGatesConfig] = None

def get_quality_gates_config() -> QualityGatesConfig:
    global _config
    
    if _config is None:  # Race condition: multiple threads could pass this check
        _config = QualityGatesConfig(...)
    return _config

Impact: In a multi-threaded environment (e.g., if multiple background tasks run quality gates simultaneously), there is a theoretical race condition where multiple QualityGatesConfig objects could be created. The last one wins, but this is unlikely to cause actual issues since both would have the same values (from environment variables).

Recommendation: For production use with concurrent quality gate execution, consider adding a lock or using threading.local(). For now, this is likely not critical since quality gates run sequentially per task.


📝 Summary of Additional Issues

Priority Issue Status
🟡 Medium custom_category_rules cannot be configured via environment Should address
🟡 Medium No validation/warning when custom rules produce empty gate list Should address
🔵 Low Document how to use custom_category_rules Nice to have
🔵 Low Consider thread safety for singleton config Can defer

These issues complement the excellent findings in the existing reviews. The PR has solid test coverage and good architecture, but the above points should be addressed before or shortly after merging.

@github-actions
Copy link
Contributor

I've completed my review of PR #261 and posted a comprehensive comment with findings not covered in previous reviews.

Summary of My Review

The PR effectively implements task classification for selective quality gate execution. The existing reviews from @github-actions, @macroscopeapp, and @coderabbitai covered most issues. I identified 2 additional medium-priority issues:

Medium Priority Issues

  1. custom_category_rules cannot be configured via environment variables (quality_gates_config.py:118-123)

    • Feature exists but there's no QUALITY_GATES_CUSTOM_CATEGORY_RULES env var
    • Users cannot actually use custom rules without modifying code
    • Recommendation: Add JSON-based environment variable support
  2. No validation/warning when custom rules are ignored (quality_gates.py:671-676)

    • If all gate names in custom rules are invalid, code silently falls back to defaults
    • User won't know their configuration was ignored
    • Recommendation: Add warning when custom rules exist but produce empty list

Low Priority Suggestions

  1. Document custom_category_rules usage - Feature needs documentation
  2. Consider thread safety for singleton config - Theoretical race condition in multi-threaded environment

The PR has excellent test coverage (129 tests), clean architecture, and good documentation. Address the issues above along with the high-priority issues from the existing review before merging.

New%20session%20-%202026-01-13T14%3A38%3A09.091Z
opencode session  |  github run

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 13, 2026

Tip

For best results, initiate chat on the files or code changes.

@github-actions, thank you for this thorough additional analysis! These are excellent findings that complement the existing reviews. Let me provide feedback on each:

Issue 8: Missing Environment Variable Support ✅ Valid

This is a genuine gap in the implementation. The feature is implemented and tested but cannot be used in production. Your JSON-based environment variable suggestion is the right approach:

Recommendations:

  1. Add the JSON parsing logic you suggested
  2. Consider adding schema validation for the parsed JSON structure
  3. Add error handling for malformed JSON with informative logging
  4. Update tests to verify environment variable parsing works correctly

Alternative consideration: If JSON in environment variables feels clunky, you could also support a file-based configuration:

config_file = os.environ.get("QUALITY_GATES_CONFIG_FILE")
if config_file and os.path.exists(config_file):
    with open(config_file) as f:
        custom_category_rules = json.load(f).get("custom_category_rules")

Issue 9: Silent Fallback Without Warning ✅ Valid

This is an important user experience issue. Your logging suggestion is spot-on. I'd slightly refine it to also log when valid custom gates are being used:

if custom_gates is not None:
    applicable_gates = custom_gates
    logger.info(
        f"Using custom gate rules for category '{category.value}': "
        f"{[g.value for g in custom_gates]}"
    )
else:
    applicable_gates = rules.get_applicable_gates(category)
    if config.custom_category_rules and category.value in config.custom_category_rules:
        logger.warning(
            f"Custom rules for '{category.value}' were ignored due to invalid gate names. "
            f"Configured: {config.custom_category_rules[category.value]}. "
            f"Valid gates: {[g.value for g in QualityGateType]}"
        )

Issue 10: Documentation Gap ✅ Valid

Agreed. Once Issue 8 is implemented, documentation should include:

  • Environment variable format and examples
  • Valid gate type names (tests, coverage, type_check, linting, code_review, skip_detection)
  • Behavior when invalid gate names are provided
  • Examples for common use cases

Issue 11: Thread Safety ⚠️ Low Risk but Worth Noting

Your assessment is correct. For the current sequential execution model, this is not an issue. However:

If future enhancements add concurrency, the fix is straightforward using threading.Lock:

import threading

_config: Optional[QualityGatesConfig] = None
_config_lock = threading.Lock()

def get_quality_gates_config() -> QualityGatesConfig:
    global _config
    if _config is None:
        with _config_lock:
            if _config is None:  # Double-check pattern
                _config = QualityGatesConfig(...)
    return _config

Recommended Priority

Before merge:

  • ✅ Issue 8 (blocks the feature from being usable)
  • ✅ Issue 9 (prevents silent config failures)

Shortly after merge:

  • 📝 Issue 10 (documentation)

Future consideration:

  • 🔮 Issue 11 (only if concurrent execution is added)

Great catch on the custom_category_rules configuration gap—that's the kind of issue that would be frustrating to discover in production!

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 13, 2026

Tip

For best results, initiate chat on the files or code changes.

@github-actions, thank you for the comprehensive summary! This is a great consolidation of the key findings.

Quick Recap for @frankbria

Your PR is in excellent shape with good test coverage and architecture. Here's what needs attention before merge:

Must address:

  • Issue 8: Add environment variable support for custom_category_rules (currently not configurable in production)
  • Issue 9: Add warning logs when custom rules are silently ignored due to invalid gate names

Nice to have:

  • Issue 10: Document the custom rules feature once Issue 8 is implemented
  • Issue 11: Thread safety consideration (only if you plan to add concurrent execution in the future)

The implementation is solid—these are polish items to make the feature production-ready and user-friendly. Once Issues 8 and 9 are addressed, this will be ready to merge! 🚀

- Replace hardcoded gate count with len(QualityGateType)
- Pass task_category through gate methods to _create_quality_blocker
- Add logging for invalid gate names in custom config rules
- Add warning when coverage check skipped (pytest-cov not installed)
- Add error handling for database update in run_all_gates
- Normalize task title/description to handle None values
- Return copy of gate rules list to prevent mutation
@github-actions
Copy link
Contributor

Additional Review Findings

I've analyzed this PR thoroughly, reviewing the code changes and considering the excellent feedback from previous reviews (@github-actions, @macroscopeapp, @coderabbitai).


✅ Issues Already Addressed in Latest Code

Looking at the current codebase, several high-priority issues mentioned in earlier reviews have already been fixed:

  1. Silent invalid configuration handling ()

    • FIXED: Lines 80-83 now include proper logging:
  2. Database update lacks error handling ()

    • FIXED: Lines 701-712 now have try-except wrapper:
  3. Hardcoded "6" for gate count ()

    • FIXED: Line 1402 now calculates dynamically:
  4. Inputs not normalized in task_classifier

    • FIXED: Lines 141-143 handle None values properly:
  5. Returning lists directly from get_applicable_gates

    • FIXED: Line 125 returns a copy:

🔴 Remaining High Priority Issue

Issue 1: custom_category_rules Cannot Be Configured via Environment Variables

File:

The feature exists and is tested, but there's no environment variable to configure it in production. Users cannot use custom gate rules without modifying code.

Current code:

Recommendation: Add JSON-based environment variable support:

Why this matters:

  • Feature is partially implemented (code exists, tests pass)
  • But users cannot actually use it in production
  • Could lead to confusion when users try to configure custom rules

🟡 Medium Priority Issues

Issue 2: No Warning When Custom Rules Produce Empty Gate List

File:

If a user configures custom rules but all gate names are invalid (e.g., typos), the code silently falls back to default rules. Users won't know their configuration was ignored.

Recommendation: Add warning when custom rules exist but produce empty list:


Issue 3: Coverage Gate May Mislead When pytest-cov Not Installed

File:

When pytest-cov is not installed, the gate passes with and . This could mislead users into thinking they have perfect coverage when no measurement actually occurred.

Current behavior:

Consideration: This is intentional (don't block tasks when tool unavailable), but there are alternatives:

  • Option A (Stricter): Fail the gate when coverage tool is unavailable
  • Option B (Current): Pass but log warning that coverage wasn't actually measured
  • Option C: Return distinct status (e.g., "unavailable") instead of "passed"

The current behavior is reasonable for a development environment, but consider documenting this limitation clearly.


🔵 Low Priority / Suggestions

Issue 4: Document custom_category_rules Usage

Once Issue #1 is implemented, add documentation:

  • Environment variable format and JSON structure
  • Valid gate type names: , , , , ,
  • Example configurations for common use cases
  • Behavior when invalid gate names are provided

Issue 5: Consider Thread Safety for Singleton Config

File:

The singleton pattern is not thread-safe. In concurrent execution (if implemented later), multiple objects could be created.

Impact: Low risk for current sequential execution. Only relevant if concurrent gate execution is added.

Recommendation: If concurrency is added in the future, use :


📊 Overall Assessment

Code Quality: ⭐⭐⭐⭐⭐ Excellent

  • Clean architecture with clear separation of concerns
  • Comprehensive test coverage (129 tests)
  • Good use of Python type hints and docstrings
  • Performance optimizations (pre-compiled regex, frozenset immutability)
  • Several high-priority issues from earlier reviews have already been addressed

Test Coverage: ⭐⭐⭐⭐⭐ Excellent

  • 129 tests across 4 test files
  • Good edge case coverage
  • Integration tests validate end-to-end behavior

Security: ⭐⭐⭐⭐⭐ No concerns

  • No command injection risks
  • Input sanitization via regex word boundaries
  • No sensitive data in logs

🎯 Recommended Action Plan

Before merge:

After merge:


✨ Summary

This is a well-implemented feature that effectively solves the problem of false gate failures for non-code tasks. The test coverage is excellent, and the code quality is high.

Key observation: Several high-priority issues mentioned in earlier reviews have already been fixed in the current codebase. The main remaining issue is the configuration gap for custom_category_rules (Issue #1), which prevents users from actually using this feature in production.

Once Issues #1 and #2 are addressed, this PR will be ready to merge! 🚀

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 13, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify Issue #1: Check if custom_category_rules can be configured via environment variables

echo "=== Checking quality_gates_config.py for environment variable support ==="
cat codeframe/config/quality_gates_config.py | grep -A 20 "def get_quality_gates_config"

echo ""
echo "=== Looking for any JSON parsing of custom_category_rules ==="
rg -n "custom_category_rules" codeframe/config/quality_gates_config.py -A 5 -B 5

echo ""
echo "=== Checking environment variable names used ==="
rg -n "QUALITY_GATES_" codeframe/config/quality_gates_config.py

Length of output: 3460


🏁 Script executed:

#!/bin/bash
# Check Issue #2: Warning when custom rules produce empty gate list

echo "=== Checking for warning logs in quality_gates.py when custom gates are empty ==="
rg -n "custom_gates" codeframe/lib/quality_gates.py -A 10 -B 5

echo ""
echo "=== Checking get_custom_gates_for_category implementation ==="
cat codeframe/config/quality_gates_config.py | grep -A 30 "def get_custom_gates_for_category"

Length of output: 2146


🏁 Script executed:

#!/bin/bash
# Check Issue #3: Coverage gate behavior when pytest-cov not installed

echo "=== Checking _run_coverage implementation ==="
rg -n "_run_coverage" codeframe/lib/quality_gates.py -A 40 | head -50

echo ""
echo "=== Looking for pytest-cov detection ==="
rg -n "pytest-cov" codeframe/lib/quality_gates.py -A 5 -B 5

Length of output: 4039


@frankbria, I've verified the findings from the comprehensive review and can confirm the analysis is accurate. Here's my assessment based on the current code:

✅ Verified Critical Issue

Issue #1: custom_category_rules Cannot Be Configured

Confirmed - the field exists in the dataclass but is never populated from environment variables:

# Line 122-130 in quality_gates_config.py
_config = QualityGatesConfig(
    enable_task_classification=_parse_bool_env("QUALITY_GATES_ENABLE_TASK_CLASSIFICATION", True),
    strict_mode=_parse_bool_env("QUALITY_GATES_STRICT_MODE", False),
    # ❌ custom_category_rules is missing here
)

Recommended fix (add to get_quality_gates_config()):

def _parse_custom_rules_env() -> Optional[Dict[str, List[str]]]:
    """Parse custom category rules from environment variable."""
    rules_json = os.environ.get("QUALITY_GATES_CUSTOM_CATEGORY_RULES")
    if not rules_json:
        return None
    
    try:
        rules = json.loads(rules_json)
        if not isinstance(rules, dict):
            logger.error("QUALITY_GATES_CUSTOM_CATEGORY_RULES must be a JSON object")
            return None
        return rules
    except json.JSONDecodeError as e:
        logger.error(f"Failed to parse QUALITY_GATES_CUSTOM_CATEGORY_RULES: {e}")
        return None

# Then in get_quality_gates_config():
_config = QualityGatesConfig(
    enable_task_classification=_parse_bool_env("QUALITY_GATES_ENABLE_TASK_CLASSIFICATION", True),
    strict_mode=_parse_bool_env("QUALITY_GATES_STRICT_MODE", False),
    custom_category_rules=_parse_custom_rules_env(),  # ✅ Add this
)

⚠️ Partially Addressed Issue

Issue #2: Silent Fallback When All Custom Gates Invalid

The code does warn for each invalid gate name (lines 80-83), but when all gates are invalid, get_custom_gates_for_category() returns None and silently falls back to default rules. Users won't realize their entire configuration was ignored.

Current behavior at the call site (line 679-683):

custom_gates = config.get_custom_gates_for_category(category.value)
if custom_gates is not None:
    applicable_gates = custom_gates
else:
    applicable_gates = rules.get_applicable_gates(category)
    # ⚠️ No warning here that we fell back due to invalid custom config

Enhancement suggestion - add logging when custom rules exist but were ignored:

custom_gates = config.get_custom_gates_for_category(category.value)
if custom_gates is not None:
    applicable_gates = custom_gates
    logger.info(f"Using custom gates for '{category.value}': {[g.value for g in custom_gates]}")
else:
    applicable_gates = rules.get_applicable_gates(category)
    # Warn if custom rules existed but all were invalid
    if config.custom_category_rules and category.value in config.custom_category_rules:
        logger.warning(
            f"All custom gates for '{category.value}' were invalid. "
            f"Falling back to default gates: {[g.value for g in applicable_gates]}"
        )

✅ Working As Designed (With Caveats)

Issue #3: Coverage Returns 100% When Tool Missing

Verified - the code correctly detects missing pytest-cov (lines 995-1006) and does log a warning (lines 307-311):

if coverage_result.get("skipped"):
    logger.warning(
        f"Coverage check skipped for task {task.id}: {coverage_result.get('skip_reason', 'unknown reason')}. "
        "Coverage gate passed by default but actual coverage was not measured."
    )

The behavior is intentional and documented, but the 100.0% value could be misleading. This is a design trade-off between:

  • ✅ Not blocking tasks when the tool is unavailable (current behavior)
  • ❌ Blocking tasks until the tool is installed (stricter, but could frustrate users)

My recommendation: Keep current behavior but consider a distinct sentinel value or status field in the result to distinguish "measured at 100%" from "tool unavailable".


📋 Action Plan

Before merge (required):

  1. Issue Add Claude Code GitHub Workflow #1: Add QUALITY_GATES_CUSTOM_CATEGORY_RULES environment variable support with JSON parsing
  2. Issue Brainstorming: Integrate remaining general concepts into specification #2: Add warning log when falling back from invalid custom rules

After merge (nice-to-have):
3. 📝 Documentation: Add examples and valid gate names to docstrings/README
4. 🔮 Thread safety: Only needed if concurrent execution is added later

The PR is architecturally solid - these are polish items to make the feature production-ready! 🚀

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @codeframe/config/quality_gates_config.py:
- Around line 110-132: get_quality_gates_config currently does not populate the
QualityGatesConfig.custom_category_rules from env, so add loading of
QUALITY_GATES_CUSTOM_RULES inside get_quality_gates_config when _config is None:
read os.environ.get("QUALITY_GATES_CUSTOM_RULES"), if present parse with
json.loads into the expected structure (fallback to [] or {} matching
QualityGatesConfig default), catch JSONDecodeError and log/error then use the
default, and pass the parsed value into the QualityGatesConfig constructor along
with enable_task_classification and strict_mode so _config includes
custom_category_rules.
🧹 Nitpick comments (5)
codeframe/lib/quality_gate_rules.py (2)

94-99: Missing skip reasons for TESTING category.

The TESTING category skips TYPE_CHECK, LINTING, and CODE_REVIEW gates (per _GATE_RULES), but _SKIP_REASONS only defines reasons for these three gates. The generic fallback message (line 158) will be used, but explicit reasons would provide better context to users.

📝 Suggested addition to _SKIP_REASONS
     TaskCategory.TESTING: {
         QualityGateType.TYPE_CHECK: "Test code may use dynamic patterns that fail type checks",
         QualityGateType.LINTING: "Test code may use patterns that trigger linting warnings",
         QualityGateType.CODE_REVIEW: "Test code is validated through execution rather than review",
     },
+    TaskCategory.CODE_IMPLEMENTATION: {},
+    TaskCategory.REFACTORING: {},
+    TaskCategory.MIXED: {},
 }

128-139: Minor inefficiency: get_applicable_gates called redundantly.

should_skip_gate calls get_applicable_gates, which creates a new list each time. For frequent calls, this could be optimized by checking directly against _GATE_RULES. However, given the current usage pattern, this is acceptable.

♻️ Optional optimization
     def should_skip_gate(self, category: TaskCategory, gate: QualityGateType) -> bool:
-        applicable_gates = self.get_applicable_gates(category)
-        return gate not in applicable_gates
+        gates = _GATE_RULES.get(category, _GATE_RULES[TaskCategory.MIXED])
+        return gate not in gates
codeframe/agents/worker_agent.py (1)

1400-1411: Consider simplifying with getattr instead of repeated hasattr checks.

The hasattr checks suggest uncertainty about QualityGateResult's attributes. If QualityGateResult always has skipped_gates and task_category (as the AI summary indicates), these checks are unnecessary. If backward compatibility is needed, getattr is cleaner.

♻️ Suggested simplification
                 # Build completion message with task classification info
                 from codeframe.core.models import QualityGateType
                 total_gates = len(QualityGateType)
-                skipped_count = quality_result.gates_skipped_count if hasattr(quality_result, 'gates_skipped_count') else 0
-                task_category = quality_result.task_category if hasattr(quality_result, 'task_category') else None
+                skipped_count = getattr(quality_result, 'gates_skipped_count', 0)
+                task_category = getattr(quality_result, 'task_category', None)
codeframe/config/quality_gates_config.py (1)

106-132: Thread safety note for future concurrent execution.

The singleton pattern using a module-level _config variable is safe for sequential execution. If concurrent task processing is added, consider adding threading.Lock protection around the initialization check.

As noted in PR comments (Issue 11), this is low priority unless concurrency is introduced.

codeframe/lib/quality_gates.py (1)

677-690: Consider logging when custom rules are applied.

When custom_gates is not None (custom rules are used), there's no log message indicating custom rules override defaults. This would help with debugging configuration issues.

📝 Suggested logging addition
         if config.should_use_task_classification():
             # Check for custom rules first
             custom_gates = config.get_custom_gates_for_category(category.value)
             if custom_gates is not None:
                 applicable_gates = custom_gates
+                logger.info(
+                    f"Using custom gate rules for category '{category.value}': "
+                    f"{[g.value for g in applicable_gates]}"
+                )
             else:
                 applicable_gates = rules.get_applicable_gates(category)
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 77bacbd and dabbf12.

📒 Files selected for processing (5)
  • codeframe/agents/worker_agent.py
  • codeframe/config/quality_gates_config.py
  • codeframe/lib/quality_gate_rules.py
  • codeframe/lib/quality_gates.py
  • codeframe/lib/task_classifier.py
🧰 Additional context used
📓 Path-based instructions (1)
codeframe/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

codeframe/**/*.py: Use Python 3.11+ for backend development
Use FastAPI framework for backend API development
Use SQLite with async support (aiosqlite) for database operations
Use tiktoken for token counting
Run linting with ruff check .
Import and use domain-specific repositories from codeframe/persistence/repositories/ for data access instead of direct database calls

Files:

  • codeframe/lib/quality_gate_rules.py
  • codeframe/agents/worker_agent.py
  • codeframe/lib/task_classifier.py
  • codeframe/lib/quality_gates.py
  • codeframe/config/quality_gates_config.py
🧠 Learnings (1)
📚 Learning: 2025-12-17T19:21:40.014Z
Learnt from: frankbria
Repo: frankbria/codeframe PR: 128
File: tests/agents/test_bottleneck_detection.py:486-500
Timestamp: 2025-12-17T19:21:40.014Z
Learning: In tests/agents/test_bottleneck_detection.py, test_skip_agents_below_threshold should be async and mock _get_agent_workload to return workload below AGENT_OVERLOAD_THRESHOLD (5) while providing non-empty tasks list to prevent early return in detect_bottlenecks().

Applied to files:

  • codeframe/agents/worker_agent.py
🧬 Code graph analysis (4)
codeframe/agents/worker_agent.py (2)
codeframe/core/models.py (2)
  • QualityGateType (158-166)
  • gates_skipped_count (938-940)
web-ui/src/types/qualityGates.ts (1)
  • QualityGateType (12-12)
codeframe/lib/task_classifier.py (2)
codeframe/core/models.py (2)
  • Task (257-305)
  • title (242-243)
tests/lib/test_quality_gates.py (3)
  • task (56-79)
  • task (467-488)
  • task (901-924)
codeframe/lib/quality_gates.py (5)
codeframe/config/quality_gates_config.py (3)
  • get_quality_gates_config (110-132)
  • should_use_task_classification (46-54)
  • get_custom_gates_for_category (56-85)
codeframe/lib/task_classifier.py (2)
  • TaskClassifier (94-188)
  • classify_task (122-188)
codeframe/lib/quality_gate_rules.py (4)
  • QualityGateRules (102-159)
  • get_applicable_gates (115-126)
  • all_gates (111-113)
  • get_skip_reason (141-159)
codeframe/core/models.py (5)
  • Task (257-305)
  • QualityGateResult (906-940)
  • id (230-231)
  • QualityGateType (158-166)
  • status (246-247)
codeframe/agents/worker_agent.py (1)
  • _create_quality_blocker (1462-1527)
codeframe/config/quality_gates_config.py (1)
codeframe/core/models.py (1)
  • QualityGateType (158-166)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Frontend Unit Tests
  • GitHub Check: Backend Unit Tests
  • GitHub Check: E2E Smoke Tests (Chromium)
  • GitHub Check: opencode-review
🔇 Additional comments (17)
codeframe/lib/quality_gate_rules.py (1)

110-126: LGTM! Defensive coding pattern properly implemented.

The all_gates property returns a new list, and get_applicable_gates returns a copy via list(gates). This prevents external mutation of internal state.

codeframe/lib/task_classifier.py (4)

31-43: LGTM! Good use of str mixin for JSON serialization.

Inheriting from str and Enum ensures TaskCategory.DESIGN.value returns "design" and the enum serializes cleanly to JSON.


115-120: LGTM! Efficient regex pattern compilation.

Sorting keywords by length descending ensures longer keywords match first (e.g., "documentation" before "doc"), and using re.escape prevents regex injection from keyword values.


167-171: Verify: CONFIGURATION excluded from MIXED detection.

has_strong_non_code only includes has_design or has_doc, excluding has_config. This means a task with both "implement" and "configure" would classify as CODE_IMPLEMENTATION, not MIXED.

Is this intentional? Configuration + code tasks might warrant MIXED classification to ensure all gates run.


140-143: LGTM! Defensive None handling.

Using or "" pattern correctly handles None values for task.title and task.description, preventing TypeError when concatenating or calling .lower().

codeframe/agents/worker_agent.py (2)

1417-1426: LGTM! Success payload properly includes classification context.

The return payload correctly includes task_category and skipped_gates, enabling downstream consumers to understand which gates were applied.


1447-1460: LGTM! Failure path includes consistent classification context.

The blocked/failure return includes task_category and skipped_gates, maintaining consistency with the success path.

codeframe/config/quality_gates_config.py (2)

73-85: LGTM! Invalid gate names handled gracefully with logging (Issue 9).

The warning log at line 80-83 correctly notifies users when invalid gate names are provided in custom_category_rules, showing both the invalid name and valid options.


135-141: LGTM! Test-friendly cache reset.

reset_config() properly clears the cached configuration, enabling clean test isolation.

codeframe/lib/quality_gates.py (8)

55-58: LGTM! Clean imports for new classification modules.

Imports for configuration, classifier, and rules modules are properly organized.


701-712: LGTM! Robust error handling for database updates.

Wrapping the requires_human_approval flag update in try/except prevents quality gate execution from failing due to transient database errors.


306-311: LGTM! Helpful warning when coverage is skipped.

The warning clearly indicates the task ID, reason for skipping, and that the gate passed by default. This addresses observability concerns.


993-1006: LGTM! Graceful handling of missing pytest-cov.

The implementation correctly detects missing pytest-cov by checking output strings and returns a skip-like result with coverage_pct=100.0 to pass by default, with clear logging.


838-879: LGTM! Separation of test execution and coverage collection.

Running pytest without --cov flags ensures tests can run even without pytest-cov installed. Coverage is collected separately by _run_coverage().


1158-1188: LGTM! Blocker messages enhanced with category context.

The updated _create_quality_blocker method now includes task category and guidance notes, providing users with better context about which gates were applied and why.


1229-1267: LGTM! Category-specific guidance is helpful.

The _get_category_guidance method provides actionable guidance for each task category, helping users understand when to update task titles or split tasks.


778-779: LGTM! QualityGateResult correctly populated with classification data.

The result includes both skipped_gates and task_category, enabling downstream consumers to understand the gate execution context.

- Add QUALITY_GATES_CUSTOM_RULES env var for custom category-to-gates mapping
- Add _parse_json_env function with validation and error handling
- Add warning when all custom gate names are invalid (fallback to defaults)
- Add comprehensive tests for JSON parsing edge cases

Usage example:
QUALITY_GATES_CUSTOM_RULES='{"design": ["code_review"], "documentation": ["linting"]}'
@github-actions
Copy link
Contributor

PR Review: Task Classification for Quality Gates

I've reviewed this PR and analyzed the code changes, considering the excellent feedback from previous reviews (@github-actions, @macroscopeapp, @coderabbitai).


✅ Issues Already Addressed in Latest Commits

Based on the git history, several high-priority issues have been fixed:

  1. custom_category_rules cannot be configured via environment variablesFIXED

    • Commit b80f03e added _parse_json_env() function (lines 121-175)
    • Environment variable QUALITY_GATES_CUSTOM_RULES is now properly loaded (line 203)
  2. No warning when custom rules produce empty gate listFIXED

    • Lines 94-98 in quality_gates_config.py warn when all gate names are invalid
    • Users are notified that fallback to default rules occurred
  3. Silent invalid configuration handlingFIXED

    • Lines 86-91 log warnings for each invalid gate name with valid options
  4. Database update lacks error handlingFIXED

    • Lines 701-712 in quality_gates.py wrap database operations in try-except
    • Proper error logging without failing the gate
  5. Hardcoded gate countFIXED

    • Line 1402 in worker_agent.py uses total_gates = len(QualityGateType)

🟡 Minor Improvements Suggested

1. Add Info Log When Custom Rules Are Applied

File: codeframe/lib/quality_gates.py:680-683

When custom rules are used, there's no log indicating this configuration override. Adding an info log would improve debugging:

if custom_gates is not None:
    applicable_gates = custom_gates
    logger.info(
        f"Using custom gate rules for category '{category.value}': "
        f"{[g.value for g in applicable_gates]}"
    )

Rationale: While invalid custom rules trigger warnings (lines 94-98), valid custom rules are silently applied. Logging this would help verify configuration is working as expected.


2. Coverage Gate Behavior Clarification

File: codeframe/lib/quality_gates.py:1003

When pytest-cov is not installed, the gate passes with 100% coverage:

"coverage_pct": 100.0,  # Pass if tool not available

Current Behavior: Intentional design to not block tasks when tools are unavailable
Concern: Could mislead users into thinking they have perfect coverage

Suggestion: The current implementation is reasonable for development environments, and the warning at lines 996-999 provides context. However, consider:

  • Adding documentation explaining this behavior
  • Optionally returning a distinct status (e.g., "unavailable") instead of "skipped" to differentiate from intentionally skipped gates

3. Thread Safety Consideration (Low Priority)

File: codeframe/config/quality_gates_config.py:178-179

The singleton pattern is not thread-safe:

_config: Optional[QualityGatesConfig] = None

def get_quality_gates_config() -> QualityGatesConfig:
    global _config
    if _config is None:  # Potential race condition
        _config = QualityGatesConfig(...)

Impact: Low risk for current sequential execution model. Only relevant if concurrent gate execution is added in the future.

Future-proofing: If concurrency is added, use threading.Lock():

import threading

_config_lock = threading.Lock()

def get_quality_gates_config() -> QualityGatesConfig:
    global _config
    if _config is None:
        with _config_lock:
            if _config is None:  # Double-check pattern
                _config = QualityGatesConfig(...)
    return _config

✅ Code Quality Assessment

Architecture: ⭐⭐⭐⭐⭐ Excellent

  • Clean separation of concerns between TaskClassifier, QualityGateRules, and QualityGatesConfig
  • Well-structured with clear responsibilities

Error Handling: ⭐⭐⭐⭐⭐ Excellent

  • Proper try-except for database operations
  • Comprehensive validation of JSON configuration
  • Graceful handling of missing pytest-cov

Logging: ⭐⭐⭐⭐⭐ Excellent

  • Informative logs for task classification (lines 692-695)
  • Warnings for invalid configuration
  • Good visibility into gate execution decisions

Performance: ⭐⭐⭐⭐⭐ Excellent

  • Pre-compiled regex patterns in TaskClassifier.__init__
  • Frozenset for immutability of keyword sets
  • Appropriate subprocess timeouts (60-300s)

Type Safety: ⭐⭐⭐⭐⭐ Excellent

  • Comprehensive type hints throughout
  • Proper use of Python 3.11+ features

📊 Test Coverage

Based on PR description: 129 tests across 4 test files

  • test_task_classifier.py: 39 tests
  • test_quality_gate_rules.py: 19 tests
  • test_quality_gates_config.py: 17 tests
  • test_quality_gates.py: 54 tests

Coverage: Excellent ✅

The tests cover:

  • Category classification accuracy with various keywords
  • Edge cases (empty descriptions, case insensitivity, word boundaries)
  • Configuration parsing and validation
  • Invalid gate name handling
  • Integration tests for selective gate execution

📝 Summary

This is a well-implemented feature that effectively solves the stated problem of false gate failures for non-code tasks. The code quality is high, test coverage is excellent, and most issues from previous reviews have been addressed.

Key Strengths:

  • Clean architecture with clear separation of concerns
  • Excellent test coverage (129 tests)
  • Good error handling and logging
  • Performance optimizations (pre-compiled patterns, immutability)
  • Backward compatibility (hasattr checks for new fields)

Action Items:

  • 🟡 Optional: Add info log when custom rules are applied (improves debugging)
  • 📝 After merge: Document QUALITY_GATES_CUSTOM_RULES usage in project docs
  • 🔮 Future: Add thread safety if concurrent gate execution is planned

Recommendation: ✅ Merge Ready

The suggested improvements are minor enhancements and don't block merging. The implementation is production-ready.


Thanks for the excellent work on this feature! 🚀

@github-actions
Copy link
Contributor

I've completed my review of PR #261 and posted a comprehensive comment. Here's my summary:

Review Summary

This is a well-implemented feature with excellent code quality and test coverage (129 tests). Most issues identified in previous reviews have been addressed in recent commits.

✅ Issues Already Fixed

  1. Environment variable support for custom_category_rules - Commit b80f03e added _parse_json_env() function
  2. Warning when all custom gates invalid - Added logging in quality_gates_config.py:94-98
  3. Database error handling - Try-except wrapper added in quality_gates.py:701-712
  4. Hardcoded gate count - Now uses len(QualityGateType) dynamically
  5. Invalid config handling - Logs warnings for invalid gate names

🟡 Minor Suggestions

  1. Add info log when custom rules applied - Would improve debugging visibility
  2. Coverage gate clarification - Current behavior (pass with 100% when tool unavailable) is reasonable but could be documented
  3. Thread safety - Low priority, only needed if concurrent execution is added

Overall Assessment

  • Architecture: ⭐⭐⭐⭐⭐ Excellent - Clean separation of concerns
  • Test Coverage: ⭐⭐⭐⭐⭐ Excellent - 129 tests across 4 files
  • Error Handling: ⭐⭐⭐⭐⭐ Excellent - Proper validation and graceful degradation
  • Performance: ⭐⭐⭐⭐⭐ Excellent - Pre-compiled patterns, immutability optimizations

Recommendation: ✅ Merge Ready

The suggested improvements are optional and don't block merging. The implementation is production-ready and effectively solves the problem of false gate failures for non-code tasks.

Review posted at: #261 (comment)

New%20session%20-%202026-01-13T15%3A03%3A36.566Z
opencode session  |  github run

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 13, 2026

Tip

For best results, initiate chat on the files or code changes.

@github-actions, thank you for this exceptionally thorough and well-structured review! It's great to see the comprehensive analysis acknowledging the recent fixes and improvements.

Response to Minor Improvement Suggestions

1. ✅ Info Log for Custom Rules Application

Excellent suggestion! Adding an info log when custom rules are successfully applied would indeed improve debugging and configuration verification. This creates a nice symmetry:

  • Invalid custom rules → warning log (already implemented, lines 94-98)
  • Valid custom rules → info log (your suggestion)
  • No custom rules → default behavior (silent)

Your proposed implementation at quality_gates.py:680-683 is spot-on and would give operators clear visibility into which configuration is active.


2. 📝 Coverage Gate Behavior Documentation

You've correctly identified the intentional design choice here. The current behavior is:

  • Pragmatic: Doesn't block development when tooling is temporarily unavailable
  • Transparent: Warning log explains the situation (lines 996-999)
  • ⚠️ Potentially misleading: The 100.0% value could be misinterpreted

Your documentation suggestion is the right approach. The implementation is sound for development workflows; it just needs clear documentation explaining:

  • Why 100.0% is returned (pass-by-default when tool unavailable)
  • The distinction between measured coverage and tool-unavailable status
  • Recommended setup for production environments (ensure pytest-cov is installed)

3. 🔮 Thread Safety Future-Proofing

Accurate assessment. The current singleton pattern (lines 178-179) is sufficient for sequential execution, and the double-checked locking pattern you've outlined is the standard solution if concurrency is added later.

Given the current architecture doesn't require concurrent gate execution, this can safely remain in the "future consideration" category unless parallelization is planned.


Final Thoughts

Your assessment is spot-on: this is production-ready code with excellent architecture, comprehensive test coverage (129 tests!), and good error handling. The three suggestions are polish items that enhance observability and documentation rather than fix functional issues.

Priority recommendations:

  • 🟢 Before merge: Optional info logging for custom rules (improves UX)
  • 📝 After merge: Document coverage gate behavior
  • 🔮 Future: Thread safety if concurrent execution added

The PR successfully solves the core problem of preventing false gate failures for non-code tasks while maintaining backward compatibility and configurability. Great work by @frankbria! 🚀

@frankbria frankbria merged commit abe02fe into main Jan 13, 2026
13 checks passed
@frankbria frankbria deleted the feature/task-classification-quality-gates branch January 13, 2026 15:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[P0] Quality gates fail on every task

2 participants