feat: docs/prds/self-improving-feedback.md#101
Conversation
|
DiffGuard AI AnalysisAI Review Summary🏆 Overall Score: 82/100 This PR implements a comprehensive self-improving feedback loop with outcome recording, pattern detection, and prompt augmentation. The implementation is well-structured with strong test coverage, but has some performance considerations and opportunities for improved observability. ✅ Key Strengths
|
| Bug Name | Affected Files | Description | Confidence |
|---|---|---|---|
| None found | — | — | — |
📋 Issues Found
| Issue Type | Issue Name | Affected Components | Description | Impact/Severity |
|---|---|---|---|---|
| Maintainability | Silent Error Swallowing | packages/cli/src/commands/*.ts, packages/core/src/feedback/prompt-augmenter.ts |
Errors caught in feedback recording are silently ignored, making it difficult to diagnose why feedback might not be working. | Medium |
| Performance | Multiple Streak Queries | packages/core/src/feedback/pattern-analyzer.ts |
countRecentStreaks and countSuccessStreak both query outcomes independently; could be optimized with a single query or in-memory caching. |
Low |
| Testing | Edge Case Coverage | packages/core/src/__tests__/feedback/pattern-analyzer.test.ts |
Missing tests for edge cases like maxActiveAugmentations: 0, pattern expiration, and confidence threshold boundaries. |
Low |
🔚 Conclusion
This is a solid, well-architected feature addition with proper separation of concerns and comprehensive test coverage. The main follow-ups are adding observability for silent failures and considering query optimization for pattern analysis at scale. No blocking bugs found—merge-ready with minor improvements recommended.
Analyzed using z-ai/glm-5
DiffGuard AI AnalysisAI Review Summary🏆 Overall Score: 88/100 This PR implements a comprehensive self-improving feedback loop system with outcome persistence, pattern detection, and prompt augmentation. The implementation is thorough and well-structured with strong test coverage, but has a few minor consistency and maintainability concerns. ✅ Key Strengths
|
| Issue Type | Issue Name | Affected Components | Description | Impact/Severity |
|---|---|---|---|---|
| Maintainability | Duplicate Feedback Function | commands/review.ts, commands/run.ts |
applyProjectFeedbackPromptEnv is duplicated across two files; should be centralized. |
Medium |
| Testing | Config Form Coverage Gap | pages/Settings.tsx |
Feedback config fields in JobsTab UI are not covered by existing Settings test fixtures. | Low |
🔚 Conclusion
This is a strong, well-designed PR with comprehensive cross-cutting implementation and appropriate safeguards. The feedback loop architecture is sound and the secret redaction is thorough. Addressing the duplicated function is a straightforward refactor that would improve maintainability. Ready for merge with minor cleanup.
Analyzed using z-ai/glm-5
DiffGuard AI AnalysisAI Review Summary🏆 Overall Score: 82/100 This PR implements a comprehensive self-improving feedback loop system with outcome recording, pattern detection, and prompt augmentation. The implementation is thorough with good test coverage, but has a few correctness and performance concerns to address. ✅ Key Strengths
|
| Bug Name | Affected Files | Description | Confidence |
|---|---|---|---|
| Missing Defaults for Partial Feedback Config | packages/cli/src/commands/shared/feedback.ts:36-42 |
If config.feedback is defined but has missing properties, getFeedbackAnalysisOptions uses config.feedback ?? {...} which only provides defaults when feedback is entirely undefined, not for partial configs. |
Medium 🟡 |
📋 Issues Found
| Issue Type | Issue Name | Affected Components | Description | Impact/Severity |
|---|---|---|---|---|
| Performance | Multiple Sequential Queries on Outcome | packages/core/src/feedback/pattern-analyzer.ts |
analyzeFeedbackOutcome executes multiple sequential queries (listPatterns, queryOutcomes for streak counting, listActiveAugmentations) which could increase latency at scale. |
Medium |
| Testing | Test Double-Resets Database | packages/cli/src/__tests__/commands/run.test.ts:498-500 |
Test manually resets database state before the test body, but afterEach already does this—could indicate test isolation concerns. |
Low |
| Maintainability | Duplicated Outcome Recording Logic | Multiple command files | While recordJobOutcome exists, some commands (review, run) use inline repository calls instead of the shared helper. |
Low |
| Security | Potential Timing Attack on Signature Comparison | packages/core/src/feedback/outcome-parser.ts |
Failure signatures are compared directly for pattern matching; if signatures ever include sensitive data (unlikely given the redaction), timing attacks could theoretically apply. | Low |
🔚 Conclusion
This is a strong implementation of a complex feature with good architecture and comprehensive test coverage. The main concern is the potential bug with partial feedback config handling, which should be verified. The performance concerns are manageable at current scale but worth tracking. Recommend addressing the config defaults issue before merge.
Analyzed using z-ai/glm-5
DiffGuard AI AnalysisAI Review Summary🏆 Overall Score: 82/100 This PR implements a comprehensive self-improving feedback loop system with outcome recording, pattern detection, and prompt augmentation. The implementation is thorough with good test coverage, but has some code duplication and architectural concerns that should be addressed. ✅ Key Strengths
|
| Bug Name | Affected Files | Description | Confidence |
|---|---|---|---|
| CLI Overrides Ignored in Feedback Prompt | packages/cli/src/commands/run.ts, packages/cli/src/commands/review.ts |
applyProjectFeedbackPromptEnv reloads config via loadConfig(projectDir), discarding any CLI overrides that were applied earlier in the command handler. This could cause feedback prompts to use stale configuration values. |
Medium 🟡 |
📋 Issues Found
| Issue Type | Issue Name | Affected Components | Description | Impact/Severity |
|---|---|---|---|---|
| Maintainability | Duplicated Prompt Env Function | packages/cli/src/commands/run.ts, packages/cli/src/commands/review.ts |
Identical applyProjectFeedbackPromptEnv function defined in two places |
Medium |
| Performance | Repeated Database Queries | packages/server/src/routes/feedback.routes.ts |
buildWindowSummary queries outcomes multiple times per window; could be consolidated into fewer queries |
Low |
| Maintainability | Hardcoded Magic Numbers | packages/core/src/feedback/pattern-analyzer.ts |
Constants like RECENT_WINDOW_MS, STALE_WINDOW_MS are hardcoded without configuration options |
Low |
🔚 Conclusion
This is a strong implementation of a complex feedback loop feature. The code is well-structured and thoroughly tested. The main concerns are code duplication (especially the applyProjectFeedbackPromptEnv function) and the config reload issue which could lead to subtle bugs with CLI overrides. These should be addressed before merge, but the overall architecture is solid.
Analyzed using z-ai/glm-5
DiffGuard AI AnalysisAI Review Summary🏆 Overall Score: 82/100 This PR implements a comprehensive self-improving feedback loop system with structured outcome storage, pattern detection, and prompt augmentation. The implementation is thorough and well-architected, with a few areas for improvement around code duplication and performance considerations. ✅ Key Strengths
|
| Bug Name | Affected Files | Description | Confidence |
|---|---|---|---|
| None found | N/A | No concrete bugs identified. The implementation handles edge cases well with proper try-catch wrapping and secret redaction. | N/A |
📋 Issues Found
| Issue Type | Issue Name | Affected Components | Description | Impact/Severity |
|---|---|---|---|---|
| Maintainability | Duplicate Feedback Prompt Function | run.ts, review.ts |
applyProjectFeedbackPromptEnv is copy-pasted between both command files instead of being shared. |
Medium |
| Performance | Synchronous Pattern Analysis | pattern-analyzer.ts, feedback.ts |
analyzeFeedbackOutcome runs synchronously, querying recent outcomes and upserting patterns on every job completion. |
Low |
| Maintainability | Undocumented Threshold Constants | pattern-analyzer.ts |
Magic numbers like RECENT_WINDOW_MS, STALE_WINDOW_MS, and MAX_PATTERN_TEXT_LENGTH lack justification comments. |
Low |
| Testing | Missing Edge Case Tests | feedback/*.test.ts |
No tests for error scenarios like database locked, circular metadata, or malformed script results. | Low |
🔚 Conclusion
This is a strong implementation of a feedback loop system with excellent security practices and clean architecture. The main concern is code duplication that should be refactored before merge. Performance considerations around synchronous pattern analysis are minor but worth tracking for high-frequency deployments. No critical bugs found.
Analyzed using z-ai/glm-5
DiffGuard AI AnalysisAI Review Summary🏆 Overall Score: 85/100 This PR introduces a comprehensive self-improving feedback loop system that records job outcomes, detects repeated failure patterns, and generates prompt augmentations to improve future runs. The implementation is thorough with solid test coverage and appropriate database schema, though it has some maintainability and optimization opportunities. ✅ Key Strengths
|
| Issue Type | Issue Name | Affected Components | Description | Impact/Severity |
|---|---|---|---|---|
| Maintainability | Duplicated Feedback Env Function | run.ts, review.ts |
Same applyProjectFeedbackPromptEnv() implementation exists in both files with identical logic. |
Medium |
| Performance | Sequential Job Type Queries | feedback.routes.ts → buildWindowSummary() |
Runs querySummary() for each valid job type separately, creating 8+ sequential DB calls per window. |
Medium |
| Maintainability | Duplicated Secret Patterns | outcome-parser.ts, session-outcome.repository.ts |
Same SECRET_TEXT_PATTERNS regex array defined in two locations, risking inconsistent updates. |
Low |
| Testing | Outcome Parser Edge Cases | outcome-parser.test.ts |
Missing tests for complex ANSI stripping, nested metadata redaction, and path extraction edge cases. | Low |
🔚 Conclusion
This is a strong, well-structured implementation of a sophisticated feedback loop feature. The core logic is sound with appropriate safeguards for secret handling and error isolation. The identified issues are maintainability and optimization concerns that should be addressed before long-term maintenance but do not block immediate merge. Consolidating the duplicated function and secret patterns would improve code health significantly.
Analyzed using z-ai/glm-5
- Generated by Night Watch QA agent - UI tests: 1 passing, 0 failing - API tests: 3 passing, 0 failing - Artifacts: screenshots, videos Co-Authored-By: Claude <noreply@anthropic.com>
Night Watch QA ReportChanges Classification
Test ResultsUI Tests (Playwright)
ScreenshotsVideo RecordingVideo artifact committed to API Tests
Additional Verification
Night Watch QA Agent
|

Closes #97
Night Watch manages this draft PR automatically so progress is preserved across retries and timeouts.
Status labels: