-
-
Notifications
You must be signed in to change notification settings - Fork 26
feat(qe-product-factors-assessor): SFDIPOT agent with enhanced quality rules #178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Implements a new QE agent that applies James Bach's HTSM Product Factors (SFDIPOT: Structure, Function, Data, Interfaces, Platform, Operations, Time) to analyze epics and generate comprehensive test ideas. Key features: - SFDIPOT-based product factor analysis - Test idea generation with priority (P0-P3) and automation fitness - Clarifying question generation to surface unknown risks - Multiple output formats: HTML, JSON, Markdown, Gherkin - Learning system to persist patterns across assessments - Code intelligence integration for codebase-aware analysis New files: - .claude/agents/qe-product-factors-assessor.md - Agent definition - src/agents/qe-product-factors-assessor/ - Core implementation - tests/agents/qe-product-factors-assessor.test.ts - Unit tests Also includes: - Updated agent registry and spawn handlers - Learning patterns from Epic 4, 5, 6 assessments - Code intelligence configuration - Documentation updates 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Contributor
Author
|
@proffesor-for-testing hi, please don't merge yet. There is some critical work to be done which I will continue after my vacation. Known issues:
|
…hanced test generation Implements 3-phase GOAP plan to improve test idea quality through domain-specific patterns: Phase 1 - Domain Pattern Registry: - Created DomainPatternRegistry with 6 domains (stripe-subscription, gdpr-compliance, pci-dss, hipaa, oauth-oidc, webhook-integration) - Added confidence-based domain detection from requirements text - Extended ProjectContext with DetectedDomain[] for domain tracking Phase 2 - Quality Calibration: - Integrated domain-specific BS patterns in brutal-honesty-analyzer - Added pre-generation validation for domain coverage - Implemented calibrateDomainQuality() with domain-specific scoring adjustments Phase 3 - Domain-Specific Output: - Added domain test template injection in test-idea-generator - Created injectMissingDomainCoverage() for automatic gap filling - Enhanced question-generator with domain-specific clarifying questions Also includes: - Epic 4 Community & Engagement assessment (249 test ideas, 68/100 quality score) - Multiple product factors assessments for various epics - GOAP integration plan documentation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
## Summary - Reduced TypeScript `any` types from 568 to 538 (63% overall reduction from 1,470) - Fixed all 146 TypeScript compilation errors - Updated version to v2.7.1 across all 5 required files - Added ErrorUtils utility for consistent error handling - Updated GOAP plan with Phase 2 completion status ## Type Safety Improvements - Added index signatures for SerializableValue compatibility - Fixed memory retrieval type casts across 15+ agent files - Added proper task payload typing in agent performTask methods - Created shared error handling utilities ## Files Updated - package.json, package-lock.json (version bump) - README.md (version badge) - CHANGELOG.md (release notes) - src/mcp/server-instructions.ts (SERVER_VERSION) - src/core/memory/HNSWVectorMemory.ts (version info) - docs/plans/goap-issue-149-code-quality.md (progress tracking) - 100+ source and test files (type safety fixes) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
…ng/working-with-agents chore(release): v2.7.1 - Type Safety Improvements
Fixes identified from brutal honesty self-review: - Fix test idea count verification (reported count now matches actual) - Add documented scoring rubric with 5-category methodology - Add AC-by-AC testability analysis with per-AC scores - Integrate penetrating questions (85% rate, was 0%) - Pass full RequirementsQualityScore to HTML formatter Changes: - brutal-honesty-analyzer.ts: Add SCORING_RUBRIC, AC analysis methods - html-formatter.ts: Add renderScoringRubric, renderACAnalysis - question-generator.ts: Add getQuestionForSubcategory public API - index.ts: Capture full quality data, use QuestionGenerator Note: Test ideas generation needs significant improvement - currently generates only 20 test ideas. Future work should expand coverage. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Major improvement to test idea coverage: - Added 165+ test templates covering all 37 SFDIPOT subcategories - Connected getGenericIdeasForSubcategory to TestIdeaGenerator templates - Added SBTM exploratory testing tours (FedEx, Garbage Collector, Bad Neighborhood, Landmark, Intellectual, Obsessive-Compulsive, Saboteur) Results on Epic 2 assessment: - Test ideas: 20 → 204 (+920%) - Coverage: 71.4% → 100% - All 7 SFDIPOT categories now covered (STRUCTURE and PLATFORM were 0) Templates include comprehensive coverage for: - Structure: Code, Hardware, NonPhysical, Dependencies, Documentation - Function: Application, Calculation, ErrorHandling, Security, StateTransition, Startup, Shutdown - Data: InputOutput, Lifecycle, Cardinality, Boundaries, Persistence, Types, Selection - Interfaces: UserInterface, ApiSdk, SystemInterface, ImportExport, Messaging - Platform: Browser, OperatingSystem, Hardware, ExternalSoftware, InternalComponents - Operations: CommonUse, UncommonUse, ExtremeUse, DisfavoredUse, Users, Environment - Time: Timing, Concurrency, Scheduling, Timeout, Sequencing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
… with SFDIPOT checklist Implements LLM-powered test idea generation using the complete SFDIPOT framework as a structured checklist: SFDIPOT Checklist (40 subcategories covered in prompt): - STRUCTURE: Code, Hardware, NonPhysical, Dependencies, Documentation - FUNCTION: Application, Calculation, ErrorHandling, Security, StateTransition, Startup, Shutdown - DATA: InputOutput, Lifecycle, Cardinality, Boundaries, Persistence, Types, Selection - INTERFACES: UserInterface, ApiSdk, SystemInterface, ImportExport, Messaging - PLATFORM: Browser, OperatingSystem, Hardware, ExternalSoftware, InternalComponents - OPERATIONS: CommonUse, UncommonUse, ExtremeUse, DisfavoredUse, Users, Environment - TIME: Timing, Concurrency, Scheduling, Timeout, Sequencing New methods: - buildSFDIPOTChecklistPrompt(): Creates comprehensive LLM prompt - generateTestIdeasWithLLM(): Main entry point for LLM generation - parseLLMTestIdeasResponse(): Parses JSON response to TestIdea[] - mapCategoryString(): Maps category strings to HTSMCategory enum - mapPriorityString(): Maps priority strings to Priority enum - mapAutomationFitnessString(): Maps automation strings to enum Usage: Set `useLLM: true` in AssessmentInput and configure `llmConfig` with enabled LLM provider to use LLM-based generation. Falls back to template-based generation if LLM is unavailable. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Adds test script for LLM-based test idea generation: - Configures RuvLLM provider for local inference - Tests SFDIPOT checklist prompt integration - Validates graceful fallback to template generation Test results show: - LLM integration code path works correctly - Graceful fallback when LLM not available (204 test ideas from templates) - All 7 SFDIPOT categories covered - 100% coverage score achieved 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
GOAP Phase 3 - Task Orchestration Integration: - Add TaskWorkflowGoals with 4 goal definitions - Add 17 orchestration-specific GOAP actions - Create GOAPTaskOrchestration integration class - Modify TaskOrchestrateHandler with GOAP planning + template fallback - Add AgentRegistry integration for fleet state - Add 31 integration tests for task orchestration CRITICAL INCIDENT - Data Loss (2025-12-29): - memory.db accidentally deleted during test debugging - 2 months of learning data permanently lost - Added backup system to prevent recurrence: - scripts/backup-memory.js with backup/restore - npm run backup, backup:list, backup:restore - Accelerated learning config for rebuild Also includes: - C4 architecture diagrams - GOAP integration plans and documentation - Quality gate GOAP integration - Plan executor with dry-run mode - Database schema updates for GOAP tables 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
### Added - Database Migration System (src/persistence/migrations/) - MigrationRunner class for versioned schema migrations - 7 migrations covering all core tables - Version tracking in schema_migrations table - Helper functions: tableExists, columnExists, safeAddColumn, safeCreateIndex - CLI Migrate Command (src/cli/commands/migrate/) - aqe migrate status - Show migration status - aqe migrate run - Run pending migrations - aqe migrate rollback - Rollback last migration - aqe migrate reset - Reset all migrations ### Changed - Database initialization now runs migrations automatically during aqe init - Schema consistency ensured across all installations ### Fixed - Schema evolution issues from backup restoration - Dream learning cycle now works with migrated data 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
…ng/working-with-agents feat(release): v2.7.2 - GOAP Phase 3 Task Orchestration + Migration System
## Phase 5: Plan Learning - PlanLearning: EMA-based action success rate tracking - PlanSimilarity: Plan signature matching for reuse (<100ms) - Q-Learning integration for GOAP action selection - Database persistence for learning history ## Phase 6: Live Agent Execution - Real agent spawning via AgentRegistry (not just dry-run) - Output parsing for real-time world state updates - Plan signature storage after successful execution - Learning feedback loop integration ## New Files - src/planning/PlanLearning.ts - src/planning/PlanSimilarity.ts - tests/integration/goap-live-execution.test.ts (17 tests) - tests/integration/goap-phase5-real-integration.test.ts (15 tests) - tests/integration/goap-plan-learning.test.ts (31 tests) ## Test Coverage - 84 GOAP-related tests passing - Output parsing methods verified - Live vs dry-run code paths tested 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
…ng/working-with-agents feat(goap): v2.7.3 - Phase 5 & 6 Plan Learning & Live Agent Execution
The captured_experiences table had a schema mismatch between Migration 003 and ExperienceCapture.initializeSchema(): - Migration 003 created: id, agent_id, action, context, outcome, reward, captured_at - ExperienceCapture expects: id, agent_id, agent_type, task_type, execution, context, outcome, embedding, created_at This caused "no such column: agent_type" errors during `aqe init`. Migration 008 safely adds missing columns with defaults and migrates existing captured_at data to created_at. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Standardize logging across all agents by replacing console.log/error/warn with the centralized Logger utility for consistent log formatting and control. Changes: - Add protected logger to BaseAgent for inheritance by all agents - Remove duplicate local Logger interfaces and ConsoleLogger classes from 8 agents - Migrate 33 files from console.* to Logger: - 19 main QE agents (TestExecutor, FlakyTestHunter, ApiContractValidator, etc.) - 7 n8n workflow agents (N8nBaseAgent, N8nSecurityAuditor, etc.) - 4 utility/adapter files (AgentPool, AgentLLMAdapter, CoordinatorAdapter) - 3 GOAP planning files (Math.random → SecureRandom migration) Results: - Console calls in src/agents/: 90 → 25 (72% reduction) - Remaining calls are in interface defaults, example scripts, and string literals - Build passes with no TypeScript errors 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Updates version across all required files: - package.json - package-lock.json - README.md - CHANGELOG.md - src/mcp/server-instructions.ts - src/core/memory/HNSWVectorMemory.ts Changes in v2.7.4: - fix(db): Migration 008 fixes captured_experiences schema mismatch - refactor(agents): Standardized logging with Logger utility (90→25 console calls) - security: Migrated Math.random() to SecureRandom in GOAP planning 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
…ng/working-with-agents Release v2.7.4: Database Schema Fix & Logger Standardization
Bumps the npm_and_yarn group with 1 update in the / directory: [qs](https://github.com/ljharb/qs). Updates `qs` from 6.14.0 to 6.14.1 - [Changelog](https://github.com/ljharb/qs/blob/main/CHANGELOG.md) - [Commits](ljharb/qs@v6.14.0...v6.14.1) --- updated-dependencies: - dependency-name: qs dependency-version: 6.14.1 dependency-type: indirect dependency-group: npm_and_yarn ... Signed-off-by: dependabot[bot] <[email protected]>
Implements browser-compatible QE agents using @ruvector/edge WASM: - Add BrowserHNSWAdapter with real @ruvector/edge WasmHnswIndex - Add IndexedDBStorage for browser-side vector persistence - Add BrowserAgent for WASM-compatible agent lifecycle - Add Chrome DevTools panel for agent monitoring - Add build:edge script producing 183.4KB gzipped bundle Verified metrics (not estimated): - Bundle size: 183.4 KB gzipped (target: <400KB) ✓ - Tests: 173/177 passing (4 expected failures in Node.js) - Build: TypeScript compilation succeeds Analysis documents: - GOAP evaluation of @ruvector/edge integration - Visionary analysis with 12 novel use cases - Implementation plan for 5 phases 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Phase 1 completion for @ruvector/edge VS Code Extension MVP: P1-006: AQE Pattern Integration - Add AQEPatternBridge service for bidirectional pattern sync - Pattern conversion between CodePattern and QEPattern formats - Offline-first with sync queue support - Export bridge and types from services index P1-007: Security Review and Hardening - Comprehensive SECURITY.md documentation - CSP configuration for WebViews (nonce-based scripts) - Input validation on all WASM calls - Sensitive pattern filtering before storage - Checksum validation on stored entries - Storage limits with LRU eviction - 40 security tests covering all threat vectors P1-008: End-to-End Testing - Full E2E test suite with mock VS Code API - Tests for extension activation lifecycle - Code analysis and test suggestion flows - Coverage visualization WebView tests - Offline storage and sync queue tests - Pattern matching and learning tests Code Quality: - Replace TODO placeholders with @template markers in test generators - Test template generators use @template: for user-fillable sections All 171 tests passing. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
- CodeAnalyzer.ts: fix getLanguage return type to literal union - AQEPatternBridge.ts: add type assertions for pattern type mappings - storage/index.ts: import classes directly for factory function usage - TestGenerationQuickPick.ts: add type assertions for selectedItems Phase 1 VS Code extension now builds with zero TypeScript errors. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
…rning Add RuVector-based learning system that captures patterns from development: - .ruvector/intelligence.json: stores learned patterns, error fixes, file sequences - scripts/workers/qe-self-learning.js: background learning worker - scripts/workers/sync-intelligence.js: intelligence synchronization - scripts/migrate-memory-to-ruvector.js: migration from legacy memory - scripts/migrate-swarm-to-ruvector.js: migration from swarm patterns - scripts/suggest-agents-from-intelligence.js: agent routing suggestions Updated .gitignore to track .ruvector/ for shared team learning. Updated .claude/settings.json with RuVector hooks integration. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
…ng/dependabot/npm_and_yarn/npm_and_yarn-2b901f0e0d chore(deps): bump qs from 6.14.0 to 6.14.1 in the npm_and_yarn group across 1 directory
- Integrate 8 previously unregistered MCP handlers: - Chaos engineering: chaos_inject_latency, chaos_inject_failure, chaos_resilience_test - Integration testing: integration_dependency_check, integration_test_orchestrate - Token-optimized: test_execute_filtered, performance_test_filtered, quality_assess_filtered - Create NewDomainToolsHandler.ts to route new domain tools Dead code removal (~22k lines): - Delete 4 overlapping filtered handlers (coverage, flaky, security, contract) - Remove unused directories: src/alerting/, src/reporting/, src/transport/ - Remove 4 unregistered CLI command dirs: fleet/, test/, quality/, monitor/ (42 files) - Archive 10 one-time verification scripts to scripts/archive/ - Remove 2 CLI backup files (index-spec.ts, index-working.ts) Test organization: - Move HNSWPatternStore.test.ts from src/ to tests/unit/memory/ - Delete OutputFormatter duplicate test from src/ - Consolidate duplicate RuVector and learning-persistence tests - Rename RuVector.SelfLearning.test.ts for consistent naming Dependencies: - Remove unused packages: jose, @types/chrome, gpt-tokenizer Build config: - Exclude src/edge/ subdirectories from main tsconfig (separate build) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
…001, P2-002) Phase 2 P2P Foundation implementation with parallel development: **P2-001: Ed25519 Cryptographic Identity** (src/edge/p2p/crypto/) - Identity.ts: Agent identity generation with Ed25519 keypairs - KeyManager.ts: Secure key storage, rotation, and revocation - Signer.ts: Message signing/verification with batch support - BIP39-style seed phrase recovery - AES-GCM encryption for private key storage **P2-002: WebRTC Connection Manager** (src/edge/p2p/webrtc/) - PeerConnectionManager.ts: Multi-peer WebRTC connections - SignalingClient.ts: WebSocket-based signaling with heartbeat - ICEManager.ts: STUN/TURN configuration and NAT detection - ConnectionPool.ts: Connection pooling with eviction policies - Automatic reconnection with exponential backoff **Tests:** - crypto.test.ts: 45 tests for identity, signing, key management - webrtc.test.ts: 65 tests for connections, signaling, pooling Total: 6,250 lines of production code + 73,000+ bytes of tests 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Implements remaining Phase 2 components for @ruvector/edge: P2-003 Agent-to-Agent Communication Protocol: - AgentChannel: Secure bidirectional channels between agents - MessageEncoder: Binary message encoding with compression - MessageRouter: Multi-hop routing with dead letter queue - ProtocolHandler: Protocol negotiation and handshake P2-004 Pattern Sharing Protocol: - PatternSerializer: Pattern serialization with anonymization - PatternBroadcaster: Gossip-based pattern broadcasting - PatternIndex: Vector similarity search for patterns - PatternSyncManager: Delta-based sync with vector clocks P2-005 Federated Learning Infrastructure: - GradientAggregator: FedAvg, FedProx, Krum, trimmed mean - FederatedRound: Training round lifecycle management - FederatedCoordinator: Multi-round training coordination - ModelManager: Weight management and checkpointing P2-006 CRDT-Based Conflict Resolution: - GCounter, LWWRegister, ORSet implementations - PatternCRDT: Composite CRDT for shared patterns - VectorClock: Causality tracking - CRDTStore: Centralized CRDT management with GC P2-007 Two-Machine Coordination (tests): - CoordinationManager integration tests - Peer authentication and health monitoring P2-008 NAT Traversal and TURN Fallback: - NATDetector: STUN-based NAT type detection - TURNManager: TURN server credential management - HolePuncher: UDP hole punching with port prediction - ConnectivityTester: Connection quality assessment Test Coverage: - 460 tests passing across all 8 P2P modules - Fixed vitest->jest imports and promise handling 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add comprehensive integration test suite for P2P Foundation: - crypto-webrtc.integration.test.ts (28 tests) - protocol-sharing.integration.test.ts (34 tests) - coordination-crdt.integration.test.ts (33 tests) - full-stack.integration.test.ts (48 tests) Update GOAP checklist marking Phase 2 complete: - All 24 Phase 2 tasks completed - 605+ total tests (462 unit + 143 integration) - All modules verified working 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Fixes identified from brutal honesty self-review: - Fix test idea count verification (reported count now matches actual) - Add documented scoring rubric with 5-category methodology - Add AC-by-AC testability analysis with per-AC scores - Integrate penetrating questions (85% rate, was 0%) - Pass full RequirementsQualityScore to HTML formatter Changes: - brutal-honesty-analyzer.ts: Add SCORING_RUBRIC, AC analysis methods - html-formatter.ts: Add renderScoringRubric, renderACAnalysis - question-generator.ts: Add getQuestionForSubcategory public API - index.ts: Capture full quality data, use QuestionGenerator Note: Test ideas generation needs significant improvement - currently generates only 20 test ideas. Future work should expand coverage. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Major improvement to test idea coverage: - Added 165+ test templates covering all 37 SFDIPOT subcategories - Connected getGenericIdeasForSubcategory to TestIdeaGenerator templates - Added SBTM exploratory testing tours (FedEx, Garbage Collector, Bad Neighborhood, Landmark, Intellectual, Obsessive-Compulsive, Saboteur) Results on Epic 2 assessment: - Test ideas: 20 → 204 (+920%) - Coverage: 71.4% → 100% - All 7 SFDIPOT categories now covered (STRUCTURE and PLATFORM were 0) Templates include comprehensive coverage for: - Structure: Code, Hardware, NonPhysical, Dependencies, Documentation - Function: Application, Calculation, ErrorHandling, Security, StateTransition, Startup, Shutdown - Data: InputOutput, Lifecycle, Cardinality, Boundaries, Persistence, Types, Selection - Interfaces: UserInterface, ApiSdk, SystemInterface, ImportExport, Messaging - Platform: Browser, OperatingSystem, Hardware, ExternalSoftware, InternalComponents - Operations: CommonUse, UncommonUse, ExtremeUse, DisfavoredUse, Users, Environment - Time: Timing, Concurrency, Scheduling, Timeout, Sequencing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
… with SFDIPOT checklist Implements LLM-powered test idea generation using the complete SFDIPOT framework as a structured checklist: SFDIPOT Checklist (40 subcategories covered in prompt): - STRUCTURE: Code, Hardware, NonPhysical, Dependencies, Documentation - FUNCTION: Application, Calculation, ErrorHandling, Security, StateTransition, Startup, Shutdown - DATA: InputOutput, Lifecycle, Cardinality, Boundaries, Persistence, Types, Selection - INTERFACES: UserInterface, ApiSdk, SystemInterface, ImportExport, Messaging - PLATFORM: Browser, OperatingSystem, Hardware, ExternalSoftware, InternalComponents - OPERATIONS: CommonUse, UncommonUse, ExtremeUse, DisfavoredUse, Users, Environment - TIME: Timing, Concurrency, Scheduling, Timeout, Sequencing New methods: - buildSFDIPOTChecklistPrompt(): Creates comprehensive LLM prompt - generateTestIdeasWithLLM(): Main entry point for LLM generation - parseLLMTestIdeasResponse(): Parses JSON response to TestIdea[] - mapCategoryString(): Maps category strings to HTSMCategory enum - mapPriorityString(): Maps priority strings to Priority enum - mapAutomationFitnessString(): Maps automation strings to enum Usage: Set `useLLM: true` in AssessmentInput and configure `llmConfig` with enabled LLM provider to use LLM-based generation. Falls back to template-based generation if LLM is unavailable. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Adds test script for LLM-based test idea generation: - Configures RuvLLM provider for local inference - Tests SFDIPOT checklist prompt integration - Validates graceful fallback to template generation Test results show: - LLM integration code path works correctly - Graceful fallback when LLM not available (204 test ideas from templates) - All 7 SFDIPOT categories covered - 100% coverage score achieved 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
… and AY-E001 - Generate Epic 3 AI Personalization Search assessment (143 test ideas) - Generate AY-E001 Celebrity Collections assessment (127 test ideas) - Fix agent template to include complete info sections (no truncation) - Add DO NOT TRUNCATE rule to agent compliance section - Update learning config for RuVector GNN/LoRA/EWC++ settings Co-Authored-By: Claude Opus 4.5 <[email protected]>
…th quality rules
Comprehensive improvements to deliver higher quality SFDIPOT assessments:
## Priority Distribution Rules
- Added strict distribution targets: P0 (8-12%), P1 (20-30%), P2 (35-45%), P3 (20-30%)
- Added priority inflation check with mandatory review if P1 > 35%
- Added calibration questions for each priority level
## Test Idea Quality Rules
- Added banned patterns list ("Verify X works correctly" etc.)
- Added transformation process: boundaries, off-by-one, state combinations
- Added failure modes, race conditions, external dependencies
## Automation Fitness Reality Check
- Added target percentages: unit (30-40%), e2e (≤50%), human-exploration (≥10%)
- Added reality check for e2e-heavy recommendations
- Fixed over-optimism issue (was 4% human exploration, now requires ≥10%)
## Domain Context Requirements
- Added mandatory domain detection before test generation
- Added risk pattern identification (social media, celebrity content, e-commerce)
- Added domain-specific edge case extraction
## Edge Cases Checklist
- Added comprehensive checklist: race conditions, contract expiry, content takedown
- Added notification SLAs, time-based expiry, external API dependencies
- Added state management and session persistence scenarios
## Quality Gates
- Added mandatory checks before finalizing reports
- 5-phase process with explicit validation steps
- Self-review requirement for priority distribution
Addresses all findings from brutal-honesty-review skill analysis of AY-E001 assessment.
Co-Authored-By: Claude Opus 4.5 <[email protected]>
… assessment Added AY-E002 (Live Shopping Experience Unification) SFDIPOT assessment to validate the enhanced agent quality rules. Results demonstrate significant improvement: - Priority distribution: P0=10.3%, P1=28.7%, P2=41.4%, P3=19.5% (all within targets) - Template patterns eliminated: 0 occurrences of "Verify X works correctly" - Human exploration: 13.8% (above 10% minimum) - E2E tests: 20.7% (well under 50% max) - Domain-specific edge cases included (WebSocket drops, inventory races, DST) Brutal honesty review score improved from 2.75/10 to 8.25/10. Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add explicit human exploration counting in Phase 4 - Add <human_exploration_templates> with domain-specific tests - Strengthen Quality Gates to be BLOCKING with explicit failure action - Add Universal tests (5) and Domain-Specific tests (6 domains) - Fix P1 target from 35% to 30% per brutal-honesty findings Addresses brutal-honesty-review finding that NORD assessment had only 7.5% human exploration (below 10% minimum). Co-Authored-By: Claude Opus 4.5 <[email protected]>
…ecklist Major enhancement to test generation using LLM-level intelligent thinking: ## New Sections Added ### <sfdipot_subcategory_checklist> (300+ lines) - 28 subcategories across 7 SFDIPOT categories - Each subcategory has: - Applicability Check question - Automated test triggers with examples - Human exploration triggers with examples - Agent MUST evaluate each subcategory for applicability - Tests generated ONLY for applicable subcategories ### <human_judgment_detector> - 5-step reasoning process for human test detection: 1. Subjective language identification 2. Expertise requirement detection 3. Perception-based judgment recognition 4. Discovery opportunity identification 5. Test generation with explicit reasoning - Includes "Why Human Essential" column requirement ## Workflow Updates ### Phase 2: Test Idea Generation - Now STRICTLY follows subcategory checklist - Iterates ALL 28 subcategories per requirement - Generates tests from applicable triggers only ### Phase 4: Automation Fitness - Uses intelligent human detection, not templates - Applies <human_judgment_detector> to every requirement - Falls back to checklist review if <10% Addresses requirement for LLM-level thinking instead of template-based generation. Co-Authored-By: Claude Opus 4.5 <[email protected]>
Step-by-step documentation of how qe-product-factors-assessor delivers output when invoked via Task tool. Covers all 8 phases from invocation to result return. Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add PRE-OUTPUT HARD STOP section with 3-step validation - Add ABSOLUTE BAN on "Verify" patterns with transformation examples - Add STEP 1: verify_count must equal 0 before output - Add STEP 2: P1 ≤30%, P3 ≥20% enforcement loops - Add STEP 3: Human ≥10% auto-add loop - Update Gates 2,4,5,7 to HARD STOP blockers - Add Gate 10 for human exploration row structure - Add explicit INVALID/VALID examples for human test format Co-Authored-By: Claude Opus 4.5 <[email protected]>
…n hook - Add scripts/validate-sfdipot-assessment.ts - validates assessment HTML output - Add scripts/hooks/validate-sfdipot-on-write.sh - PostToolUse hook - Register hook in .claude/settings.json for Write operations - Validates: Gate 7 (no Verify), Gate 2 (P1≤30%), Gate 4 (P3≥20%), Gate 5 (Human≥10%) - Stores validation results in memory.db for learning Co-Authored-By: Claude Opus 4.5 <[email protected]>
Option C implementation: separate concerns into focused agents - Generator: qe-product-factors-assessor (coverage) - Rewriter: qe-test-idea-rewriter (action verbs) - Validator: validate-sfdipot-assessment.ts (quality gates) Results: - V12 (single agent, no hand-holding): 28 Verify patterns (FAIL) - V13 (V12 + rewriter): 0 Verify patterns (ALL GATES PASS) Pipeline reduces attention dilution by giving each agent a single responsibility with narrow focus. Co-Authored-By: Claude Opus 4.5 <[email protected]>
… loop Adds explicit transformation loop that runs BEFORE saving HTML: - STEP 1: Scan for "Verify X" patterns - STEP 2: Transform each using pattern table - STEP 3: Re-scan until verify_count = 0 - STEP 4: Only save when clean Results without hand-holding: - V12 (before): 28 Verify patterns - V14 (after): 0 Verify patterns ✓ Gate 7 now passes without Task prompt hand-holding. Co-Authored-By: Claude Opus 4.5 <[email protected]>
…onal, not gates Priority distribution should be domain-specific, determined by SMEs with business context - not arbitrary percentage targets that cause meaningless priority shuffling. Changes: - Validator: Priority gates are now soft (informational) vs hard (blocking) - Agent: Removed mandatory percentage rebalancing loops - Agent: Priority guidelines are context-driven, not percentage-driven - E5 Assessment: Added SME Review Warning box for priority validation Hard gates that remain: - Gate 5: Human >= 10% - Gate 7: NO "Verify X" patterns - Gate 8: 28 SFDIPOT subcategories - Gate 9: Feature coverage - Gate 10: Human test format Co-Authored-By: Claude Opus 4.5 <[email protected]>
…late - Add docs/templates/sfdipot-reference-template.html as permanent reference - Template contains all required sections without client-specific data: - 7 Exploratory Testing Charters with correct naming - 7 Test Data sections with correct naming - 5 items in "How to use this report?" section - No Human Exploration in Automation Fitness summary - Update agent to use new template path instead of client-specific file - Ensures agent works for all users without dependency on private files Co-Authored-By: Claude Opus 4.5 <[email protected]>
…eference template The reference template was missing the Mutation Testing Strategy section, causing generated reports to omit this important section. Added: - Recommended Mutation Targets (business logic, boundaries, error handling, states) - Kill Rate Targets table (95% for critical paths, 85% for API, 70% for UI) - Mutation Operators to Apply (arithmetic, relational, logical, return values) Template now matches E001 structure with all required sections.
…ing section Regenerated PMI-E002 SFDIPOT assessment to verify template fix. Report now includes: - Mutation Testing Strategy section ✓ - 168 test ideas across 7 SFDIPOT categories - 7 exploration charters - 7 test data strategies - 0 'Human Exploration' in automation chart
…egory content
Problem: Test Data and Test Ideas content was randomly appearing in wrong
sections across SFDIPOT categories.
Solution:
- Added Gate 14: Strict Section Order validation
- Added explicit 4-subsection structure with numbered comments
- Added FAILURE CONDITIONS for order violations
- Added CSS table-layout: fixed for consistent column widths
- Added Step 6b validation to check section ordering
Each category section MUST now have content in this order:
1. Test Ideas table (filterable-table with tbody)
2. Test Data Strategy (📊 Recommended Test Data for {CATEGORY})
3. Exploration Charter (🔍 Recommended Exploratory Testing Charter)
4. Clarifying Questions
Verified E002 report now has:
- 7 Test Data sections ✓
- 7+ Charter sections ✓
- Correct section ordering in all categories ✓
Epic: Evidence, Science & Regulatory Communication User Stories: US01, US02, US03 Test Ideas: 98 total across 7 SFDIPOT categories Validated: - 7 Test Data sections ✓ - 7 Exploration Charters ✓ - 0 Human in automation chart ✓
…ructure Overwrote client-specific content with generic SFDIPOT template to ensure agent reads correct structure regardless of which file path it uses. This serves as a fallback fix since agent sometimes ignores the primary template at docs/templates/sfdipot-reference-template.html
Epic: Social Proof & User-Generated Content Integration (Next.co.uk) Domain: E-commerce Retail Fashion Test Ideas: 35 (5 per SFDIPOT category) Validated: - 7 Test Data sections ✓ - 7 Exploration Charters ✓ - 0 Human in automation chart ✓ Key risk areas: Content moderation, third-party APIs, GDPR compliance
- Replace artificially constrained v1 (35 tests) with proper risk-driven v2 - Risk-driven distribution: Function(28) > Interfaces(26) > Data(24) > Structure(22) > Platform(18) > Operations(16) > Time(13) - Includes copyright detection, abuse prevention, rate limiting tests - 20 Human SME tests (13.6%) for subjective quality assessments - All 7 SFDIPOT categories with Test Data and Exploratory Charters Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Remove fixed "exactly 2 questions per subcategory" constraint - Add variable question count guidance (1 gap = 1 question, 3 gaps = 3 questions) - Remove rigid "minimum 21 rows" PCO requirement - now proportional to AC/NFR count - Remove "at least 3 testable elements per SFDIPOT category" constraint - Remove fixed "3-5 test ideas per AC" - now complexity-based - Remove "add at least 3 from templates" for human exploration padding - Fix Gate 15/21 column count inconsistency (both now 4 columns) - Add Gate 22 for enforcing variable question counts - Update PCO description to use "Product Factor(s)" (plural) These changes ensure realistic output that reflects actual requirement content rather than pattern-matching to template-driven quotas. Co-Authored-By: Claude Opus 4.5 <[email protected]>
Owner
|
This agent (qe-product-factors-assessor) is now part of the v3 alpha release. It will be made available with the v3 version soon, once we finish fine-tuning and testing. The implementation includes:
Thank you for the contribution! |
proffesor-for-testing
added a commit
that referenced
this pull request
Jan 18, 2026
Implements comprehensive product factors analysis using James Bach's HTSM framework v6.3 for test strategy generation. New Features: - qe-product-factors-assessor: Full SFDIPOT analysis (7 categories, 37 subcategories) - qe-test-idea-rewriter: Transform "Verify X" patterns to action-verb format - sfdipot-product-factors skill: Skill definition for SFDIPOT assessment - test-idea-rewriting skill: Skill for test idea quality improvement Product Factors Assessment Capabilities: - Test idea generation with P0-P3 priority levels - Automation fitness recommendations (Unit/Integration/E2E/Human) - Brutal Honesty validation (Bach/Ramsay/Linus modes) - Domain pattern detection (ecommerce, healthcare, finance) - Multiple output formats: HTML, JSON, Markdown, Gherkin - Clarifying question generation for coverage gaps Code Changes: - Removed deprecated time-crystal module (consolidated into mincut) - Removed compatibility layer (v2-v3 migration complete) - Enhanced unified-memory with improved persistence - Added kuramoto-cpg oscillator for coordination - Fixed various TypeScript compilation issues Based on original implementation by @fndlalit (PR #178) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Adds a new QE agent that applies James Bach's HTSM Product Factors (SFDIPOT) framework to analyze epics and generate comprehensive test ideas for Product Coverage Sessions.
Enhanced with quality rules based on brutal-honesty-review feedback to produce production-grade assessments.
Key Features
Quality Rules Added (Brutal Honesty Feedback)
Before/After Quality Comparison
Test Idea Quality Example
Before:
After:
Files Added/Modified
.claude/agents/qe-product-factors-assessor.md- Agent definition with quality rulessrc/agents/qe-product-factors-assessor/- Core implementation.agentic-qe/product-factors-assessments/- Sample assessments (AY-E001, AY-E002)Test plan
🤖 Generated with Claude Code