Skip to content

Conversation

@fndlalit
Copy link
Contributor

@fndlalit fndlalit commented Dec 28, 2025

Summary

Adds a new QE agent that applies James Bach's HTSM Product Factors (SFDIPOT) framework to analyze epics and generate comprehensive test ideas for Product Coverage Sessions.

Enhanced with quality rules based on brutal-honesty-review feedback to produce production-grade assessments.

Key Features

  • SFDIPOT Analysis: Structure, Function, Data, Interfaces, Platform, Operations, Time
  • Test Idea Generation: Prioritized (P0-P3) with automation fitness recommendations
  • Clarifying Questions: Surfaces unknown risks and missing requirements
  • Multiple Formats: HTML (interactive dashboard), JSON, Markdown, Gherkin
  • Learning System: Persists patterns across assessments for continuous improvement

Quality Rules Added (Brutal Honesty Feedback)

Rule Purpose
Priority Distribution P0: 8-12%, P1: 20-30%, P2: 35-45%, P3: 20-30%
Test Idea Quality No "Verify X works correctly" patterns; require boundaries, failure modes
Automation Fitness Minimum 10% human-exploration, max 50% E2E
Domain Context Mandatory risk pattern identification before test generation
Edge Cases Checklist Race conditions, external API failures, time-based expiry

Before/After Quality Comparison

Metric Before After Target
P1 Priority 57.5% ❌ 28.7% ✅ 20-30%
Human Exploration 4% ❌ 13.8% ✅ ≥10%
E2E Tests 68% ❌ 20.7% ✅ ≤50%
Template Patterns Many ❌ 0 ✅ None
Overall Score 2.75/10 8.25/10 -

Test Idea Quality Example

Before:

"Verify celebrity collection navigation works correctly"

After:

"200 users click 'Add to Bag' on same product within 1 second during live event; verify inventory correctly decremented without oversell"

Files Added/Modified

  • .claude/agents/qe-product-factors-assessor.md - Agent definition with quality rules
  • src/agents/qe-product-factors-assessor/ - Core implementation
  • .agentic-qe/product-factors-assessments/ - Sample assessments (AY-E001, AY-E002)

Test plan

  • Generated AY-E001 assessment (identified quality issues)
  • Applied brutal-honesty-review skill to analyze output
  • Updated agent definition with quality rules
  • Generated AY-E002 assessment with enhanced agent
  • Verified all metrics within targets via brutal-honesty-review

🤖 Generated with Claude Code

Implements a new QE agent that applies James Bach's HTSM Product Factors
(SFDIPOT: Structure, Function, Data, Interfaces, Platform, Operations, Time)
to analyze epics and generate comprehensive test ideas.

Key features:
- SFDIPOT-based product factor analysis
- Test idea generation with priority (P0-P3) and automation fitness
- Clarifying question generation to surface unknown risks
- Multiple output formats: HTML, JSON, Markdown, Gherkin
- Learning system to persist patterns across assessments
- Code intelligence integration for codebase-aware analysis

New files:
- .claude/agents/qe-product-factors-assessor.md - Agent definition
- src/agents/qe-product-factors-assessor/ - Core implementation
- tests/agents/qe-product-factors-assessor.test.ts - Unit tests

Also includes:
- Updated agent registry and spawn handlers
- Learning patterns from Epic 4, 5, 6 assessments
- Code intelligence configuration
- Documentation updates

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@fndlalit
Copy link
Contributor Author

@proffesor-for-testing hi, please don't merge yet. There is some critical work to be done which I will continue after my vacation. Known issues:

  1. Agent randomly generates its own HTML despite having clear HTML reference to follow/copy
  2. Test case generation needs improvement. Brutal-honest-review skill understands context better.
  3. Clarifying questions can be improved
  4. Other miscellaneous updates

Lalit and others added 28 commits December 28, 2025 22:31
…hanced test generation

Implements 3-phase GOAP plan to improve test idea quality through domain-specific patterns:

Phase 1 - Domain Pattern Registry:
- Created DomainPatternRegistry with 6 domains (stripe-subscription, gdpr-compliance,
  pci-dss, hipaa, oauth-oidc, webhook-integration)
- Added confidence-based domain detection from requirements text
- Extended ProjectContext with DetectedDomain[] for domain tracking

Phase 2 - Quality Calibration:
- Integrated domain-specific BS patterns in brutal-honesty-analyzer
- Added pre-generation validation for domain coverage
- Implemented calibrateDomainQuality() with domain-specific scoring adjustments

Phase 3 - Domain-Specific Output:
- Added domain test template injection in test-idea-generator
- Created injectMissingDomainCoverage() for automatic gap filling
- Enhanced question-generator with domain-specific clarifying questions

Also includes:
- Epic 4 Community & Engagement assessment (249 test ideas, 68/100 quality score)
- Multiple product factors assessments for various epics
- GOAP integration plan documentation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
## Summary
- Reduced TypeScript `any` types from 568 to 538 (63% overall reduction from 1,470)
- Fixed all 146 TypeScript compilation errors
- Updated version to v2.7.1 across all 5 required files
- Added ErrorUtils utility for consistent error handling
- Updated GOAP plan with Phase 2 completion status

## Type Safety Improvements
- Added index signatures for SerializableValue compatibility
- Fixed memory retrieval type casts across 15+ agent files
- Added proper task payload typing in agent performTask methods
- Created shared error handling utilities

## Files Updated
- package.json, package-lock.json (version bump)
- README.md (version badge)
- CHANGELOG.md (release notes)
- src/mcp/server-instructions.ts (SERVER_VERSION)
- src/core/memory/HNSWVectorMemory.ts (version info)
- docs/plans/goap-issue-149-code-quality.md (progress tracking)
- 100+ source and test files (type safety fixes)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
…ng/working-with-agents

chore(release): v2.7.1 - Type Safety Improvements
Fixes identified from brutal honesty self-review:
- Fix test idea count verification (reported count now matches actual)
- Add documented scoring rubric with 5-category methodology
- Add AC-by-AC testability analysis with per-AC scores
- Integrate penetrating questions (85% rate, was 0%)
- Pass full RequirementsQualityScore to HTML formatter

Changes:
- brutal-honesty-analyzer.ts: Add SCORING_RUBRIC, AC analysis methods
- html-formatter.ts: Add renderScoringRubric, renderACAnalysis
- question-generator.ts: Add getQuestionForSubcategory public API
- index.ts: Capture full quality data, use QuestionGenerator

Note: Test ideas generation needs significant improvement - currently
generates only 20 test ideas. Future work should expand coverage.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Major improvement to test idea coverage:
- Added 165+ test templates covering all 37 SFDIPOT subcategories
- Connected getGenericIdeasForSubcategory to TestIdeaGenerator templates
- Added SBTM exploratory testing tours (FedEx, Garbage Collector, Bad
  Neighborhood, Landmark, Intellectual, Obsessive-Compulsive, Saboteur)

Results on Epic 2 assessment:
- Test ideas: 20 → 204 (+920%)
- Coverage: 71.4% → 100%
- All 7 SFDIPOT categories now covered (STRUCTURE and PLATFORM were 0)

Templates include comprehensive coverage for:
- Structure: Code, Hardware, NonPhysical, Dependencies, Documentation
- Function: Application, Calculation, ErrorHandling, Security,
  StateTransition, Startup, Shutdown
- Data: InputOutput, Lifecycle, Cardinality, Boundaries, Persistence,
  Types, Selection
- Interfaces: UserInterface, ApiSdk, SystemInterface, ImportExport, Messaging
- Platform: Browser, OperatingSystem, Hardware, ExternalSoftware,
  InternalComponents
- Operations: CommonUse, UncommonUse, ExtremeUse, DisfavoredUse, Users,
  Environment
- Time: Timing, Concurrency, Scheduling, Timeout, Sequencing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
… with SFDIPOT checklist

Implements LLM-powered test idea generation using the complete SFDIPOT
framework as a structured checklist:

SFDIPOT Checklist (40 subcategories covered in prompt):
- STRUCTURE: Code, Hardware, NonPhysical, Dependencies, Documentation
- FUNCTION: Application, Calculation, ErrorHandling, Security,
  StateTransition, Startup, Shutdown
- DATA: InputOutput, Lifecycle, Cardinality, Boundaries, Persistence,
  Types, Selection
- INTERFACES: UserInterface, ApiSdk, SystemInterface, ImportExport, Messaging
- PLATFORM: Browser, OperatingSystem, Hardware, ExternalSoftware,
  InternalComponents
- OPERATIONS: CommonUse, UncommonUse, ExtremeUse, DisfavoredUse, Users,
  Environment
- TIME: Timing, Concurrency, Scheduling, Timeout, Sequencing

New methods:
- buildSFDIPOTChecklistPrompt(): Creates comprehensive LLM prompt
- generateTestIdeasWithLLM(): Main entry point for LLM generation
- parseLLMTestIdeasResponse(): Parses JSON response to TestIdea[]
- mapCategoryString(): Maps category strings to HTSMCategory enum
- mapPriorityString(): Maps priority strings to Priority enum
- mapAutomationFitnessString(): Maps automation strings to enum

Usage:
Set `useLLM: true` in AssessmentInput and configure `llmConfig` with
enabled LLM provider to use LLM-based generation. Falls back to
template-based generation if LLM is unavailable.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Adds test script for LLM-based test idea generation:
- Configures RuvLLM provider for local inference
- Tests SFDIPOT checklist prompt integration
- Validates graceful fallback to template generation

Test results show:
- LLM integration code path works correctly
- Graceful fallback when LLM not available (204 test ideas from templates)
- All 7 SFDIPOT categories covered
- 100% coverage score achieved

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
GOAP Phase 3 - Task Orchestration Integration:
- Add TaskWorkflowGoals with 4 goal definitions
- Add 17 orchestration-specific GOAP actions
- Create GOAPTaskOrchestration integration class
- Modify TaskOrchestrateHandler with GOAP planning + template fallback
- Add AgentRegistry integration for fleet state
- Add 31 integration tests for task orchestration

CRITICAL INCIDENT - Data Loss (2025-12-29):
- memory.db accidentally deleted during test debugging
- 2 months of learning data permanently lost
- Added backup system to prevent recurrence:
  - scripts/backup-memory.js with backup/restore
  - npm run backup, backup:list, backup:restore
  - Accelerated learning config for rebuild

Also includes:
- C4 architecture diagrams
- GOAP integration plans and documentation
- Quality gate GOAP integration
- Plan executor with dry-run mode
- Database schema updates for GOAP tables

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
### Added
- Database Migration System (src/persistence/migrations/)
  - MigrationRunner class for versioned schema migrations
  - 7 migrations covering all core tables
  - Version tracking in schema_migrations table
  - Helper functions: tableExists, columnExists, safeAddColumn, safeCreateIndex

- CLI Migrate Command (src/cli/commands/migrate/)
  - aqe migrate status - Show migration status
  - aqe migrate run - Run pending migrations
  - aqe migrate rollback - Rollback last migration
  - aqe migrate reset - Reset all migrations

### Changed
- Database initialization now runs migrations automatically during aqe init
- Schema consistency ensured across all installations

### Fixed
- Schema evolution issues from backup restoration
- Dream learning cycle now works with migrated data

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
…ng/working-with-agents

feat(release): v2.7.2 - GOAP Phase 3 Task Orchestration + Migration System
## Phase 5: Plan Learning
- PlanLearning: EMA-based action success rate tracking
- PlanSimilarity: Plan signature matching for reuse (<100ms)
- Q-Learning integration for GOAP action selection
- Database persistence for learning history

## Phase 6: Live Agent Execution
- Real agent spawning via AgentRegistry (not just dry-run)
- Output parsing for real-time world state updates
- Plan signature storage after successful execution
- Learning feedback loop integration

## New Files
- src/planning/PlanLearning.ts
- src/planning/PlanSimilarity.ts
- tests/integration/goap-live-execution.test.ts (17 tests)
- tests/integration/goap-phase5-real-integration.test.ts (15 tests)
- tests/integration/goap-plan-learning.test.ts (31 tests)

## Test Coverage
- 84 GOAP-related tests passing
- Output parsing methods verified
- Live vs dry-run code paths tested

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
…ng/working-with-agents

feat(goap): v2.7.3 - Phase 5 & 6 Plan Learning & Live Agent Execution
The captured_experiences table had a schema mismatch between Migration 003
and ExperienceCapture.initializeSchema():

- Migration 003 created: id, agent_id, action, context, outcome, reward, captured_at
- ExperienceCapture expects: id, agent_id, agent_type, task_type, execution, context, outcome, embedding, created_at

This caused "no such column: agent_type" errors during `aqe init`.

Migration 008 safely adds missing columns with defaults and migrates
existing captured_at data to created_at.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Standardize logging across all agents by replacing console.log/error/warn
with the centralized Logger utility for consistent log formatting and control.

Changes:
- Add protected logger to BaseAgent for inheritance by all agents
- Remove duplicate local Logger interfaces and ConsoleLogger classes from 8 agents
- Migrate 33 files from console.* to Logger:
  - 19 main QE agents (TestExecutor, FlakyTestHunter, ApiContractValidator, etc.)
  - 7 n8n workflow agents (N8nBaseAgent, N8nSecurityAuditor, etc.)
  - 4 utility/adapter files (AgentPool, AgentLLMAdapter, CoordinatorAdapter)
  - 3 GOAP planning files (Math.random → SecureRandom migration)

Results:
- Console calls in src/agents/: 90 → 25 (72% reduction)
- Remaining calls are in interface defaults, example scripts, and string literals
- Build passes with no TypeScript errors

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Updates version across all required files:
- package.json
- package-lock.json
- README.md
- CHANGELOG.md
- src/mcp/server-instructions.ts
- src/core/memory/HNSWVectorMemory.ts

Changes in v2.7.4:
- fix(db): Migration 008 fixes captured_experiences schema mismatch
- refactor(agents): Standardized logging with Logger utility (90→25 console calls)
- security: Migrated Math.random() to SecureRandom in GOAP planning

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
…ng/working-with-agents

Release v2.7.4: Database Schema Fix & Logger Standardization
Bumps the npm_and_yarn group with 1 update in the / directory: [qs](https://github.com/ljharb/qs).


Updates `qs` from 6.14.0 to 6.14.1
- [Changelog](https://github.com/ljharb/qs/blob/main/CHANGELOG.md)
- [Commits](ljharb/qs@v6.14.0...v6.14.1)

---
updated-dependencies:
- dependency-name: qs
  dependency-version: 6.14.1
  dependency-type: indirect
  dependency-group: npm_and_yarn
...

Signed-off-by: dependabot[bot] <[email protected]>
Implements browser-compatible QE agents using @ruvector/edge WASM:

- Add BrowserHNSWAdapter with real @ruvector/edge WasmHnswIndex
- Add IndexedDBStorage for browser-side vector persistence
- Add BrowserAgent for WASM-compatible agent lifecycle
- Add Chrome DevTools panel for agent monitoring
- Add build:edge script producing 183.4KB gzipped bundle

Verified metrics (not estimated):
- Bundle size: 183.4 KB gzipped (target: <400KB) ✓
- Tests: 173/177 passing (4 expected failures in Node.js)
- Build: TypeScript compilation succeeds

Analysis documents:
- GOAP evaluation of @ruvector/edge integration
- Visionary analysis with 12 novel use cases
- Implementation plan for 5 phases

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Phase 1 completion for @ruvector/edge VS Code Extension MVP:

P1-006: AQE Pattern Integration
- Add AQEPatternBridge service for bidirectional pattern sync
- Pattern conversion between CodePattern and QEPattern formats
- Offline-first with sync queue support
- Export bridge and types from services index

P1-007: Security Review and Hardening
- Comprehensive SECURITY.md documentation
- CSP configuration for WebViews (nonce-based scripts)
- Input validation on all WASM calls
- Sensitive pattern filtering before storage
- Checksum validation on stored entries
- Storage limits with LRU eviction
- 40 security tests covering all threat vectors

P1-008: End-to-End Testing
- Full E2E test suite with mock VS Code API
- Tests for extension activation lifecycle
- Code analysis and test suggestion flows
- Coverage visualization WebView tests
- Offline storage and sync queue tests
- Pattern matching and learning tests

Code Quality:
- Replace TODO placeholders with @template markers in test generators
- Test template generators use @template: for user-fillable sections

All 171 tests passing.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- CodeAnalyzer.ts: fix getLanguage return type to literal union
- AQEPatternBridge.ts: add type assertions for pattern type mappings
- storage/index.ts: import classes directly for factory function usage
- TestGenerationQuickPick.ts: add type assertions for selectedItems

Phase 1 VS Code extension now builds with zero TypeScript errors.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
…rning

Add RuVector-based learning system that captures patterns from development:
- .ruvector/intelligence.json: stores learned patterns, error fixes, file sequences
- scripts/workers/qe-self-learning.js: background learning worker
- scripts/workers/sync-intelligence.js: intelligence synchronization
- scripts/migrate-memory-to-ruvector.js: migration from legacy memory
- scripts/migrate-swarm-to-ruvector.js: migration from swarm patterns
- scripts/suggest-agents-from-intelligence.js: agent routing suggestions

Updated .gitignore to track .ruvector/ for shared team learning.
Updated .claude/settings.json with RuVector hooks integration.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
…ng/dependabot/npm_and_yarn/npm_and_yarn-2b901f0e0d

chore(deps): bump qs from 6.14.0 to 6.14.1 in the npm_and_yarn group across 1 directory
- Integrate 8 previously unregistered MCP handlers:
  - Chaos engineering: chaos_inject_latency, chaos_inject_failure, chaos_resilience_test
  - Integration testing: integration_dependency_check, integration_test_orchestrate
  - Token-optimized: test_execute_filtered, performance_test_filtered, quality_assess_filtered
- Create NewDomainToolsHandler.ts to route new domain tools

Dead code removal (~22k lines):
- Delete 4 overlapping filtered handlers (coverage, flaky, security, contract)
- Remove unused directories: src/alerting/, src/reporting/, src/transport/
- Remove 4 unregistered CLI command dirs: fleet/, test/, quality/, monitor/ (42 files)
- Archive 10 one-time verification scripts to scripts/archive/
- Remove 2 CLI backup files (index-spec.ts, index-working.ts)

Test organization:
- Move HNSWPatternStore.test.ts from src/ to tests/unit/memory/
- Delete OutputFormatter duplicate test from src/
- Consolidate duplicate RuVector and learning-persistence tests
- Rename RuVector.SelfLearning.test.ts for consistent naming

Dependencies:
- Remove unused packages: jose, @types/chrome, gpt-tokenizer

Build config:
- Exclude src/edge/ subdirectories from main tsconfig (separate build)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
…001, P2-002)

Phase 2 P2P Foundation implementation with parallel development:

**P2-001: Ed25519 Cryptographic Identity** (src/edge/p2p/crypto/)
- Identity.ts: Agent identity generation with Ed25519 keypairs
- KeyManager.ts: Secure key storage, rotation, and revocation
- Signer.ts: Message signing/verification with batch support
- BIP39-style seed phrase recovery
- AES-GCM encryption for private key storage

**P2-002: WebRTC Connection Manager** (src/edge/p2p/webrtc/)
- PeerConnectionManager.ts: Multi-peer WebRTC connections
- SignalingClient.ts: WebSocket-based signaling with heartbeat
- ICEManager.ts: STUN/TURN configuration and NAT detection
- ConnectionPool.ts: Connection pooling with eviction policies
- Automatic reconnection with exponential backoff

**Tests:**
- crypto.test.ts: 45 tests for identity, signing, key management
- webrtc.test.ts: 65 tests for connections, signaling, pooling

Total: 6,250 lines of production code + 73,000+ bytes of tests

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Implements remaining Phase 2 components for @ruvector/edge:

P2-003 Agent-to-Agent Communication Protocol:
- AgentChannel: Secure bidirectional channels between agents
- MessageEncoder: Binary message encoding with compression
- MessageRouter: Multi-hop routing with dead letter queue
- ProtocolHandler: Protocol negotiation and handshake

P2-004 Pattern Sharing Protocol:
- PatternSerializer: Pattern serialization with anonymization
- PatternBroadcaster: Gossip-based pattern broadcasting
- PatternIndex: Vector similarity search for patterns
- PatternSyncManager: Delta-based sync with vector clocks

P2-005 Federated Learning Infrastructure:
- GradientAggregator: FedAvg, FedProx, Krum, trimmed mean
- FederatedRound: Training round lifecycle management
- FederatedCoordinator: Multi-round training coordination
- ModelManager: Weight management and checkpointing

P2-006 CRDT-Based Conflict Resolution:
- GCounter, LWWRegister, ORSet implementations
- PatternCRDT: Composite CRDT for shared patterns
- VectorClock: Causality tracking
- CRDTStore: Centralized CRDT management with GC

P2-007 Two-Machine Coordination (tests):
- CoordinationManager integration tests
- Peer authentication and health monitoring

P2-008 NAT Traversal and TURN Fallback:
- NATDetector: STUN-based NAT type detection
- TURNManager: TURN server credential management
- HolePuncher: UDP hole punching with port prediction
- ConnectivityTester: Connection quality assessment

Test Coverage:
- 460 tests passing across all 8 P2P modules
- Fixed vitest->jest imports and promise handling

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add comprehensive integration test suite for P2P Foundation:
- crypto-webrtc.integration.test.ts (28 tests)
- protocol-sharing.integration.test.ts (34 tests)
- coordination-crdt.integration.test.ts (33 tests)
- full-stack.integration.test.ts (48 tests)

Update GOAP checklist marking Phase 2 complete:
- All 24 Phase 2 tasks completed
- 605+ total tests (462 unit + 143 integration)
- All modules verified working

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Lalit and others added 8 commits January 12, 2026 12:36
Fixes identified from brutal honesty self-review:
- Fix test idea count verification (reported count now matches actual)
- Add documented scoring rubric with 5-category methodology
- Add AC-by-AC testability analysis with per-AC scores
- Integrate penetrating questions (85% rate, was 0%)
- Pass full RequirementsQualityScore to HTML formatter

Changes:
- brutal-honesty-analyzer.ts: Add SCORING_RUBRIC, AC analysis methods
- html-formatter.ts: Add renderScoringRubric, renderACAnalysis
- question-generator.ts: Add getQuestionForSubcategory public API
- index.ts: Capture full quality data, use QuestionGenerator

Note: Test ideas generation needs significant improvement - currently
generates only 20 test ideas. Future work should expand coverage.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Major improvement to test idea coverage:
- Added 165+ test templates covering all 37 SFDIPOT subcategories
- Connected getGenericIdeasForSubcategory to TestIdeaGenerator templates
- Added SBTM exploratory testing tours (FedEx, Garbage Collector, Bad
  Neighborhood, Landmark, Intellectual, Obsessive-Compulsive, Saboteur)

Results on Epic 2 assessment:
- Test ideas: 20 → 204 (+920%)
- Coverage: 71.4% → 100%
- All 7 SFDIPOT categories now covered (STRUCTURE and PLATFORM were 0)

Templates include comprehensive coverage for:
- Structure: Code, Hardware, NonPhysical, Dependencies, Documentation
- Function: Application, Calculation, ErrorHandling, Security,
  StateTransition, Startup, Shutdown
- Data: InputOutput, Lifecycle, Cardinality, Boundaries, Persistence,
  Types, Selection
- Interfaces: UserInterface, ApiSdk, SystemInterface, ImportExport, Messaging
- Platform: Browser, OperatingSystem, Hardware, ExternalSoftware,
  InternalComponents
- Operations: CommonUse, UncommonUse, ExtremeUse, DisfavoredUse, Users,
  Environment
- Time: Timing, Concurrency, Scheduling, Timeout, Sequencing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
… with SFDIPOT checklist

Implements LLM-powered test idea generation using the complete SFDIPOT
framework as a structured checklist:

SFDIPOT Checklist (40 subcategories covered in prompt):
- STRUCTURE: Code, Hardware, NonPhysical, Dependencies, Documentation
- FUNCTION: Application, Calculation, ErrorHandling, Security,
  StateTransition, Startup, Shutdown
- DATA: InputOutput, Lifecycle, Cardinality, Boundaries, Persistence,
  Types, Selection
- INTERFACES: UserInterface, ApiSdk, SystemInterface, ImportExport, Messaging
- PLATFORM: Browser, OperatingSystem, Hardware, ExternalSoftware,
  InternalComponents
- OPERATIONS: CommonUse, UncommonUse, ExtremeUse, DisfavoredUse, Users,
  Environment
- TIME: Timing, Concurrency, Scheduling, Timeout, Sequencing

New methods:
- buildSFDIPOTChecklistPrompt(): Creates comprehensive LLM prompt
- generateTestIdeasWithLLM(): Main entry point for LLM generation
- parseLLMTestIdeasResponse(): Parses JSON response to TestIdea[]
- mapCategoryString(): Maps category strings to HTSMCategory enum
- mapPriorityString(): Maps priority strings to Priority enum
- mapAutomationFitnessString(): Maps automation strings to enum

Usage:
Set `useLLM: true` in AssessmentInput and configure `llmConfig` with
enabled LLM provider to use LLM-based generation. Falls back to
template-based generation if LLM is unavailable.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Adds test script for LLM-based test idea generation:
- Configures RuvLLM provider for local inference
- Tests SFDIPOT checklist prompt integration
- Validates graceful fallback to template generation

Test results show:
- LLM integration code path works correctly
- Graceful fallback when LLM not available (204 test ideas from templates)
- All 7 SFDIPOT categories covered
- 100% coverage score achieved

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
… and AY-E001

- Generate Epic 3 AI Personalization Search assessment (143 test ideas)
- Generate AY-E001 Celebrity Collections assessment (127 test ideas)
- Fix agent template to include complete info sections (no truncation)
- Add DO NOT TRUNCATE rule to agent compliance section
- Update learning config for RuVector GNN/LoRA/EWC++ settings

Co-Authored-By: Claude Opus 4.5 <[email protected]>
…th quality rules

Comprehensive improvements to deliver higher quality SFDIPOT assessments:

## Priority Distribution Rules
- Added strict distribution targets: P0 (8-12%), P1 (20-30%), P2 (35-45%), P3 (20-30%)
- Added priority inflation check with mandatory review if P1 > 35%
- Added calibration questions for each priority level

## Test Idea Quality Rules
- Added banned patterns list ("Verify X works correctly" etc.)
- Added transformation process: boundaries, off-by-one, state combinations
- Added failure modes, race conditions, external dependencies

## Automation Fitness Reality Check
- Added target percentages: unit (30-40%), e2e (≤50%), human-exploration (≥10%)
- Added reality check for e2e-heavy recommendations
- Fixed over-optimism issue (was 4% human exploration, now requires ≥10%)

## Domain Context Requirements
- Added mandatory domain detection before test generation
- Added risk pattern identification (social media, celebrity content, e-commerce)
- Added domain-specific edge case extraction

## Edge Cases Checklist
- Added comprehensive checklist: race conditions, contract expiry, content takedown
- Added notification SLAs, time-based expiry, external API dependencies
- Added state management and session persistence scenarios

## Quality Gates
- Added mandatory checks before finalizing reports
- 5-phase process with explicit validation steps
- Self-review requirement for priority distribution

Addresses all findings from brutal-honesty-review skill analysis of AY-E001 assessment.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
… assessment

Added AY-E002 (Live Shopping Experience Unification) SFDIPOT assessment
to validate the enhanced agent quality rules.

Results demonstrate significant improvement:
- Priority distribution: P0=10.3%, P1=28.7%, P2=41.4%, P3=19.5% (all within targets)
- Template patterns eliminated: 0 occurrences of "Verify X works correctly"
- Human exploration: 13.8% (above 10% minimum)
- E2E tests: 20.7% (well under 50% max)
- Domain-specific edge cases included (WebSocket drops, inventory races, DST)

Brutal honesty review score improved from 2.75/10 to 8.25/10.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@fndlalit fndlalit changed the title feat(agents): add qe-product-factors-assessor agent for SFDIPOT analysis feat(qe-product-factors-assessor): SFDIPOT agent with enhanced quality rules Jan 12, 2026
Lalit and others added 18 commits January 12, 2026 20:11
- Add explicit human exploration counting in Phase 4
- Add <human_exploration_templates> with domain-specific tests
- Strengthen Quality Gates to be BLOCKING with explicit failure action
- Add Universal tests (5) and Domain-Specific tests (6 domains)
- Fix P1 target from 35% to 30% per brutal-honesty findings

Addresses brutal-honesty-review finding that NORD assessment had only
7.5% human exploration (below 10% minimum).

Co-Authored-By: Claude Opus 4.5 <[email protected]>
…ecklist

Major enhancement to test generation using LLM-level intelligent thinking:

## New Sections Added

### <sfdipot_subcategory_checklist> (300+ lines)
- 28 subcategories across 7 SFDIPOT categories
- Each subcategory has:
  - Applicability Check question
  - Automated test triggers with examples
  - Human exploration triggers with examples
- Agent MUST evaluate each subcategory for applicability
- Tests generated ONLY for applicable subcategories

### <human_judgment_detector>
- 5-step reasoning process for human test detection:
  1. Subjective language identification
  2. Expertise requirement detection
  3. Perception-based judgment recognition
  4. Discovery opportunity identification
  5. Test generation with explicit reasoning
- Includes "Why Human Essential" column requirement

## Workflow Updates

### Phase 2: Test Idea Generation
- Now STRICTLY follows subcategory checklist
- Iterates ALL 28 subcategories per requirement
- Generates tests from applicable triggers only

### Phase 4: Automation Fitness
- Uses intelligent human detection, not templates
- Applies <human_judgment_detector> to every requirement
- Falls back to checklist review if <10%

Addresses requirement for LLM-level thinking instead of template-based generation.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Step-by-step documentation of how qe-product-factors-assessor
delivers output when invoked via Task tool.

Covers all 8 phases from invocation to result return.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add PRE-OUTPUT HARD STOP section with 3-step validation
- Add ABSOLUTE BAN on "Verify" patterns with transformation examples
- Add STEP 1: verify_count must equal 0 before output
- Add STEP 2: P1 ≤30%, P3 ≥20% enforcement loops
- Add STEP 3: Human ≥10% auto-add loop
- Update Gates 2,4,5,7 to HARD STOP blockers
- Add Gate 10 for human exploration row structure
- Add explicit INVALID/VALID examples for human test format

Co-Authored-By: Claude Opus 4.5 <[email protected]>
…n hook

- Add scripts/validate-sfdipot-assessment.ts - validates assessment HTML output
- Add scripts/hooks/validate-sfdipot-on-write.sh - PostToolUse hook
- Register hook in .claude/settings.json for Write operations
- Validates: Gate 7 (no Verify), Gate 2 (P1≤30%), Gate 4 (P3≥20%), Gate 5 (Human≥10%)
- Stores validation results in memory.db for learning

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Option C implementation: separate concerns into focused agents
- Generator: qe-product-factors-assessor (coverage)
- Rewriter: qe-test-idea-rewriter (action verbs)
- Validator: validate-sfdipot-assessment.ts (quality gates)

Results:
- V12 (single agent, no hand-holding): 28 Verify patterns (FAIL)
- V13 (V12 + rewriter): 0 Verify patterns (ALL GATES PASS)

Pipeline reduces attention dilution by giving each agent
a single responsibility with narrow focus.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
… loop

Adds explicit transformation loop that runs BEFORE saving HTML:
- STEP 1: Scan for "Verify X" patterns
- STEP 2: Transform each using pattern table
- STEP 3: Re-scan until verify_count = 0
- STEP 4: Only save when clean

Results without hand-holding:
- V12 (before): 28 Verify patterns
- V14 (after): 0 Verify patterns ✓

Gate 7 now passes without Task prompt hand-holding.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
…onal, not gates

Priority distribution should be domain-specific, determined by SMEs with
business context - not arbitrary percentage targets that cause meaningless
priority shuffling.

Changes:
- Validator: Priority gates are now soft (informational) vs hard (blocking)
- Agent: Removed mandatory percentage rebalancing loops
- Agent: Priority guidelines are context-driven, not percentage-driven
- E5 Assessment: Added SME Review Warning box for priority validation

Hard gates that remain:
- Gate 5: Human >= 10%
- Gate 7: NO "Verify X" patterns
- Gate 8: 28 SFDIPOT subcategories
- Gate 9: Feature coverage
- Gate 10: Human test format

Co-Authored-By: Claude Opus 4.5 <[email protected]>
…late

- Add docs/templates/sfdipot-reference-template.html as permanent reference
- Template contains all required sections without client-specific data:
  - 7 Exploratory Testing Charters with correct naming
  - 7 Test Data sections with correct naming
  - 5 items in "How to use this report?" section
  - No Human Exploration in Automation Fitness summary
- Update agent to use new template path instead of client-specific file
- Ensures agent works for all users without dependency on private files

Co-Authored-By: Claude Opus 4.5 <[email protected]>
…eference template

The reference template was missing the Mutation Testing Strategy section,
causing generated reports to omit this important section.

Added:
- Recommended Mutation Targets (business logic, boundaries, error handling, states)
- Kill Rate Targets table (95% for critical paths, 85% for API, 70% for UI)
- Mutation Operators to Apply (arithmetic, relational, logical, return values)

Template now matches E001 structure with all required sections.
…ing section

Regenerated PMI-E002 SFDIPOT assessment to verify template fix.

Report now includes:
- Mutation Testing Strategy section ✓
- 168 test ideas across 7 SFDIPOT categories
- 7 exploration charters
- 7 test data strategies
- 0 'Human Exploration' in automation chart
…egory content

Problem: Test Data and Test Ideas content was randomly appearing in wrong
sections across SFDIPOT categories.

Solution:
- Added Gate 14: Strict Section Order validation
- Added explicit 4-subsection structure with numbered comments
- Added FAILURE CONDITIONS for order violations
- Added CSS table-layout: fixed for consistent column widths
- Added Step 6b validation to check section ordering

Each category section MUST now have content in this order:
1. Test Ideas table (filterable-table with tbody)
2. Test Data Strategy (📊 Recommended Test Data for {CATEGORY})
3. Exploration Charter (🔍 Recommended Exploratory Testing Charter)
4. Clarifying Questions

Verified E002 report now has:
- 7 Test Data sections ✓
- 7+ Charter sections ✓
- Correct section ordering in all categories ✓
Epic: Evidence, Science & Regulatory Communication
User Stories: US01, US02, US03
Test Ideas: 98 total across 7 SFDIPOT categories

Validated:
- 7 Test Data sections ✓
- 7 Exploration Charters ✓
- 0 Human in automation chart ✓
…ructure

Overwrote client-specific content with generic SFDIPOT template to ensure
agent reads correct structure regardless of which file path it uses.

This serves as a fallback fix since agent sometimes ignores the primary
template at docs/templates/sfdipot-reference-template.html
Epic: Social Proof & User-Generated Content Integration (Next.co.uk)
Domain: E-commerce Retail Fashion
Test Ideas: 35 (5 per SFDIPOT category)

Validated:
- 7 Test Data sections ✓
- 7 Exploration Charters ✓
- 0 Human in automation chart ✓

Key risk areas: Content moderation, third-party APIs, GDPR compliance
- Replace artificially constrained v1 (35 tests) with proper risk-driven v2
- Risk-driven distribution: Function(28) > Interfaces(26) > Data(24) > Structure(22) > Platform(18) > Operations(16) > Time(13)
- Includes copyright detection, abuse prevention, rate limiting tests
- 20 Human SME tests (13.6%) for subjective quality assessments
- All 7 SFDIPOT categories with Test Data and Exploratory Charters

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Remove fixed "exactly 2 questions per subcategory" constraint
- Add variable question count guidance (1 gap = 1 question, 3 gaps = 3 questions)
- Remove rigid "minimum 21 rows" PCO requirement - now proportional to AC/NFR count
- Remove "at least 3 testable elements per SFDIPOT category" constraint
- Remove fixed "3-5 test ideas per AC" - now complexity-based
- Remove "add at least 3 from templates" for human exploration padding
- Fix Gate 15/21 column count inconsistency (both now 4 columns)
- Add Gate 22 for enforcing variable question counts
- Update PCO description to use "Product Factor(s)" (plural)

These changes ensure realistic output that reflects actual requirement content
rather than pattern-matching to template-driven quotas.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@proffesor-for-testing
Copy link
Owner

This agent (qe-product-factors-assessor) is now part of the v3 alpha release. It will be made available with the v3 version soon, once we finish fine-tuning and testing.

The implementation includes:

  • Full SFDIPOT analysis (7 categories, 37 subcategories)
  • Test idea generation with P0-P3 priority levels
  • Automation fitness recommendations
  • Brutal Honesty validation (Bach/Ramsay/Linus modes)
  • Domain pattern detection
  • Multiple output formats (HTML, JSON, Markdown, Gherkin)

Thank you for the contribution!

proffesor-for-testing added a commit that referenced this pull request Jan 18, 2026
Implements comprehensive product factors analysis using James Bach's HTSM
framework v6.3 for test strategy generation.

New Features:
- qe-product-factors-assessor: Full SFDIPOT analysis (7 categories, 37 subcategories)
- qe-test-idea-rewriter: Transform "Verify X" patterns to action-verb format
- sfdipot-product-factors skill: Skill definition for SFDIPOT assessment
- test-idea-rewriting skill: Skill for test idea quality improvement

Product Factors Assessment Capabilities:
- Test idea generation with P0-P3 priority levels
- Automation fitness recommendations (Unit/Integration/E2E/Human)
- Brutal Honesty validation (Bach/Ramsay/Linus modes)
- Domain pattern detection (ecommerce, healthcare, finance)
- Multiple output formats: HTML, JSON, Markdown, Gherkin
- Clarifying question generation for coverage gaps

Code Changes:
- Removed deprecated time-crystal module (consolidated into mincut)
- Removed compatibility layer (v2-v3 migration complete)
- Enhanced unified-memory with improved persistence
- Added kuramoto-cpg oscillator for coordination
- Fixed various TypeScript compilation issues

Based on original implementation by @fndlalit (PR #178)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants