Skip to content
294 changes: 253 additions & 41 deletions packages/design-system-agent/DOCUMENTATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,17 +25,19 @@ The project should enable:
- **Evaluations** assessing the consuming application's alignment with the HPE Design System & the system's ability to meet consuming application needs.
- The ability to scope the evaluation to the entire project or a particular feature area.
- The evaluations should have a scoring mechanism so that we are able to monitor and measure improvement progress across:
1. context generator quality
2. component usage
3. component coverage
4. Typescript DX
5. Token compliance
6. application structure / page layouts
7. responsive layouts
8. accessibility
9. agent experience
10. developer experience
11. developer confidence
- **Consumer Implementation metrics:**
1. Component Coverage
2. Component Usage
3. App Structure
4. Token Compliance
5. Responsive Layouts
6. Accessibility
7. Type Safety & Interfaces
8. Dev Confidence
- **Design System Enablement metrics:**
9. System Discoverability
10. Developer Experience
11. Agent Experience
- Generate prioritized recommendations based off the evaluations. Recommendations should be classified as recommendations for improving the consumer application's implementation or as suggested improvements to the delivery of the design system.
- Provide suggested remedies for top priority recommendations.
- Ability to apply and implement remedies for top priority recommendations.
Expand Down Expand Up @@ -66,17 +68,21 @@ To support distribution as a package, the structure follows a Modular Agent patt
- **`orchestrator`:** The manager acting as the master controller. Responsible for managing state, handing off tasks between specialist agents, and maintaining the continuous improvement loop.
- **`auditor`:** The evaluator consuming the application's source code and scoring it against **Evaluation Metrics**.
- Input: Application's source code
- Task: Score the project based on evaluation metrics
- Logic: Use a weighted formula to calculate an alignment index
- Task: Score the project based on evaluation metrics (Consumer Implementation and Design System Enablement)
- Logic: Use a weighted formula to calculate separate Consumer Alignment and System Enablement scores, then a combined rollup score
- Output: Structured JSON scorecard with raw scores, status (Pass/Warning/Fail per metric), and evidence citations
- **`strategist`:** The prioritizer taking the audit findings and prioritizing by impact and effort.
- Input: The Auditor's report
- Task: Categorize issues into "Consumer Implementation" (e.g., "You used a hex code") vs. "System Improvement" (e.g., "The system lacks a pattern for this specific dashboard view"), then rank by impact and effort.
- Input: The Auditor's scorecard report
- Task: Categorize issues into "Consumer Implementation" (e.g., "You used a hex code") vs. "System Improvement" (e.g., "The system lacks a pattern for this specific dashboard view"), then rank by impact and effort using the Impact/Effort matrix
- Output: Game Plan with top 3 Consumer recommendations and P1/P2/P3 System Delivery Suggestions
- **`engineer`:** The remediator implementing top priorities.
- Input: Top priority recommendations
- Task: Executes the "fix." Modifies the user's local files to replace legacy code with design system tokens, components, and patterns.
- Input: Top priority recommendations from the Strategist
- Task: Executes the "fix" by modifying the user's local files to replace legacy code with design system tokens, components, and patterns
- Output: Code diffs for user approval before writing to disk
- **`reporter`:** The telemetry collector silently observing the ecosystem. Responsible for taking granular data from the Auditor and Strategist and compressing into high-level insights the HPE Design System can use to make roadmap and funding decisions.
- Input: Data from Auditor and Strategist
- Task: Summarize adoption, points of friction, and ROI. Delivers
- Input: Data from Auditor (scorecard metrics, evidence) and Strategist (system gaps, impact/effort rankings)
- Task: Aggregate adoption trends, friction points, and ROI signals across the 50+ consuming teams
- Output: Monthly/quarterly telemetry dashboard and system roadmap recommendations

## Implementation Strategy: The Continuous Improvement Loop

Expand All @@ -94,49 +100,255 @@ This is the blueprint for how the CLI package executes. It visualizes the "hands

#### 1. Initiation Phase
- **User/CI:** Executes `hpe-ds-ai audit --fix`.
- **Orchestrator:** Loads `.hpedsrc` config and fetches the latest `knowledge/` (Tokens, Components, Patterns).
- **Orchestrator → Auditor:** Sends the file path and DS knowledge. "*Analyze this*."
- **Orchestrator:** Loads `.hpedsrc` config (see Configuration section below) and fetches the latest `knowledge/` (Tokens, Components, Patterns).
- **Orchestrator → Auditor:** Sends the file path, scope, detected framework, and DS knowledge. "*Analyze this*."

#### 2. Analysis Phase
- **Auditor:** Performs AST parsing and Regex scans.
- **Auditor → Orchestrator:** Returns a structured JSON report (The Evaluation Metric Scorecard).
- **Auditor:** Performs AST parsing, regex scans, and static analysis.
- **Auditor → Orchestrator:** Returns a structured JSON report (The Evaluation Metric Scorecard) with Consumer Alignment Score, System Enablement Score, Combined Alignment Score, and classified findings.
- **Orchestrator → Strategist:** Sends the Scorecard. "*What should we do first?*"

#### 3. Strategy Phase
- **Strategist:** Runs the $Impact/Effort$ matrix.
- **Strategist → Orchestrator:** Returns the "Game Plan" (batched remediation tasks).
- **Orchestrator → User:** Displays the Scorecard and the proposed fixes.
- **User:** Input `[Y]` to approve the top 3 critical fixes.
- **Strategist:** Runs the Impact/Effort matrix on Consumer findings and assigns P1/P2/P3 severity to System Delivery Suggestions.
- **Strategist → Orchestrator:** Returns the "Game Plan" (batched Consumer remediation tasks + prioritized System gaps).
- **Orchestrator → User:** Displays the Scorecard (both Consumer and System scores), improvement delta (if re-audit), and top 3 proposed Consumer fixes.
- **User:** Input `[Y]` to approve the top 3 critical Consumer fixes.

#### 4. Remediation Phase
- **Orchestrator → Engineer:** Sends the specific files and approved tasks. "*Fix these 3 items.*"
- **Engineer:** Generates code diffs, ensuring A11y and Token compliance.
- **Engineer → User:** Displays the `diff` for review.
- **Orchestrator → Engineer:** Sends the specific files, approved tasks, and context. "*Fix these 3 items.*"
- **Engineer:** Generates code diffs, ensuring A11y and Token compliance per HPEDS standards.
- **Engineer → User:** Displays the `diff` for review with rationale.
- **User:** Input `[Y]` to write changes to disk.

#### 5. Verification Phase
- **Orchestrator → Auditor:** Sends the newly modified code. "*Verify the improvement.*"
- **Auditor:** Re-scores the evaluation metrics.
- **Orchestrator → User:** Displays the improvement delta (e.g., "Score: 0.45 → 0.72").
- **Auditor:** Re-scores the evaluation metrics across both Consumer and System dimensions.
- **Orchestrator → User:** Displays the improvement delta (e.g., "Consumer Score: 0.45 → 0.72; System Enablement: 0.60 → 0.65").

#### 6. External Reporting
- **Orchestrator:** If a "System Gap" was found, it generates the **System Delivery Ticket**.
- **Orchestrator:** Sends telemetry to your organization's central dashboard.
- **Orchestrator:** If P1 "System Gap" findings were identified, it generates a **System Delivery Ticket** (see System Delivery Suggestion severity rules in auditor instructions).
- **Orchestrator:** Sends telemetry to your organization's central dashboard (adoption rate, metric trends, friction points).


## Configuration

### `.hpedsrc` file discovery and setup

The Orchestrator looks for `.hpedsrc` in the following order (first match wins):
1. Root of the repository (`./`)
2. Root of the monorepo workspace (if applicable)
3. User's home directory (`~/.hpedsrc`) as a fallback for global defaults

If no `.hpedsrc` is found, the Orchestrator prompts interactively for `framework` and `scope`, then caches the response in the repo root.

**Recommendation:** Commit `.hpedsrc` to version control so all team members use consistent audit settings.

### `.hpedsrc` file
The `.hpedsrc` file is a JSON or YAML configuration file in the root of the consuming application that tells the Orchestrator how to run audits. The Orchestrator loads this on initiation.

**Required fields:**
- `framework`: The application's UI framework (e.g., `"react"`, `"vue"`, `"angular"`). Used by Auditor and Engineer to select framework-specific skills.
- `scope`: The default audit scope (e.g., `"src/"`, or specific directory like `"src/pages/dashboard/"`).

**Optional fields:**
- `feedback_collection`: Whether to collect team feedback signals via CLI prompt (default: `true`).
- `auto_apply_fixes`: If `true`, Engineer automatically applies non-critical fixes; if `false`, all fixes require user approval (default: `false`).
- `telemetry_endpoint`: URL for sending Reporter telemetry (default: to HPE Design System telemetry service).

**Example:**
```json
{
"framework": "react",
"scope": "src/",
"feedback_collection": true,
"auto_apply_fixes": false
}
```

## Framework Support

The Auditor and Engineer support multiple frameworks via modular skills. Framework detection:
1. Check `.hpedsrc` `framework` field (highest priority).
2. Auto-detect from `package.json` dependencies if not specified.
3. Prompt user if detection fails.

**Supported frameworks:** React (primary, 80% of users), Vue, Angular, and others via pluggable skill modules.

### Framework Coverage & Roadmap

| Framework | Support Status | Auditor | Engineer | Notes |
| --- | --- | --- | --- | --- |
| React | ✅ Stable | Full scoring | Full remediation | Primary platform; most patterns/components built for React |
| Vue | ✅ Beta | Full scoring* | Full remediation* | *Component binding syntax differs; Engineer generates Vue 3 composition API |
| Angular | ✅ Beta | Full scoring* | Full remediation* | *TypeScript-first; DX metrics heavily weighted toward Angular idioms |
| Svelte | 🔄 Roadmap | Partial (M1 2026) | Planned (M2 2026) | In design phase; awaiting customer adoption signals |
| Next.js | ✅ Included | As React | As React | File-route conventions auto-detected |
| Nuxt | ✅ Included | As Vue | As Vue | File-route conventions auto-detected |

**Unsupported framework fallback:** If a framework is not listed, the Orchestrator falls back to React skill set with a warning: "Framework not natively supported; using React conventions. Some metrics may be inaccurate."

**Framework-specific skill loading:** The Orchestrator detects framework from `.hpedsrc` or `package.json` and loads the corresponding `auditor/skills-{framework}.md` and `engineer/skills-{framework}.md` modules.

### Agent-to-Agent data flow

| From | To | Data Packet |
| --- | --- | --- |
| Auditor | Strategist | `raw_findings[]` + `alignment_score` |
| Strategist | Engineer | `remediation_tasks[]` + `priority_level` |
| Engineer | Auditor | `modified_code_snippets[]` |
| Orchestrator | External API | `telemetry_payload` + `system_delivery_ticket` |
| Auditor | Strategist | `consumer_score`, `system_score`, `combined_score`, `raw_findings[]` with evidence citations |
| Strategist | Engineer | `consumer_remediation_tasks[]` + `priority_level`, `system_delivery_suggestions[]` + `p1_p2_p3_severity` |
| Engineer | Auditor | `modified_code_snippets[]` + `file_paths` |
| Auditor | Reporter | `scorecard_snapshot` (metrics, scores, timestamp) |
| Orchestrator | External API | `telemetry_payload` (adoption metrics, improvement trends) + `system_delivery_ticket` (P1 gaps only) |

## System Delivery Ticket Workflow

When the Auditor identifies a **P1 System Gap** (critical gap in HPEDS capabilities required by consuming teams), the Orchestrator automatically creates a **System Delivery Ticket** in the HPEDS roadmap system.

### Ticket creation trigger
A System Delivery Suggestion is escalated to P1 (and triggers a ticket) when:
- Multiple teams (2+) report the same gap independently, OR
- A single team flags it as blocking critical feature delivery, AND
- No existing or planned HPEDS offering addresses the gap.

### Ticket destination
- **Default:** GitHub Issues in the HPEDS repository (`/hpe-design-system/issues`)
- **Override:** Specify `system_delivery_ticket_endpoint` in `.hpedsrc` to route to external tracking (Jira, Azure DevOps, etc.)

### Ticket structure
```json
{
"title": "System Delivery Gap: [Gap name] (P1)",
"body": "Reported by [N] team(s): [Team A], [Team B]\n\nGap description: [evidence from audits]\n\nConsumer impact: [how many teams affected]\n\nProposed solution: [Strategist recommendation]",
"labels": ["system-gap", "p1", "design-token" | "component" | "pattern"],
"assignee": "@hpeds-team"
}
```

### Ticket lifecycle
1. **Created** by Orchestrator (read-only; consumers cannot edit)
2. **Triaged** by HPEDS team (estimated effort, impact, roadmap milestone assigned)
3. **Planned** in HPEDS sprint (linked to design + engineering tasks)
4. **Shipped** in HPEDS release (consumers notified via changelog)
5. **Verified** by Reporter (monitors if gap remediation improved downstream scores)

## Privacy & Data Collection

### What telemetry is collected
The Reporter collects:
- **Scorecard metrics:** Individual metric scores (0.0–1.0), status (Pass/Warning/Fail), and timestamps
- **Evidence citations:** File paths, line ranges, and component references (NO source code contents)
- **Team feedback:** Aggregated Likert responses + feedback comments (anonymized by default)
- **Improvement deltas:** Score changes over time (e.g., 0.45 → 0.75)
- **System gaps:** P1/P2/P3 counts and categories

### What is NOT collected
- Source code or implementation details
- Company/team identifiers (unless explicitly opted-in for trend analysis)
- Personal developer names or commit history
- Proprietary business logic or secrets

### Opt-out and privacy controls
Teams can disable telemetry collection by setting in `.hpedsrc`:
```json
{
"telemetry_enabled": false
}
```

When disabled:
- Reporter does not POST to `telemetry_endpoint`
- Local audit scores are still computed and displayed
- System Delivery Tickets for P1 gaps are still created (essential for HPEDS roadmap)

### Data retention
Telemetry is retained for 12 months rolling window, then anonymized and aggregated into quarterly trends. Teams can request deletion of their telemetry data by contacting the HPEDS team.

### Responsible use commitment
The HPE Design System team uses telemetry solely to:
- Prioritize HPEDS roadmap (gaps, patterns, component needs)
- Measure system adoption and ROI
- Identify and resolve friction points in developer workflow

Telemetry is not used for individual developer performance metrics or organizational surveillance.

### Enterprise benefits
- **Asynchronous audits:** Teams can run the Auditor in their PRs without ever running the Engineer (passive monitoring).
- **Modular skills:** If we decide to support a new framework (e.g., Vue), we only need to update the `skills.md` for the **Auditor** and **Engineer**. The Orchestrator and Strategist logic remains exactly the same.
- **Traceability:** Every code change made by the AI is linked back to a specific metric violation found by the Auditor.
- **Asynchronous audits:** Teams can run the Auditor in their PRs without ever running the Engineer (passive monitoring mode). Useful for visibility without mandatory remediation.
- **Modular skills:** If we decide to support a new framework (e.g., Svelte), we only need to update the `skills.md` for the **Auditor** and **Engineer**. The Orchestrator, Strategist, and Reporter logic remains exactly the same.
- **Traceability:** Every code change made by the Engineer is linked back to a specific metric violation found by the Auditor, with evidence citations (file path, line range, matched knowledge artifact).
- **System-level feedback loop:** Reporter collects adoption and friction signals from 50+ teams, enabling the HPE Design System team to prioritize gaps (P1 tickets) and deprecations.

## Troubleshooting

### "Knowledge sync failed" error
**Symptom:** Auditor reports "Could not fetch latest knowledge/components."

**Cause:** Orchestrator cannot reach the knowledge repository (network issue or endpoint stale).

**Resolution:**
1. Check network connectivity: `curl https://knowledge-endpoint.hpe-ds.io/health`
2. Clear local knowledge cache: `rm -rf ~/.hpe-ds-cache/`
3. Re-run audit: `hpe-ds-ai audit` (cache will rebuild)
4. If issue persists, file an issue on HPEDS GitHub with your `.hpedsrc` and network logs.

### Score regression after fix
**Symptom:** Engineer applied a fix (e.g., replaced hex with token), but re-audit shows lower score.

**Cause:** Fix may have introduced new violations (e.g., incorrect token usage, broken layout) or affected other metrics.

**Resolution:**
1. Review the Verification Phase report; Auditor explains which metrics regressed and why
2. Engineer can propose alternative fixes: `hpe-ds-ai audit --suggest-alternatives`
3. User can undo the last fix and try a different approach: `hpe-ds-ai undo`
4. Engage with HPEDS support if the fix is correct but Auditor is mis-scoring

### Low initial audit score (< 0.50)
**Symptom:** First audit shows very low Consumer Alignment Score; team is discouraged.

**Expectation:** Low scores on initial audit are normal for legacy codebases. HPEDS system is incremental.

**Guidance:**
1. Scores are **not** a reflection of code quality; they measure DS alignment only
2. Focus on top 3 Consumer recommendations (highest impact/effort ratio)
3. Expect 10-20 point score improvement per quarter with consistent remediation
4. Review the "System Discoverability" and "Developer Experience" scores; high DX = faster remediation

### Framework not detected
**Symptom:** "Could not detect framework. Please specify in `.hpedsrc`."

**Cause:** `package.json` is ambiguous (e.g., repo has both React and Vue dependencies) or is missing.

**Resolution:**
1. Explicitly set `framework` in `.hpedsrc`: `{"framework": "react"}`
2. For monorepos with mixed frameworks, create `.hpedsrc` in the specific workspace folder and re-run audit with `--scope` flag

### Stale knowledge artifacts
**Symptom:** Auditor flags a component as "missing" but it exists in HPEDS v3.5.

**Cause:** Local knowledge cache is stale (not synced with latest HPEDS release).

**Resolution:**
1. Clear cache: `hpe-ds-ai cache clear`
2. Verify `.hpedsrc` specifies the correct HPEDS version: `{"hpeds_version": "3.5.0"}`
3. Re-run audit to download latest knowledge
4. If issue persists, file an issue with the exact component name and version

### "Insufficient evidence for metric X"
**Symptom:** Metric is marked N/A; cannot improve score for this area.

**Cause:** Audit scope does not include observable evidence (e.g., no tests for Dev Confidence, no interactive UI for Accessibility).

**Resolution:**
1. Expand audit scope: `hpe-ds-ai audit --scope src/` (include more files)
2. For Page/Feature scope, ensure the audited feature includes the relevant code areas (pages, components, tests)
3. Review "Observability rubric" in auditor instructions for minimum evidence per metric

### Team feedback not collected
**Symptom:** Developer Experience score shows "(code evidence only)" despite `feedback_collection: true`.

**Cause:** No feedback responses were sufficient (need min. 2 responses for valid signal).

**Resolution:**
1. Ensure CLI prompt is shown: run `hpe-ds-ai audit --feedback` to force feedback collection
2. Ensure at least 2 team members answer feedback questions (1-5 Likert scale)
3. Feedback is best collected in your latest audit window (< 1 sprint old)