IgniteUI · Copilot · Mar 8, 2026 · Mar 8, 2026 · Mar 8, 2026 · Mar 8, 2026
diff --git a/.github/workflows/skill-eval.yml b/.github/workflows/skill-eval.yml
@@ -0,0 +1,88 @@
+name: Skill Eval
+
+on:
+  pull_request:
+    paths:
+      - 'skills/**'
+      - 'evals/**'
+
+jobs:
+  eval:
+    runs-on: ubuntu-latest
+    timeout-minutes: 30
+
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v4
+
+      - name: Set up Node.js
+        uses: actions/setup-node@v4
+        with:
+          node-version: '20'
+
+      - name: Install eval dependencies
+        working-directory: evals
+        run: npm install
+
+      - name: Run skill evals
+        working-directory: evals
+        run: npx skill-eval _ --suite=all --trials=5
+        env:
+          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
+          GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }}
+
+      - name: Upload results
+        if: always()
+        uses: actions/upload-artifact@v4
+        with:
+          name: skill-eval-results
+          path: evals/results/
+          retention-days: 30
+
+      - name: Post summary comment
+        if: always() && github.event_name == 'pull_request'
+        uses: actions/github-script@v7
+        with:
+          script: |
+            const fs = require('fs');
+            const path = require('path');
+
+            const resultsDir = 'evals/results';
+            let summary = '## 📊 Skill Eval Results\n\n';
+
+            try {
+              const files = fs.readdirSync(resultsDir).filter(f => f.endsWith('.json'));
+              if (files.length === 0) {
+                summary += '> ⚠️ No eval results found. The eval run may have failed.\n';
+              } else {
+                summary += '| Task | Pass Rate | pass@5 | Status |\n';
+                summary += '|---|---|---|---|\n';
+
+                for (const file of files) {
+                  try {
+                    const data = JSON.parse(fs.readFileSync(path.join(resultsDir, file), 'utf8'));
+                    const taskName = data.task || file.replace('.json', '');
+                    const passRate = data.passRate != null ? `${(data.passRate * 100).toFixed(0)}%` : 'N/A';
+                    const passAtK = data.passAtK != null ? `${(data.passAtK * 100).toFixed(0)}%` : 'N/A';
+                    const status = data.passAtK >= 0.8 ? '✅' : data.passAtK >= 0.6 ? '⚠️' : '❌';
+                    summary += `| ${taskName} | ${passRate} | ${passAtK} | ${status} |\n`;
+                  } catch (e) {
+                    summary += `| ${file} | Error | Error | ❌ |\n`;
+                  }
+                }
+
+                summary += '\n### Thresholds\n';
+                summary += '- ✅ `pass@5 ≥ 80%` — merge gate passed\n';
+                summary += '- ⚠️ `pass@5 ≥ 60%` — needs investigation\n';
+                summary += '- ❌ `pass@5 < 60%` — blocks merge for affected skill\n';
+              }
+            } catch (e) {
+              summary += `> ⚠️ Could not read results: ${e.message}\n`;
+            }
+
+            await github.rest.issues.createComment({
+              owner: context.repo.owner,
+              repo: context.repo.repo,
+              issue_number: context.issue.number,
+              body: summary,
+            });
diff --git a/.gitignore b/.gitignore
@@ -56,3 +56,8 @@ extras/docs/themes/sassdoc/sassdoc/*
 
 # Localization sources
 i18nRepo
+
+# Eval artifacts (keep baseline results)
+evals/node_modules
+evals/results/*.json
+!evals/results/baseline.json
diff --git a/evals/README.md b/evals/README.md
@@ -0,0 +1,155 @@
+# Ignite UI for Angular — Skill Evals
+
+Automated evaluation suite for the Ignite UI for Angular agent skills. Uses the
+[skill-eval](https://github.com/mgechev/skill-eval) framework to measure skill
+quality, detect regressions, and gate merges.
+
+## Overview
+
+The suite tests three skills:
+
+| Skill | Task ID | What it tests |
+|---|---|---|
+| `igniteui-angular-grids` | `grid-basic-setup` | Flat grid with sorting and pagination on flat employee data |
+| `igniteui-angular-components` | `component-combo-reactive-form` | Multi-select combo bound to a reactive form control |
+| `igniteui-angular-theming` | `theming-palette-generation` | Custom branded palette with `palette()` and `theme()` |
+
+Each task includes:
+
+- **`instruction.md`** — the prompt given to the agent
+- **`tests/test.sh`** — deterministic grader (file checks, compilation, lint)
+- **`prompts/quality.md`** — LLM rubric grader (intent routing, API usage)
+- **`solution/solve.sh`** — reference solution for baseline validation
+- **`environment/Dockerfile`** — isolated environment for agent execution
+- **`skills/`** — symlinked or copied skill files under test
+
+## Prerequisites
+
+- Node.js 20+
+- Docker (for isolated agent execution)
+- An API key for the agent provider (Gemini or Anthropic)
+
+## Running Evals Locally
+
+### Install dependencies
+
+```bash
+cd evals
+npm install
+```
+
+### Run a single task
+
+```bash
+# Gemini (default)
+GEMINI_API_KEY=your-key npm run eval -- grid-basic-setup
+
+# Claude
+ANTHROPIC_API_KEY=your-key npm run eval -- grid-basic-setup --agent=claude
+```
+
+### Run all tasks
+
+```bash
+GEMINI_API_KEY=your-key npm run eval:all
+```
+
+### Options
+
+```bash
+# Adjust trials (default: 5)
+npm run eval -- grid-basic-setup --trials=5
+
+# Run locally without Docker
+npm run eval -- grid-basic-setup --provider=local
+
+# Validate graders against the reference solution
+npm run eval -- grid-basic-setup --validate --provider=local
+
+# Run multiple trials in parallel
+npm run eval -- grid-basic-setup --parallel=3
+```
+
+### Preview results
+
+```bash
+# CLI report
+npm run preview
+
+# Web UI at http://localhost:3847
+npm run preview:browser
+```
+
+## Adding a New Task
+
+1. Create a directory under `evals/tasks/<task-id>/` with the standard structure:
+
+   ```
+   tasks/<task-id>/
+   ├── task.toml               # Config: graders, timeouts, resource limits
+   ├── instruction.md          # Agent prompt
+   ├── environment/Dockerfile  # Container setup
+   ├── tests/test.sh           # Deterministic grader
+   ├── prompts/quality.md      # LLM rubric grader
+   ├── solution/solve.sh       # Reference solution
+   └── skills/                 # Skill files under test
+       └── <skill-name>/SKILL.md
+   ```
+
+2. Write a clear, unambiguous `instruction.md` that tells the agent exactly what
+   to build.
+
+3. Write `tests/test.sh` to check **outcomes** (files exist, project compiles,
+   correct selectors are present) rather than specific steps.
+
+4. Write `prompts/quality.md` with rubric dimensions that sum to 1.0.
+
+5. Write `solution/solve.sh` — a shell script that proves the task is solvable
+   and validates that the graders work correctly.
+
+6. Validate graders before submitting:
+
+   ```bash
+   npm run eval -- <task-id> --validate --provider=local
+   ```
+
+## Pass / Fail Thresholds
+
+Following [Anthropic's recommendations](https://www.anthropic.com/engineering/demystifying-evals-for-ai-agents):
+
+| Metric | Threshold | Effect |
+|---|---|---|
+| `pass@5 ≥ 80%` | **Merge gate** | At least 1 success in 5 trials required |
+| `pass^5 ≥ 60%` | **Tracked** | Flags flaky skills for investigation |
+| `pass@5 < 60%` | **Blocks merge** | On PRs touching the relevant skill |
+
+## CI Integration
+
+The GitHub Actions workflow at `.github/workflows/skill-eval.yml` runs
+automatically on PRs that modify `skills/**` or `evals/**`. It:
+
+1. Checks out the repo
+2. Installs eval dependencies
+3. Runs all tasks with 5 trials
+4. Uploads results as an artifact
+5. Posts a summary comment on the PR
+
+## Grading Strategy
+
+**Deterministic grader (60% weight)** — checks:
+- Project builds without errors
+- Correct Ignite UI selector is present in the generated template
+- Required imports exist
+- No use of forbidden alternatives
+
+**LLM rubric grader (40% weight)** — evaluates:
+- Correct intent routing
+- Idiomatic API usage
+- Absence of hallucinated APIs
+- Following the skill's guidance
+
+## Results
+
+Baseline results are stored in `evals/results/baseline.json` and used for
+regression comparison on PRs. The CI workflow uploads per-run results as
+GitHub Actions artifacts.
diff --git a/evals/package.json b/evals/package.json
@@ -0,0 +1,21 @@
+{
+  "name": "igniteui-angular-skill-evals",
+  "version": "1.0.0",
+  "description": "Evaluation suite for Ignite UI for Angular agent skills",
+  "private": true,
+  "scripts": {
+    "eval": "npx skill-eval",
+    "eval:grid": "npx skill-eval grid-basic-setup",
+    "eval:combo": "npx skill-eval component-combo-reactive-form",
+    "eval:theming": "npx skill-eval theming-palette-generation",
+    "eval:all": "npx skill-eval _ --suite=all",
+    "preview": "npx skill-eval preview",
+    "preview:browser": "npx skill-eval preview browser"
+  },
+  "dependencies": {
+    "skill-eval": "^1.0.0"
+  },
+  "engines": {
+    "node": ">=20.0.0"
+  }
+}
diff --git a/evals/results/baseline.json b/evals/results/baseline.json
@@ -0,0 +1,36 @@
+{
+  "generated_at": "2026-03-08T07:00:00.000Z",
+  "framework_version": "1.0.0",
+  "description": "Initial baseline results for skill evals. Actual scores will be populated after the first full eval run with an API key.",
+  "thresholds": {
+    "pass_at_5_merge_gate": 0.8,
+    "pass_at_5_block": 0.6,
+    "pass_pow_5_tracked": 0.6
+  },
+  "tasks": {
+    "grid-basic-setup": {
+      "skill": "igniteui-angular-grids",
+      "trials": 5,
+      "pass_rate": null,
+      "pass_at_5": null,
+      "pass_pow_5": null,
+      "status": "pending_first_run"
+    },
+    "component-combo-reactive-form": {
+      "skill": "igniteui-angular-components",
+      "trials": 5,
+      "pass_rate": null,
+      "pass_at_5": null,
+      "pass_pow_5": null,
+      "status": "pending_first_run"
+    },
+    "theming-palette-generation": {
+      "skill": "igniteui-angular-theming",
+      "trials": 5,
+      "pass_rate": null,
+      "pass_at_5": null,
+      "pass_pow_5": null,
+      "status": "pending_first_run"
+    }
+  }
+}
diff --git a/evals/tasks/component-combo-reactive-form/environment/Dockerfile b/evals/tasks/component-combo-reactive-form/environment/Dockerfile
@@ -0,0 +1,17 @@
+FROM node:20-slim
+
+WORKDIR /workspace
+
+RUN npm install -g @angular/cli@latest
+
+RUN ng new eval-app --skip-git --skip-install --style=scss --ssr=false && \
+    cd eval-app && \
+    npm install && \
+    npm install igniteui-angular
+
+WORKDIR /workspace/eval-app
+
+COPY . .
+
+RUN mkdir -p logs/verifier
+CMD ["bash"]
diff --git a/evals/tasks/component-combo-reactive-form/instruction.md b/evals/tasks/component-combo-reactive-form/instruction.md
@@ -0,0 +1,40 @@
+# Task: Add a Multi-Select Combo in a Reactive Form
+
+You are working in an Angular 20+ project that already has `igniteui-angular` installed and a theme applied.
+
+## Requirements
+
+Create a `UserSettingsComponent` with a reactive form that includes a multi-select combo for choosing notification channels.
+
+1. **Component location**: `src/app/user-settings/user-settings.component.ts` (with its template)
+
+2. **Form structure**: Create a reactive form (`FormGroup`) with a `notificationChannels` control
+
+3. **Data source**: Use the following list of notification channels:
+
+   ```typescript
+   channels = [
+     { id: 1, name: 'Email', icon: 'email' },
+     { id: 2, name: 'SMS', icon: 'sms' },
+     { id: 3, name: 'Push Notification', icon: 'notifications' },
+     { id: 4, name: 'Slack', icon: 'chat' },
+     { id: 5, name: 'Microsoft Teams', icon: 'groups' },
+   ];
+   ```
+
+4. **Combo configuration**:
+   - Use the Ignite UI for Angular Combo component for multi-selection
+   - Bind it to the `notificationChannels` form control
+   - Display the `name` field in the dropdown
+   - Use the `id` field as the value key
+
+5. **Form validation**: The `notificationChannels` control must be required (at least one channel must be selected)
+
+6. **Submit button**: Add a submit button that is disabled when the form is invalid
+
+## Constraints
+
+- Use the Ignite UI `igx-combo` component — do NOT use a native `<select multiple>`, `igx-select`, or Angular Material `mat-select`.
+- Import from the correct `igniteui-angular` entry point.
+- The component must be standalone and use `ChangeDetectionStrategy.OnPush`.
+- Use reactive forms (`FormGroup` / `FormControl`), not template-driven forms.