Skip to content

Commit fad1b27

Browse files
committed
Updated Evals results for all skills.
1 parent f3c3d3b commit fad1b27

2 files changed

Lines changed: 14 additions & 18 deletions

File tree

README.md

Lines changed: 5 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Claude Skills for Governance, Risk & Compliance (GRC)
22
Expert-level compliance guidance for ISO 27001, SOC 2, FedRAMP, GDPR, HIPAA, NIST CSF, PCI DSS, TSA Cybersecurity, ISO 42001 AI Management System, ISO 27701 Privacy Information Management, DORA Digital Operational Resilience, and India's Digital Personal Data Protection Act (DPDPA) — powered by Claude Skills.
33

4-
Benchmarked across 60 test cases (5 per framework) using the eval framework — each graded against 5 verifiable assertions by independent agents. Skills scored **92%** vs a baseline of **84%** across 300 total assertions.
4+
Benchmarked across 60 test cases (5 per framework) using the eval framework — each graded against 5 verifiable assertions by independent agents. Skills scored **94%** vs a baseline of **83%** across 300 total assertions.
55

66
[![Release: v0.3.0](https://img.shields.io/badge/Release-v0.3.0-brightgreen.svg)](../../releases/tag/v0.3.0)
77
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
@@ -441,9 +441,9 @@ These skills were benchmarked using the [Claude Skill Creator](https://claude.ai
441441

442442
| Configuration | Pass Rate | Assertions Passed |
443443
|---------------|-----------|-------------------|
444-
| **With GRC Skills installed** | **92%** | **276 / 300** |
445-
| Without skills (baseline Claude) | 84% | 252 / 300 |
446-
| **Delta** | **+8 points** | **+24 assertions** |
444+
| **With GRC Skills installed** | **94%** | **282 / 300** |
445+
| Without skills (baseline Claude) | 83% | 250 / 300 |
446+
| **Delta** | **+11 points** | **+32 assertions** |
447447

448448
### Per-Skill Results
449449

@@ -458,14 +458,12 @@ These skills were benchmarked using the [Claude Skill Creator](https://claude.ai
458458
| PCI DSS | 5 | **92%** | 88% | +4% | SAQ type selection; Req 3 stored data (v4.0); Breach obligations; Penetration testing; Tokenization scope |
459459
| TSA Cybersecurity | 5 | **100%** | 96% | +4% | Pipeline directive requirements; CIRP elements; OT/IT segmentation; Airport applicability; TSA vs CIRCIA |
460460
| ISO 42001 | 5 | **92%** | 80% | +12% | AIMS applicability; Key requirements; AI-specific risks; Third-party LLM management; AI ethics controls |
461-
| ISO 27701 | 5 | **76%** | 84% | -8% | Extension to ISO 27001; GDPR mapping; Processor controls; PIA methodology; Certification as GDPR evidence |
461+
| ISO 27701 | 5 | **100%** | 80% | +20% | Extension to ISO 27001; GDPR mapping; Processor controls; PIA methodology; Certification as GDPR evidence |
462462
| DORA | 5 | **88%** | 72% | +16% | Five pillars; ICT incident reporting timelines; TLPT requirements; Third-party contracts; DORA vs EBA |
463463
| DPDPA | 5 | **96%** | 80% | +16% | Applicability to foreign entities; Consent vs GDPR; Children's data (18-year threshold); Cross-border transfers; SDF obligations |
464464

465465
Skills add the most measurable value on highly framework-specific tasks: clause-level precision for ISO 27001, CC criteria mapping for SOC 2, exact FedRAMP POA&M timeframes and document names, GDPR article citations, HIPAA regulatory section references, CSF 2.0 subcategory IDs, PCI DSS v4.0.1 requirement numbers, TSA Security Directive citations, ISO 42001 AIMS clause references, DORA Article citations and exact incident reporting timelines (4h/72h/1 month), and DPDPA-specific terminology (Data Fiduciary, 8 legitimate uses, blacklist transfers).
466466

467-
The ISO 27701 skill shows a slight negative delta in keyword-matching grading because baseline Claude already has substantial GDPR/privacy knowledge; qualitative review of the outputs confirms the skill still provides more structured, citation-precise responses.
468-
469467
📊 **[View the full eval results →](grc-skills-eval-results.html)**
470468

471469
---

index.html

Lines changed: 9 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -713,16 +713,16 @@ <h2>Skill Evaluation</h2>
713713

714714
<div class="stat-grid">
715715
<div class="stat-card green">
716-
<div class="value">92%</div>
717-
<div class="label">With GRC Skills installed<br /><small>276 / 300 assertions passed</small></div>
716+
<div class="value">94%</div>
717+
<div class="label">With GRC Skills installed<br /><small>282 / 300 assertions passed</small></div>
718718
</div>
719719
<div class="stat-card">
720-
<div class="value">84%</div>
721-
<div class="label">Baseline Claude (no skills)<br /><small>252 / 300 assertions passed</small></div>
720+
<div class="value">83%</div>
721+
<div class="label">Baseline Claude (no skills)<br /><small>250 / 300 assertions passed</small></div>
722722
</div>
723723
<div class="stat-card delta">
724-
<div class="value">+8</div>
725-
<div class="label">Point improvement<br /><small>+24 additional assertions passed</small></div>
724+
<div class="value">+11</div>
725+
<div class="label">Point improvement<br /><small>+32 additional assertions passed</small></div>
726726
</div>
727727
</div>
728728

@@ -740,7 +740,7 @@ <h3>Per-Skill Results</h3>
740740
<tr><td>💳 PCI DSS</td><td>5</td><td><strong>92%</strong></td><td>88%</td><td>+4%</td><td>SAQ type selection; Req 3 stored data (v4.0); Breach obligations; Penetration testing; Tokenization scope</td></tr>
741741
<tr><td>🚨 TSA Cybersecurity</td><td>5</td><td><strong>100%</strong></td><td>96%</td><td>+4%</td><td>Pipeline directive requirements; CIRP elements; OT/IT segmentation; Airport applicability; TSA vs CIRCIA</td></tr>
742742
<tr><td>🤖 ISO 42001</td><td>5</td><td><strong>92%</strong></td><td>80%</td><td>+12%</td><td>AIMS applicability; Key requirements; AI-specific risks; Third-party LLM management; AI ethics controls</td></tr>
743-
<tr><td>🔏 ISO 27701</td><td>5</td><td><strong>76%</strong></td><td>84%</td><td>-8%</td><td>Extension to ISO 27001; GDPR mapping; Processor controls; PIA methodology; Certification as GDPR evidence</td></tr>
743+
<tr><td>🔏 ISO 27701</td><td>5</td><td><strong>100%</strong></td><td>80%</td><td>+20%</td><td>Extension to ISO 27001; GDPR mapping; Processor controls; PIA methodology; Certification as GDPR evidence</td></tr>
744744
<tr><td>🏦 DORA</td><td>5</td><td><strong>88%</strong></td><td>72%</td><td>+16%</td><td>Five pillars; ICT incident reporting timelines; TLPT requirements; Third-party contracts; DORA vs EBA</td></tr>
745745
<tr><td>🇮🇳 DPDPA</td><td>5</td><td><strong>96%</strong></td><td>80%</td><td>+16%</td><td>Applicability to foreign entities; Consent vs GDPR; Children's data (18-year threshold); Cross-border transfers; SDF obligations</td></tr>
746746
</tbody>
@@ -749,8 +749,6 @@ <h3>Per-Skill Results</h3>
749749

750750
<p>Skills add the most measurable value on highly framework-specific tasks: clause-level precision for ISO 27001, CC criteria mapping for SOC 2, exact FedRAMP document names and POA&amp;M timeframes, GDPR article citations, HIPAA regulatory section references, CSF 2.0 subcategory IDs, PCI DSS v4.0.1 requirement numbers, TSA Security Directive citations, ISO 42001 AIMS clause references, DORA Article numbers and exact incident reporting timelines (4h/72h/1 month), and DPDPA-specific terminology and section references.</p>
751751

752-
<p><em>Note: ISO 27701 shows a slight negative delta in keyword-matching grading because baseline Claude already has substantial GDPR/privacy knowledge. Qualitative review confirms the skill still produces more structured, citation-precise responses.</em></p>
753-
754752
<a href="grc-skills-eval-results.html" class="eval-link-btn">📊 View the full eval results →</a>
755753

756754
</section>
@@ -899,7 +897,7 @@ <h4>🌐 GitHub Pages — Multi-Tab Site</h4>
899897
<li><span class="release-badge badge-new">New</span> Interactive Customer Feedback tab with Formspree-powered contact form (Customer Name, Company, Feedback Title, Feedback Body) — submissions delivered to <a href="mailto:hemant.naik@gmail.com">hemant.naik@gmail.com</a></li>
900898
<li><span class="release-badge badge-new">New</span> Integrated Formspree Ajax library (<code>@formspree/ajax</code>) via CDN for inline field validation and no-reload submissions</li>
901899
<li><span class="release-badge badge-new">New</span> Release Notes section (this section) added to the Resources tab</li>
902-
<li><span class="release-badge badge-improve">Improved</span> Evaluation tab now shows stat cards (92% / 84% / +8pts) and per-skill results table for all 12 skills</li>
900+
<li><span class="release-badge badge-improve">Improved</span> Evaluation tab now shows stat cards (94% / 83% / +11pts) and per-skill results table for all 12 skills</li>
903901
</ul>
904902
<h4>🐛 Bug Fixes — Skill Installability</h4>
905903
<ul>
@@ -930,7 +928,7 @@ <h4>🆕 New Skills (4)</h4>
930928
<h4>📊 Skill Evaluation</h4>
931929
<ul>
932930
<li><span class="release-badge badge-improve">Improved</span> Expanded eval suite to <strong>12 skills / 60 test cases</strong> (5 per framework), each graded against 5 verifiable assertions by independent grader agents — 300 total assertions</li>
933-
<li><span class="release-badge badge-improve">Improved</span> Skills scored <strong>92%</strong> vs baseline of 84% (+8 point improvement, +24 additional assertions passed)</li>
931+
<li><span class="release-badge badge-improve">Improved</span> Skills scored <strong>94%</strong> vs baseline of 83% (+11 point improvement, +32 additional assertions passed)</li>
934932
<li><span class="release-badge badge-improve">Improved</span> Evaluation tab updated with full 60-case results for all 12 skills including DORA and DPDPA</li>
935933
</ul>
936934
<h4>🐛 Bug Fixes</h4>

0 commit comments

Comments
 (0)