name	description	argument-hint
audit-agents-skills	Audit quality of agents, skills, and commands in a Claude Code project	[path] [--fix] [--verbose]

Audit Agents/Skills/Commands Quality

Comprehensive quality audit for Claude Code agents, skills, and commands. Scores each file on weighted criteria with production readiness grading.

Arguments

[path] - Directory to audit (default: current project .claude/)
--fix - Generate fix suggestions for failing criteria
--verbose - Show details for all criteria (not just failures)

Usage

/audit-agents-skills              # Audit current project
/audit-agents-skills --fix        # Audit + fix suggestions
/audit-agents-skills ~/other-repo # Audit another project
/audit-agents-skills --verbose    # Full details for all criteria

Phase 1: Discovery

Objective: Locate and classify all agents, skills, and commands

Steps

Scan directories:

.claude/agents/
.claude/skills/
.claude/commands/
examples/agents/      (if exists)
examples/skills/      (if exists)
examples/commands/    (if exists)

Classify files:
- Agent: File in agents/ directory with YAML frontmatter containing tools: field
- Skill: File in skills/ directory OR has SKILL.md name OR frontmatter with allowed-tools: field
- Command: File in commands/ directory with frontmatter containing name: and description:
Display summary:
```
Found: X agents, Y skills, Z commands
```

Phase 2: Audit Individual Files

Each file type is scored on weighted criteria. Maximum scores:

Agents: 32 points
Skills: 32 points
Commands: 20 points

Agents (32 points max)

Identity (weight: 3x) - 12 points

Criterion	Points	Detection
Clear `name` field	3	Frontmatter YAML has `name:` field that's descriptive (not generic like "agent1")
`description` with triggers	3	Description contains "when", "use", or "trigger" keywords indicating activation context
`model` specified	3	Frontmatter has `model:` field (sonnet/haiku/opus)
`tools` restricted appropriately	3	Tools list doesn't include Bash unless justified, or includes explanation for risky tools

Rationale: Identity determines discoverability and activation. If users can't locate or invoke the agent, downstream quality is irrelevant.

Prompt Quality (weight: 2x) - 8 points

Criterion	Points	Detection
Role defined	2	Contains "You are" or "Your role" statement defining agent persona
Output format specified	2	Has section titled "Output", "Format", or "Deliverables" specifying expected structure
Scope/limits defined	2	Has section defining scope, triggers, or when NOT to use the agent
Anti-hallucination measures	2	Contains keywords: "verify", "cite", "source", "evidence", or warnings against hallucination

Rationale: Prompt quality determines reliability and accuracy of agent responses.

Validation (weight: 1x) - 4 points

Criterion	Points	Detection
3+ usage examples	1	Has "Examples", "Usage", or "Scenarios" section with at least 3 distinct examples
Edge cases documented	1	Mentions "edge case", "error", "failure", or "limitation" scenarios
Integration documented	1	References other agents, skills, or tools it works with
Error handling described	1	Mentions "fallback", "recovery", "error handling", or failure modes

Rationale: Validation ensures robustness through comprehensive testing scenarios.

Design (weight: 2x) - 8 points

Criterion	Points	Detection
Single responsibility	2	File size <5000 tokens AND description is focused (not "general purpose" or multiple verbs)
No duplication	2	Description doesn't overlap significantly with other agents (>50% keyword similarity check)
Composable (skills references)	2	References skills or other agents it can invoke, showing modularity
Reasonable token budget	2	File size <8000 tokens (avoids context bloat)

Rationale: Design patterns determine maintainability and scalability of agent architecture.

Skills (32 points max)

Structure (weight: 3x) - 12 points

Criterion	Points	Detection
Valid SKILL.md or frontmatter	3	File named `SKILL.md` OR has YAML frontmatter with `name:` field
`name` valid	3	Name is lowercase, 1-64 chars, matches pattern `[a-z0-9-]+` (no spaces/special chars)
`description` non-empty	3	Description field exists and is >20 characters
`allowed-tools` specified	3	Frontmatter has `allowed-tools:` field listing tool permissions

Rationale: Structure compliance ensures spec compatibility with Claude Code runtime.

Content (weight: 2x) - 8 points

Criterion	Points	Detection
Methodology/workflow described	2	Has section titled "Methodology", "Workflow", "Process", or numbered steps
Output format specified	2	Has section specifying deliverable format (Markdown, JSON, report structure)
Examples provided	2	Has "Examples", "Usage", or "Scenarios" section with concrete instances
Checklists included	2	Contains Markdown checkbox syntax `- [ ]` or `- [x]` for actionable items

Rationale: Content richness determines usability and learning curve.

Technical (weight: 1x) - 4 points

Criterion	Points	Detection
Scripts have error handling	1	If bundled scripts exist, contain `set -e`, `trap`, or `
No hardcoded paths	1	No absolute paths like `/Users/`, `/home/`, `C:\` in code or instructions
No secrets	1	No keywords: "password", "secret", "token", "api_key", "credentials" in plaintext
Dependencies documented	1	If external tools required, has "Requirements", "Dependencies", or "Prerequisites" section

Rationale: Technical hygiene prevents portability issues and security risks.

Design (weight: 2x) - 8 points

Criterion	Points	Detection
Single responsibility	2	Description is focused on one domain (not "general" or multi-purpose)
Clear triggers	2	Has section defining "When to use", "Triggers", or "Activation criteria"
No overlap with other skills	2	Description doesn't duplicate >50% of keywords from other skills in project
Portable	2	No Claude Code-specific extensions that break portability (check for custom APIs)

Rationale: Design determines findability and maintainability across projects.

Commands (20 points max)

Structure (weight: 3x) - 12 points

Criterion	Points	Detection
Valid frontmatter	3	Has YAML frontmatter with both `name:` and `description:` fields
`argument-hint` if takes args	3	If `$ARGUMENTS` variable is used in body, frontmatter has `argument-hint:` field
Step-by-step workflow	3	Body contains numbered sections (1., 2., 3.) or clear phase structure
Usage examples	3	Has section titled "Usage", "Examples", or shows invocation patterns

Rationale: Structure determines usability and learnability for command users.

Quality (weight: 2x) - 8 points

Criterion	Points	Detection
Error handling	2	Mentions "error", "failure", "fallback", or conditional paths for failures
Output format defined	2	Specifies what command outputs (report, file, summary) and its structure
Validation gates	2	Contains checkpoints, verification steps, or "before proceeding" checks
Arguments parsed properly	2	If takes args, shows how to parse/validate `$ARGUMENTS` (default values, validation)

Rationale: Quality determines reliability and production readiness.

Phase 3: Scoring

Individual File Score

Score = (Points Obtained / Max Points) × 100

Example: Agent scores 26/32 points → 81% score

Grade Assignment

Grade	Score Range	Status
A	90-100%	Production-ready ✅
B	80-89%	Good (production threshold) ⚠️
C	70-79%	Needs improvement 🔧
D	60-69%	Significant gaps ⚠️
F	<60%	Critical issues ❌

Production Threshold: 80% (Grade B or higher)

Overall Project Score

Weighted average by file type:

Overall = (Σ Agent Scores × Agent Count + Σ Skill Scores × Skill Count + Σ Command Scores × Command Count) / Total Files

Phase 4: Report Generation

Report Structure

# Audit: Agents/Skills/Commands

**Project**: {path}
**Date**: {date}
**Overall Score**: {score}% ({grade})
**Files Audited**: {total} ({n} agents, {n} skills, {n} commands)
**Production Ready**: {count} files ({percentage}%)

---

## Summary

| Type | Files | Avg Score | Grade | Production Ready |
|------|-------|-----------|-------|------------------|
| Agents | X | Y% | Z | N/X (%) |
| Skills | X | Y% | Z | N/X (%) |
| Commands | X | Y% | Z | N/X (%) |

---

## Individual Scores

| File | Type | Score | Grade | Top Issues |
|------|------|-------|-------|------------|
| agent-name.md | Agent | 85% | B | Missing anti-hallucination measures, no edge cases |
| skill-name/ | Skill | 72% | C | Hardcoded paths, no error handling |
| command.md | Command | 95% | A | None |

---

## Top Issues (Across All Files)

1. **Missing error handling** (8 files affected)
   - Impact: Runtime failures unhandled
   - Fix: Add error handling sections, fallback strategies

2. **Hardcoded paths** (5 files affected)
   - Impact: Portability broken across systems
   - Fix: Use relative paths or environment variables

3. **No usage examples** (4 files affected)
   - Impact: Poor learnability, unclear invocation
   - Fix: Add "Examples" section with 3+ scenarios

---

## Detailed Breakdown

<details>
<summary>agent-name.md (Agent, 85%, Grade B)</summary>

### Scores by Category

| Category | Points | Max | Pass |
|----------|--------|-----|------|
| Identity | 12 | 12 | ✅ |
| Prompt Quality | 6 | 8 | ⚠️ |
| Validation | 2 | 4 | ❌ |
| Design | 6 | 8 | ⚠️ |

### Failed Criteria

- ❌ **Anti-hallucination measures** (2 pts): No keywords found for source verification
- ❌ **Edge cases documented** (1 pt): No mention of failure scenarios
- ❌ **Integration documented** (1 pt): No references to other agents/skills

### Recommendations

1. Add "Source Verification" section requiring citation of claims
2. Document edge cases: API failures, timeout scenarios, invalid input
3. List compatible skills/agents for composition patterns

</details>

---

## Recommendations (Prioritized)

### High Priority (Critical for production)

1. **Add error handling to 8 files**
   - Files: [list]
   - Action: Add error handling sections, define fallback behaviors

2. **Remove hardcoded paths from 5 files**
   - Files: [list]
   - Action: Replace with `$HOME`, relative paths, or env vars

### Medium Priority (Improves quality)

3. **Add usage examples to 4 files**
   - Files: [list]
   - Action: Create "Examples" section with 3+ scenarios

4. **Define output formats in 3 files**
   - Files: [list]
   - Action: Specify deliverable structure (Markdown/JSON/report)

### Low Priority (Polish)

5. **Add integration docs to 2 files**
   - Files: [list]
   - Action: List compatible agents/skills for composition

---

## Next Steps

1. Review failures: Focus on Grade D/F files first
2. Run with `--fix` for automated suggestions
3. Re-audit after improvements to track progress
4. Aim for 80%+ (Grade B) across all files for production readiness

Phase 5: Fix Mode (Optional)

Trigger: --fix flag

For each failing criterion, generate specific fix suggestion:

Example Fix Suggestions

File: agent-name.md Issue: Missing anti-hallucination measures (2 pts lost)

Suggested Fix:

Add this section after the "Methodology" section:

## Source Verification

- Always cite sources for factual claims
- Use phrases like "According to [source]..." or "Based on [documentation]..."
- If uncertain, explicitly state "I don't have verified information on..."
- Never invent statistics, version numbers, or API details

File: skill-debugging/scripts/analyze.sh Issue: No error handling (1 pt lost)

Suggested Fix:

Add to top of script:

set -e  # Exit on error
trap 'echo "Error on line $LINENO"' ERR

# Replace risky commands:
curl https://api.example.com        # ❌ No error check
curl https://api.example.com || {   # ✅ Error handled
    echo "API call failed"
    exit 1
}

Verbose Mode (Optional)

Trigger: --verbose flag

By default, report shows only failed criteria. Verbose mode shows all criteria with pass/fail status:

### All Criteria (Verbose)

| Criterion | Status | Points | Notes |
|-----------|--------|--------|-------|
| Clear name | ✅ Pass | 3/3 | Name is "debugging-specialist" (descriptive) |
| Description with triggers | ✅ Pass | 3/3 | Contains "Use when debugging..." |
| Model specified | ❌ Fail | 0/3 | No `model:` field in frontmatter |
| Tools restricted | ⚠️ Partial | 2/3 | Includes Bash but no justification |
| ... | ... | ... | ... |

Industry Context

Source: LangChain Agent Report 2026 (verified)

Key Statistics:

29.5% of organizations deploy agents without systematic evaluation
18% cite "agent bugs" as their top challenge
Only 12% use automated quality checks

Implication: This audit addresses a real industry gap. Most teams deploy agents/skills without validation, leading to production issues. The 80% threshold (Grade B) aligns with industry best practices for production readiness.

Comparison: Manual checklists (like the Guide's Agent Validation Checklist on line 4921) are comprehensive but error-prone. Automated scoring reduces human error and provides quantitative metrics for tracking improvements over time.

Agent Validation Checklist (guide line 4921): Manual 16-criteria checklist
Skill Validation (guide line 5491): Spec compliance documentation
Examples: examples/agents/, examples/skills/, examples/commands/
Advanced Audit: Use audit-agents-skills skill (see examples/skills/) for comparative analysis vs templates

Implementation Notes

Detection Patterns

Frontmatter Parsing:

import re
yaml_match = re.search(r'^---\n(.*?)\n---', content, re.DOTALL)
if yaml_match:
    import yaml
    frontmatter = yaml.safe_load(yaml_match.group(1))

Keyword Detection (case-insensitive):

has_trigger = any(word in description.lower() for word in ['when', 'use', 'trigger'])

Token Counting (approximate):

tokens = len(content.split()) * 1.3  # Rough estimate: 1 token ≈ 0.75 words

Overlap Detection

Compare descriptions using Jaccard similarity:

def jaccard_similarity(desc1, desc2):
    words1 = set(desc1.lower().split())
    words2 = set(desc2.lower().split())
    intersection = words1 & words2
    union = words1 | words2
    return len(intersection) / len(union) if union else 0

# Flag if similarity > 0.5 (50% keyword overlap)

Grade Color Coding (Terminal Output)

COLORS = {
    'A': '\033[92m',  # Green
    'B': '\033[93m',  # Yellow
    'C': '\033[93m',  # Yellow
    'D': '\033[91m',  # Red
    'F': '\033[91m'   # Red
}

Command ready for use: /audit-agents-skills

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audit Agents/Skills/Commands Quality

Arguments

Usage

Phase 1: Discovery

Steps

Phase 2: Audit Individual Files

Agents (32 points max)

Identity (weight: 3x) - 12 points

Prompt Quality (weight: 2x) - 8 points

Validation (weight: 1x) - 4 points

Design (weight: 2x) - 8 points

Skills (32 points max)

Structure (weight: 3x) - 12 points

Content (weight: 2x) - 8 points

Technical (weight: 1x) - 4 points

Design (weight: 2x) - 8 points

Commands (20 points max)

Structure (weight: 3x) - 12 points

Quality (weight: 2x) - 8 points

Phase 3: Scoring

Individual File Score

Grade Assignment

Overall Project Score

Phase 4: Report Generation

Report Structure

Phase 5: Fix Mode (Optional)

Example Fix Suggestions

Verbose Mode (Optional)

Industry Context

Related

Implementation Notes

Detection Patterns

Overlap Detection

Grade Color Coding (Terminal Output)

FilesExpand file tree

audit-agents-skills.md

Latest commit

History

audit-agents-skills.md

File metadata and controls

Audit Agents/Skills/Commands Quality

Arguments

Usage

Phase 1: Discovery

Steps

Phase 2: Audit Individual Files

Agents (32 points max)

Identity (weight: 3x) - 12 points

Prompt Quality (weight: 2x) - 8 points

Validation (weight: 1x) - 4 points

Design (weight: 2x) - 8 points

Skills (32 points max)

Structure (weight: 3x) - 12 points

Content (weight: 2x) - 8 points

Technical (weight: 1x) - 4 points

Design (weight: 2x) - 8 points

Commands (20 points max)

Structure (weight: 3x) - 12 points

Quality (weight: 2x) - 8 points

Phase 3: Scoring

Individual File Score

Grade Assignment

Overall Project Score

Phase 4: Report Generation

Report Structure

Phase 5: Fix Mode (Optional)

Example Fix Suggestions

Verbose Mode (Optional)

Industry Context

Related

Implementation Notes

Detection Patterns

Overlap Detection

Grade Color Coding (Terminal Output)