feat(skills): Add agent-performance skill for metrics tracking #1627

rysweet · 2025-11-25T07:31:49Z

Summary

Adds new agent-performance skill for tracking agent effectiveness
Tracks invocation counts, success rates, completion times
Identifies underutilized agents (30+ available, which are used?)
Leverages existing workflow_tracker.py infrastructure

Part of Issue #1611 (Enhancement 2)

Files Added

.claude/skills/agent-performance/SKILL.md - Main skill definition
.claude/skills/agent-performance/README.md - Documentation

How to Use

Triggers: 'agent performance', 'agent metrics', 'which agents', 'underutilized'

🤖 Generated with Claude Code

Add new skill for tracking agent utilization and effectiveness: - Invocation counts per agent - Success/failure rates - Average completion times - Identifies underutilized agents - Leverages existing workflow_tracker.py Part of Issue #1611 Enhancement 2 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

rysweet · 2025-11-25T19:30:22Z

Code Review for agent-performance Skill

Reviewer: Claude Code Reviewer Agent
Date: 2025-11-25

Philosophy Compliance: 8/10

Strengths:

Follows "Ruthless Simplicity" - leverages existing workflow_tracker.py infrastructure instead of creating new logging systems
Adheres to "Zero-BS Implementation" - no placeholder code, working aggregation logic documented
Respects "Modular Design" - self-contained skill with clear boundaries (SKILL.md + README.md)
Uses file-based storage (YAML/JSONL) instead of external dependencies

Minor Concerns:

The skill describes a report generation process but relies entirely on Claude following instructions rather than executable code. This is appropriate for a Claude Code skill but means correctness depends on LLM execution fidelity.

Strengths

Leverages Existing Infrastructure: Correctly identifies and uses workflow_tracker.py with its log_agent_invocation() function, avoiding duplicate logging systems.
Actionable Metrics Focus: The report format includes practical recommendations ("Consider using database agent for schema work") rather than just raw numbers.
Complete Agent Inventory: Documents 32 available agents (6 core + 26 specialized), which matches the actual agent count in .claude/agents/amplihack/.

Issues Found

MINOR - Agent Count Discrepancy:

SKILL.md claims "26 specialized agents" but I count 31 files in specialized/ directory
Location: .claude/skills/agent-performance/SKILL.md line 155-165
Impact: Low - documentation accuracy
Suggestion: Update the inventory count or explicitly state that the list is representative, not exhaustive

MINOR - Missing Error Handling Documentation:

The skill doesn't document what happens if workflow_execution.jsonl doesn't exist or is empty
Impact: Low - edge case handling
Suggestion: Add a note that reports gracefully degrade with "no data available" for new installations

OBSERVATION - No Automated Aggregation Script:

The skill documents how to aggregate manually but doesn't include a Python script for automated report generation
This is consistent with skill design (skills are instructions, not executables)
Consider: Future enhancement could add a scenario tool in .claude/scenarios/agent-metrics/

Recommendations

Verify Agent Count: Update line 155-165 to reflect actual agent counts after review
Add Empty State Handling: Document behavior when no log data exists
Consider Companion Tool: If automated report generation is valuable, create a scenario tool that implements the aggregation logic

Philosophy Scores

Criterion	Score	Notes
Ruthless Simplicity	9/10	Leverages existing infra, no new dependencies
Modular Design	9/10	Clean skill boundary, clear public interface
Zero-BS Implementation	8/10	Working logic, minor doc inconsistencies
Clarity	8/10	Good examples, minor inventory discrepancy

Overall: 8.5/10

Verdict

[x] Comment - Minor suggestions only

The skill follows amplihack philosophy well. The identified issues are documentation accuracy items that don't affect functionality. The core design of leveraging workflow_tracker.py and providing actionable metrics is sound.

Recommendation: Merge as-is, or optionally address the minor agent count discrepancy before merging.

Generated by Claude Code Reviewer Agent

Improvements: - Fix specialized agent count: 26 -> 25 (accurate count) - Add dynamic count note for future maintainability - Add "Interpreting Metrics" section with benchmarks for: - Success rate guidelines (95%+, 85-94%, 70-84%, <70%) - Invocation volume interpretation - Duration benchmarks - Add "Empty State Handling" with example output - Add "Limitations" section documenting 6 constraints - Update README.md with summary of new sections 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

rysweet · 2025-11-25T20:22:06Z

Quality Improvement Update

Previous Score: 8.5/10
New Score: 9/10

Improvements Made

Fixed Agent Count Accuracy
- Changed from incorrect "26 specialized agents" to accurate "25 specialized agents"
- Added note about dynamic count with command to verify current count
Added Interpreting Metrics Section
- Success rate guidelines table (95-100%, 85-94%, 70-84%, <70% with assessments and recommended actions)
- Invocation volume interpretation (high/medium/low/very low thresholds)
- Duration benchmarks (fast/normal/complex/inefficient ranges)
Added Empty State Handling
- Complete example YAML output when no logs exist
- Clear "getting_started" guidance
- "next_steps" for enabling tracking
Added Limitations Section
- 6 documented constraints:
  - Dependency on workflow_tracker
  - On-demand (not real-time) reports
  - Historical data only
  - No automatic anomaly detection
  - Single-project scope
  - No correlation with code quality
Updated README.md
- Added summary of interpreting results
- Added empty state handling description
- Added limitations summary with reference to SKILL.md

Quality Assessment

These improvements bring the skill to 9/10 by:

Fixing factual accuracy (agent count)
Adding actionable guidance (metric interpretation)
Documenting edge cases (empty state)
Being transparent about constraints (limitations)

🤖 Generated with Claude Code

rysweet mentioned this pull request Nov 25, 2025

🚀 10 Enhancement Ideas for amplihack - Analysis from Trace Files and PRs #1611

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(skills): Add agent-performance skill for metrics tracking #1627

feat(skills): Add agent-performance skill for metrics tracking #1627

Uh oh!

rysweet commented Nov 25, 2025

Uh oh!

rysweet commented Nov 25, 2025

Uh oh!

rysweet commented Nov 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(skills): Add agent-performance skill for metrics tracking #1627

Are you sure you want to change the base?

feat(skills): Add agent-performance skill for metrics tracking #1627

Uh oh!

Conversation

rysweet commented Nov 25, 2025

Summary

Part of Issue #1611 (Enhancement 2)

Files Added

How to Use

Uh oh!

rysweet commented Nov 25, 2025

Code Review for agent-performance Skill

Philosophy Compliance: 8/10

Strengths

Issues Found

Recommendations

Philosophy Scores

Verdict

Uh oh!

rysweet commented Nov 25, 2025

Quality Improvement Update

Improvements Made

Quality Assessment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants