Last updated: 2026-03-29
- What: Contributor guide covering setup, PR conventions, agent creation workflow, and skill authoring.
- Why: Essential for external contributors to understand how to add agents or improve existing skills.
- Effort: S
- Status: Done
- What: Replace GitHub template with real vulnerability reporting process.
- Why: Current file has placeholder version numbers and no actual policy.
- Effort: S
- Status: Done
- What: Retroactive changelog from git history, then maintain going forward.
- Why: Contributors and users need to understand what changed between versions.
- Effort: S
- Status: Done
- What: Issue templates (bug report, feature request, new agent proposal), PR template, and funding config.
- Why: Standardizes contributions and makes it easy for new contributors to participate.
- Effort: S
- Status: Done
- What: GitHub Actions workflow — lint (Trunk), validate plugin structure, test.
- Why: PRs need automated quality gates before merge. Currently all checks are manual.
- Effort: M
- Status: Todo
- What: Skill or hook that verifies dependencies (Python version, uv/pip) on install.
- Why: Prevents wasted time when users hit missing tool errors on first run.
- Effort: S
- Status: Todo
- What: After install, optionally verify agent loading and print a summary.
- Why: Turns "hope it works" into "confirmed working" on first install.
- Effort: S
- Depends on: Prerequisite checker
- What: Framework for testing skills with sample inputs and expected behavior patterns.
- Why: As skill count grows (64+), manual testing doesn't scale. Need automated regression.
- Effort: M
- What: Tests that verify each agent loads, responds to basic prompts, and uses correct tools.
- Why: Ensures refactors don't silently break agent behavior.
- Effort: M
- What: Lint-like tool that scores skill definitions for completeness (frontmatter, workflow steps, anti-patterns, output format).
- Why: Maintains consistent quality as community contributes new skills.
- Effort: M
- What: Recorded examples (asciicast or markdown) showing each agent handling a real task.
- Why: Contributors need to see what "good" looks like. Users need to see what's possible.
- Effort: L
- What: Rich metadata (screenshots, categories, compatibility tags) for marketplace listing.
- Why: Better discovery when Claude Code plugin ecosystem grows.
- Effort: S
- What: Shared context mechanism so agents can pass findings to each other without Apex relaying everything.
- Why: System takeover and multi-agent workflows lose context in handoffs today.
- Effort: L
- What: Interactive scaffolding (like
create-react-appbut for agents) that walks through naming, domain, skills. - Why: Lowers the barrier for community contributions beyond the current manual template copy.
- Effort: M
- What: Declarative way to define multi-agent workflows (e.g., "run Warden audit, then Relay hardening, then Vigil instrumentation").
- Why: Currently only Apex can orchestrate. Users should be able to define custom pipelines.
- Effort: L
- Documentation site (Astro/Starlight) with per-agent pages
- Agent performance benchmarks (token usage, task completion rates)
- Specialized sub-agents within teams (e.g., Forge could have Terraform vs Pulumi specialists)
- Localization of agent prompts for non-English teams
- VS Code extension for agent selection UI
- Initial release with 13 agents and 64 skills
- Plugin-based architecture
- Marketplace registration
- Removed legacy pip CLI
- Bundled all agents in root plugin for single install
- Flattened skills to root for discoverability
- Removed legacy Cloud Run agent from Forge
- Refined tagline and added "Why This Exists" section