Version: 0.1.0-draft Date: 2026-03-06
WorkCenter is the operations layer for Claude Code agent teams. It provides lifecycle management, state persistence, real-time observability, and operational controls for multi-agent software development workflows running on self-hosted Kubernetes infrastructure.
WorkCenter does not replace Claude Code. It wraps the existing agent team
primitives — claude --print, CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1,
and the ~/.claude/ state directory — with the tooling needed to run them
reliably at scale on bare-metal clusters.
This section documents what Claude Code provides natively (as of v2.1.x). WorkCenter builds on top of these primitives. Understanding the boundary is critical.
| Primitive | Mechanism | Notes |
|---|---|---|
| Spawn agent | claude --print with --output-format stream-json |
Headless. Env var CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 enables team tools. |
| Team creation | TeamCreate tool (internal) |
Creates ~/.claude/teams/{name}/config.json. |
| Team deletion | TeamDelete tool (internal) |
Removes team state. |
| Agent join | Agent tool with team_name and name params |
Registers member in config.json members[]. |
| Agent shutdown | SendMessage with type: shutdown_request |
Recipient responds with shutdown_response (approve/reject). |
| Nudge (message) | SendMessage with type: message |
Delivered to ~/.claude/teams/{team}/inboxes/{agent}.json. |
| Broadcast | SendMessage with type: broadcast |
Sends to all team members. Expensive (N deliveries). |
| Primitive | Tool | Notes |
|---|---|---|
| Create task | TaskCreate |
Fields: subject, description, owner, blocks, blockedBy. |
| Update task | TaskUpdate |
Status: pending -> in_progress -> completed (or deleted). Supports activeForm, metadata, dependency edges. |
| List tasks | TaskList |
Returns summary: id, subject, status, owner, blockedBy. |
| Get task | TaskGet |
Returns full task including description, blocks, blockedBy. |
Tasks are stored at ~/.claude/tasks/{team_or_session_id}/{id}.json.
Team config — ~/.claude/teams/{name}/config.json:
{
"name": "string",
"description": "string",
"createdAt": 1772804253786,
"leadAgentId": "team-lead@{name}",
"leadSessionId": "uuid",
"members": [
{
"agentId": "{name}@{team}",
"name": "string",
"agentType": "team-lead | general-purpose",
"model": "claude-opus-4-6",
"prompt": "string (full system prompt)",
"color": "string",
"planModeRequired": false,
"joinedAt": 1772804304872,
"tmuxPaneId": "string",
"cwd": "/absolute/path",
"subscriptions": [],
"backendType": "in-process"
}
]
}Task — ~/.claude/tasks/{team}/{id}.json:
{
"id": "string",
"subject": "string",
"description": "string",
"activeForm": "string (present-continuous for spinner)",
"status": "pending | in_progress | completed | deleted",
"blocks": ["task_id", ...],
"blockedBy": ["task_id", ...],
"owner": "agent_name",
"metadata": {}
}Inbox — ~/.claude/teams/{team}/inboxes/{agent}.json:
[
{
"from": "agent_name",
"text": "string (may contain JSON-encoded structured messages)",
"timestamp": "ISO8601",
"color": "string",
"read": false
}
]- No persistent daemon — each
claude --printinvocation is ephemeral - No process supervision or restart-on-failure
- No centralized event stream — state changes are file mutations
- No web UI or HTTP API
- No multi-node coordination — state directory is local to one filesystem
- No access controls — any process with filesystem access can read/write state
- No cost tracking or audit logging
- No retention or cleanup policies
These gaps define WorkCenter's scope.
┌─────────────────────────────────────────────────────────────┐
│ Dashboard UI │
│ (embedded static frontend) │
└────────────────┬──────────────────────┬─────────────────────┘
│ HTTP/WS │ REST
┌────────────────▼──────────────────────▼─────────────────────┐
│ Dashboard Server │
│ (Go binary, port 8080) │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌────────────┐ │
│ │ REST API │ │ WS Hub │ │ Auth │ │ Static │ │
│ │ Handler │ │ │ │ Middleware│ │ Embed │ │
│ └────┬─────┘ └────┬─────┘ └──────────┘ └────────────┘ │
└───────┼──────────────┼──────────────────────────────────────┘
│ │
┌───────▼──────────────▼──────────────────────────────────────┐
│ Orchestrator │
│ ┌──────────────┐ ┌──────────────┐ ┌────────────────────┐ │
│ │ Agent │ │ State │ │ Event Bus │ │
│ │ Lifecycle │ │ Layer │ │ (fsnotify -> │ │
│ │ Manager │ │ │ │ typed events) │ │
│ └──────┬───────┘ └──────┬───────┘ └────────┬───────────┘ │
└─────────┼──────────────────┼───────────────────┼────────────┘
│ │ │
│ exec │ read/write │ watch
▼ ▼ ▼
┌──────────┐ ┌─────────────┐ ┌─────────────┐
│ claude │ │ ~/.claude/ │ │ ~/.claude/ │
│ --print │ │ teams/ │ │ tasks/ │
│ process │ │ tasks/ │ │ teams/ │
└──────────┘ └─────────────┘ └─────────────┘
│
┌──────▼──────┐
│ iSCSI PVC │
│ (persistent │
│ storage) │
└─────────────┘
Purpose: Manage Claude Code agent process lifecycle. The orchestrator is
the core daemon that spawns, monitors, and terminates claude --print
processes.
Runs as: Kubernetes Deployment (single replica, leader election for HA).
Responsibilities:
- Spawn: Execute
claude --print --output-format stream-jsonwithCLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1set. Pass team name, agent name, system prompt, model, and working directory as arguments. - Monitor: Track process health via stdout stream parsing. Detect crashes, hangs (no output within configurable timeout), and unexpected exits.
- Kill: Send SIGTERM to agent process. If no exit within grace period (default 10s), SIGKILL.
- Restart: Re-spawn agents that crash, with exponential backoff (1s, 2s, 4s, ..., max 60s). Configurable per-agent restart policy (always, on-failure, never).
- Nudge: Write messages to agent inbox files or invoke
SendMessagevia a newclaude --printsession targeting the team.
Process table (in-memory, reconstructed from state on startup):
type AgentProcess struct {
TeamName string
AgentName string
PID int
Cmd *exec.Cmd
StartedAt time.Time
RestartCount int
Status AgentStatus // running, stopped, crashed, starting
StdoutPipe io.ReadCloser
}Key design decision: One orchestrator manages one ~/.claude/ state
directory. Multi-node requires shared filesystem (iSCSI/NFS), not
distributed consensus.
Purpose: Read and write Claude Code's native state files. This is a library, not a service.
Design principle: WorkCenter does not define its own state format. It
reads and writes the same JSON files that Claude Code uses natively. This
means a running WorkCenter instance and a manual claude session can
coexist on the same state directory.
Operations:
| Operation | File | Method |
|---|---|---|
| List teams | ~/.claude/teams/*/config.json |
Glob + parse |
| Get team | ~/.claude/teams/{name}/config.json |
Read + parse |
| Create team | ~/.claude/teams/{name}/config.json |
Write (atomic via rename) |
| List tasks | ~/.claude/tasks/{team}/*.json |
Glob + parse |
| Get task | ~/.claude/tasks/{team}/{id}.json |
Read + parse |
| Update task | ~/.claude/tasks/{team}/{id}.json |
Read-modify-write (file lock) |
| Read inbox | ~/.claude/teams/{team}/inboxes/{agent}.json |
Read + parse |
| Write inbox | ~/.claude/teams/{team}/inboxes/{agent}.json |
Append (file lock) |
Concurrency: File-level locking via flock(2). The state layer must
handle concurrent access from both the orchestrator and claude processes.
State directory: Configurable via WORKCENTER_STATE_DIR env var.
Default: ~/.claude/.
type StateLayer struct {
BaseDir string // e.g., /home/workcenter/.claude
}
func (s *StateLayer) ListTeams() ([]TeamConfig, error)
func (s *StateLayer) GetTeam(name string) (*TeamConfig, error)
func (s *StateLayer) ListTasks(team string) ([]Task, error)
func (s *StateLayer) GetTask(team, id string) (*Task, error)
func (s *StateLayer) UpdateTask(team, id string, fn func(*Task)) error
func (s *StateLayer) ReadInbox(team, agent string) ([]InboxMessage, error)
func (s *StateLayer) WriteInbox(team, agent string, msg InboxMessage) errorPurpose: Convert filesystem mutations into typed, ordered events. Decouple state changes from consumers (dashboard, CLI, webhooks).
Implementation: fsnotify watches the ~/.claude/ directory tree.
On each file write/create/delete, the event bus:
- Identifies the file type (team config, task, inbox) from the path.
- Reads the new file contents.
- Diffs against cached previous state.
- Emits a typed event.
Event types:
type EventType string
const (
EventTeamCreated EventType = "team_created"
EventTeamUpdated EventType = "team_updated"
EventTeamDeleted EventType = "team_deleted"
EventTaskCreated EventType = "task_created"
EventTaskUpdated EventType = "task_updated"
EventTaskDeleted EventType = "task_deleted"
EventAgentJoined EventType = "agent_joined"
EventAgentLeft EventType = "agent_left"
EventAgentStatus EventType = "agent_status"
EventMessage EventType = "message"
)
type Event struct {
Type EventType `json:"type"`
Team string `json:"team"`
Agent string `json:"agent,omitempty"`
TaskID string `json:"taskId,omitempty"`
Timestamp time.Time `json:"timestamp"`
Payload json.RawMessage `json:"payload"`
}Delivery: Fan-out to registered subscribers. Each WebSocket connection
and each CLI --follow session is a subscriber. Events are not persisted
by the bus — they are derived from state file changes which are already
persistent.
Debouncing: File writes may trigger multiple fsnotify events. The bus debounces with a 50ms window per file path before emitting.
Purpose: Web UI for operational visibility and control. Single Go binary that serves both the REST API and embedded static frontend.
Architecture: The dashboard server is a Go HTTP server. The frontend is
a static SPA (HTML + JS + CSS) embedded in the binary via embed.FS. No
Node.js build step in production. No framework heavier than vanilla JS or
a minimal reactive library (e.g., Preact).
Port: 8080 (configurable via WORKCENTER_PORT).
Pages:
| Route | View | Description |
|---|---|---|
/ |
Team list | All teams with member count, task progress |
/teams/{name} |
Team detail | Agent status cards, task board, message log |
/teams/{name}/tasks |
Task graph | DAG visualization of task dependencies |
/teams/{name}/messages |
Message log | Chronological message stream |
Real-time updates: WebSocket connection at /ws receives events from
the event bus. The frontend applies events to local state without polling.
Controls (via REST API, rendered as buttons/forms in UI):
- Spawn agent into team
- Kill agent
- Send nudge message to agent
- Create/update/delete task
- Create/delete team
Purpose: Terminal-native interface to the same REST API the dashboard
uses. For operators who prefer kubectl-style workflows.
Binary: workcenter (same module, different cmd/ entrypoint).
Commands:
workcenter team list
workcenter team get <name>
workcenter team create <name> [--description <desc>]
workcenter team delete <name>
workcenter agent spawn <team> <name> [--model <model>] [--prompt <prompt>]
workcenter agent list <team>
workcenter agent kill <team> <name>
workcenter agent nudge <team> <name> <message>
workcenter task list <team>
workcenter task get <team> <id>
workcenter task create <team> --subject <s> [--description <d>] [--owner <o>]
workcenter task update <team> <id> [--status <s>] [--owner <o>]
workcenter logs <team> [--follow]
workcenter status
Output formats: --output text (default, human-readable tables),
--output json, --output yaml.
Connection: --server <url> (default: http://localhost:8080).
Purpose: Go library for building custom orchestration workflows. Exposes WorkCenter primitives as importable functions.
Package: github.com/workcenter/workcenter/pkg/sdk
Core types:
// Client connects to a WorkCenter server.
type Client struct { /* ... */ }
func NewClient(serverURL string, opts ...Option) *Client
func (c *Client) SpawnAgent(ctx context.Context, req SpawnRequest) (*Agent, error)
func (c *Client) KillAgent(ctx context.Context, team, name string) error
func (c *Client) NudgeAgent(ctx context.Context, team, name, msg string) error
func (c *Client) ListTeams(ctx context.Context) ([]Team, error)
func (c *Client) ListTasks(ctx context.Context, team string) ([]Task, error)
func (c *Client) Subscribe(ctx context.Context) (<-chan Event, error)Use cases:
- Custom CI/CD pipelines that spawn review teams
- Scheduled batch workflows (nightly refactoring runs)
- Integration with external systems (Slack bot, GitHub webhook)
Base path: /api/v1
| Method | Path | Description | Request Body | Response |
|---|---|---|---|---|
GET |
/health |
Health check | — | {"status":"ok"} |
GET |
/teams |
List teams | — | [TeamSummary] |
POST |
/teams |
Create team | {name, description} |
TeamConfig |
GET |
/teams/{name} |
Get team | — | TeamConfig |
DELETE |
/teams/{name} |
Delete team | — | 204 |
GET |
/teams/{name}/agents |
List agents | — | [AgentStatus] |
POST |
/teams/{name}/agents |
Spawn agent | {name, model, prompt, cwd} |
AgentStatus |
DELETE |
/teams/{name}/agents/{agent} |
Kill agent | — | 204 |
POST |
/teams/{name}/agents/{agent}/nudge |
Send message | {message} |
202 |
GET |
/teams/{name}/tasks |
List tasks | — | [Task] |
POST |
/teams/{name}/tasks |
Create task | {subject, description, owner, blocks, blockedBy} |
Task |
GET |
/teams/{name}/tasks/{id} |
Get task | — | Task |
PATCH |
/teams/{name}/tasks/{id} |
Update task | {status, owner, ...} |
Task |
DELETE |
/teams/{name}/tasks/{id} |
Delete task | — | 204 |
GET |
/teams/{name}/messages |
Get messages | ?agent=<name> |
[InboxMessage] |
All endpoints return JSON. Errors use standard HTTP status codes with
{"error": "message"} body.
Endpoint: /ws
Connection: Client connects and optionally sends a subscribe message:
{"type": "subscribe", "teams": ["workcenter"]}If no subscribe message is sent, the client receives events for all teams.
Server -> Client messages (Event objects as defined in Section 2.3):
{
"type": "task_updated",
"team": "workcenter",
"taskId": "1",
"timestamp": "2026-03-06T13:38:35.786Z",
"payload": {
"id": "1",
"subject": "Write ARCHITECTURE.md",
"status": "in_progress",
"owner": "architect"
}
}{
"type": "agent_status",
"team": "workcenter",
"agent": "architect",
"timestamp": "2026-03-06T13:38:35.786Z",
"payload": {
"name": "architect",
"status": "running",
"pid": 12345,
"startedAt": "2026-03-06T13:38:00.000Z",
"restartCount": 0
}
}Client -> Server messages:
| Type | Fields | Description |
|---|---|---|
subscribe |
teams: []string |
Filter events to specific teams |
ping |
— | Keepalive (server responds with pong) |
Keepalive: Server sends ping every 30s. Client must respond with
pong within 10s or connection is closed.
- Architecture: ARM64 (Raspberry Pi 4/5, CM3588+, similar SBCs)
- Kubernetes: K3s or standard kubeadm clusters
- Storage: iSCSI StorageClass for persistent state (ReadWriteOnce)
- Registry: Self-hosted Forgejo container registry (insecure HTTP)
- GitOps: ArgoCD for declarative deployment
- External access: Cloudflare Tunnel (no ingress controller required)
# Namespace
apiVersion: v1
kind: Namespace
metadata:
name: workcenter
---
# PVC for ~/.claude/ state
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: workcenter-state
namespace: workcenter
spec:
storageClassName: iscsi
accessModes: [ReadWriteOnce]
resources:
requests:
storage: 10Gi
---
# Dashboard + Orchestrator Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: workcenter
namespace: workcenter
spec:
replicas: 1
selector:
matchLabels:
app: workcenter
template:
metadata:
labels:
app: workcenter
spec:
containers:
- name: workcenter
image: 192.168.8.197:30080/tim/workcenter:latest
ports:
- containerPort: 8080
env:
- name: WORKCENTER_STATE_DIR
value: /state
- name: CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS
value: "1"
- name: ANTHROPIC_API_KEY
valueFrom:
secretKeyRef:
name: workcenter-secrets
key: anthropic-api-key
volumeMounts:
- name: state
mountPath: /state
volumes:
- name: state
persistentVolumeClaim:
claimName: workcenter-state
---
# Service (NodePort)
apiVersion: v1
kind: Service
metadata:
name: workcenter
namespace: workcenter
spec:
type: NodePort
selector:
app: workcenter
ports:
- port: 8080
targetPort: 8080
nodePort: 30800Multi-stage build. Final image is distroless (no shell, no package manager).
The claude CLI must be installed in the image for the orchestrator to
spawn agents.
FROM golang:1.22-alpine AS build
WORKDIR /src
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux GOARCH=arm64 go build -o /out/workcenter ./cmd/workcenter
RUN CGO_ENABLED=0 GOOS=linux GOARCH=arm64 go build -o /out/dashboard ./cmd/dashboard
FROM node:22-slim AS claude
RUN npm install -g @anthropic-ai/claude-code
FROM gcr.io/distroless/static:nonroot
COPY --from=build /out/dashboard /dashboard
COPY --from=build /out/workcenter /workcenter
COPY --from=claude /usr/local/lib/node_modules /usr/local/lib/node_modules
COPY --from=claude /usr/local/bin/node /usr/local/bin/node
COPY --from=claude /usr/local/bin/claude /usr/local/bin/claude
ENV PATH="/usr/local/bin:$PATH"
EXPOSE 8080
ENTRYPOINT ["/dashboard"]Note: The claude CLI requires Node.js. The distroless base is augmented with node and the claude package from a separate build stage. This adds ~100MB to image size but avoids requiring a full OS layer.
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: workcenter
namespace: argocd
spec:
project: default
source:
repoURL: http://192.168.8.197:30080/tim/workcenter.git
targetRevision: HEAD
path: k8s
destination:
server: https://kubernetes.default.svc
namespace: workcenter
syncPolicy:
automated:
prune: true
selfHeal: trueThe MVP assumes a single trusted operator with full access. No authentication or authorization on the REST API or WebSocket.
API key management: The Anthropic API key is stored as a Kubernetes Secret and injected via environment variable. It is never exposed through the REST API or dashboard.
Network boundary: The dashboard listens on a ClusterIP or NodePort. External access (if desired) goes through Cloudflare Tunnel with Cloudflare Access for authentication.
| Feature | Implementation |
|---|---|
| Authentication | OIDC/SSO via reverse proxy (e.g., oauth2-proxy) |
| Authorization | RBAC: roles per team (admin, operator, viewer) |
| Audit log | Append-only log of all API mutations with actor, timestamp, diff |
| Secret rotation | API key rotation without agent restart |
| Network policy | K8s NetworkPolicy restricting pod-to-pod traffic |
| Component | Deliverable |
|---|---|
| Orchestrator | Spawn, kill, restart agents. Process monitoring. |
| State Layer | Read/write Claude Code native state files. |
| Event Bus | fsnotify watcher with typed events. |
| Dashboard | Web UI with team list, agent status, task board, message log. Spawn/kill/nudge controls. |
| CLI | workcenter command with team/agent/task subcommands. |
| K8s manifests | Deployment, Service, PVC, Namespace. |
| Container image | Multi-stage Dockerfile for ARM64. |
| Feature | Rationale |
|---|---|
| RBAC | Single-tenant MVP needs no access control. |
| Audit logging | Adds write amplification; unnecessary for solo operators. |
| SSO/OIDC | Cloudflare Access covers single-user external auth. |
| Multi-cluster federation | Requires distributed state; out of scope for v1. |
| Cost tracking | Requires API response parsing for token counts; deferred. |
| Retention policies | Manual cleanup is acceptable for MVP. |
| Agent SDK | Library release after API stabilizes. |
User (Dashboard/CLI)
│
▼
REST API: POST /api/v1/teams/{name}/agents
│
▼
Orchestrator.SpawnAgent()
├── Write team config (add member to config.json)
├── exec: claude --print --output-format stream-json \
│ --system-prompt "..." --model claude-opus-4-6
│ (env: CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1)
├── Start goroutine: read stdout stream, detect crashes
└── Update process table
│
▼
Event Bus detects config.json change
│
▼
WebSocket: {"type": "agent_joined", "team": "...", "agent": "..."}
│
▼
Dashboard UI updates agent card
Claude agent writes ~/.claude/tasks/{team}/{id}.json
│
▼
Event Bus (fsnotify) detects file write
├── Read new file contents
├── Diff against cached state
└── Emit: {"type": "task_updated", ...}
│
▼
WebSocket subscribers receive event
│
▼
Dashboard UI updates task board
User (Dashboard/CLI)
│
▼
REST API: POST /api/v1/teams/{name}/agents/{agent}/nudge
│
▼
State Layer: append message to
~/.claude/teams/{team}/inboxes/{agent}.json
│
▼
Claude agent's idle notification picks up new inbox message
│
▼
Agent processes the nudge and resumes work
| # | Question | Impact | Default Assumption |
|---|---|---|---|
| 1 | Should the orchestrator and dashboard be one binary or two? | Deployment complexity vs. separation of concerns | One binary, two cmd/ entrypoints, can run as single process with subcommand |
| 2 | How to handle claude CLI versioning inside the container? |
Breaking changes in state format | Pin to specific npm version in Dockerfile |
| 3 | Is iSCSI ReadWriteOnce sufficient or do we need ReadWriteMany? | HA/multi-replica | RWO (single replica orchestrator) for MVP |
| 4 | Should the event bus persist events for replay? | Late-joining dashboard clients miss events | No — clients do full state read on connect, then apply events |
| 5 | How to authenticate the claude CLI inside the container? |
API key injection | ANTHROPIC_API_KEY env var from K8s Secret |
| 6 | Should WorkCenter manage CLAUDE.md and project settings? |
Reproducibility | Deferred — pass via --system-prompt for now |
| Term | Definition |
|---|---|
| Agent | A running claude --print process that is a member of a team. |
| Team | A named group of agents with shared state directory, task list, and message inboxes. |
| Task | A unit of work with subject, description, status, owner, and dependency edges. |
| Nudge | A message sent to an agent's inbox to prompt action. |
| State directory | The ~/.claude/ filesystem tree containing team configs, tasks, and inboxes. |
| Event | A typed notification derived from a state file mutation. |
| Orchestrator | The WorkCenter daemon that manages agent process lifecycle. |
| Dashboard | The web UI and HTTP server for operational control. |