|
| 1 | +# Workflow Automation Platform Technical Specification |
| 2 | + |
| 3 | +## 1. Background and Goals |
| 4 | + |
| 5 | +The current proof of concept orchestrates AI-assisted workflows through ad-hoc `just` |
| 6 | +recipes (`agents.just`) and a Bash helper script (`scripts/agent-workspace.sh`) that |
| 7 | +manages per-run Jujutsu workspaces outside the repository tree. |
| 8 | +To deliver a production-grade product we must transform this tooling into a cohesive, |
| 9 | +extensible Rust platform that supports: |
| 10 | + |
| 11 | +- User-authored workflows with first-class configuration, validation, and reuse. |
| 12 | +- Publishing, discovering, and pulling shared workflows from a central registry. |
| 13 | +- Safe parallel execution of multiple workflows using isolated workspaces. |
| 14 | +- Operational observability, access control, and lifecycle management expected from a |
| 15 | + real product. |
| 16 | + |
| 17 | +## 2. Product Requirements |
| 18 | + |
| 19 | +### 2.1 Functional |
| 20 | + |
| 21 | +1. **Workflow authoring** – Users can define workflows locally using a declarative file |
| 22 | + format (`workflow.toml`), including parameterized steps, dependencies, conditional |
| 23 | + execution, and reusable actions. |
| 24 | +2. **Execution** – Users can run workflows locally via CLI, passing parameters and |
| 25 | + environment overrides. Runs use isolated workspaces derived from a Jujutsu change or |
| 26 | + repository head. |
| 27 | +3. **Parallelization** – Users can queue or execute multiple workflows concurrently with |
| 28 | + per-run resource isolation and status tracking. |
| 29 | +4. **Publishing** – Users can publish workflows (including metadata, version, license, |
| 30 | + and documentation) to a central registry service after authentication. |
| 31 | +5. **Discovery & Fetching** – Users can search the registry, view metadata, download a |
| 32 | + workflow into their local cache, and upgrade to new versions. |
| 33 | +6. **Introspection** – CLI and APIs expose run status, logs, workspace locations, and |
| 34 | + artifacts. Users can attach interactive shells to running workspaces. |
| 35 | +7. **Lifecycle management** – Users can clean up workspaces, cancel runs, and configure |
| 36 | + retention policies for cached tooling and artifacts. |
| 37 | + |
| 38 | +### 2.2 Non-Functional |
| 39 | + |
| 40 | +- **Language** – Rust 2021 edition across all crates. |
| 41 | +- **Portability** – Linux and macOS support (Windows optional, planned). |
| 42 | +- **Security** – Signed workflow bundles, authenticated registry access, sandboxed step |
| 43 | + execution with configurable allow-lists. |
| 44 | +- **Scalability** – Scheduler supports dozens of concurrent workflows on a single node; |
| 45 | + registry handles thousands of workflows and metadata queries. |
| 46 | +- **Observability** – Structured logging, OpenTelemetry tracing, and metrics export. |
| 47 | +- **Extensibility** – Pluggable actions and custom step types without recompiling the |
| 48 | + core runtime. |
| 49 | + |
| 50 | +## 3. System Architecture Overview |
| 51 | + |
| 52 | +The platform is organized into cooperating Rust crates/services: |
| 53 | + |
| 54 | +- `workflow-core` – Parsing, validation, and planning for workflow definitions. |
| 55 | +- `workspace-manager` – Rust port of workspace provisioning, replacing |
| 56 | + `agent-workspace.sh` while preserving the features documented in the workspace design |
| 57 | + doc. |
| 58 | +- `workflow-executor` – Asynchronous engine that schedules workflow graphs, creates |
| 59 | + workspaces, executes steps, and collects results. |
| 60 | +- `workflow-cli` – End-user interface built with `clap`, orchestrating local commands. |
| 61 | +- `workflow-registry-server` – Central service exposing REST/JSON APIs for workflow |
| 62 | + publish, fetch, and discovery. |
| 63 | +- `workflow-registry-client` – Shared library used by CLI/executor to interact with the |
| 64 | + registry, manage auth tokens, and cache bundles. |
| 65 | +- `workflowd` (optional) – Long-running daemon that the CLI can delegate to for |
| 66 | + background execution and concurrency control on shared machines. |
| 67 | + |
| 68 | +Core data flow: |
| 69 | + |
| 70 | +1. User invokes `workflow run <workflow-id>` (local or fetched). CLI loads definition via |
| 71 | + `workflow-core`, resolves dependencies, and submits run to `workflowd` or local |
| 72 | + executor. |
| 73 | +2. Executor requests workspace allocation from `workspace-manager`, which creates or |
| 74 | + reuses a workspace, copies tooling, and returns metadata. |
| 75 | +3. Executor schedules steps according to DAG dependencies, running actions (shell, |
| 76 | + built-in Rust code, or plugin) inside the workspace with environment variables |
| 77 | + matching the current proof of concept. |
| 78 | +4. Run state, logs, and artifacts stream to local storage; optional upload to registry or |
| 79 | + external artifact store. |
| 80 | +5. CLI polls or receives events to display progress. Upon completion, metadata is |
| 81 | + persisted; optional cleanup occurs per policy. |
| 82 | + |
| 83 | +## 4. Component Specifications |
| 84 | + |
| 85 | +### 4.1 Workflow Definition Format (`workflow.toml`) |
| 86 | + |
| 87 | +- **Schema** |
| 88 | + - `id` (string, semantic version optional) and `name`. |
| 89 | + - `description`, `tags`, `license`, `homepage`. |
| 90 | + - `parameters` (name, type, default, required, validation regex). |
| 91 | + - `artifacts` declarations (path patterns, retention policy). |
| 92 | + - `steps`: map from step id → struct with `uses` (action reference), `inputs`, |
| 93 | + `env`, `run` (command), `needs` (dependencies), `when` (expression), and |
| 94 | + `workspace` (inherit, ephemeral, or custom path). |
| 95 | + - `actions`: reusable step templates referencing built-in adapters or external |
| 96 | + binaries. |
| 97 | + - `requirements`: toolchain prerequisites (e.g., nix profile, docker image, python |
| 98 | + packages) for validation before run. |
| 99 | +- **Parser** – Implemented with `serde` + `toml`. Provide JSON schema export for editor |
| 100 | + tooling. |
| 101 | +- **Validation** – Ensure DAG acyclicity, parameter resolution, and compatibility with |
| 102 | + workspace policies. Emit actionable diagnostics. |
| 103 | +- **Extensibility** – Support plugin-defined parameter types and validators via dynamic |
| 104 | + registration. |
| 105 | + |
| 106 | +### 4.2 `workflow-core` Crate |
| 107 | + |
| 108 | +- Modules: |
| 109 | + - `model` – Rust structs representing workflows, steps, actions, parameters. |
| 110 | + - `parser` – Functions to load from file/URL, merge overrides, and surface line/column |
| 111 | + errors. |
| 112 | + - `validator` – Graph validation, parameter type checking, requirement resolution. |
| 113 | + - `planner` – Convert workflow definitions + runtime parameters into executable DAG |
| 114 | + plans with resolved command strings and environment. |
| 115 | +- Exposes stable API consumed by CLI, executor, and registry server. |
| 116 | +- Provides `serde` serialization for storing compiled plans in the registry. |
| 117 | +- Includes unit tests covering parsing edge cases, invalid graphs, and parameter |
| 118 | + substitution. |
| 119 | + |
| 120 | +### 4.3 `workspace-manager` Crate |
| 121 | + |
| 122 | +- Reimplements responsibilities of `agent-workspace.sh` in Rust: |
| 123 | + - Manage workspace root discovery, hashed repo namespace, and metadata persistence in |
| 124 | + `.agent-tools/.agent-workflow.json`. |
| 125 | + - Provide APIs: `ensure_workspace(id, base_change, direnv_policy)`, |
| 126 | + `update_status(status, workflow, command)`, `cleanup(id)`, `sync_tools(id)`. |
| 127 | + - Copy automation bundles (`agents.just`, `scripts/`, `rules/` by default) into |
| 128 | + `.agent-tools/`, hashing contents to avoid redundant copies. |
| 129 | + - Execute commands via `direnv exec` when available; fall back to plain execution if |
| 130 | + disabled. |
| 131 | + - Emit structured events for workspace lifecycle (created, reused, direnv allowed, |
| 132 | + tooling hash, cleanup). |
| 133 | +- Implementation details: |
| 134 | + - Use `tokio::process` for subprocess management. |
| 135 | + - Use `serde_json` for metadata file compatibility with current schema. |
| 136 | + - Provide CLI subcommands reused by `workflow-cli` for manual inspection. |
| 137 | + |
| 138 | +### 4.4 `workflow-executor` Crate |
| 139 | + |
| 140 | +- Built atop `tokio` runtime with cooperative scheduling. |
| 141 | +- Responsibilities: |
| 142 | + - Accept execution plans from `workflow-core`. |
| 143 | + - Allocate workspaces per workflow run or per-step when `workspace = "ephemeral"`. |
| 144 | + - Manage concurrency using per-run DAG scheduler; configurable max parallel steps. |
| 145 | + - Execute actions: |
| 146 | + - **Shell command**: spawn process with inherited/captured stdio; enforce timeouts |
| 147 | + and environment. |
| 148 | + - **Built-in adapters**: native Rust functions implementing actions like `jj diff` |
| 149 | + summarization. |
| 150 | + - **Plugins**: load dynamic libraries (`cdylib`) conforming to trait `ActionPlugin`. |
| 151 | + - Collect logs, exit codes, produced artifacts; stream to observers. |
| 152 | + - Handle cancellation, retries, backoff, and failure propagation (fail-fast or |
| 153 | + continue modes per step). |
| 154 | +- Provides event stream (`RunEvent`) consumed by CLI/daemon for status updates. |
| 155 | +- Maintains run metadata store (SQLite via `sqlx` or `rusqlite`) capturing history and |
| 156 | + enabling queries. |
| 157 | + |
| 158 | +### 4.5 `workflow-cli` Crate |
| 159 | + |
| 160 | +Commands (subset): |
| 161 | + |
| 162 | +- `workflow init` – Scaffold new `workflow.toml` with templates. |
| 163 | +- `workflow validate [file]` – Run parser and validator. |
| 164 | +- `workflow run <id|path> [--param key=value] [--workspace-id ...] [--parallel N]` – |
| 165 | + Execute workflows, optionally delegating to daemon. |
| 166 | +- `workflow status [run-id]` – Show active runs, including workspace paths and metadata. |
| 167 | +- `workflow logs <run-id>` – Stream logs and step outputs. |
| 168 | +- `workflow workspace <list|show|shell|clean>` – User-facing wrappers around |
| 169 | + `workspace-manager` operations. |
| 170 | +- `workflow publish <path> [--registry]` – Package and upload to registry. |
| 171 | +- `workflow fetch <workflow-ref>` – Download to local cache. |
| 172 | +- `workflow registry login` – Acquire/store auth token securely (OS keyring). |
| 173 | + |
| 174 | +Implementation notes: |
| 175 | + |
| 176 | +- Built with `clap` derive, asynchronous commands using `tokio`. |
| 177 | +- CLI communicates with daemon via Unix domain socket/gRPC (tonic) when running in |
| 178 | + background mode; falls back to in-process executor. |
| 179 | +- Provides colored terminal UI (indicatif) for progress bars and summary tables. |
| 180 | + |
| 181 | +### 4.6 `workflow-registry-server` |
| 182 | + |
| 183 | +- Rust service built with `axum` + `tower`. |
| 184 | +- Stores workflow bundles (TOML + optional assets) and metadata in PostgreSQL or SQLite. |
| 185 | +- REST API endpoints: |
| 186 | + - `POST /v1/workflows` – Publish new version (requires auth, accepts signed tarball). |
| 187 | + - `GET /v1/workflows` – Search by tag, owner, text. |
| 188 | + - `GET /v1/workflows/{id}` – Fetch metadata and available versions. |
| 189 | + - `GET /v1/workflows/{id}/{version}/download` – Stream bundle. |
| 190 | + - `PUT /v1/workflows/{id}/{version}/deprecate` – Mark version as deprecated. |
| 191 | + - `GET /v1/tags` – Enumerate tags/categories. |
| 192 | +- Authentication via OAuth2 access tokens or PAT; integrate with identity provider. |
| 193 | +- Supports content-addressed storage (CAS) for deduplicated bundles (S3-compatible |
| 194 | + backend optional). |
| 195 | +- Provides audit logs and signed metadata (Ed25519). Server verifies bundle signature |
| 196 | + and publishes signature chain for clients. |
| 197 | + |
| 198 | +### 4.7 `workflow-registry-client` |
| 199 | + |
| 200 | +- Shared crate handling: |
| 201 | + - Auth token storage and refresh. |
| 202 | + - HTTP client (reqwest) with retry/backoff, TLS pinning optional. |
| 203 | + - Local cache of downloaded bundles under `$XDG_CACHE_HOME/workflows/<id>/<version>`. |
| 204 | + - Signature verification before unpacking. |
| 205 | + - Integration with CLI/executor to auto-update cached workflows. |
| 206 | + |
| 207 | +### 4.8 `workflowd` Daemon (Optional but recommended) |
| 208 | + |
| 209 | +- Runs locally as background service; manages queue of workflow runs and enforces |
| 210 | + concurrency limits. |
| 211 | +- Exposes control API over gRPC/Unix socket: submit run, stream events, cancel, list |
| 212 | + runs, attach shell (spawn using `workspace-manager`). |
| 213 | +- Persists state in local SQLite to survive restarts. |
| 214 | +- Implements cooperative scheduling across workflows, respecting per-user and global |
| 215 | + limits. |
| 216 | + |
| 217 | +### 4.9 Observability & Telemetry |
| 218 | + |
| 219 | +- Unified logging via `tracing` crate with JSON output option. |
| 220 | +- Emit OpenTelemetry spans for major operations (parsing, workspace allocation, step |
| 221 | + execution) with context propagation from CLI to daemon to registry calls. |
| 222 | +- Metrics (Prometheus exporter) for run success rates, queue depth, workspace lifecycle. |
| 223 | +- Artifact metadata includes checksums and retention metadata for cleaning policies. |
| 224 | + |
| 225 | +### 4.10 Packaging and Distribution |
| 226 | + |
| 227 | +- Provide `cargo` workspace with crates listed above; enable `--features daemon` etc. |
| 228 | +- Offer standalone binaries via `cargo dist` or `nix` flake integration. |
| 229 | +- Provide container image for registry server and optional `workflowd`. |
| 230 | +- Ensure integration with existing `just` recipes for compatibility during migration. |
| 231 | + |
| 232 | +## 5. Parallel Execution & Scheduling |
| 233 | + |
| 234 | +- Scheduler maintains run queue prioritized by submission time and priority class. |
| 235 | +- Per-run concurrency derived from workflow definition; defaults to sequential. |
| 236 | +- Implement resource leasing to avoid oversubscribing CPU/memory; allow configuration |
| 237 | + via CLI/daemon. |
| 238 | +- Guarantee workspace uniqueness per run; share read-only caches (e.g., tool bundles) |
| 239 | + to minimize duplication. |
| 240 | +- Provide cancellation tokens; steps respond promptly to interrupts. |
| 241 | + |
| 242 | +## 6. Security & Permissions |
| 243 | + |
| 244 | +- Workflow bundles signed with user-specific keys; registry verifies signatures. |
| 245 | +- CLI validates signatures and optionally enforces allow-list for publishers. |
| 246 | +- Sandboxed execution options: |
| 247 | + - Support running steps inside container runtimes (e.g., `nix develop`, `podman`). |
| 248 | + - Provide file access policies per workspace (readonly host repo except workspace |
| 249 | + copy). |
| 250 | +- Secrets management: CLI loads env secrets from OS keyring or `.env` with opt-in. |
| 251 | +- Auditing: persist run metadata (who ran what, when, with which workflow version). |
| 252 | + |
| 253 | +## 7. Testing and Quality Strategy |
| 254 | + |
| 255 | +- Unit tests in each crate; property tests for parser and planner. |
| 256 | +- Integration tests using temporary repositories and mocked registry server. |
| 257 | +- End-to-end tests executed via `cargo nextest` that run sample workflows through CLI |
| 258 | + and executor using fixture registry data. |
| 259 | +- Provide smoke-test command `workflow self-test` to validate installation. |
| 260 | + |
| 261 | +## 8. Migration from Proof of Concept |
| 262 | + |
| 263 | +1. Implement `workspace-manager` crate mirroring Bash functionality, validated against |
| 264 | + scenarios in `agent-workspace.sh` (run, status, shell, clean, sync-tools). |
| 265 | +2. Port representative workflows from `agents.just` into `workflow.toml` definitions to |
| 266 | + ensure feature parity (workspace-aware steps, iterative loops, interactive shells). |
| 267 | +3. Wrap existing `just` recipes to call new CLI for backward compatibility during |
| 268 | + transition period. |
| 269 | +4. Deprecate Bash script once Rust manager is stable; mark `agents.just` workflows as |
| 270 | + legacy and document migration path. |
| 271 | + |
| 272 | +## 9. Open Questions |
| 273 | + |
| 274 | +- Should the workflow format support embedded Python/Rust scripts, or require external |
| 275 | + files? |
| 276 | +- How do we support long-running interactive steps (e.g., human-in-the-loop) within the |
| 277 | + DAG while preserving resumability? |
| 278 | +- What identity provider(s) should the registry integrate with, and do we need |
| 279 | + fine-grained ACLs per workflow? |
| 280 | +- Do we require distributed execution (multi-machine) in the initial release, or is |
| 281 | + single-host parallelism sufficient? |
| 282 | +- How should artifact storage integrate with external systems (S3, OCI registries)? |
| 283 | + |
0 commit comments