Skip to content

Commit fb9c334

Browse files
committed
RunConfig as architectural boundary proposal
This change contains a design doc describing an architectural boundary at `RunConfig` level. Aim of this design doc is * listing the current responsibilities of `RunConfig` and related code (e.g. `Runner`) * propose a new split of responsibilities and a clear golang API for it * clarify what responsibilities should be moved in the respective user interfaces, e.g. CLI or Operator
1 parent d9498fe commit fb9c334

File tree

1 file changed

+81
-0
lines changed

1 file changed

+81
-0
lines changed
Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
# `RunConfig` as Architectural Boundary
2+
3+
## Problem Statement
4+
5+
As of the time of this writing, ToolHive has a few interfaces and as released as part of three user facing
6+
systems, namely
7+
8+
* **ToolHive CLI** `thv`
9+
* **ToolHive Operator**, which is used in both Kubernetes and OpenShift
10+
* **ToolHive UI**, which internally uses APIs exposed by `thv serve`
11+
12+
These interfaces differ in their execution environment and "quality of life" features that one might implement. For example, one obvious difference is the way configuration is accessed: CLI must both accept CLI options and read files on file system, while the Kubernetes Operator requires all parameters to be fully specified in the main CRD or into "linked" ones (e.g. `MCPToolConfig`). Another one is in config reload semantics, which must implemented in an ad-hoc fashion for CLI, while Kubernetes handles it as part of the life cycle of resources. Yet another example is the location of runtime configuration and state. The Kubernetes Operator relies on both being stored "inside the cluster", while the CLI and UI must both rely on file system, yet the configuration might be semantically equivalent. A final useful use case is exporting `RunConfig` so that the same workload can be "moved" to a different place or shipped as configuration. A similar use case is that implemented by the `thv restart` command, which fetches the serialized version of a run config to restart the workload when necessary.
13+
14+
These differences warrant the introduction of an architectural boundary where workloads are executed, which is currently specified via what we call a RunConfig.
15+
16+
## Goals
17+
18+
* make RunConfig the entrypoint for workloads execution
19+
* stabilize RunConfig so that new user or application interfaces can be built upon it
20+
* clearly specify responsibilities of code above and below this new boundary
21+
22+
## Responsibilities of a RunConfig
23+
24+
A `RunConfig` struct contains either information necessary for a OCI-compatible runner to run a workload, or alternatively the remote URL at which an already running MCP is reachable. In both scenarios, ToolHive aims to "wrap" the workload in a proxy to handle auth and gather telemetry.
25+
26+
Finally, `RunConfig` are currently serialized as JSON and used by `thv restart` command.
27+
28+
## Current Interface
29+
30+
Conceptually, a `RunConfig` contains three things
31+
* details on how to run or reach the Workload
32+
* configuration for the Proxy itself
33+
* Metadata like name pertaining both components that allow ToolHive to refer to them as one
34+
35+
**Metadata details** amount to
36+
* name
37+
* group
38+
* schema version
39+
* debug settings, common to both proxy and workload
40+
41+
**Workload details** for local Workloads amount to
42+
* OCI-compatible container config (image, its command arguments, container name, etc...)
43+
* desired workload name
44+
* host and port to expose
45+
* environment variables to set (literal or file-based)
46+
* secrets to set
47+
* volumes to mount
48+
* container labels
49+
* Kubernetes pod template patch (Kubernetes specific)
50+
* network isolation flag
51+
52+
While when the Workload is remote, details are
53+
* remote URL
54+
* auth configuration
55+
56+
**Proxy details** amount to
57+
* host and port to expose
58+
* workload transport type
59+
* permission profile (literal or file-based)
60+
* OIDC configuration parameters
61+
* authorization config (literal or file-based)
62+
* audit config (literal or file-based)
63+
* proxy headers trust flag
64+
* proxy mode (i.e. transport to expose)
65+
* CA bundle
66+
* JWKS token file
67+
* tools config
68+
* IgnoreConfig (?)
69+
* middleware configuration settings
70+
71+
## Responsibility Split
72+
73+
We propose the following split in responsibilities
74+
75+
**RunConfig** data structure and routines will be responsible for holding configuration parameters, basic validation, and serialization, but not storage. Simply put, the package should accept bytes and readers, and return bytes, similarly to how `encoding/json` works.
76+
77+
**CLI** and **Operator** will be responsible for mapping their respective representation of configuration parameters to the representation allowed by `RunConfig`. Specifically, no file-based representation of configuration parameters is allowed in the `RunConfig` struct.
78+
79+
Consequences of changes to configuration parameters must be managed outside the `RunConfig` code. For example, configuration reload for the CLI must be managed within CLI commands, and not within the `Runner` or `RunConfig`. That said, `Runner`s can implement behaviors specific to their execution environment, but they must not rely on references to the "outside world" being stored in the `RunConfig`.
80+
81+
Types exposed by `RunConfig` package should not be used for externally facing formats like HTTP API if not for trivial cases. In case diverging becomes necessary, said types won't be modified (to avoid breaking changes) and CLI/Operator must expose their own type that is then mapped to the `RunConfig` equivalent.

0 commit comments

Comments
 (0)