Update charter

johnbelamaric · johnbelamaric · commit d18506e4a9e3 · 2024-04-24T08:59:06.000-07:00
Signed-off-by: John Belamaric &lt;jbelamaric@google.com&gt;
diff --git a/wg-device-management/charter.md b/wg-device-management/charter.md
@@ -1,3 +1,100 @@
-# WG Device Management
+# WG Device Management Charter
 
-In progress.
+This charter adheres to the conventions described in the [Kubernetes Charter
+README] and uses the Roles and Organization Management outlined in
+[wg-governance].
+
+## Scope
+
+Enable simple and efficient configuration, sharing, and allocation of
+accelerators and other specialized devices. This working group focuses on the
+APIs, abstractions, and feature designs needed to configure, target, and share
+the necessary hardware for both batch and serving (inference) workloads.
+
+### In scope
+
+- Enable efficient utilization of specialized hardware devices. This includes
+  sharing one or more resources effectively (many workloads sharing a pool of
+  devices), as well as sharing individual devices effectively (several workloads
+  dividing up a single device for sharing).
+- Enable workload authors to specify “just enough” details about their workload
+  requirements to ensure it runs optimally, without having to understand exactly
+  how the infrastructure team has provisioned the cluster.
+- Enable the scheduler to choose the correct place to run a workload the vast
+  majority of the time (rejections should be extremely rare).
+- Enable cluster autoscalers and other node auto-provisioning components to
+  predict whether creating additional resources will satisfy workload needs,
+  before provisioning those resources.
+- Enable the shift from “pods run on nodes” to “workloads consume capacity”.
+  This allows Kubernetes to provision sets of pods on top of sets of nodes and
+  specialized hardware, while taking into account the relationships between
+  those infrastructure components.
+- Enable in-node devices as well as network-accessible devices.
+- Minimize workload disruption due to hardware failures.
+- Address fragmentation of accelerator due to fractional use.
+- Additional problems that may be identified and deemed in scope as we gather
+  use cases and requirements from WG Serving, WG Batch, and other stakeholders.
+- Address all of the above while with a simple API that is a natural extension
+  of the existing Kubernetes APIs, and avoids or minimizes any transition
+  effort.
+
+### Out of Scope
+
+- Higher-level workload controller APIs (for example, the equivalent of
+  Deployment, StatefulSet, or DaemonSet) for specific types of workloads.
+- General resource management requirements not related to devices.
+
+## Deliverables
+
+The WG will coordinate the delivery of KEPs and their implementations by the
+participating SIGs. Interim artifacts will include documents capturing use
+cases, requirements, and designs; however, all of those will eventually result
+in KEPs and code owned by SIGs.
+
+Specifically, we expect to need:
+
+- APIs for publishing resource capacity of in-node and network-accessible
+  devices, as well as sample code to ease creation of drivers to populate this
+  information.
+- APIs for specifying workload resource requirements with respect to devices.
+- APIs, algorithms, and implementations for allocating access to and resources on devices, as well as
+  persisting the results of those allocations.
+- APIs, algorithms, and implementations for allowing adminstrators to control
+  and govern access to devices.
+
+## Stakeholders
+
+- SIG Architecture
+- SIG Autoscaling
+- SIG Network
+- SIG Node
+- SIG Scheduling
+
+Additionally a broad set of end users, device vendors, cloud providers,
+Kubernetes distribution providers, and ecosystem projects (particularly
+autoscaling-related projects) have expressed interest in this effort. There are
+five primary groups of stakeholders from each of which we expect multiple participants:
+
+- Device vendors that manufacture accelerators and other specialized hardware
+  which they would like to make available to Kubernetes users.
+- Kubernetes distribution and managed offering providers that would like to make
+  specialized hardware available to their users.
+- Kubernetes ecosystem projects that help manage workloads utilizing these
+  accelerators (e.g., Karpenter, Kueue, Volcano)
+- End user workload authors that will create workloads that take advantage of
+  the specialized hardware.
+- Cluster administrators that operate and govern clusters containing the
+  specialized hardware.
+
+## Roles and Organization Management
+
+This sig follows adheres to the Roles and Organization Management outlined in [wg-governance]
+and opts-in to updates and modifications to [wg-governance].
+
+## Exit Criteria
+
+The working group will disband when the KEPs resulting from these discussions
+have reached a terminal state.
+
+[wg-governance]: https://github.com/kubernetes/community/blob/master/committee-steering/governance/wg-governance.md
+[Kubernetes Charter README]: https://github.com/kubernetes/community/blob/master/committee-steering/governance/README.md