Skip to content

Commit 3a35d27

Browse files
authored
Merge pull request #6299 from alculquicondor/wg-batch
Add WG Batch with charter
2 parents ab46b8c + 0cf4239 commit 3a35d27

File tree

10 files changed

+201
-0
lines changed

10 files changed

+201
-0
lines changed

OWNERS_ALIASES

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -111,6 +111,12 @@ aliases:
111111
wg-api-expression-leads:
112112
- apelisse
113113
- kwiesmueller
114+
wg-batch-leads:
115+
- Huang-Wei
116+
- ahg-g
117+
- endocrimes
118+
- soltysh
119+
- swatisehgal
114120
wg-data-protection-leads:
115121
- xing-yang
116122
- yuxiangqian

liaisons.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,7 @@ members will assume one of the departing members groups.
5656
| [SIG Usability](sig-usability/README.md) | Davanum Srinivas (**[@dims](https://github.com/dims)**) |
5757
| [SIG Windows](sig-windows/README.md) | Jordan Liggitt (**[@liggitt](https://github.com/liggitt)**) |
5858
| [WG API Expression](wg-api-expression/README.md) | Jordan Liggitt (**[@liggitt](https://github.com/liggitt)**) |
59+
| [WG Batch](wg-batch/README.md) | Bob Killen (**[@mrbobbytables](https://github.com/mrbobbytables)**) |
5960
| [WG Data Protection](wg-data-protection/README.md) | Christoph Blecker (**[@cblecker](https://github.com/cblecker)**) |
6061
| [WG IoT Edge](wg-iot-edge/README.md) | Christoph Blecker (**[@cblecker](https://github.com/cblecker)**) |
6162
| [WG Multitenancy](wg-multitenancy/README.md) | Jordan Liggitt (**[@liggitt](https://github.com/liggitt)**) |

sig-apps/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,7 @@ The Chairs of the SIG run operations and processes governing the SIG.
4949
## Working Groups
5050

5151
The following [working groups][working-group-definition] are sponsored by sig-apps:
52+
* [WG Batch](/wg-batch)
5253
* [WG Data Protection](/wg-data-protection)
5354

5455

sig-autoscaling/README.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,12 @@ The Chairs of the SIG run operations and processes governing the SIG.
3939
- [@kubernetes/sig-autoscaling-test-failures](https://github.com/orgs/kubernetes/teams/sig-autoscaling-test-failures) - Test Failures and Triage
4040
- Steering Committee Liaison: Tim Pepper (**[@tpepper](https://github.com/tpepper)**)
4141

42+
## Working Groups
43+
44+
The following [working groups][working-group-definition] are sponsored by sig-autoscaling:
45+
* [WG Batch](/wg-batch)
46+
47+
4248
## Subprojects
4349

4450
The following [subprojects][subproject-definition] are owned by sig-autoscaling:

sig-list.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,7 @@ When the need arises, a [new SIG can be created](sig-wg-lifecycle.md)
6363
| Name | Label | Stakeholder SIGs |Organizers | Contact | Meetings |
6464
|------|-------|------------------|-----------|---------|----------|
6565
|[API Expression](wg-api-expression/README.md)|[api-expression](https://github.com/kubernetes/kubernetes/labels/wg%2Fapi-expression)|* API Machinery<br>* Architecture<br>|* [Antoine Pelisse](https://github.com/apelisse), Google<br>* [Kevin Wiesmueller](https://github.com/kwiesmueller), //SEIBERT/MEDIA GmbH<br>|* [Slack](https://kubernetes.slack.com/messages/wg-api-expression)<br>* [Mailing List](https://groups.google.com/forum/#!forum/kubernetes-wg-api-expression)|* Regular WG Meeting: [Tuesdays at 9:30 PT (Pacific Time) (biweekly)](https://zoom.us/j/94238112084)<br>
66+
|[Batch](wg-batch/README.md)|[batch](https://github.com/kubernetes/kubernetes/labels/wg%2Fbatch)|* Apps<br>* Autoscaling<br>* Node<br>* Scheduling<br>|* [Wei Huang](https://github.com/Huang-Wei), Apple<br>* [Abdullah Gharaibeh](https://github.com/ahg-g), Google<br>* [Danielle Lancashire](https://github.com/endocrimes), VMware<br>* [Maciej Szulik](https://github.com/soltysh), Red Hat<br>* [Swati Sehgal](https://github.com/swatisehgal), Intel<br>|* [Slack](https://kubernetes.slack.com/messages/wg-batch)<br>* [Mailing List](TBD)|* Regular Meeting: [TBDs at TBD UTC (biweekly)](TBD)<br>
6667
|[Data Protection](wg-data-protection/README.md)|[data-protection](https://github.com/kubernetes/kubernetes/labels/wg%2Fdata-protection)|* Apps<br>* Storage<br>|* [Xing Yang](https://github.com/xing-yang), VMware<br>* [Xiangqian Yu](https://github.com/yuxiangqian), Google<br>|* [Slack](https://kubernetes.slack.com/messages/wg-data-protection)<br>* [Mailing List](https://groups.google.com/forum/#!forum/kubernetes-data-protection)|* Regular WG Meeting: [Wednesdays at 9:00 PT (Pacific Time) (bi-weekly)](https://zoom.us/j/6933410772)<br>
6768
|[IoT Edge](wg-iot-edge/README.md)|[iot-edge](https://github.com/kubernetes/kubernetes/labels/wg%2Fiot-edge)|* Multicluster<br>* Network<br>|* [Steve Wong](https://github.com/cantbewong), VMware<br>* [Cindy Xing](https://github.com/cindyxing), Microsoft<br>* [Dejan Bosanac](https://github.com/dejanb), Red Hat<br>|* [Slack](https://kubernetes.slack.com/messages/wg-iot-edge)<br>* [Mailing List](https://groups.google.com/forum/#!forum/kubernetes-wg-iot-edge)|* APAC WG Meeting: [Wednesdays at 5:00 UTC (every four weeks)](https://zoom.us/j/91251176046?pwd=cmdqclovM3R3eDB1VlpuL1ZGU1hnZz09)<br>* Regular WG Meeting (Pacific Time): [Wednesdays at 09:00 PT (every four weeks)](https://zoom.us/j/92778512626?pwd=MXhlemwvYnhkQmkxeXllQ0Z5VGs4Zz09)<br>
6869
|[Multitenancy](wg-multitenancy/README.md)|[multitenancy](https://github.com/kubernetes/kubernetes/labels/wg%2Fmultitenancy)|* API Machinery<br>* Auth<br>* Network<br>* Node<br>* Scheduling<br>* Storage<br>|* [Sanjeev Rampal](https://github.com/srampal), Cisco<br>* [Tasha Drew](https://github.com/tashimi), VMware<br>|* [Slack](https://kubernetes.slack.com/messages/wg-multitenancy)<br>* [Mailing List](https://groups.google.com/forum/#!forum/kubernetes-wg-multitenancy)|* Regular WG Meeting: [Tuesdays at 11:00 PT (Pacific Time) (biweekly)](https://zoom.us/my/k8s.sig.auth)<br>

sig-node/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@ The Chairs of the SIG run operations and processes governing the SIG.
4444
## Working Groups
4545

4646
The following [working groups][working-group-definition] are sponsored by sig-node:
47+
* [WG Batch](/wg-batch)
4748
* [WG Multitenancy](/wg-multitenancy)
4849
* [WG Policy](/wg-policy)
4950
* [WG Structured Logging](/wg-structured-logging)

sig-scheduling/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,7 @@ subprojects, and resolve cross-subproject technical issues and decisions.
5656
## Working Groups
5757

5858
The following [working groups][working-group-definition] are sponsored by sig-scheduling:
59+
* [WG Batch](/wg-batch)
5960
* [WG Multitenancy](/wg-multitenancy)
6061
* [WG Policy](/wg-policy)
6162
* [WG Structured Logging](/wg-structured-logging)

sigs.yaml

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2873,6 +2873,52 @@ workinggroups:
28732873
liaison:
28742874
github: liggitt
28752875
name: Jordan Liggitt
2876+
- dir: wg-batch
2877+
name: Batch
2878+
mission_statement: >
2879+
Discuss and enhance the support for Batch (eg. HPC, AI/ML, data analytics, CI)
2880+
workloads in core Kubernetes. We want to unify the way users deploy batch workloads
2881+
to improve portability and to simplify supportability for Kubernetes providers.
2882+
2883+
charter_link: charter.md
2884+
stakeholder_sigs:
2885+
- Apps
2886+
- Autoscaling
2887+
- Node
2888+
- Scheduling
2889+
label: batch
2890+
leadership:
2891+
chairs:
2892+
- github: Huang-Wei
2893+
name: Wei Huang
2894+
company: Apple
2895+
- github: ahg-g
2896+
name: Abdullah Gharaibeh
2897+
company: Google
2898+
- github: endocrimes
2899+
name: Danielle Lancashire
2900+
company: VMware
2901+
- github: soltysh
2902+
name: Maciej Szulik
2903+
company: Red Hat
2904+
- github: swatisehgal
2905+
name: Swati Sehgal
2906+
company: Intel
2907+
meetings:
2908+
- description: Regular Meeting
2909+
day: TBD
2910+
time: TBD
2911+
tz: UTC
2912+
frequency: biweekly
2913+
url: TBD
2914+
archive_url: TBD
2915+
recordings_url: TBD
2916+
contact:
2917+
slack: wg-batch
2918+
mailing_list: TBD
2919+
liaison:
2920+
github: mrbobbytables
2921+
name: Bob Killen
28762922
- dir: wg-data-protection
28772923
name: Data Protection
28782924
mission_statement: >

wg-batch/README.md

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
<!---
2+
This is an autogenerated file!
3+
4+
Please do not edit this file directly, but instead make changes to the
5+
sigs.yaml file in the project root.
6+
7+
To understand how this file is generated, see https://git.k8s.io/community/generator/README.md
8+
--->
9+
# Batch Working Group
10+
11+
Discuss and enhance the support for Batch (eg. HPC, AI/ML, data analytics, CI) workloads in core Kubernetes. We want to unify the way users deploy batch workloads to improve portability and to simplify supportability for Kubernetes providers.
12+
13+
The [charter](charter.md) defines the scope and governance of the Batch Working Group.
14+
15+
## Stakeholder SIGs
16+
* [SIG Apps](/sig-apps)
17+
* [SIG Autoscaling](/sig-autoscaling)
18+
* [SIG Node](/sig-node)
19+
* [SIG Scheduling](/sig-scheduling)
20+
21+
## Meetings
22+
*Joining the [mailing list](TBD) for the group will typically add invites for the following meetings to your calendar.*
23+
* Regular Meeting: [TBDs at TBD UTC](TBD) (biweekly). [Convert to your timezone](http://www.thetimezoneconverter.com/?t=TBD&tz=UTC).
24+
* [Meeting notes and Agenda](TBD).
25+
* [Meeting recordings](TBD).
26+
27+
## Organizers
28+
29+
* Wei Huang (**[@Huang-Wei](https://github.com/Huang-Wei)**), Apple
30+
* Abdullah Gharaibeh (**[@ahg-g](https://github.com/ahg-g)**), Google
31+
* Danielle Lancashire (**[@endocrimes](https://github.com/endocrimes)**), VMware
32+
* Maciej Szulik (**[@soltysh](https://github.com/soltysh)**), Red Hat
33+
* Swati Sehgal (**[@swatisehgal](https://github.com/swatisehgal)**), Intel
34+
35+
## Contact
36+
- Slack: [#wg-batch](https://kubernetes.slack.com/messages/wg-batch)
37+
- [Mailing list](TBD)
38+
- [Open Community Issues/PRs](https://github.com/kubernetes/community/labels/wg%2Fbatch)
39+
- Steering Committee Liaison: Bob Killen (**[@mrbobbytables](https://github.com/mrbobbytables)**)
40+
<!-- BEGIN CUSTOM CONTENT -->
41+
42+
<!-- END CUSTOM CONTENT -->

wg-batch/charter.md

Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
# WG Batch Charter
2+
3+
This charter adheres to the conventions described in the [Kubernetes Charter README] and uses
4+
the Roles and Organization Management outlined in [wg-governance].
5+
6+
[Kubernetes Charter README]: /committee-steering/governance/README.md
7+
8+
## Scope
9+
10+
Discuss and enhance the support for Batch (eg. HPC, AI/ML, data analytics, CI)
11+
workloads in core Kubernetes. We want to unify the way users deploy batch
12+
workloads to improve portability and to simplify supportability for Kubernetes
13+
providers.
14+
15+
### In scope
16+
17+
- To reduce fragmentation in the k8s batch ecosystem: congregate leads and users from
18+
different external and internal projects and user groups (CNCF TAGs, k8s sub-projects
19+
focused on batch-related features such as topology-aware scheduling) in the batch ecosystem to
20+
gather requirements, validate designs and encourage reutilization of core kubernetes APIs.
21+
- The following recommendations for enhancements:
22+
- Additions to the batch API group, currently including Job and CronJob resources
23+
that benefit batch use cases such as HPC, AI/ML, data analytics and CI.
24+
- Primitives for job-level queueing, not limited to the k8s Job resource. Long-term,
25+
this could include multi-cluster support.
26+
- Primitives to control and maximize utilization of resources in fixed-size clusters
27+
(on-prem) and elastic clusters (cloud).
28+
- Runtime and scheduling support for specialized hardware (GPUs, NUMA, RDMA, etc.)
29+
30+
### Out of scope
31+
32+
- Addition of new API kinds that serve a specialized type of workload. The focus
33+
should be on general APIs that specialized controllers can build on top of.
34+
- Uses of the batch APIs as support for serving workloads (eg. backups,
35+
upgrades, migrations). These can be served by existing SIGs.
36+
- Proposals that duplicate the functionality of core kubernetes components
37+
(job-controller, kube-scheduler, cluster-autoscaler).
38+
- Job workflows or pipelines. Mature third party frameworks serve these
39+
use cases with the current kubernetes primitives. But additional primitives
40+
to support these frameworks could be in scope.
41+
42+
## Stakeholders
43+
44+
Stakeholders in this working group span multiple SIGs that own parts of the
45+
code in core kubernetes components and addons.
46+
47+
- Apps
48+
- Autoscaling
49+
- Node
50+
- Scheduling
51+
52+
## Deliverables
53+
54+
The list of deliverables include the following high level features:
55+
56+
- To SIG Apps:
57+
- Updated Job API that fulfills the needs of a wider range of batch applications.
58+
- A performant job controller that can scale to thousands of pods per minute.
59+
- To SIG Scheduling and Autoscaling
60+
- A set of APIs to support job queueing, a framework to support different
61+
queueing policies and a ready-to-use implementation as a subproject.
62+
- Scheduling plugin(s) to support different batch needs.
63+
- To SIG Autoscaling:
64+
- Capabilities for job-level provisioning.
65+
- To SIG Node:
66+
- Runtime support for specialized hardware.
67+
68+
## Roles and Organization Management
69+
70+
This wg adheres to the Roles and Organization Management outlined in [wg-governance]
71+
and opts-in to updates and modifications to [wg-governance].
72+
73+
[wg-governance]: /committee-steering/governance/wg-governance.md
74+
75+
Additionally, the wg commits to:
76+
77+
- maintain a solid communication line between the Kubernetes groups and the wider CNCF community;
78+
- submit a proposal to the KubeCon/CloudNativeCon maintainers track; if not selected, a video update will be recorded and listed below.
79+
80+
## Timelines and Disbanding
81+
82+
As a first mandate, the wg will define a roadmap in the first quarter
83+
of operation. We envision three timelines for the exit criteria, the focus will
84+
be on early exit, but a determination on whether or not to go beyond
85+
that is left until we reach that milestone.
86+
87+
1. Early exit: define "recommendations" for the deliverables mentioned above, those
88+
recommendations would be left to the respective sigs to implement. The WG could
89+
start implementing those recommendations in the context of the owning sig to generate
90+
some momentum.
91+
2. Mileston 2, Late exit: The WG continues the implementation of the recommendations until they reach GA,
92+
and then disband.
93+
2. Convert to SIG: The WG observes a constant influx of requirements for the artifacts and there
94+
is the risk that the SIGs don't have enough capacity to maintain them.
95+
Then, the wg will propose the graduation into a SIG, taking ownership of the
96+
APIs, controllers and scheduling plugins.

0 commit comments

Comments
 (0)