From 026b839b7e98bea9aaa8ee681bb1893c4d9b8f50 Mon Sep 17 00:00:00 2001 From: juliusvonkohout <45896133+juliusvonkohout@users.noreply.github.com> Date: Thu, 6 Mar 2025 15:13:31 +0100 Subject: [PATCH 01/17] Helm KEP from @varodrig @chasecadet @juliusvonkohout Signed-off-by: juliusvonkohout <45896133+juliusvonkohout@users.noreply.github.com> --- proposals/831-helm-support/README.MD | 454 +++++++++++++++++++++++++++ 1 file changed, 454 insertions(+) create mode 100644 proposals/831-helm-support/README.MD diff --git a/proposals/831-helm-support/README.MD b/proposals/831-helm-support/README.MD new file mode 100644 index 000000000..03284329d --- /dev/null +++ b/proposals/831-helm-support/README.MD @@ -0,0 +1,454 @@ +# 649-Kubeflow-Helm-Support: Support Helm as an Alternative for Kustomize + + +The demand for a Helm chart for a basic Kubeflow installation has increased. Given the KSC's stance in issue 821 on neutral deployment language and user-defined production readiness, this is an opportune time to introduce a Helm chart. Supporting Helm will enhance ease of adoption and simplify deployments while maintaining the flexibility of community-maintained manifests. There have already been community efforts as well as the Kubeflow-Helm-Chart Slack Channel. + + +## Summary + + +Kubeflow manifests provide a fast way to deploy a minimal Kubeflow platform, with best-effort community support. For guaranteed assistance, users can opt for third-party distributions, consultants, or self-managed expertise. This approach extends to Helm chart support. Contributions and bug reports are encouraged, but no support will be guaranteed. The goal is to build a similar folder structure as Argo for Kubeflow Helm charts. + + +## Motivation + + +Currently, because Kubeflow/manifests are based on Kustomize, many potential users and companies that require Helm charts due to company processes/policies have to rely on third-party distributions. While these options are valuable, they require engagement with adjacent projects and communities. + + +As a project, we must ensure that our Helm chart provides a quick and accessible way for users to deploy a complete Kubeflow platform and individual components, enabling them to manage their environments or adopt a vendor solution. + + +Simplifying Kubeflow deployment lowers the barrier to entry, increases adoption, and encourages contributions. Just as Kubernetes enabled a new wave of cloud-native startups, a neutral, accessible deployment path can empower AI/ML startups to leverage tools like the Training Operator or Katib without reinventing common patterns. If support becomes burdensome, teams can hire expertise or use a distribution—both of which drive demand for Kubeflow skills. + + +By making deployment easy, we attract more end users and foster collaboration with broader communities like PyTorch, improving our implementations in service of their users. + +### About Helm + +- Helm is a graduated project in the CNCF and is maintained by the Helm community. +- Helm is supported by and built with a community of over 400 developers. +- Helm helps you manage Kubernetes applications, rollbacks, updates, dependencies, and releases, and the most complex Kubernetes applications. [More about Helm](https://helm.sh/) + +## Value to the Community + +- Streamline the deployment, upgrade, and rollback of the kubeflow installation process +- Provide another way to install Kubeflow to give the community more options, promoting flexibility and choice + +## Goals + + +✅ A fully functional Kubeflow Helm chart for the targeted release. This will install Kubeflow as a platform and as an individual component. +✅ Published Helm chart documentation with straightforward and uncomplicated configuration options. +✅ A step-by-step tutorial simplifying Kubeflow deployment for users. +✅ Contribution to the Kubeflow community effort for Helm-based installation as part of the official Kubeflow repository. +✅ Where possible, rely on upstream Helm charts (i.e., KServe/Istio). +✅ Consolidate community Helm efforts and prevent duplicate efforts. + + +## Non-Goals +* Deep integration with hyperscalers such as AWS managed databases. Nevertheless for example basic Dex/oauth2-proxy configuration for authentication integration with popular Kubernetes platforms such as EKS is a goal, because it is needed for M2M authentication within Kubeflow. +* Infrastructure provisioning. Users can opt into OpenTofu or Crossplane to template the Helm chart with infrastructure. Still, we are focused on the Helm chart where the values would be configured (if a component requires knowledge of external systems). +* Support separate abstracted operators for components. Helm will handle upgrades. +* Provide guaranteed community support. +* Define production for any particular set of users. + + +## Proposal +This proposal introduces official Helm chart support for deploying Kubeflow. The goal is to provide a modular, community-maintained method for installing and managing Kubeflow, making it more accessible for users who prefer Helm over Kustomize-based manifests. + + +## Desired Outcome + + +The Helm chart will allow users to: + + +✅ Deploy Kubeflow with a single Helm command, reducing installation complexity. +✅ Select specific components to install (e.g., Training Operator, Katib, Pipelines) without requiring the entire Kubeflow stack. +✅ Configure installations via Helm values, enabling customization for different environments (e.g., resource allocation, authentication settings, storage options). +✅ Upgrade and rollback Kubeflow deployments safely using Helm’s built-in version control. +✅ Integrate with GitOps workflows (e.g., ArgoCD, FluxCD) for automated deployments. +✅ Maintain a Helm chart structure similar to Argo’s, ensuring a familiar experience for Kubernetes users. + + +## Measuring Success + + +### Adoption Metrics + + +* Number of Helm chart downloads from the official Kubeflow repository. +* Community contributions to Helm chart improvements. +* Ease of Use and Community Engagement. +* Successful deployments reported by users via GitHub issues, Slack, and forums. +* Documentation feedback and tutorial completion rates. +* Metrics to compare downloads from kustomize vs Helm charts. + + +### Modularity and Customization + + +* Verified Helm installations of the platform and individual components. +* Flexibility demonstrated in community-reported use cases (e.g., deploying only the Training Operator). + + +### Stability and Maintainability + + +* Helm-based deployments function consistently across Kind, Minikube, AKS, EKS, GKE, Rancher and OpenShift. +* Contributions and maintenance of Helm charts remain sustainable within the Kubeflow community. + + +### User Stories (Optional) + + +#### Alex Conquers Kubeflow + + +**Background** + + +Alex is an ML engineer working at a mid-sized AI startup. The team wants to experiment with Kubeflow Pipelines and Katib for hyperparameter tuning but doesn’t need the full Kubeflow stack. Currently, deploying Kubeflow using Kustomize manifests feels cumbersome and requires significant manual effort and maintenance. + + +**Scenario:** + + +Alex needs a fast and repeatable way to deploy only the necessary Kubeflow components while keeping the installation manageable and configurable. + + +**Steps & Experience:** + + +**Discovering Helm Support for Kubeflow** +Alex reads the updated Kubeflow documentation and finds that Helm is now an official installation method. The documentation provides a simple command to install only the necessary components. + + +**Deploying Kubeflow with Helm** +Alex runs a command to install only Kubeflow Pipelines and Katib. The Helm chart automatically handles dependencies and namespace creation, reducing manual steps. Within minutes, the required services are running in the Kubernetes cluster. + + +**Customizing the Deployment** +Alex configures resource limits and storage settings by modifying the Helm values file. + + +**Scaling and Managing the Deployment** +Later, the team decides to add the Training Operator. Instead of redeploying everything, Alex simply enables it. Helm seamlessly applies the changes, avoiding disruption to the existing setup. + + +**Rolling Back** +A misconfiguration in values.yaml causes an issue. Instead of debugging manually, Alex rolls back to the previous working state. + + +**Outcome & Value:** + + +✅ Fast, modular deployment – No need to install unnecessary components. +✅ Easy configuration – Fine-tune installations using Helm values. +✅ Smooth upgrades and rollbacks – No more breaking changes due to manual YAML edits. +✅ Better DevOps integration – Fits naturally into the team’s GitOps workflow with tools like ArgoCD. + +**Managing updates** +Alex can easily install new updates to include new release versions and update dependencies. + +##### Alex's Outcomes: Easily Deploy Kubeflow Using Helm +Alex could deploy only the necessary Kubeflow components using Helm, avoiding the complexity of managing Kustomize-based manifests. Alex installed Kubeflow Pipelines and Katib by running a single command, making the deployment process fast, modular, and repeatable. + + +**Customize Deployments with Helm Values.** +Alex configured the Kubeflow deployment using Helm values, fine-tuning resource limits and storage settings without modifying raw YAML files. By adjusting values.yaml, Alex was able to: + + +* Enable Pipelines and Katib while keeping other components disabled. +* Set up a custom storage backend for Kubeflow Pipelines. +* Adjust CPU and memory limits for Katib experiments. + + +These changes were seamlessly applied with a Helm upgrade, making the system highly customizable and adaptable. + + +**Use Helm’s Standardized Package Management Features** +Alex leveraged Helm’s built-in lifecycle management to ensure a smooth deployment experience: + + +* When a misconfiguration caused an issue, Alex instantly rolled back to a stable deployment using Helm’s versioning feature. +* Helm automatically handled dependencies, ensuring Pipelines and Katib were installed correctly without manual intervention. +* As new versions of Kubeflow components were released, Alex could upgrade seamlessly without reinstalling everything. + + +**Deploy Individual Kubeflow Components** +Since Alex’s team only needed Kubeflow Pipelines and Katib, they didn’t have to deploy the entire Kubeflow stack. Instead, Helm allowed them to deploy only the necessary components, keeping the cluster lightweight and resource-efficient. + + +**Drive Adoption** +As someone new to Kubeflow, Alex benefited from clear documentation and a step-by-step guide for deploying components with Helm. Instead of spending hours understanding manifests and dependencies, Alex got Kubeflow running in minutes. The modular Helm-based approach made it easy for the team to evaluate Kubeflow without committing to a complex setup. + + +**Contribute to and Extend the Helm Chart.** +Alex’s organization saw value in Helm-based deployment and wanted to contribute improvements back to the community. Following a structured approach similar to Argo’s Helm charts, the team could extend the charts to support their infrastructure needs while sharing their updates with the wider Kubeflow community. + + +Thanks to Helm, Kubeflow deployment became effortless, modular, and scalable—allowing Alex’s team to focus on building ML workflows instead of dealing with infrastructure complexity. Alex will get feedback from his ML team using Kubeflow and motivate them to contribute improvements and feature requests to enhance the Kubeflow ecosystem. + + +### Notes/Constraints/Caveats +Alex may choose to use vanilla manifests or go with a vendor. The goal is not to be a distribution but still a part of Kubeflow/manifests. As the appetite for community support grows, the scope may expand, but for now, this is just a simple way to get Kubeflow running using a well-known deployment pattern and provide examples of how to use it. + + +## Risks and Mitigations + + +1. Fragmentation of Deployment Methods + + +**Risk:** Introducing Helm charts as an official deployment method alongside Kustomize may create fragmentation within the Kubeflow ecosystem, leading to confusion between Helm-based, Kustomize-based, and third-party deployment tools. + + +**Mitigation:** +* Clearly position Helm as an alternative to Kustomize, rather than a replacement. +* Maintain alignment with existing manifests, ensuring Helm charts remain consistent with official Kubeflow components. +* Provide comprehensive documentation comparing Helm, Kustomize, and third-party solutions. + + +2. Maintenance Burden and Long-Term Support + + +**Risk:** Maintaining a Helm chart requires ongoing updates as Kubeflow components evolve, which could become a burden if not adequately resourced. + + +**Mitigation:** +* Adopt a community-driven maintenance model, similar to Argo’s Helm charts. +* Establish clear ownership within the Kubeflow community and define a process for versioning and deprecating charts. +* Regularly sync Helm charts with upstream manifests to prevent drift. + + +3. Security Considerations + + +**Risk:** Misconfigured Helm deployments could introduce security vulnerabilities, such as exposed services, weak authentication, or misconfigured role-based access control (RBAC). + + +**Mitigation:** +* Follow Kubernetes security best practices, ensuring charts include secure default configurations. +* Conduct security reviews as part of Kubeflow’s release cycle. +* Provide Helm values presets for secure and production-ready configurations. + + +## Design Details + + +### Helm Chart Structure + + +The repository will contain a root Helm chart (kubeflow) that acts as an umbrella for subcharts: + + +``` +kubeflow/manifests/experimental/helm +│── charts/ +│ │── training-operator/ +│ │── katib/ +│ │── pipelines/ +│ │── istio/ +│ │── profiles/ +│ │── common/ +│ │── kserve/ +│── templates/ +│── values.yaml +│── Chart.yaml +│── README.md +``` + + +* The root kubeflow chart will manage dependencies and shared configurations. +* Subcharts for each component (training-operator, katib, pipelines, etc.) allow independent deployments. + + +### Example Helm Chart Configuration (values.yaml) + + +```yaml +# Global settings +global: + namespace: kubeflow + istio: + enabled: true + + +# Enable/Disable specific components +pipelines: + enabled: true + mysql: + persistence: + storageClass: gp2 + size: 10Gi + + +katib: + enabled: false + + +training-operator: + enabled: true + resources: + limits: + cpu: "2" + memory: "4Gi" +``` + + +### Installation + + +To install Kubeflow Pipelines and the Training Operator only: + + +``` +helm install kubeflow ./kubeflow-helm-chart --set pipelines.enabled=true --set training-operator.enabled=true +``` + + +To enable Katib after the initial installation: + + +``` +helm upgrade kubeflow ./kubeflow-helm-chart --set katib.enabled=true +``` + + +To roll back a deployment: + + +``` +helm rollback kubeflow 1 +``` + + +### Security and Default Configurations + + +To ensure secure and production-ready deployments, the Helm chart will include: + + +* Minimal privileges using Role-Based Access Control (RBAC) and Pod Security Standards restricted. +* Network policies to restrict component communication where necessary. +* Secure default values, with optional overrides for users needing customization. + + +Example RBAC template (templates/rbac.yaml): + + +```yaml +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRole +metadata: + name: kubeflow-pipelines-role +rules: +- apiGroups: ["kubeflow.org"] + resources: ["pipelines"] + verbs: ["get", "list", "watch"] +``` + + +### Implementation Plan + + +* Create a subdirectory (kubeflow/manifests/experimental/helm) to host the Helm charts. +* Define the Helm chart structure, ensuring compatibility, synchronizability, and a single source of truth with the Kustomize manifests. +* Develop subcharts for each major Kubeflow component (Pipelines, Platform, Notebooks, Dashboard, Katib, Training Operator, Istio, etc.) and make them deployable simultaneously to replicate the kustomize manifests. +* Write Helm values documentation, including examples for different environments (Dex vs oauth-proxy and authentication). +* Test the Helm charts as we test the Kustomize manifests. +* Engage the Kubeflow community for feedback and contributions. + + +### Test Plan +1. Unit Testing for Helm Templates +Each Helm template will be tested using Helm unit testing frameworks such as: +* helm-unittest – Validates Helm templates using YAML-based test cases. +* helm-template – Ensures templates render correctly without errors. +* This shall happen in the same GHA used for Kustomize since the output should stay the same. + + +2. Linting and Static Analysis +* Helm Linting (helm lint) – Ensures best practices in chart structure and values. +* YAML Schema Validation – Ensures manifests follow correct Kubernetes API specifications. + Kubeval & Kubeconform – Validates Kubernetes resources before applying them. +* We already have that repository-wide and can just add helm linting. + + +3. End-to-End Testing with CI/CD Pipelines +* Tests will validate component functionality post-deployment (e.g., Pipelines UI loads, Katib runs experiments). +* We shall reuse/extend the Kustomize tests, if the output is the same, we can test Helm and Kustomize simultaneously in the same GHA. + + +4. Community Testing and User Feedback +* Early adopters will be encouraged to test pre-release Helm charts and provide feedback via GitHub issues and the Kubeflow Slack #kubeflow-helm-chart channel. +* A beta phase will allow broader testing before an official Helm release. + + +[ X] I/we understand the components' owners may require updates to existing tests to make this code solid before committing the changes necessary to implement this enhancement. + + +#### Prerequisite Testing Updates + + +Since this is a replication of the Kustomize manifests, most of the testing infrastructure is already in place. + + +#### E2E Tests +The integration tests will be very similar to and based on the ones we have for the Kustomize manifests. + + +#### Integration Tests + + +The end-to-end tests will be very similar to and based on the ones we have for the Kustomize manifests. + + +### Graduation Criteria + + +Reach feature-parity with the Kustomze manifests + + +## Implementation History + + + + + + + +## Drawbacks +### Potential Drawbacks include: +* Users may expect Helm charts to be fully "production ready" and engage the community for out-of-scope support/contributions. +* Helm chart complexity may become burdensome to manage and strain community resources. +* Helm may have unforeseen limitations. + + +## Alternatives +### Glasskube +[Glasskube](https://github.com/glasskube/glasskube) was initially explored as a potential way to improve our deployment. That community [has made an effort](https://glasskube.dev/blog/kubeflow-setup-guide/), but we've yet to see more traction. Their implementation is not as widely adopted, and we may struggle finding contributors. Should the Glasskube community build a Kubeflow distribution/installation method, we'd gladly support them in this effort, but we have not seen a push for Glasskube like we've seen for Helm. + + +### KPT +The [GCP Distribution](https://googlecloudplatform.github.io/kubeflow-gke-docs/docs/) uses KPT, but KPT is not as easily integrated with upstream communities that've standardized on Helm. We'd need to consider upstream vanilla manifests and use KPT. + + +### Crossplane +[The Crossaplane project](https://www.crossplane.io/) could be used to template manifests with a higher-level manifest, but that project is more suited for templating and infrastructure management. We've yet to see any community traction for a Crossplane-powered Kubeflow distribution; therefore, resourcing may be difficult and could lead to a longer lead time versus using Helm. + + + + + From d33759698fbd28669e8929e201e724513c2a96ad Mon Sep 17 00:00:00 2001 From: juliusvonkohout <45896133+juliusvonkohout@users.noreply.github.com> Date: Tue, 11 Mar 2025 19:05:31 +0100 Subject: [PATCH 02/17] Adress Andreys comments. Signed-off-by: juliusvonkohout <45896133+juliusvonkohout@users.noreply.github.com> --- .../README.MD | 40 +++++++++---------- 1 file changed, 18 insertions(+), 22 deletions(-) rename proposals/{831-helm-support => 831-helm-manifests}/README.MD (85%) diff --git a/proposals/831-helm-support/README.MD b/proposals/831-helm-manifests/README.MD similarity index 85% rename from proposals/831-helm-support/README.MD rename to proposals/831-helm-manifests/README.MD index 03284329d..f06b68930 100644 --- a/proposals/831-helm-support/README.MD +++ b/proposals/831-helm-manifests/README.MD @@ -1,17 +1,12 @@ -# 649-Kubeflow-Helm-Support: Support Helm as an Alternative for Kustomize - - -The demand for a Helm chart for a basic Kubeflow installation has increased. Given the KSC's stance in issue 821 on neutral deployment language and user-defined production readiness, this is an opportune time to introduce a Helm chart. Supporting Helm will enhance ease of adoption and simplify deployments while maintaining the flexibility of community-maintained manifests. There have already been community efforts as well as the Kubeflow-Helm-Chart Slack Channel. - +# 831-Helm-Manifests: Installing Kubeflow with Helm ## Summary - Kubeflow manifests provide a fast way to deploy a minimal Kubeflow platform, with best-effort community support. For guaranteed assistance, users can opt for third-party distributions, consultants, or self-managed expertise. This approach extends to Helm chart support. Contributions and bug reports are encouraged, but no support will be guaranteed. The goal is to build a similar folder structure as Argo for Kubeflow Helm charts. - ## Motivation +The demand for a Helm chart for a basic Kubeflow installation has increased. Given the KSC's stance in issue 821 on neutral deployment language and user-defined production readiness, this is an opportune time to introduce a Helm chart. Supporting Helm will enhance ease of adoption and simplify deployments while maintaining the flexibility of community-maintained manifests. There have already been community efforts as well as the Kubeflow-Helm-Chart Slack Channel. Currently, because Kubeflow/manifests are based on Kustomize, many potential users and companies that require Helm charts due to company processes/policies have to rely on third-party distributions. While these options are valuable, they require engagement with adjacent projects and communities. @@ -19,7 +14,7 @@ Currently, because Kubeflow/manifests are based on Kustomize, many potential use As a project, we must ensure that our Helm chart provides a quick and accessible way for users to deploy a complete Kubeflow platform and individual components, enabling them to manage their environments or adopt a vendor solution. -Simplifying Kubeflow deployment lowers the barrier to entry, increases adoption, and encourages contributions. Just as Kubernetes enabled a new wave of cloud-native startups, a neutral, accessible deployment path can empower AI/ML startups to leverage tools like the Training Operator or Katib without reinventing common patterns. If support becomes burdensome, teams can hire expertise or use a distribution—both of which drive demand for Kubeflow skills. +Simplifying the Kubeflow deployment and customization lowers the barrier to entry, increases adoption, and encourages contributions. Just as Kubernetes enabled a new wave of cloud-native startups, a neutral, accessible deployment path can empower AI/ML startups to leverage tools like the Training Operator or Katib without reinventing common patterns. If support becomes burdensome, teams can hire expertise or use a distribution—both of which drive demand for Kubeflow skills. By making deployment easy, we attract more end users and foster collaboration with broader communities like PyTorch, improving our implementations in service of their users. @@ -38,12 +33,12 @@ By making deployment easy, we attract more end users and foster collaboration wi ## Goals -✅ A fully functional Kubeflow Helm chart for the targeted release. This will install Kubeflow as a platform and as an individual component. -✅ Published Helm chart documentation with straightforward and uncomplicated configuration options. -✅ A step-by-step tutorial simplifying Kubeflow deployment for users. +✅ A fully functional Kubeflow Helm chart for the targeted release. Users can install Kubeflow and subsets as a multi-tenant platform and also the individual projects as single-tenant without platform-level support. +✅ The kustomize Manifests are still the single source of truth for the Helm manifests and they shall use the same CI/CD to make sure that they satisyfy the same requirements and the difference is negligible. Therefore they must be next to each other in the same repository for Kstomize and Helm manifests. After the initial PoC the Helm manifests should be upstreamed to the respective working groups and be maintained there and synchronized like the kustomize manifests. +✅ Published Helm chart documentation with straightforward and uncomplicated configuration options. ✅ Contribution to the Kubeflow community effort for Helm-based installation as part of the official Kubeflow repository. -✅ Where possible, rely on upstream Helm charts (i.e., KServe/Istio). -✅ Consolidate community Helm efforts and prevent duplicate efforts. +✅ Where possible, rely on upstream Helm charts (i.e., Istio) instead of translating our own Kustomize manifests again. +✅ Consolidate community Helm efforts and prevent duplicate efforts. ## Non-Goals @@ -65,10 +60,9 @@ The Helm chart will allow users to: ✅ Deploy Kubeflow with a single Helm command, reducing installation complexity. -✅ Select specific components to install (e.g., Training Operator, Katib, Pipelines) without requiring the entire Kubeflow stack. -✅ Configure installations via Helm values, enabling customization for different environments (e.g., resource allocation, authentication settings, storage options). +✅ Select a subset of components to install (e.g., Training Operator, Katib) without requiring for example Pipelines, similar to the kubeflow/manifests/example/kustomization.yaml. +✅ Configure installations via Helm values, enabling customization for different environments (e.g., resource allocation, authentication settings, storage options). ✅ Upgrade and rollback Kubeflow deployments safely using Helm’s built-in version control. -✅ Integrate with GitOps workflows (e.g., ArgoCD, FluxCD) for automated deployments. ✅ Maintain a Helm chart structure similar to Argo’s, ensuring a familiar experience for Kubernetes users. @@ -109,32 +103,34 @@ The Helm chart will allow users to: **Background** -Alex is an ML engineer working at a mid-sized AI startup. The team wants to experiment with Kubeflow Pipelines and Katib for hyperparameter tuning but doesn’t need the full Kubeflow stack. Currently, deploying Kubeflow using Kustomize manifests feels cumbersome and requires significant manual effort and maintenance. +Alex is an ML engineer working at a mid-sized AI startup. The team wants to experiment with Notebooks / Workspaces and Trainer for hyperparameter tuning but does not need Kueflow Pipelines. Currently, deploying Kubeflow using Kustomize manifests feels cumbersome since he is accustomed to Helm manifests it and requires significant extra effort. +He also has an edge device where he only wants to run Kserve for inferencing. + **Scenario:** -Alex needs a fast and repeatable way to deploy only the necessary Kubeflow components while keeping the installation manageable and configurable. +Alex needs a fast and repeatable way to deploy only the necessary Kubeflow components dpending on the environment, while keeping the different installations manageable and configurable. **Steps & Experience:** **Discovering Helm Support for Kubeflow** -Alex reads the updated Kubeflow documentation and finds that Helm is now an official installation method. The documentation provides a simple command to install only the necessary components. +Alex reads the updated Kubeflow documentation and finds that Helm manifests are now an official installation method next to Kustomize manifests. The documentation provides a simple way to template different values for different installations. **Deploying Kubeflow with Helm** -Alex runs a command to install only Kubeflow Pipelines and Katib. The Helm chart automatically handles dependencies and namespace creation, reducing manual steps. Within minutes, the required services are running in the Kubernetes cluster. +Alex runs a command to install only Notebooks / Workspaces and Trainer. The Helm chart automatically handles dependencies (Istio, Dashboard etc.), reducing manual steps. Within minutes, the required services are running in the Kubernetes cluster. He then modifies the values to install only Kserve on his edge device. **Customizing the Deployment** -Alex configures resource limits and storage settings by modifying the Helm values file. +Over time more and more users (10-100) use the larger cluster. Alex configures resource limits and storage settings by modifying the Helm values file to accomodate more users on his larger Kubeflow cluster and decreases the resources on his edge device. **Scaling and Managing the Deployment** -Later, the team decides to add the Training Operator. Instead of redeploying everything, Alex simply enables it. Helm seamlessly applies the changes, avoiding disruption to the existing setup. +Later, the team decides to add Kubeflow pipelines in multi-tenancy platform mode for his multiple users. Instead of redeploying everything, Alex simply enables it. Helm seamlessly applies the changes, avoiding disruption to the existing setup. **Rolling Back** From 87e32dc2d20cf8912bd299a7464a30fe4b54f189 Mon Sep 17 00:00:00 2001 From: juliusvonkohout <45896133+juliusvonkohout@users.noreply.github.com> Date: Tue, 11 Mar 2025 19:17:41 +0100 Subject: [PATCH 03/17] Adress Andreys comments. Signed-off-by: juliusvonkohout <45896133+juliusvonkohout@users.noreply.github.com> --- proposals/831-helm-manifests/README.MD | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/proposals/831-helm-manifests/README.MD b/proposals/831-helm-manifests/README.MD index f06b68930..d91d49454 100644 --- a/proposals/831-helm-manifests/README.MD +++ b/proposals/831-helm-manifests/README.MD @@ -2,19 +2,19 @@ ## Summary -Kubeflow manifests provide a fast way to deploy a minimal Kubeflow platform, with best-effort community support. For guaranteed assistance, users can opt for third-party distributions, consultants, or self-managed expertise. This approach extends to Helm chart support. Contributions and bug reports are encouraged, but no support will be guaranteed. The goal is to build a similar folder structure as Argo for Kubeflow Helm charts. +Kubeflow manifests provide a fast way to deploy a minimal Kubeflow platform, with best-effort community support. For guaranteed assistance, users can opt for [commercial support](https://www.kubeflow.org/docs/started/support/). This approach extends to Helm support. Contributions and bug reports are encouraged, but no support will be guaranteed. The goal is to build a similar folder structure as we have for the current Kustomize manifests to rely on the existing CI/CD and have a signle source of truth. We also want to incorporate best-practices from Argo for Kubeflow Helm manifests. ## Motivation The demand for a Helm chart for a basic Kubeflow installation has increased. Given the KSC's stance in issue 821 on neutral deployment language and user-defined production readiness, this is an opportune time to introduce a Helm chart. Supporting Helm will enhance ease of adoption and simplify deployments while maintaining the flexibility of community-maintained manifests. There have already been community efforts as well as the Kubeflow-Helm-Chart Slack Channel. -Currently, because Kubeflow/manifests are based on Kustomize, many potential users and companies that require Helm charts due to company processes/policies have to rely on third-party distributions. While these options are valuable, they require engagement with adjacent projects and communities. +Currently, because Kubeflow/manifests only contains Kustomize but no Helm manifests, many potential users and companies that require Helm manifests due to company processes/policies have to rely on third-party options. While these options are valuable, they require engagement with adjacent projects and communities and often common improvents done in one fork do not reach and benefit the whole community. As a project, we must ensure that our Helm chart provides a quick and accessible way for users to deploy a complete Kubeflow platform and individual components, enabling them to manage their environments or adopt a vendor solution. -Simplifying the Kubeflow deployment and customization lowers the barrier to entry, increases adoption, and encourages contributions. Just as Kubernetes enabled a new wave of cloud-native startups, a neutral, accessible deployment path can empower AI/ML startups to leverage tools like the Training Operator or Katib without reinventing common patterns. If support becomes burdensome, teams can hire expertise or use a distribution—both of which drive demand for Kubeflow skills. +Simplifying the Kubeflow deployment and customization lowers the barrier to entry, increases adoption, and encourages contributions. Just as Kubernetes enabled a new wave of cloud-native startups, a neutral, accessible deployment path can empower AI/ML startups to leverage tools like the Training Operator or Katib without reinventing common maintenance patterns. If support becomes burdensome, teams can obtain [commercial support](https://www.kubeflow.org/docs/started/support/). By making deployment easy, we attract more end users and foster collaboration with broader communities like PyTorch, improving our implementations in service of their users. @@ -185,11 +185,11 @@ As someone new to Kubeflow, Alex benefited from clear documentation and a step-b Alex’s organization saw value in Helm-based deployment and wanted to contribute improvements back to the community. Following a structured approach similar to Argo’s Helm charts, the team could extend the charts to support their infrastructure needs while sharing their updates with the wider Kubeflow community. -Thanks to Helm, Kubeflow deployment became effortless, modular, and scalable—allowing Alex’s team to focus on building ML workflows instead of dealing with infrastructure complexity. Alex will get feedback from his ML team using Kubeflow and motivate them to contribute improvements and feature requests to enhance the Kubeflow ecosystem. +Thanks to Helm, Kubeflow deployment became effortless, modular, and scalable—allowing Alex’s team to focus on building ML workflows instead of dealing with platform complexity. Alex will get feedback from his ML team using Kubeflow and motivate them to contribute improvements and feature requests to enhance the Kubeflow ecosystem. ### Notes/Constraints/Caveats -Alex may choose to use vanilla manifests or go with a vendor. The goal is not to be a distribution but still a part of Kubeflow/manifests. As the appetite for community support grows, the scope may expand, but for now, this is just a simple way to get Kubeflow running using a well-known deployment pattern and provide examples of how to use it. +Alex may choose to use the Kustomize manifests or go with a [vendor](https://www.kubeflow.org/docs/started/support/). The goal is to be a part of the community maintained Kubeflow/manifests instead of deriving / deviating from it and becoming a distribution. As the appetite for community support grows, the scope may expand, but for now, this is just a simple way to get Kubeflow running using a well-known deployment pattern and provide examples of how to use it. ## Risks and Mitigations @@ -434,7 +434,7 @@ Major milestones might include: ## Alternatives ### Glasskube -[Glasskube](https://github.com/glasskube/glasskube) was initially explored as a potential way to improve our deployment. That community [has made an effort](https://glasskube.dev/blog/kubeflow-setup-guide/), but we've yet to see more traction. Their implementation is not as widely adopted, and we may struggle finding contributors. Should the Glasskube community build a Kubeflow distribution/installation method, we'd gladly support them in this effort, but we have not seen a push for Glasskube like we've seen for Helm. +[Glasskube](https://github.com/glasskube/glasskube) was initially explored as a potential way to improve our deployment. That community [has made an effort](https://glasskube.dev/blog/kubeflow-setup-guide/), but we've yet to see more traction. Their implementation is not as widely adopted, and we may struggle finding contributors. Should the Glasskube community provide a Kubeflow installation method, we'd gladly support them in this effort, but we have not seen a push for Glasskube like we've seen for Helm. ### KPT From 664609f18f498e9855930690bc327e97bd958ae5 Mon Sep 17 00:00:00 2001 From: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com> Date: Mon, 17 Mar 2025 13:57:16 +0100 Subject: [PATCH 04/17] Update charter.md Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com> --- wg-manifests/charter.md | 92 +++++++++++++---------------------------- 1 file changed, 28 insertions(+), 64 deletions(-) diff --git a/wg-manifests/charter.md b/wg-manifests/charter.md index afbbe1f8d..87d365ab5 100644 --- a/wg-manifests/charter.md +++ b/wg-manifests/charter.md @@ -1,70 +1,34 @@ -# WG Manifests Charter +# WG Manifests/Platform Charter This charter adheres to the conventions, roles and organization management outlined in [wg-governance]. +This charter describes the working mode of the last 5 years as of March 2025: + ## Scope -- Provide a catalog (centralized repository) of Kubeflow application manifests. -- Provide a catalog of third-party apps for common services. - -### In scope - -#### Code, Binaries and Services - -- Maintain tooling to automate copying manifests from upstream app repos. -- Maintain a catalog that will allow users to install Kubeflow apps and - common services easily on Kubernetes, either on the cloud or on-prem, without - depending on external cloud services or closed source solutions. Those - manifests are deployed using `kubectl` and `kustomize` and include: - 1. A common set of manifests for the current official Kubeflow applications: - - Training Operators - - Kubeflow Pipelines (KFP) - - Notebooks - - KFServing - - Katib - - Central Dashboard - - Profile Controller - - PodDefaults Controller - 1. Manifests for a set of specific common services: - - Istio - - KNative - - Dex - - Cert-Manager +- Enable users to install, extend and maintain Kubeflow as a platform for multiple users +- This includes dependencies, security efforts and examplary integration with popular tools and frameworks. +- Synchronize the manifests (Helm, Kustomize) between working groups +- We try to be compatible with the popular Kubernetes clusters +- We do not support a specific deployment tool (e.g., ArgoCD, Flux) +- The default installation shall not contain deep integration with external cloud services or closed source solutions +- We provide hints and experimental examples how a user could integrate non-default external authentication (e.g. companies Identity Provider) and popular services on his own +- There is the evolving and not exhaustive list of dependencies for a proper multi-tenant platform installatio: Istio, KNative, Dex, Oauth2-proxy, Cert-Manager, ... +- There is the evolving and not exhaustive list of applications: KFP, Trainer, Dashboard, Workspaces / Noteboks, Kserve, Spark, ... #### Cross-cutting and Externally Facing Processes ##### With Application Owners -- Aid applications owners in creating kustomize manifests for their application, - inside the app repo, if those don't exist already. -- Communicate with application owners to agree upon the version they want to be - included in the next Kubeflow release. +- Aid the application owner in creating manifests (Helm, Kustomize) for his application +- Aid the application owner regarding security best practices +- Communicate with the application owner regarding releases and versioning ##### With Distribution Owners -- Coordinate with distribution owners, to make sure they are in-sync about the - release schedule and have time to test and bring their distributions - up-to-date. - -### Out of scope - -This WG is NOT going to: -- Maintain deployment-specific tools like `kfctl`. -- Maintain distribution-specific manifests. -- Decide which applications to include in Kubeflow. -- Decide which variant of an application to include (e.g., KFP Standalone vs - KFP with Istio). -- Create and maintain one or more Kubeflow distributions. -- Support configurations with environment-specific requirements, like special - hardware, different versions of third-party apps (e.g., Istio, KNative, etc.) - or custom OIDC providers. -- Support and promote a specific deployment tool (e.g., `kfctl`). Opinionated - deployment tools can extend the base kustomizations to create manifests that - support their methods. - - For example, people invested in `kfctl` can create overlays that enable - the use of `kfctl`'s parameter substitution, which expects a specific - folder structure (`params.env`). +- Coordinate with "distribution owners" and users to take part in the testing of Kubeflow releases. + ## Roles and Organization Management @@ -76,13 +40,13 @@ The positions of the Chairs and TLs are granted to the organizations and compani Kubeflow's [governance model](https://github.com/kubeflow/community/blob/master/wgs/wg-governance.md) includes a plethora of different leadership roles. This section aims to provide a clear description of what these roles mean for -this repo, as well as set expectations from people with these roles and requirements +this repository, as well as set expectations from people with these roles and requirements for people to be promoted in a role. A Working Group lead is considered someone that has either the role of **Subproject Owner**, **Tech Lead** or **Chair**. These roles were defined by trying -to provide different responsibility levels for repo owners. For the Manifests WG -we'd like to start by treating *approvers* in the root [OWNERS](https://github.com/kubeflow/manifests/blob/master/OWNERS), +to provide different responsibility levels for repository owners. For the Manifests WG +we would like to start by treating *approvers* in the root [OWNERS](https://github.com/kubeflow/manifests/blob/master/OWNERS), as Subproject Owners, Tech Leads and Chairs. This is done to ensure we have a simple enough model to start that people can understand and get used to. So for the Manifests WG we only have Manifests WG Leads, which are the root approvers. @@ -93,16 +57,16 @@ a reviewer and an approver in the root OWNERS file (Manifests WG Lead). ### Manifests WG Lead Requirements The requirements for someone to be a Lead come from the processes and work required -to be done in this repo. The main goal with having multiple Leads is to ensure +to be done in this repository. The main goal with having multiple Leads is to ensure that in case there's an absence of one of the Leads the rest will be able to ensure -the established processes and the health of the repo will be preserved. +the established processes and the health of the repository will be preserved. With the above the main pillars of work and responsibilities that we've seen for -this repo throughout the years are the following: -1. Being involved with the release team, since the [release process](https://github.com/kubeflow/community/tree/master/releases) is tightly intertwined with the manifests repo -2. Testing methodologies (GitHub Actions, E2E testing with AWS resources etc) -3. Processes regarding the [contrib/addon](https://github.com/kubeflow/manifests/blob/master/contrib) components -4. [Common manifests](https://github.com/kubeflow/manifests/tree/master/common) maintained by Manifests WG (Istio, Knative, Cert Manager etc) +this repository throughout the years are the following: +1. Being involved with the release team, since the [release process](https://github.com/kubeflow/community/tree/master/releases) is tightly intertwined with the manifests/platform repository +2. Testing methodologies (GitHub Actions) +3. Processes regarding the [experimental](https://github.com/kubeflow/manifests/blob/master/experimental) components +4. [Platform manifests](https://github.com/kubeflow/manifests/tree/master/common) maintained irectly by Manifests WG (Istio, Knative, Cert Manager etc.) 5. Community and health of the project Root approvers, or Manifests WG Leads, are expected to have expertise and be able @@ -120,7 +84,7 @@ role by helping with reviews throughout the project. The goal of the requirements is to quantify the main pillars that we documented above. The high level reasoning is that approvers should have lead efforts and -have expertise in the different processes and artefacts maintained in this repo +have expertise in the different processes and artefacts maintained in this repository as well as be invested in the community of the WG. * Need to be a root reviewer From 2ce12a179f81b4feae868696205405264e6499c7 Mon Sep 17 00:00:00 2001 From: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com> Date: Mon, 17 Mar 2025 14:14:21 +0100 Subject: [PATCH 05/17] Update charter.md Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com> --- wg-manifests/charter.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/wg-manifests/charter.md b/wg-manifests/charter.md index 87d365ab5..9d572d15e 100644 --- a/wg-manifests/charter.md +++ b/wg-manifests/charter.md @@ -3,7 +3,7 @@ This charter adheres to the conventions, roles and organization management outlined in [wg-governance]. -This charter describes the working mode of the last 5 years as of March 2025: +This charter describes the working mode / reality / status quo of the last 5 years as of March 2025. ## Scope @@ -26,9 +26,9 @@ This charter describes the working mode of the last 5 years as of March 2025: - Communicate with the application owner regarding releases and versioning ##### With Distribution Owners - -- Coordinate with "distribution owners" and users to take part in the testing of Kubeflow releases. - +- Distributions are strongly opinionated derivatives of Kubeflow platform / manifests, for example replacing all databases with closed source managed databases from AWS, GKE, Azure, ... +- A distribution can be created by an arbitrary amount of users / companies in private or in public, see the definition above +- Coordinate with "distribution owners" / users to take part in the testing of Kubeflow releases. ## Roles and Organization Management From a51ad4a27cb14eae899a25c4ec7ae5ef042045d2 Mon Sep 17 00:00:00 2001 From: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com> Date: Mon, 17 Mar 2025 14:25:05 +0100 Subject: [PATCH 06/17] Update charter.md Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com> --- wg-manifests/charter.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/wg-manifests/charter.md b/wg-manifests/charter.md index 9d572d15e..4ce6292b9 100644 --- a/wg-manifests/charter.md +++ b/wg-manifests/charter.md @@ -10,22 +10,22 @@ This charter describes the working mode / reality / status quo of the last 5 ye - Enable users to install, extend and maintain Kubeflow as a platform for multiple users - This includes dependencies, security efforts and examplary integration with popular tools and frameworks. - Synchronize the manifests (Helm, Kustomize) between working groups -- We try to be compatible with the popular Kubernetes clusters +- We try to be compatible with the popular Kubernetes clusters (Kind, Rancher, AKS, EKS, GKE, ...) - We do not support a specific deployment tool (e.g., ArgoCD, Flux) - The default installation shall not contain deep integration with external cloud services or closed source solutions - We provide hints and experimental examples how a user could integrate non-default external authentication (e.g. companies Identity Provider) and popular services on his own - There is the evolving and not exhaustive list of dependencies for a proper multi-tenant platform installatio: Istio, KNative, Dex, Oauth2-proxy, Cert-Manager, ... - There is the evolving and not exhaustive list of applications: KFP, Trainer, Dashboard, Workspaces / Noteboks, Kserve, Spark, ... -#### Cross-cutting and Externally Facing Processes +## Communication Tasks -##### With Application Owners +### With Application Owners - Aid the application owner in creating manifests (Helm, Kustomize) for his application - Aid the application owner regarding security best practices - Communicate with the application owner regarding releases and versioning -##### With Distribution Owners +### With Distribution Owners - Distributions are strongly opinionated derivatives of Kubeflow platform / manifests, for example replacing all databases with closed source managed databases from AWS, GKE, Azure, ... - A distribution can be created by an arbitrary amount of users / companies in private or in public, see the definition above - Coordinate with "distribution owners" / users to take part in the testing of Kubeflow releases. From a01236372f0ef8ef018d1c90cab25a1d2fe650df Mon Sep 17 00:00:00 2001 From: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com> Date: Tue, 18 Mar 2025 10:17:24 +0100 Subject: [PATCH 07/17] Update charter.md Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com> --- wg-manifests/charter.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/wg-manifests/charter.md b/wg-manifests/charter.md index 4ce6292b9..1bef0e323 100644 --- a/wg-manifests/charter.md +++ b/wg-manifests/charter.md @@ -3,20 +3,20 @@ This charter adheres to the conventions, roles and organization management outlined in [wg-governance]. -This charter describes the working mode / reality / status quo of the last 5 years as of March 2025. +This platform/manifests charter describes the working mode / reality / status quo of the last 5 years as of March 2025. +It tries to balance community and commercial interests. ## Scope - Enable users to install, extend and maintain Kubeflow as a platform for multiple users -- This includes dependencies, security efforts and examplary integration with popular tools and frameworks. +- This includes dependencies, security efforts and exemplary integration with popular tools and frameworks. - Synchronize the manifests (Helm, Kustomize) between working groups - We try to be compatible with the popular Kubernetes clusters (Kind, Rancher, AKS, EKS, GKE, ...) -- We do not support a specific deployment tool (e.g., ArgoCD, Flux) -- The default installation shall not contain deep integration with external cloud services or closed source solutions +- **We do not support a specific deployment tool (e.g., ArgoCD, Flux)** +- The default installation shall not contain deep integration with external cloud services or closed source solutions, instead we aim for Kubernetes-native solutions and light authentication and authorization integration with external IDPs - We provide hints and experimental examples how a user could integrate non-default external authentication (e.g. companies Identity Provider) and popular services on his own -- There is the evolving and not exhaustive list of dependencies for a proper multi-tenant platform installatio: Istio, KNative, Dex, Oauth2-proxy, Cert-Manager, ... +- There is the evolving and not exhaustive list of dependencies for a proper multi-tenant platform installation: Istio, KNative, Dex, Oauth2-proxy, Cert-Manager, ... - There is the evolving and not exhaustive list of applications: KFP, Trainer, Dashboard, Workspaces / Noteboks, Kserve, Spark, ... - ## Communication Tasks ### With Application Owners @@ -27,7 +27,7 @@ This charter describes the working mode / reality / status quo of the last 5 ye ### With Distribution Owners - Distributions are strongly opinionated derivatives of Kubeflow platform / manifests, for example replacing all databases with closed source managed databases from AWS, GKE, Azure, ... -- A distribution can be created by an arbitrary amount of users / companies in private or in public, see the definition above +- A distribution can be created by an arbitrary amount of users / companies in private or in public by deriving from Kubeflow platform/manifests, see the definition above - Coordinate with "distribution owners" / users to take part in the testing of Kubeflow releases. ## Roles and Organization Management @@ -66,10 +66,10 @@ this repository throughout the years are the following: 1. Being involved with the release team, since the [release process](https://github.com/kubeflow/community/tree/master/releases) is tightly intertwined with the manifests/platform repository 2. Testing methodologies (GitHub Actions) 3. Processes regarding the [experimental](https://github.com/kubeflow/manifests/blob/master/experimental) components -4. [Platform manifests](https://github.com/kubeflow/manifests/tree/master/common) maintained irectly by Manifests WG (Istio, Knative, Cert Manager etc.) +4. [Platform manifests](https://github.com/kubeflow/manifests/tree/master/common) maintained irectly by Manifests/Platform WG (Istio, Knative, Cert Manager etc.) 5. Community and health of the project -Root approvers, or Manifests WG Leads, are expected to have expertise and be able +Root approvers, or Manifests/Platform WG Leads, are expected to have expertise and be able to drive all the above areas. Root reviewers on the other hand are expected to have knowledge in all the above and have as a goal to grow into the approvers role by helping with reviews throughout the project. From 11f2ef42d727ad6739bc8682dcd9bbbf310e80be Mon Sep 17 00:00:00 2001 From: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com> Date: Tue, 18 Mar 2025 11:59:11 +0100 Subject: [PATCH 08/17] Update charter.md Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com> --- wg-manifests/charter.md | 18 ++++++++---------- 1 file changed, 8 insertions(+), 10 deletions(-) diff --git a/wg-manifests/charter.md b/wg-manifests/charter.md index 1bef0e323..30d9f45b1 100644 --- a/wg-manifests/charter.md +++ b/wg-manifests/charter.md @@ -1,22 +1,20 @@ -# WG Manifests/Platform Charter - -This charter adheres to the conventions, roles and organization management -outlined in [wg-governance]. +# WG Platform/Manifests Charter This platform/manifests charter describes the working mode / reality / status quo of the last 5 years as of March 2025. -It tries to balance community and commercial interests. +It tries to be as lean as possible and balance community and commercial interests. ## Scope -- Enable users to install, extend and maintain Kubeflow as a platform for multiple users +- Enable users / distributions to install, extend and maintain Kubeflow as a multi-tenant platform for multiple users - This includes dependencies, security efforts and exemplary integration with popular tools and frameworks. - Synchronize the manifests (Helm, Kustomize) between working groups - We try to be compatible with the popular Kubernetes clusters (Kind, Rancher, AKS, EKS, GKE, ...) - **We do not support a specific deployment tool (e.g., ArgoCD, Flux)** - The default installation shall not contain deep integration with external cloud services or closed source solutions, instead we aim for Kubernetes-native solutions and light authentication and authorization integration with external IDPs -- We provide hints and experimental examples how a user could integrate non-default external authentication (e.g. companies Identity Provider) and popular services on his own +- We provide hints and experimental examples how a user / distribution could integrate non-default external authentication (e.g. companies Identity Provider) and popular non-default services on his own - There is the evolving and not exhaustive list of dependencies for a proper multi-tenant platform installation: Istio, KNative, Dex, Oauth2-proxy, Cert-Manager, ... - There is the evolving and not exhaustive list of applications: KFP, Trainer, Dashboard, Workspaces / Noteboks, Kserve, Spark, ... + ## Communication Tasks ### With Application Owners @@ -25,8 +23,8 @@ It tries to balance community and commercial interests. - Aid the application owner regarding security best practices - Communicate with the application owner regarding releases and versioning -### With Distribution Owners -- Distributions are strongly opinionated derivatives of Kubeflow platform / manifests, for example replacing all databases with closed source managed databases from AWS, GKE, Azure, ... +### With Users / Distribution Owners +- Distributions are strongly opinionated derivatives of Kubeflow platform/manifests, for example replacing all databases with closed source managed databases from AWS, GKE, Azure, ... - A distribution can be created by an arbitrary amount of users / companies in private or in public by deriving from Kubeflow platform/manifests, see the definition above - Coordinate with "distribution owners" / users to take part in the testing of Kubeflow releases. @@ -54,7 +52,7 @@ the Manifests WG we only have Manifests WG Leads, which are the root approvers. The following sections will aim to define the requirements for someone to become a reviewer and an approver in the root OWNERS file (Manifests WG Lead). -### Manifests WG Lead Requirements +### Platform/Manifests WG Lead Requirements The requirements for someone to be a Lead come from the processes and work required to be done in this repository. The main goal with having multiple Leads is to ensure From c82dc088701179be639d33ee5b60a7254f3f13bf Mon Sep 17 00:00:00 2001 From: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com> Date: Fri, 18 Jul 2025 12:44:57 +0200 Subject: [PATCH 09/17] Update proposals/831-helm-manifests/README.MD Co-authored-by: Andrey Velichkevich Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com> --- proposals/831-helm-manifests/README.MD | 1 - 1 file changed, 1 deletion(-) diff --git a/proposals/831-helm-manifests/README.MD b/proposals/831-helm-manifests/README.MD index d91d49454..bd8976ce8 100644 --- a/proposals/831-helm-manifests/README.MD +++ b/proposals/831-helm-manifests/README.MD @@ -11,7 +11,6 @@ The demand for a Helm chart for a basic Kubeflow installation has increased. Giv Currently, because Kubeflow/manifests only contains Kustomize but no Helm manifests, many potential users and companies that require Helm manifests due to company processes/policies have to rely on third-party options. While these options are valuable, they require engagement with adjacent projects and communities and often common improvents done in one fork do not reach and benefit the whole community. -As a project, we must ensure that our Helm chart provides a quick and accessible way for users to deploy a complete Kubeflow platform and individual components, enabling them to manage their environments or adopt a vendor solution. Simplifying the Kubeflow deployment and customization lowers the barrier to entry, increases adoption, and encourages contributions. Just as Kubernetes enabled a new wave of cloud-native startups, a neutral, accessible deployment path can empower AI/ML startups to leverage tools like the Training Operator or Katib without reinventing common maintenance patterns. If support becomes burdensome, teams can obtain [commercial support](https://www.kubeflow.org/docs/started/support/). From 2724ad649780e45f801da5e4f42c79f1b92eb476 Mon Sep 17 00:00:00 2001 From: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com> Date: Fri, 18 Jul 2025 12:48:30 +0200 Subject: [PATCH 10/17] Update proposals/831-helm-manifests/README.MD Co-authored-by: Andrey Velichkevich Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com> --- proposals/831-helm-manifests/README.MD | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/831-helm-manifests/README.MD b/proposals/831-helm-manifests/README.MD index bd8976ce8..22bb51387 100644 --- a/proposals/831-helm-manifests/README.MD +++ b/proposals/831-helm-manifests/README.MD @@ -13,7 +13,7 @@ Currently, because Kubeflow/manifests only contains Kustomize but no Helm manife -Simplifying the Kubeflow deployment and customization lowers the barrier to entry, increases adoption, and encourages contributions. Just as Kubernetes enabled a new wave of cloud-native startups, a neutral, accessible deployment path can empower AI/ML startups to leverage tools like the Training Operator or Katib without reinventing common maintenance patterns. If support becomes burdensome, teams can obtain [commercial support](https://www.kubeflow.org/docs/started/support/). +Simplifying the deployment and customization of Kubeflow lowers the barrier to entry, increases adoption, and encourages contributions. Just as Kubernetes enabled a new wave of cloud-native startups, a neutral, accessible deployment path can empower AI/ML startups to leverage tools like the Kubeflow Trainer or Katib without reinventing common maintenance patterns. If support becomes burdensome, teams can obtain [commercial support](https://www.kubeflow.org/docs/started/support/). By making deployment easy, we attract more end users and foster collaboration with broader communities like PyTorch, improving our implementations in service of their users. From ecdeb7e6f23ddf6f7ce3efb62373cee7394611c9 Mon Sep 17 00:00:00 2001 From: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com> Date: Fri, 18 Jul 2025 12:50:36 +0200 Subject: [PATCH 11/17] Update README.MD Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com> --- proposals/831-helm-manifests/README.MD | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/831-helm-manifests/README.MD b/proposals/831-helm-manifests/README.MD index 22bb51387..438de20eb 100644 --- a/proposals/831-helm-manifests/README.MD +++ b/proposals/831-helm-manifests/README.MD @@ -2,7 +2,7 @@ ## Summary -Kubeflow manifests provide a fast way to deploy a minimal Kubeflow platform, with best-effort community support. For guaranteed assistance, users can opt for [commercial support](https://www.kubeflow.org/docs/started/support/). This approach extends to Helm support. Contributions and bug reports are encouraged, but no support will be guaranteed. The goal is to build a similar folder structure as we have for the current Kustomize manifests to rely on the existing CI/CD and have a signle source of truth. We also want to incorporate best-practices from Argo for Kubeflow Helm manifests. +Kubeflow manifests provide a fast way to deploy Kubeflow projects, with best-effort community support. For guaranteed assistance, users can opt for [commercial support](https://www.kubeflow.org/docs/started/support/). This approach extends to Helm support. Contributions and bug reports are encouraged, but no support will be guaranteed. The goal is to build a similar folder structure as we have for the current Kustomize manifests to rely on the existing CI/CD and have a signle source of truth. We also want to incorporate best-practices from Argo for Kubeflow Helm manifests. ## Motivation From b5e020ef165c254928de57f8a9c7bb8828f5aa53 Mon Sep 17 00:00:00 2001 From: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com> Date: Fri, 18 Jul 2025 12:51:33 +0200 Subject: [PATCH 12/17] Update proposals/831-helm-manifests/README.MD Co-authored-by: Andrey Velichkevich Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com> --- proposals/831-helm-manifests/README.MD | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/831-helm-manifests/README.MD b/proposals/831-helm-manifests/README.MD index 438de20eb..210fd9ec2 100644 --- a/proposals/831-helm-manifests/README.MD +++ b/proposals/831-helm-manifests/README.MD @@ -27,7 +27,7 @@ By making deployment easy, we attract more end users and foster collaboration wi ## Value to the Community - Streamline the deployment, upgrade, and rollback of the kubeflow installation process -- Provide another way to install Kubeflow to give the community more options, promoting flexibility and choice +- Provide another way to install Kubeflow projects to give the community more options, promoting flexibility and choice ## Goals From 6f69f90b337df5f09424c53789c52ba6bb207bac Mon Sep 17 00:00:00 2001 From: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com> Date: Fri, 18 Jul 2025 12:53:33 +0200 Subject: [PATCH 13/17] Update proposals/831-helm-manifests/README.MD Co-authored-by: Andrey Velichkevich Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com> --- proposals/831-helm-manifests/README.MD | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/proposals/831-helm-manifests/README.MD b/proposals/831-helm-manifests/README.MD index 210fd9ec2..1c4152638 100644 --- a/proposals/831-helm-manifests/README.MD +++ b/proposals/831-helm-manifests/README.MD @@ -32,8 +32,9 @@ By making deployment easy, we attract more end users and foster collaboration wi ## Goals -✅ A fully functional Kubeflow Helm chart for the targeted release. Users can install Kubeflow and subsets as a multi-tenant platform and also the individual projects as single-tenant without platform-level support. -✅ The kustomize Manifests are still the single source of truth for the Helm manifests and they shall use the same CI/CD to make sure that they satisyfy the same requirements and the difference is negligible. Therefore they must be next to each other in the same repository for Kstomize and Helm manifests. After the initial PoC the Helm manifests should be upstreamed to the respective working groups and be maintained there and synchronized like the kustomize manifests. +✅ Users can install Kubeflow projects in standalone and multi-tenant mode. +✅ The kustomize Manifests are still the single source of truth for the Helm manifests and they shall use the same CI/CD to make sure that they satisfy the same requirements and the difference is negligible. Therefore they must be next to each other in the same repository for Kustomize and Helm manifests. After the initial PoC the Helm manifests should be upstreamed to the respective working groups and be maintained there and synchronized like the kustomize manifests. +✅ Provide Helm Charts for Individual Kubeflow Tools ✅ Published Helm chart documentation with straightforward and uncomplicated configuration options. ✅ Contribution to the Kubeflow community effort for Helm-based installation as part of the official Kubeflow repository. ✅ Where possible, rely on upstream Helm charts (i.e., Istio) instead of translating our own Kustomize manifests again. From 3fbafcaab3fefe468cb122745de35a42cebbb9a8 Mon Sep 17 00:00:00 2001 From: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com> Date: Fri, 18 Jul 2025 12:55:06 +0200 Subject: [PATCH 14/17] Update proposals/831-helm-manifests/README.MD Co-authored-by: Andrey Velichkevich Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com> --- proposals/831-helm-manifests/README.MD | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/831-helm-manifests/README.MD b/proposals/831-helm-manifests/README.MD index 1c4152638..7c21b3253 100644 --- a/proposals/831-helm-manifests/README.MD +++ b/proposals/831-helm-manifests/README.MD @@ -50,7 +50,7 @@ By making deployment easy, we attract more end users and foster collaboration wi ## Proposal -This proposal introduces official Helm chart support for deploying Kubeflow. The goal is to provide a modular, community-maintained method for installing and managing Kubeflow, making it more accessible for users who prefer Helm over Kustomize-based manifests. +This proposal introduces multiple official Helm Charts to deploy Kubeflow projects. The goal is to provide a modular, community-maintained method for installing and managing Kubeflow, making it more accessible for users who prefer Helm over Kustomize-based manifests. ## Desired Outcome From 77bd29977be88725527c3d8b14cc36f11a36ca85 Mon Sep 17 00:00:00 2001 From: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com> Date: Fri, 18 Jul 2025 12:57:29 +0200 Subject: [PATCH 15/17] Update proposals/831-helm-manifests/README.MD Co-authored-by: Andrey Velichkevich Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com> --- proposals/831-helm-manifests/README.MD | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/831-helm-manifests/README.MD b/proposals/831-helm-manifests/README.MD index 7c21b3253..419130d1a 100644 --- a/proposals/831-helm-manifests/README.MD +++ b/proposals/831-helm-manifests/README.MD @@ -72,7 +72,7 @@ The Helm chart will allow users to: ### Adoption Metrics -* Number of Helm chart downloads from the official Kubeflow repository. +* Number of Helm chart downloads from the official Kubeflow repositories. * Community contributions to Helm chart improvements. * Ease of Use and Community Engagement. * Successful deployments reported by users via GitHub issues, Slack, and forums. From f93bbc68015083b2041f08049c38b492d0a17810 Mon Sep 17 00:00:00 2001 From: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com> Date: Fri, 18 Jul 2025 12:58:59 +0200 Subject: [PATCH 16/17] Update proposals/831-helm-manifests/README.MD Co-authored-by: Andrey Velichkevich Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com> --- proposals/831-helm-manifests/README.MD | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/831-helm-manifests/README.MD b/proposals/831-helm-manifests/README.MD index 419130d1a..77cefe549 100644 --- a/proposals/831-helm-manifests/README.MD +++ b/proposals/831-helm-manifests/README.MD @@ -83,7 +83,7 @@ The Helm chart will allow users to: ### Modularity and Customization -* Verified Helm installations of the platform and individual components. +* Verified Helm installations of the Kubeflow projects. * Flexibility demonstrated in community-reported use cases (e.g., deploying only the Training Operator). From 54ae533976ee55a68fea1ae420c104a35cf2d252 Mon Sep 17 00:00:00 2001 From: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com> Date: Fri, 18 Jul 2025 13:13:30 +0200 Subject: [PATCH 17/17] Update charter.md revert to continue in the manifests charter PR. Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com> --- wg-manifests/charter.md | 114 ++++++++++++++++++++++++++-------------- 1 file changed, 76 insertions(+), 38 deletions(-) diff --git a/wg-manifests/charter.md b/wg-manifests/charter.md index 30d9f45b1..afbbe1f8d 100644 --- a/wg-manifests/charter.md +++ b/wg-manifests/charter.md @@ -1,32 +1,70 @@ -# WG Platform/Manifests Charter +# WG Manifests Charter -This platform/manifests charter describes the working mode / reality / status quo of the last 5 years as of March 2025. -It tries to be as lean as possible and balance community and commercial interests. +This charter adheres to the conventions, roles and organization management +outlined in [wg-governance]. ## Scope -- Enable users / distributions to install, extend and maintain Kubeflow as a multi-tenant platform for multiple users -- This includes dependencies, security efforts and exemplary integration with popular tools and frameworks. -- Synchronize the manifests (Helm, Kustomize) between working groups -- We try to be compatible with the popular Kubernetes clusters (Kind, Rancher, AKS, EKS, GKE, ...) -- **We do not support a specific deployment tool (e.g., ArgoCD, Flux)** -- The default installation shall not contain deep integration with external cloud services or closed source solutions, instead we aim for Kubernetes-native solutions and light authentication and authorization integration with external IDPs -- We provide hints and experimental examples how a user / distribution could integrate non-default external authentication (e.g. companies Identity Provider) and popular non-default services on his own -- There is the evolving and not exhaustive list of dependencies for a proper multi-tenant platform installation: Istio, KNative, Dex, Oauth2-proxy, Cert-Manager, ... -- There is the evolving and not exhaustive list of applications: KFP, Trainer, Dashboard, Workspaces / Noteboks, Kserve, Spark, ... - -## Communication Tasks - -### With Application Owners - -- Aid the application owner in creating manifests (Helm, Kustomize) for his application -- Aid the application owner regarding security best practices -- Communicate with the application owner regarding releases and versioning - -### With Users / Distribution Owners -- Distributions are strongly opinionated derivatives of Kubeflow platform/manifests, for example replacing all databases with closed source managed databases from AWS, GKE, Azure, ... -- A distribution can be created by an arbitrary amount of users / companies in private or in public by deriving from Kubeflow platform/manifests, see the definition above -- Coordinate with "distribution owners" / users to take part in the testing of Kubeflow releases. +- Provide a catalog (centralized repository) of Kubeflow application manifests. +- Provide a catalog of third-party apps for common services. + +### In scope + +#### Code, Binaries and Services + +- Maintain tooling to automate copying manifests from upstream app repos. +- Maintain a catalog that will allow users to install Kubeflow apps and + common services easily on Kubernetes, either on the cloud or on-prem, without + depending on external cloud services or closed source solutions. Those + manifests are deployed using `kubectl` and `kustomize` and include: + 1. A common set of manifests for the current official Kubeflow applications: + - Training Operators + - Kubeflow Pipelines (KFP) + - Notebooks + - KFServing + - Katib + - Central Dashboard + - Profile Controller + - PodDefaults Controller + 1. Manifests for a set of specific common services: + - Istio + - KNative + - Dex + - Cert-Manager + +#### Cross-cutting and Externally Facing Processes + +##### With Application Owners + +- Aid applications owners in creating kustomize manifests for their application, + inside the app repo, if those don't exist already. +- Communicate with application owners to agree upon the version they want to be + included in the next Kubeflow release. + +##### With Distribution Owners + +- Coordinate with distribution owners, to make sure they are in-sync about the + release schedule and have time to test and bring their distributions + up-to-date. + +### Out of scope + +This WG is NOT going to: +- Maintain deployment-specific tools like `kfctl`. +- Maintain distribution-specific manifests. +- Decide which applications to include in Kubeflow. +- Decide which variant of an application to include (e.g., KFP Standalone vs + KFP with Istio). +- Create and maintain one or more Kubeflow distributions. +- Support configurations with environment-specific requirements, like special + hardware, different versions of third-party apps (e.g., Istio, KNative, etc.) + or custom OIDC providers. +- Support and promote a specific deployment tool (e.g., `kfctl`). Opinionated + deployment tools can extend the base kustomizations to create manifests that + support their methods. + - For example, people invested in `kfctl` can create overlays that enable + the use of `kfctl`'s parameter substitution, which expects a specific + folder structure (`params.env`). ## Roles and Organization Management @@ -38,13 +76,13 @@ The positions of the Chairs and TLs are granted to the organizations and compani Kubeflow's [governance model](https://github.com/kubeflow/community/blob/master/wgs/wg-governance.md) includes a plethora of different leadership roles. This section aims to provide a clear description of what these roles mean for -this repository, as well as set expectations from people with these roles and requirements +this repo, as well as set expectations from people with these roles and requirements for people to be promoted in a role. A Working Group lead is considered someone that has either the role of **Subproject Owner**, **Tech Lead** or **Chair**. These roles were defined by trying -to provide different responsibility levels for repository owners. For the Manifests WG -we would like to start by treating *approvers* in the root [OWNERS](https://github.com/kubeflow/manifests/blob/master/OWNERS), +to provide different responsibility levels for repo owners. For the Manifests WG +we'd like to start by treating *approvers* in the root [OWNERS](https://github.com/kubeflow/manifests/blob/master/OWNERS), as Subproject Owners, Tech Leads and Chairs. This is done to ensure we have a simple enough model to start that people can understand and get used to. So for the Manifests WG we only have Manifests WG Leads, which are the root approvers. @@ -52,22 +90,22 @@ the Manifests WG we only have Manifests WG Leads, which are the root approvers. The following sections will aim to define the requirements for someone to become a reviewer and an approver in the root OWNERS file (Manifests WG Lead). -### Platform/Manifests WG Lead Requirements +### Manifests WG Lead Requirements The requirements for someone to be a Lead come from the processes and work required -to be done in this repository. The main goal with having multiple Leads is to ensure +to be done in this repo. The main goal with having multiple Leads is to ensure that in case there's an absence of one of the Leads the rest will be able to ensure -the established processes and the health of the repository will be preserved. +the established processes and the health of the repo will be preserved. With the above the main pillars of work and responsibilities that we've seen for -this repository throughout the years are the following: -1. Being involved with the release team, since the [release process](https://github.com/kubeflow/community/tree/master/releases) is tightly intertwined with the manifests/platform repository -2. Testing methodologies (GitHub Actions) -3. Processes regarding the [experimental](https://github.com/kubeflow/manifests/blob/master/experimental) components -4. [Platform manifests](https://github.com/kubeflow/manifests/tree/master/common) maintained irectly by Manifests/Platform WG (Istio, Knative, Cert Manager etc.) +this repo throughout the years are the following: +1. Being involved with the release team, since the [release process](https://github.com/kubeflow/community/tree/master/releases) is tightly intertwined with the manifests repo +2. Testing methodologies (GitHub Actions, E2E testing with AWS resources etc) +3. Processes regarding the [contrib/addon](https://github.com/kubeflow/manifests/blob/master/contrib) components +4. [Common manifests](https://github.com/kubeflow/manifests/tree/master/common) maintained by Manifests WG (Istio, Knative, Cert Manager etc) 5. Community and health of the project -Root approvers, or Manifests/Platform WG Leads, are expected to have expertise and be able +Root approvers, or Manifests WG Leads, are expected to have expertise and be able to drive all the above areas. Root reviewers on the other hand are expected to have knowledge in all the above and have as a goal to grow into the approvers role by helping with reviews throughout the project. @@ -82,7 +120,7 @@ role by helping with reviews throughout the project. The goal of the requirements is to quantify the main pillars that we documented above. The high level reasoning is that approvers should have lead efforts and -have expertise in the different processes and artefacts maintained in this repository +have expertise in the different processes and artefacts maintained in this repo as well as be invested in the community of the WG. * Need to be a root reviewer