Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
4737d65
feat: add kubeflow-ambient solution
NohaIhab Feb 12, 2026
acdfe1a
update tox.ini
NohaIhab Feb 12, 2026
08dc643
configure cilium for canonical k8s only
NohaIhab Feb 12, 2026
11597da
exclude istio-pilot and istio-gateway from new module
NohaIhab Feb 12, 2026
2170246
skip: fix env
NohaIhab Feb 12, 2026
f5be8ef
use juju_application for service mesh charms
NohaIhab Feb 12, 2026
b1079dd
skip: remove feast related vars
NohaIhab Feb 12, 2026
9e97dc4
skip: use edge risk for testing
NohaIhab Feb 12, 2026
82cb605
set knative channel and branch to 1.16
NohaIhab Feb 12, 2026
3d92472
skip: fix platform config
NohaIhab Feb 12, 2026
7122539
set kserve mode to rawdeployment
NohaIhab Feb 13, 2026
72b0fcc
update istio service name to ambient one
NohaIhab Feb 13, 2026
fa691d1
skip bundle correctness check
NohaIhab Feb 13, 2026
9509631
skip bundle correctness for microk8s
NohaIhab Feb 13, 2026
372205e
add missing integrations and configs
NohaIhab Feb 13, 2026
1d4e250
remove model-on-mesh config
NohaIhab Feb 13, 2026
b51ee2f
add trainer-beacon relation to integrations
NohaIhab Feb 16, 2026
6aace9d
include trainer tests with ambient
NohaIhab Feb 19, 2026
9848c08
Merge branch 'track/1.10' into kf-8410-feat-add-kubeflow-ambient-solu…
NohaIhab Feb 19, 2026
4b2c659
remove deprecated knative configs
NohaIhab Feb 19, 2026
3711ec3
remove knative charms from kubeflow-ambient solution
NohaIhab Feb 19, 2026
707b930
deploy trainer v2
NohaIhab Feb 19, 2026
fe181f1
tmp: skip training-integration for ambient
NohaIhab Feb 20, 2026
209c64f
tmp: run only ambient tests
NohaIhab Feb 20, 2026
c14bb3c
test: change UATs branch to test
NohaIhab Feb 24, 2026
e3d2221
include training-operator UATs
NohaIhab Feb 24, 2026
392fef2
debug: add tmate session
NohaIhab Feb 25, 2026
9cac909
tmp: skip tests not ambient
NohaIhab Feb 25, 2026
1bb0bde
skip: run all tests
NohaIhab Feb 25, 2026
cc5c27c
remove tmate sessions
NohaIhab Feb 25, 2026
896a770
update kubeflow-profiles config
NohaIhab Mar 6, 2026
2902686
undo branch pin
NohaIhab Mar 6, 2026
5139993
skip running spark
NohaIhab Mar 6, 2026
f95b63e
run uats from main for ambient
NohaIhab Mar 6, 2026
c4d21c3
remove --bundle arg
NohaIhab Mar 6, 2026
2cf3ea3
pin to latest/edge and add README.md
NohaIhab Mar 6, 2026
c16b989
use uats branch for testing
NohaIhab Mar 6, 2026
7ca9d5f
do not pass --risk for ambient
NohaIhab Mar 6, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 27 additions & 7 deletions .github/workflows/deploy-to-k8s.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -36,13 +36,18 @@ jobs:
uats:
env: feast-remote
extra-args: ""
- module: kubeflow-spark
tf-vars-file: examples/tfvars.json
risk: edge # TODO: Run Spark tests with Kubeflow from `stable` risk once all changes for Spark integration are released to stable
uats_branch: main # TODO: Run Spark tests against ${{ inputs.uats_branch }} once UATs backported to track/1.x
# - module: kubeflow-spark
# tf-vars-file: examples/tfvars.json
# risk: edge # TODO: Run Spark tests with Kubeflow from `stable` risk once all changes for Spark integration are released to stable
# uats_branch: main # TODO: Run Spark tests against ${{ inputs.uats_branch }} once UATs backported to track/1.x
# uats:
# env: spark-remote
# extra-args: --test-image ghcr.io/canonical/charmed-spark-jupyterlab:3.5-22.04_edge@sha256:72a6e89985e35e0920fb40c063b3287425760ebf823b129a87143d5ec0e99af7 --bundle ""
- module: kubeflow-ambient
uats_branch: fix-increase-katib-experiment-timeout
uats:
env: spark-remote
extra-args: --test-image ghcr.io/canonical/charmed-spark-jupyterlab:3.5-22.04_edge@sha256:72a6e89985e35e0920fb40c063b3287425760ebf823b129a87143d5ec0e99af7 --bundle ""
env: kubeflow-remote
extra-args: --include-kubeflow-trainer-tests

steps:
- name: Checkout repository
Expand Down Expand Up @@ -103,6 +108,14 @@ jobs:
sudo concierge prepare --trace
cd ..

- name: Configure Cilium for Canonical K8s
if: matrix.bundle.module == 'kubeflow-ambient'
run: |
# Configure Cilium for Canonical K8s to work with Charmed Istio (Ambient mode)
# See https://canonical-service-mesh-documentation.readthedocs-hosted.com/en/latest/how-to/use-charmed-istio-with-canonical-kubernetes/
kubectl -n kube-system patch configmap cilium-config --type merge --patch '{"data":{"bpf-lb-sock-hostns-only":"true"}}'
kubectl -n kube-system rollout restart daemonset cilium

- name: Setup terraform
run: |
sudo snap install terraform --channel latest/stable --classic
Expand All @@ -119,7 +132,14 @@ jobs:
TF_VARS_ARG="--tf-vars-file ${{ matrix.bundle.tf-vars-file }}"
fi

tox -c ./modules/${{ matrix.bundle.module }} -e test_deployment -- -vv -s --risk ${{ env.RISK }} $TF_VARS_ARG
ISTIO_PLATFORM_ARG=""
RISK_ARG="--risk ${{ env.RISK }}"
if [[ "${{ matrix.bundle.module }}" == "kubeflow-ambient" ]]; then
ISTIO_PLATFORM_ARG='--istio-k8s-platform'
RISK_ARG=""
fi

tox -c ./modules/${{ matrix.bundle.module }} -e test_deployment -- -vv -s $RISK_ARG $TF_VARS_ARG $ISTIO_PLATFORM_ARG

- name: Run UATs
run: |
Expand Down
29 changes: 21 additions & 8 deletions .github/workflows/deploy-to-microk8s.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -37,14 +37,19 @@ jobs:
uats:
env: feast-remote
extra-args: ""
- module: kubeflow-spark
risk: edge # TODO: Run Spark tests with Kubeflow from `stable` risk once all changes for Spark integration are released to stable
uats_branch: main # TODO: Run Spark tests against ${{ inputs.uats_branch }} once UATs backported to track/1.x
juju-agent-version: 3.6.9 # TODO: remove pin once Spark can work on recent Juju agent versions
tf-vars-file: examples/tfvars.json
# - module: kubeflow-spark
# risk: edge # TODO: Run Spark tests with Kubeflow from `stable` risk once all changes for Spark integration are released to stable
# uats_branch: main # TODO: Run Spark tests against ${{ inputs.uats_branch }} once UATs backported to track/1.x
# juju-agent-version: 3.6.9 # TODO: remove pin once Spark can work on recent Juju agent versions
# tf-vars-file: examples/tfvars.json
# uats:
# env: spark-remote
# extra-args: --test-image ghcr.io/canonical/charmed-spark-jupyterlab:3.5-22.04_edge@sha256:72a6e89985e35e0920fb40c063b3287425760ebf823b129a87143d5ec0e99af7 --bundle ""
- module: kubeflow-ambient
uats_branch: fix-increase-katib-experiment-timeout
uats:
env: spark-remote
extra-args: --test-image ghcr.io/canonical/charmed-spark-jupyterlab:3.5-22.04_edge@sha256:72a6e89985e35e0920fb40c063b3287425760ebf823b129a87143d5ec0e99af7 --bundle ""
env: kubeflow-remote
extra-args: --include-kubeflow-trainer-tests
pod-security-standards:
- policy: privileged
istio-cni-bin-dir: "" # meaning Istio CNI disabled
Expand Down Expand Up @@ -158,7 +163,15 @@ jobs:
TF_VARS_ARG="--tf-vars-file ${{ matrix.bundle.tf-vars-file }}"
fi

tox -c ./modules/${{ matrix.bundle.module }} -e test_deployment -- -vv -s --risk ${{ env.RISK }} --istio-cni-bin-dir ${{ matrix.pod-security-standards.istio-cni-bin-dir }} --istio-cni-conf-dir ${{ matrix.pod-security-standards.istio-cni-conf-dir }} --pss ${{ matrix.pod-security-standards.policy }} $TF_VARS_ARG
CNI_ARGS=""
RISK_ARG="--risk ${{ env.RISK }}"
if [[ "${{ matrix.bundle.module }}" != "kubeflow-ambient" ]]; then
CNI_ARGS="--istio-cni-bin-dir ${{ matrix.pod-security-standards.istio-cni-bin-dir }} --istio-cni-conf-dir ${{ matrix.pod-security-standards.istio-cni-conf-dir }}"
else
RISK_ARG=""
fi

tox -c ./modules/${{ matrix.bundle.module }} -e test_deployment -- -vv -s $RISK_ARG $CNI_ARGS --pss ${{ matrix.pod-security-standards.policy }} $TF_VARS_ARG

- name: Run UATs
run: |
Expand Down
86 changes: 86 additions & 0 deletions modules/kubeflow-ambient/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
# Charmed Kubeflow Ambient Terraform solution

> **⚠️ WARNING: EXPERIMENTAL**
>
> This solution is experimental and currently deploys charms from the `latest/edge` channel. It is **NOT recommended for production use**.

This is a Terraform module facilitating the deployment of Charmed Kubeflow in ambient mode, using the [Terraform juju provider](https://github.com/juju/terraform-provider-juju/). For more information, refer to the provider [documentation](https://registry.terraform.io/providers/juju/juju/latest/docs).

The ambient mode provides an alternative deployment of Kubeflow that includes all core components along with KServe for model serving capabilities.

## API

### Inputs
The solution module offers the following configurable inputs:

| Name | Type | Description | Required |
| - | - | - | - |
| `<charm_name>_revision`| number | For each charm of the solution, the revision of the charm to deploy | False |
| `argo_controller_bucket`| string | The name of the bucket to be used by Argo controller in the object store | False |
| `create_model` | bool | Allows to skip Juju model creation and re-use a model created in a higher level module. When re-using a model, if this is created by Terraform, make sure that the current module depends on the resource using the `depends_on` option. | False |
| `cos_configuration`| bool | Boolean value that enables COS configuration | False |
| `dex_connectors`| string | dex-auth connectors in yaml format | False |
| `dex_static_username`| string | dex-auth static username | False |
| `dex_static_password`| string | dex-auth static password | False |
| `existing_opentelemetry_collector_name`| string | Name of an existing opentelemetry-collector-k8s deployment | False |
| `opentelemetry_collector_k8s_size`| string | OpenTelemetry collector storage size | False |
| `http_proxy`| string | Value of the http_proxy environment variable | False |
| `https_proxy`| string | Value of the https_proxy environment variable | False |
| `istio_k8s_platform`| string | Platform for istio-k8s | False |
| `jupyter_ui_config`| map(string) | Map of config values passed to jupyter-ui | False |
| `katib_db_size`| string | Katib database storage size | False |
| `kfp_api_object_store_bucket_name`| string | The name of the bucket to be used by KFP API in the object store | False |
| `kfp_db_size`| string | KFP database storage size | False |
| `kubeflow_dashboard_registration_flow`| string | Whether to enable the registration flow on sign-in for kubeflow-dashboard | False |
| `kubeflow_profiles_security_policy`| string | Security policy for pod security standards enforced in user workloads. Only `privileged` and `baseline` are supported | False |
| `kubeflow_trainer_v2`| bool | Boolean value that enables deployment of Kubeflow Trainer V2 (experimental) | False |
| `minio_access_key`| string | MinIO access key | False |
| `minio_gateway_storage_service`| string | Gateway storage service configuration for MinIO when in 'gateway' mode | False |
| `minio_mode`| string | MinIO mode, either 'server' or 'gateway' | False |
| `minio_secret_key`| string | MinIO secret key | False |
| `minio_size`| string | MinIO database storage size | False |
| `minio_storage_service_endpoint`| string | MinIO storage service endpoint, required if minio_mode is 'gateway' | False |
| `mlmd_size`| string | MLMD database storage size | False |
| `no_proxy`| string | Value of the no_proxy environment variable | False |
| `oidc_gatekeeper_ca_bundle`| string | Custom CA to be trusted by OIDC gatekeeper | False |
| `public_url`| string | Public URL of Kubeflow for auth/OIDC | False |

### Outputs
Upon applied, the solution module exports the following outputs:

| Name | Description |
| - | - |
| `dashboard_links_provider`| Map containing the `app_name` and `provides` endpoints of the kubeflow-dashboard charm |
| `kserve_controller`| Map containing the `app_name`, `provides` and `requires` fields of the kserve-controller charm |
| `opentelemetry_collector_k8s`| Map containing the `app_name`, `provides` and `requires` endpoints of the opentelemetry-collector-k8s charm used |
| `model`| Model name that Charmed Kubeflow is deployed on |

## Usage

This solution module is intended to be used either on its own or as part of a higher-level module.

### Model
This solution always creates a model of the name `kubeflow`, since Charmed Kubeflow cannot be deployed in a different model.

### COS configuration

#### Enable COS configuration
The `cos_configuration` input enables the solution to configure Charmed Kubeflow to integrate with COS. This is done by deploying a `opentelemetry-collector-k8s` charm and adding all the required relations.
```
terraform apply -var cos_configuration=true
```

#### Use an existing opentelemetry-collector-k8s
If there is already an instance of the opentelemetry-collector-k8s charm in the `kubeflow` model, then it can be used instead of deploying a new one. This is achieved with the use of `existing_opentelemetry_collector_name` input. By default, its value is `null`.
```
terraform apply -var cos_configuration=true -var existing_opentelemetry_collector_name="dummy-opentelemetry-collector"
```
> :warning: Setting this input without `cos_configuration` will not have any effect.

### Kubeflow Trainer V2

#### Enable Kubeflow Trainer V2 (Experimental)
The `kubeflow_trainer_v2` input enables the solution to deploy Kubeflow Trainer V2 charm and all the required resources.
```shell
terraform apply -var kubeflow_trainer_v2=true
```
Loading
Loading