-
Notifications
You must be signed in to change notification settings - Fork 242
Description
/kind feature
Enable builds & releases for IBM Power (ppc64le architecture). This proposal was presented with these slides at the 2022-10-25 Kubeflow community call with positive community feedback. We also created this design documentation: https://docs.google.com/document/d/1nGUvLonahoLogfWCHsoUOZl-s77YtPEiCjWBVlZjJHo/edit?usp=sharing
Why you need this feature:
- Widen scope of possible on-premises deployments (vanilla Kubernetes & OpenShift on Power)
- More general independence regarding processor architecture (x86, ppc64le, arm, …)
- Unified container builds
Describe the solution you'd like:
- Upstreaming changes that allow to build Dockerfiles on multiple architecture (starting with x86 & ppc64le)
- Upstreaming CI integration for multi-arch builds (starting with x86 & ppc64le)
We currently plan to divide our efforts into multiply phases:
- low-hanging "easy" integrations where no or minor code changes are needed; excluding KFP; Kubeflow 1.7 release scope (✅ done),
- same as 1. but now including additional KServe components for model serving; Kubeflow 1.8 release scope,
- same as 1. but now including KFP; Kubeflow 1.9 release scope,
- more complex integrations where external dependencies to python wheels exist.
Below is a detailed overview of each required integration, including links to associated PRs if those already exist.
Phase 1 Integrations (Kubeflow 1.7 scope)
- Poddefaults (Admission) Webhook: updated compatible base images & removed arch dependencies in different components for multiple arch support kubeflow#6650, Adding support for linux-ppc64le in CI for admission-webhook multi-arch docker image kubeflow#6803
🚀 https://hub.docker.com/r/kubeflownotebookswg/poddefaults-webhook/tags - Central Dashboard: Updating central-board Dockerfile for multi-arch support kubeflow#6861, Adding support for linux/ppc64le in CI for centraldashboard multi-arc… kubeflow#6923
🚀 https://hub.docker.com/r/kubeflownotebookswg/centraldashboard/tags - Jupyter Web App: updated compatible base images & removed arch dependencies in different components for multiple arch support kubeflow#6650, Adding support for linux/ppc64le in CI for jupyter-web-app multi-arch… kubeflow#6800
🚀 https://hub.docker.com/r/kubeflownotebookswg/jupyter-web-app/tags - KServe: Agent: removed arch dependency for multiarc support kserve/kserve#2476, Adding support for linux/ppc64le arch in github action for kserve-agent kserve/kserve#2549
🚀 https://hub.docker.com/r/kserve/agent/tags - KServe: Controller: removed arch dependency for multiarc support kserve/kserve#2476, Adding support for linux/ppc64le in github action for kserve-controller kserve/kserve#2550
🚀 https://hub.docker.com/r/kserve/kserve-controller/tags - KServe: Models Web App: updated base images for multiple arch support kserve/models-web-app#45, Adding support for linux-ppc64le in CI for models-web-app kserve/models-web-app#55
🚀 https://hub.docker.com/r/kserve/models-web-app/tags - KServe: QPExt: Adding multi-arch support for linux-ppc64le for qpext kserve/kserve#2604
🚀 https://hub.docker.com/r/kserve/qpext/tags - KServe: Router: Adding multi-arch support for linux-ppc64le for router kserve/kserve#2605
🚀 https://hub.docker.com/r/kserve/router/tags - MPI Operator: Adding mpi-operator workflow to release multi-arch docker image mpi-operator#489
🚀 https://hub.docker.com/r/mpioperator/mpi-operator/tags - Notebook Controller: updated compatible base images & removed arch dependencies in different components for multiple arch support kubeflow#6650, Adding multi-arch support for linux-ppc64le in CI for notebook-controller kubeflow#6771
🚀 https://hub.docker.com/r/kubeflownotebookswg/notebook-controller/tags - Profiles + KFAM: updated compatible base images & removed arch dependencies in different components for multiple arch support kubeflow#6650, Adding support for linux-ppc64le in CI for kfam multi-arch docker image kubeflow#6785, Adding support for linux-ppc64le in CI for profile-controller kubeflow#6809
🚀 https://hub.docker.com/r/kubeflownotebookswg/profile-controller/tags
🚀 https://hub.docker.com/r/kubeflownotebookswg/kfam/tags - Tensorboard Controller: updated compatible base images & removed arch dependencies in different components for multiple arch support kubeflow#6650, Adding support for linux/ppc64le in CI for tensorboard-controller multi-arch docker images. kubeflow#6805
🚀 https://hub.docker.com/r/kubeflownotebookswg/notebook-controller/tags - Tensorboard Web App: updated compatible base images & removed arch dependencies in different components for multiple arch support kubeflow#6650, Adding support for linux/ppc64le in CI for tensorboard-web-app multi-arch docker images. kubeflow#6810
🚀 https://hub.docker.com/r/kubeflownotebookswg/tensorboards-web-app/tags - Training Operator: Removed GOARCH dependency for multiarch support trainer#1674, Adding support for linux/ppc64le in github actions for training-operator trainer#1692
🚀 https://hub.docker.com/r/kubeflow/training-operator/tags - Volumes Web App: updated compatible base images & removed arch dependencies in different components for multiple arch support kubeflow#6650, Adding support for linux-ppc64le in CI to release multi-arch docker image volumes-web-app kubeflow#6811
🚀 https://hub.docker.com/r/kubeflownotebookswg/volumes-web-app/tags
Phase 2 Integrations (Kubeflow 1.9 scope)
- KServe: PMML Server
- KServe: AIX
- KServe: Alibi
- KServe: Art
- Triton Inference Server (external): feat: Added power support for python backend on ubuntu. triton-inference-server/server#8329
- Seldon: ML Server (external)
- PyTorch: TorchServe (external)
Phase 3 Integrations (Kubeflow 1.10 scope)
Note: KFP is currently blocked by kubeflow/pipelines#8660 / GoogleCloudPlatform/oss-test-infra#1972
- KFP: Application-CRD-Controller
- KFP: Argoexec
- KFP: Cache-Server
- KFP: Frontend: feat: Updated dockerfile to support Power pipelines#12125
- KFP: Metadata Envoy
- KFP: Persistence Agent
- KFP: Scheduled Workflow
- KFP: Workflow Controller
- KFP: Viewer-CRD-Controller
- KServe: LGB Server: blocked by buildx and ppc64le wheel pyca/cryptography#7723
- KServe: Paddle Server: blocked by buildx and ppc64le wheel pyca/cryptography#7723
- KServe: SKLearn Server: blocked by buildx and ppc64le wheel pyca/cryptography#7723
- KServe: XGB Server: blocked by buildx and ppc64le wheel pyca/cryptography#7723
- Katib: controller, db-manager, ui
- Katib: file-metrics-collector
- Katib: tfevent-metrics-collector
- Katib: suggestion-hyperopt: updated dockerfiles for grpc builds for powerPC compilation~ katib#2262, Updated dockerfiles for grpc installation for ppc64le katib#2290
- Katib: suggestion-chocolate
- Katib: suggestion-hyperband: updated dockerfiles for grpc builds for powerPC compilation~ katib#2262, Updated dockerfiles for grpc installation for ppc64le katib#2290
- Katib: suggestion-skopt: updated dockerfiles for grpc builds for powerPC compilation~ katib#2262, Updated dockerfiles for grpc installation for ppc64le katib#2290
- Katib: suggestion-goptuna
- Katib: suggestion-optuna: updated dockerfiles for grpc builds for powerPC compilation~ katib#2262, Updated dockerfiles for grpc installation for ppc64le katib#2290
- Katib: suggestion-enas
- Katib: suggestion-darts: updated dockerfiles for grpc builds for powerPC compilation~ katib#2262, Updated dockerfiles for grpc installation for ppc64le katib#2290
- Katib: suggestion-pbt: updated dockerfiles for grpc builds for powerPC compilation~ katib#2262, Updated dockerfiles for grpc installation for ppc64le katib#2290
- Katib: earlystopping-medianstop: Updated dockerfiles for grpc installation for ppc64le katib#2290
Phase 4 Integrations (Post Kubeflow 1.11 scope)
- KFP: Api Server
- KFP: Metadata Writer
- KFP: Visualization Server
- ml-metadata (KFP wheel dep.): Adding multi-arch support for linux/ppc64le in ml-metadata google/ml-metadata#171, [ppc64le] Added GCC-11 and Power support google/ml-metadata#218
- KServe: Storage Initializer: blocked by buildx and ppc64le wheel pyca/cryptography#7723
-
OIDC Auth (external): Enable oidc-authservice repository CI for power(ppc64le) architecture. arrikto/oidc-authservice#104; on-hold as potentially irrelevant as of Kubeflow v1.8 (Move away from AuthService manifests#2469)