Skip to content

Conversation

@ntny
Copy link
Contributor

@ntny ntny commented Sep 22, 2025

Description of your changes:

POC for #12023

Changes:

  • I modified the Argo compiler in the API server — it now generates a workflow spec with the driver plugin instead of a container. The driver is now hosted as a server inside the agent.
  • I built modified images for the API server (for compiling a new Argo workflow spec) and added the KFP driver server image (hosted by the executor plugin).
  • Added a necessary sa/tokens and additional rules according to documentation
  • built images from the brunch and pushed to docker.io ( ntny/kfp-driver:central-driver-poc & ntny/kfp-api-server:central-driver-poc)

How to launch:

I built the images on Apple M CPU (arm64). If you’re using the same architecture, you can safely reuse the images from Docker Hub (ntny/kfp-driver:central-driver-poc & ntny/kfp-api-server:central-driver-poc). These images are already referenced in the manifests in this branch.
If your architecture is different, you will need to build the Dockerfile and Dockerfile.driver yourself from this brunch and replace images to yours here and here before proceeding with the further instructions

I use a platform-agnostic env inside minikube (mono user)

  • move to the root of the project and run:
kubectl apply -k ./manifests/kustomize/cluster-scoped-resources
  • wait about 30 seconds and run
kubectl apply -k ./manifests/kustomize/env/platform-agnostic 
  1. after all pods are successfully running, add the following rules to the argo-role (required for the executor agent pod):
kubectl -n kubeflow patch role argo-role --type='json' -p='[
  {
    "op": "add",
    "path": "/rules/-",
    "value": {
      "apiGroups": ["argoproj.io"],
      "resources": ["workflowtasksets/status"],
      "verbs": ["get","list","watch","update","patch","delete","create"]
    }
  }
]'

Forward the UI port as usual:

kubectl port-forward -n kubeflow svc/ml-pipeline-ui 8080:80

I have tested this POC on the preinstalled [Tutorial] Data passing in Python components pipeline. Drivers are not created, and the agent is used instead (and removed after the pipeline has finished).
Снимок экрана 2025-09-25 в 12 25 03

Please note: this is just a POC and not a production-ready solution.

@google-oss-prow
Copy link

Hi @ntny. Thanks for your PR.

I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@github-actions
Copy link

🚫 This command cannot be processed. Only organization members or owners can use the commands.

@ntny ntny force-pushed the central-driver-poc branch 3 times, most recently from 3388fc7 to 87883fa Compare September 22, 2025 19:12
@ntny
Copy link
Contributor Author

ntny commented Sep 22, 2025

/hold

@ntny ntny force-pushed the central-driver-poc branch 5 times, most recently from de3b9d2 to 5c0ae07 Compare September 24, 2025 23:22
@droctothorpe
Copy link
Collaborator

This is EPIC, @ntny! Can't wait to try it out.

@ntny
Copy link
Contributor Author

ntny commented Sep 27, 2025

/unhold

arpechenin added 3 commits September 27, 2025 17:40
- Modify Argo compiler: generate a plugin template instead of a container
- driver as a http server

Signed-off-by: arpechenin <[email protected]>
- add feature to regenerate all specs

Signed-off-by: arpechenin <[email protected]>
@ntny ntny force-pushed the central-driver-poc branch from 4808c79 to c88c791 Compare September 30, 2025 16:23
@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign mprahl for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ntny
Copy link
Contributor Author

ntny commented Sep 30, 2025

Hi @HumairAK @droctothorpe would you mind giving this a try?
It should be pretty straightforward to run the cluster with the agent only without the driver by following the instructions above.

@ntny
Copy link
Contributor Author

ntny commented Sep 30, 2025

Hi! @nsingla I made intentional changes to the compiler, and manually updating all specs in test/compiled-workflow would be very time-consuming.
I’ve already used the following code on my side to regenerate specs directly from the test using a special flag (similar to snapshot tests) and then review the diff manually.
Do you have any concerns about this approach, given your experience with test code and test practices?

@nsingla
Copy link
Contributor

nsingla commented Sep 30, 2025

Hi! @nsingla I made intentional changes to the compiler, and manually updating all specs in test/compiled-workflow would be very time-consuming. I’ve already used the following code on my side to regenerate specs directly from the test using a special flag (similar to snapshot tests) and then review the diff manually. Do you have any concerns about this approach, given your experience with test code and test practices?

You don;t need to update it manually, you can run the compiler tests locally with flag:
ginkgo -v -- -updateCompiledFiles=true
this should update the workflows

@ntny ntny force-pushed the central-driver-poc branch from 4fb9fd9 to 4bee799 Compare September 30, 2025 18:26
@ntny ntny force-pushed the central-driver-poc branch from 4bee799 to 1ee2602 Compare September 30, 2025 18:28
@zazulam
Copy link
Collaborator

zazulam commented Oct 1, 2025

/ok-to-test

@droctothorpe
Copy link
Collaborator

Hey, @ntny . Unfortunately, I won't have bandwidth to validate it in the next two weeks but just wanted to let you know that it's on my radar and I will get to it as soon as I can. Maybe someone else will get to it before me. VERY excited about this. Kudos!

@ntny
Copy link
Contributor Author

ntny commented Oct 8, 2025

Hey, @ntny . Unfortunately, I won't have bandwidth to validate it in the next two weeks but just wanted to let you know that it's on my radar and I will get to it as soon as I can. Maybe someone else will get to it before me. VERY excited about this. Kudos!

Hi, thanks! Sure, absolutely no rush!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants