Skip to content

Conversation

@youngbupark
Copy link

  • This is the initial draft for supporting rollout plugin in ManifestWorkReplicaSet Work Controller
  • Note: rollback will be added when we propose MWRS automatic rollback enhancement.

@openshift-ci
Copy link

openshift-ci bot commented Oct 28, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: youngbupark
Once this PR has been reviewed and has the lgtm label, please assign qiujian16 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

The following service defines the contract between Work Controller and the plugin. Each call must be idempotent, stateless, and time-bounded (≤30 s) to ensure consistent controller reconciliation. Plugin server must implement the following APIs. The helpers to implement server and clients will be implemented in [ocm/sdk-go](https://github.com/open-cluster-management-io/sdk-go) repository.

```proto
// RolloutPluginService is the service for the rollout plugin.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the initial commit of gRPC server proto - open-cluster-management-io/sdk-go#154

Note: The implementation can change as we develop.

observedGeneration: 1
reason: PluginInitialized
status: "True"
type: PluginLoaded
Copy link
Author

@youngbupark youngbupark Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it is good idea to expose the plugin status or not... simple logging in work controller might be enough. I would like to get the feedback.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this status might not be needed, but we would need to surface the message on mwrs API when calling grpc API fails.

@youngbupark youngbupark changed the title KEP: ManifestWorkReplicaSet Rollout Plugin ManifestWorkReplicaSet Rollout Plugin Oct 28, 2025
@youngbupark youngbupark marked this pull request as ready for review October 29, 2025 02:00
@openshift-ci openshift-ci bot requested review from deads2k and qiujian16 October 29, 2025 02:00
observedGeneration: 1
reason: PluginInitialized
status: "True"
type: PluginLoaded
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this status might not be needed, but we would need to surface the message on mwrs API when calling grpc API fails.

// RolloutPluginService is the service for the rollout plugin.
service RolloutPluginService {
// Initialize initializes the plugin.
rpc Initialize(InitializeRequest) returns (InitializeResponse);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will need some clarification on error handling. What happens when a specific call fails? How would mwrs consumer to know and debug.

workConfiguration:
workDriver: kube
# Plugin configuration
plugin:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the plugin is disabled if plugin is not set, right?

// If the validation is completed successfully, the plugin should return a OK result.
// If the validation is still in progress, the plugin should return a INPROGRESS result.
// If the validation is failed, the plugin should return a FAILED result.
rpc ValidateRollout(RolloutPluginRequest) returns (ValidateResponse);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When will this be called in mwrs reconciler? I think a flow on when these APIs will be called in mwrs controller will be helpful.


// BeginRollout is called before the manifestwork resource is applied.
// It is used to prepare the rollout.
rpc BeginRollout(RolloutPluginRequest) returns (google.protobuf.Empty);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean any spec change on manifestwork will trigger this? What if placement changes but mw spec does not change in mwrs?

| False | False or not set | SUCCEEDED | Succeeded | Work has been successfully applied |
| Unknown/Not set | Any | N/A | Progressing | Conservative fallback: treat as still progressing |

The following state machine shows the expected transitions between rollout statuses:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this only enabled when rollout strategy in mwrs is set?

# Plugin configuration
plugin:
# the image name of plugin sidecar
image: quay.io/open-cluster-management/my-rollout-plugin:v0.0.1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if setting image only is enough if the plugin also need to wire with cloud service or mesh which mean the plugin also need some customized args or secret.


## Alternatives

- Implement plugin server as a standalone service: In order to run plugin server securely, we need to enable TLS connection. Using a sidecar, we can avoid the complex network and security configuration. No newline at end of file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on the other hand, plugin can be easily customized.

- rolloutStatus: The current [cluster rollout status](https://github.com/open-cluster-management-io/sdk-go/blob/main/pkg/apis/cluster/v1alpha1/rollout.go#L23-L39) (e.g., ToApply, Progressing, Succeeded, Failed, TimeOut, Skip).
- manifestRevisionName: The name of the manifest revision applied to the cluster.

### Configure custom plugin for work controller
Copy link
Member

@haoqing0110 haoqing0110 Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once a plugin is enabled, what's the behavior of a normal mwrs rollout which does not need to call any plugin?

Work->>PluginServer: (NEW) ProgressingRollout()
PluginServer-->>Work: OK
loop clusterToRollout clusters
Work->>PluginServer: (NEW) BeforeRollout()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo? BeginRollout() ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants