✨ Extend interface with runner #62

mjudeikis · 2025-08-13T16:27:54Z

When creating vendor-agnostic providers, start by establishing your provider alongside all applications. This means you need to wire in every supported provider into the depths of the code, as the main interface does, to support Run.

The majority of providers will be runnable, so I think it makes sense to make it a permanent member of the interface. But SetupWithManager - no. And it can be done way earlier in the lifecycle of the program.

Now, if you pass in the code mcmanager.Provider, you can't start it, as the method is not exported.
So you end up with:

type struct {
  providerX
  providerY
  providerZ

vs just having a single interface to back any implementation.

k8s-ci-robot · 2025-08-13T16:28:00Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mjudeikis

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [mjudeikis]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

FourFifthsCode · 2025-08-13T20:31:27Z

I was also thinking extending the provider interface for something like this makes sense. I wonder if it would helpful to have the multicluster manager's existing start function also call the provider's Run func before it sets everything up, and then maybe SetupWithMgr logic could fit somewhere in there too.

mjudeikis · 2025-08-14T06:04:04Z

I was also thinking extending the provider interface for something like this makes sense. I wonder if it would helpful to have the multicluster manager's existing start function also call the provider's Run func before it sets everything up, and then maybe SetupWithMgr logic could fit somewhere in there too.

@embik and I been discussing this async. My take was that one might want to control lifecycle of those. I personally find these things left flexible.

And result is still the same, as manager inside operates on mcmanager.Provider interface so Run method needs to be exposed. So now you get to the situation of "who is responsible for starting it, author of manager". And when not sure - in my head - leave it manual.

Like:

func (p *scopedManager) Start(ctx context.Context) error {
	if p.Manager.GetProvider() != nil {
		if err := p.Manager.GetProvider().Run(ctx); err != nil {
			return fmt.Errorf("failed to run provider: %w", err)
		}
	}

	return p.Manager.GetLocalManager().Start(ctx)
}

means you still need interface exposed.

FourFifthsCode · 2025-08-14T13:13:54Z

So now you get to the situation of "who is responsible for starting it, author of manager". And when not sure - in my head - leave it manual.

Yeah, the "who is responsible" is definitely tricky. On the other side of things it can be easy to forget to start the provider (which happened to me at one point 😅 )

What would you think of renaming Run() on the provider to something like StartupHook() and have it return a func? Maybe then we get the best of both worlds, a managed startup and clearer responsibility.

func (p *scopedManager) Start(ctx context.Context) error {
     prov :=  p.Manager.GetProvider() 
     if prov != nil && prov.StartupHook != nil {
          if err := prov.StartupHook(ctx); err != nil {
               return fmt.Errorf("failed to run provider: %w", err)
           }
     }

     return p.Manager.GetLocalManager().Start(ctx)
}

mjudeikis · 2025-08-14T13:57:09Z

So now you get to the situation of "who is responsible for starting it, author of manager". And when not sure - in my head - leave it manual.

Yeah, the "who is responsible" is definitely tricky. On the other side of things it can be easy to forget to start the provider (which happened to me at one point 😅 )

What would you think of renaming Run() on the provider to something like StartupHook() and have it return a func? Maybe then we get the best of both worlds, a managed startup and clearer responsibility.
func (p *scopedManager) Start(ctx context.Context) error {
     prov :=  p.Manager.GetProvider() 
     if prov != nil && prov.StartupHook != nil {
          if err := prov.StartupHook(ctx); err != nil {
               return fmt.Errorf("failed to run provider: %w", err)
           }
     }

     return p.Manager.GetLocalManager().Start(ctx)
}

I think I would be ok. In my current POC I kinda building towards idea that we would need SetupWithManager in the interface too :/ Still shuffling code around. Let me sit on this for few more days as Im iterating on this as I go.

ntnn · 2025-08-14T19:53:06Z

IMHO it would make some sense for the manager to start the provider.
The provider provides clusters for the manager and its reconcilers, so it would make sense for the manager to start the provider when the manager is ready to accept/engage the clusters.

So at the moment it will always end in this setup:

provider := myprovider.New()
mgr, _ := mcmanager.New(ctrl.GetConfigOrDie(), provider, mcmanager.Options{})
provider.SetupWithManager(mgr) // Or something else to set the manager
go mgr.Start(ctx)
go provider.Run(ctx)
// wait

Which could instead be:

provider := myprovider.New()
mgr, _ := mcmanager.New(ctrl.GetConfigOrDie(), provider, mcmanager.Options{}) // manager sets itself on the provider
mgr.Start(ctx) // manager starts the provider

When I read the interfaces for the first time I found it odd that the provider interface had no Run method and that the provider has to be started manually.

However I think that was by design as it might be too early to decide the lifecycle of both the provider and manager, as that could stifle design choices.

If the lifecycle is less flexible that could result in some designs not being possible.
E.g. the multi provider would allow adding and removing providers at runtime: #56
While this would still be possible (or even easier) with this change future design decisions based on the premise that the manager manages the provider lifecycle could make some provider designs not possible.

Then again - the lifecycles are already linked somewhat, so it would make sense to include that in the design.

StartWithManager

If the lifecycle is linked like this I'd prefer Provider.StartWithManager(context.Context, mcmanager.Manager) instead of .SetupWithManager(mcmanager.Manager) and .Start(context.Context).
The method starting the provider would have to check for the manager not being nil anyhow - regardless of where it is coming from:

func (p *Provider) SetupWithManager(mgr mcmanager.Manager) error {
	if mgr == nil {
		return ErrMgrNil
	}
	p.mgr = mgr
}

func (p *Provider) Start(ctx context.Context) error {
	if p.mgr == nil {
		return ErrMgrNil
	}
	// ...
}

or:

func (p *Provider) StartWithManager(ctx context.Context, mgr mcmanager.Manager) error {
	if mgr == nil {
		return ErrMgrNil
	}
	p.mgr = mgr
	// ...
}

And providers that would also work without a mgr wouldn't care either way.

FourFifthsCode · 2025-08-15T13:27:16Z

So if I understand correctly, you are suggesting using provider.StartWithManager instead of provider.Run?
And then the manager calls that func before it starts?
If so, that sounds good to me!

ntnn · 2025-08-15T16:41:02Z

Exactly, looking at some sample code the SetupWithManager and Run separately doesn't feel right and overall I much prefer the story of "the manager is started and then starts the provider when it is ready" over the developer having knowledge of the inner workings of the provider they are using to know when the provider should be started.

corentone · 2025-08-16T12:26:23Z

I have a hard time following everyone's position to be honest.

I can see two cases (which I think map to what you're suggesting, but we stand a bit in a middle today?)

The provider is a controller (recommended?) and therefore should have similar method/interfaces as a controller. It would be started by the manager and be a runnable. Being a controller, it can benefit from leveraging things the manager offer (clients, logging, etc)
The provider is independent from the manager runtime-wise, and only calls Engage on the manager (should there be more calls?).

I think 1 makes it easier for lifecycle and requires less provider-specific management. 2 gives a lot of control and indepedence from the rest of the controllers (could the controller bog down the manager and therefore block the provider from doing its job -- maybe there could be a small chance of deadlock?), but I'm not sure the added complexity is worth it.

In conclusion, if we go for 1, the only "external" requirement is for the provider to be a runnable for the manager to manage.

ntnn · 2025-08-17T08:02:39Z

I can see two cases (which I think map to what you're suggesting, but we stand a bit in a middle today?)

1. The provider is a controller (recommended?) and therefore should have similar method/interfaces as a controller. It would be started by the manager and be a runnable. Being a controller, it can benefit from leveraging things the manager offer (clients, logging, etc)

2. The provider is independent from the manager runtime-wise, and only calls Engage on the manager (should there be more calls?).

We don't stand in the middle - the status quo is 2: The provider is entirely independent from the manager.
So the user has to manage the lifecycle of both and needs to know in which order to start them.

That has some problems, e.g. the user needs to know when the provider expects to be run. Before the manager? After the manger? When should the manager be set on the provider? Before or after starting the manager?

E.g. in the Engage method the runnables are enganging the cluster:

multicluster-runtime/pkg/manager/manager.go

Lines 201 to 210 in a41032c

    
           func (m *mcManager) Engage(ctx context.Context, name string, cl cluster.Cluster) error { 
        
           	ctx, cancel := context.WithCancel(ctx) //nolint:govet // cancel is called in the error case only. 
        
           	for _, r := range m.mcRunnables { 
        
           		if err := r.Engage(ctx, name, cl); err != nil { 
        
           			cancel() 
        
           			return fmt.Errorf("failed to engage cluster %q: %w", name, err) 
        
           		} 
        
           	} 
        
           	return nil //nolint:govet // cancel is called in the error case only. 
        
           }

Say a provider only has the SetManager method with no further documentation aside from that it sets the manager:

prov := myprov.New(myprov.Opts{})
mgr, err := mcmanager.New(cfg, prov, mcmanager.Opts{})
prov.SetManager(mgr)
mgr.Add(myrunnable1)
mgr.Add(myrunnable2)
return mgr.Start(ctx)

In reality the .SetManager already starts working - so the runnables added later will miss the first clusters the provider provides.

In conclusion, if we go for 1, the only "external" requirement is for the provider to be a runnable for the manager to manage.

Correct - and to prevent the provider starting to provide clusters too early the manager should pass itself when running the provider.

When @mjudeikis and I were talking he also suggested that the Run/Start/... method on the provider interface might not exist due to circular dependencies, as the mcmanager.Manager already references the multicluster.Provider and multicluster.Aware.
That could be circumvented by the provider expecting multicluster.Aware as a parameter, as Provider and Aware are both in the multicluster package:

type Provider interface {
	Start(context.Context, Aware)
	Get(...)
	IndexField(...)
}

Providers that act like a controller (e.g. cluster-api or gardener) are already expecting to be passed a manager anyhow because they need their manager to target whatever cluster they pull the data from, not necessarily the cluster they are running in.
And most providers will only need a target to engage clusters with.

mjudeikis · 2025-08-18T05:38:36Z

When @mjudeikis and I were talking he also suggested that the Run/Start/... method on the provider interface might not exist due to circular dependencies, as the mcmanager.Manager already references the multicluster.Provider and multicluster.Aware.
That could be circumvented by the provider expecting multicluster.Aware as a parameter, as Provider and Aware are both in the multicluster package:

Yeah, this might be an option.

So if I understand correctly, you are suggesting using provider.StartWithManager instead of provider.Run?

It sounds reasonable. Need to test this in the code.

So I think im leaning towards:

provider.StartWithManager

Does anybody have anything against this?

ntnn · 2025-08-18T05:51:27Z

provider.StartWithManager 
Does anybody have anything against this?

If the method takes a manager I like that.
If the method takes an Aware instead of manage it should be called StartWithAware or similar, otherwise it wouldn't be accurate.
I'd also be fine with just calling it Start or Run if it accepts an Aware. That is pretty much what the existing providers look like, except that they expect the full manager.

mjudeikis · 2025-08-18T06:30:56Z

I think only issue could be circularity in go. Need to shuffle code and see as I dont have (and I hope I never will need to have) mental model of all packages :)

corentone · 2025-08-18T08:38:24Z

That has some problems, e.g. the user needs to know when the provider expects to be run. Before the manager? After the manger? When should the manager be set on the provider? Before or after starting the manager?

We're on the same page! I actually have been pushing towards making it a controller in some PRs because it became quickly clearly we needed a way to make setup and management of a provider simple.

I'm not too scared of the circular dependency. I think the provider interface could be put into a separate package alongside helper methods and both the provider implementations and manager would import it.

Aware is not too clear to me, thats recent right? What are we aware of?

I do like StartWithManager but I'd recommend keeping it the same as controllers, kubebuilder has SetupWithManager in doc which would group a New and Start together.

ntnn · 2025-08-18T08:43:59Z

Aware is the interface that has the Engage method:

multicluster-runtime/pkg/multicluster/multicluster.go

Lines 26 to 43 in 00d670c

    
           // Aware is an interface that can be implemented by components that 
        
           // can engage and disengage when clusters are added or removed at runtime. 
        
           type Aware interface { 
        
           	// Engage gets called when the component should start operations for the given Cluster. 
        
           	// The given context is tied to the Cluster's lifecycle and will be cancelled when the 
        
           	// Cluster is removed or an error occurs. 
        
           	// 
        
           	// Implementers should return an error if they cannot start operations for the given Cluster, 
        
           	// and should ensure this operation is re-entrant and non-blocking. 
        
           	// 
        
           	//	\_________________|)____.---'--`---.____ 
        
           	//              ||    \----.________.----/ 
        
           	//              ||     / /    `--' 
        
           	//            __||____/ /_ 
        
           	//           |___         \ 
        
           	//               `--------' 
        
           	Engage(context.Context, string, cluster.Cluster) error 
        
           }

Since the Provider interface is in the same package (and even file :D) I think the least invasive change would be to expect something that satisfies the Aware interface to be passed to a Start method on the provider.

mjudeikis · 2025-08-18T11:52:42Z

I updated the code. Moved Run to Start but I'm wondering if I should revert. I like start more, but its about delta.

Having aware as an argument works for most cases (for example, it does not work for the kubeconfig provider), but it's easy to override.

It does not solve the startup ordering problem, but I'm not sure how I feel about that either. I like explicitness.

@ntnn @embik @corentone @FourFifthsCode, what's your take on the current iteration?

ntnn · 2025-08-18T11:57:53Z

providers/single/provider.go

+	p.aware = aware
+
+	if err := p.aware.Engage(ctx, p.name, p.cl); err != nil {


Suggested change

p.aware = aware

if err := p.aware.Engage(ctx, p.name, p.cl); err != nil {

if err := aware.Engage(ctx, p.name, p.cl); err != nil {

Just a nit

ntnn · 2025-08-18T12:06:28Z

providers/kubeconfig/provider.go

 		return fmt.Errorf("manager is nil")
 	}
-	p.mgr = mgr
+	p.mcmanager = mgr

 	// Get the local manager from the multicluster manager
 	localMgr := mgr.GetLocalManager()


Not specifically for this change but it comes up now with the manager - wouldn't it be better to give kubeconfig a config, and the provider spins up a manager by itself, rather than insisting on the same cluster?

flowchart LR subgraph computeCluster[Compute Cluster] kubeconfigMCP[MCR-based operator with Kubeconfig provider] end kubeconfigSecrets -.-> kubeconfigMCP subgraph secretsCluster[Cluster with Secrets] kubeconfigSecrets[Kubeconfigs] end kubeconfigMCP --> target1 kubeconfigSecrets -.-> target1 kubeconfigMCP --> target2 kubeconfigSecrets -.-> target2 subgraph target1[Target Cluster 1] end subgraph target2[Target Cluster 2] end

Loading

Then the setup would resolve itself:

cfg := ctrl.GetConfigOrDie() kconfigProvider := kubeconfig.New(kubeconfig.Options{ Config: cfg, // this could be the same config as for mcmanager or a config for another cluster to read the secrets from }) mcmgr, _ := mcmanager.New(cfg, kconfigProvider, mcmanager.Options{}) return mcmger.Start(ctx)

@FourFifthsCode What do you think?

That would open things up a bit! Would just need to try and make it really clear where users are getting their secrets from and hopefully make it straight forward to configure.

It does almost feel cleaner to have the provider start the manager, but then that might make bootstrapping more complicated on the manager side.

Yeah it would - and it would block e.g. having multiple providers in one manager

Let's do this in a follow-up. While this sounds right, this somehow breaks the pattern we been using. I think I want to see it in separe PR so we can discuss.

ntnn · 2025-08-18T12:08:40Z

providers/cluster-api/provider.go

+	defer p.lock.Unlock()
+
+	p.aware = aware


Suggested change

defer p.lock.Unlock()

p.aware = aware

p.aware = aware

p.lock.Unlock()

Otherwise the provider locks I think? Because the reconcile also locks

But there should not be any reconciliation until started. However, it's better to be safe than sorry.

Correct but the .Start is blocking until the context is done and holds the lock at the same time, so the Reconcile method will never execute because it cannot acquire the lock

FourFifthsCode · 2025-08-18T15:09:26Z

I have a hard time following everyone's position to be honest.

I can see two cases (which I think map to what you're suggesting, but we stand a bit in a middle today?)

The provider is a controller (recommended?) and therefore should have similar method/interfaces as a controller. It would be started by the manager and be a runnable. Being a controller, it can benefit from leveraging things the manager offer (clients, logging, etc)

The provider is independent from the manager runtime-wise, and only calls Engage on the manager (should there be more calls?).

I think 1 makes it easier for lifecycle and requires less provider-specific management. 2 gives a lot of control and indepedence from the rest of the controllers (could the controller bog down the manager and therefore block the provider from doing its job -- maybe there could be a small chance of deadlock?), but I'm not sure the added complexity is worth it.

In conclusion, if we go for 1, the only "external" requirement is for the provider to be a runnable for the manager to manage.

@corentone I think you laid out the positions clearly!

Option 1 has the benefits of moving toward a cleaner startup and access to client, logging, recorders, etc.
Option 2 has the benefit of flexibility in use cases where a controller might not make sense.

I tend to lean toward option 1 myself because I view managers as a framework that helps get things bootstrapped and if I really need something custom I can just use informers directly. The Runnable interface is designed around this.

The one thing that is tricky is the circular dependency. Providers with controllers needs access to the manager, but managers need access to the clusters a provider manages 😅

Also not all providers need a controller.

mjudeikis · 2025-08-18T15:14:38Z

I have a hard time following everyone's position to be honest.
I can see two cases (which I think map to what you're suggesting, but we stand a bit in a middle today?)

The provider is a controller (recommended?) and therefore should have similar method/interfaces as a controller. It would be started by the manager and be a runnable. Being a controller, it can benefit from leveraging things the manager offer (clients, logging, etc)

The provider is independent from the manager runtime-wise, and only calls Engage on the manager (should there be more calls?).

I think 1 makes it easier for lifecycle and requires less provider-specific management. 2 gives a lot of control and indepedence from the rest of the controllers (could the controller bog down the manager and therefore block the provider from doing its job -- maybe there could be a small chance of deadlock?), but I'm not sure the added complexity is worth it.
In conclusion, if we go for 1, the only "external" requirement is for the provider to be a runnable for the manager to manage.

@corentone I think you laid out the positions clearly!

Option 1 has the benefits of moving toward a cleaner startup and access to client, logging, recorders, etc.

Option 2 has the benefit of flexibility in use cases where a controller might not make sense.

I tend to lean toward option 1 myself because I view managers as a framework that helps get things bootstrapped and if I really need something custom I can just use informers directly. The Runnable interface is designed around this.

The one thing that is tricky is the circular dependency. Providers with controllers needs access to the manager, but managers need access to the clusters a provider manages 😅

Also not all providers need a controller.

I will ask differently. Does whats currently in the code would solve your issues/challenges you seen on your end or are you missing something?

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 13, 2025

k8s-ci-robot requested review from embik and sttts August 13, 2025 16:28

k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 13, 2025

mjudeikis changed the title ~~Extend interface with runner~~ ✨ Extend interface with runner Aug 13, 2025

mjudeikis force-pushed the mjudeikis.runner branch from c3c267a to 650df41 Compare August 14, 2025 06:16

Extend interface with runner

93f2c78

mjudeikis force-pushed the mjudeikis.runner branch from 650df41 to 93f2c78 Compare August 14, 2025 06:19

try 2: runnable provider

d8ebc45

mjudeikis force-pushed the mjudeikis.runner branch from ba8166d to d8ebc45 Compare August 18, 2025 10:14

ntnn reviewed Aug 18, 2025

View reviewed changes

mjudeikis mentioned this pull request Aug 18, 2025

Part3: Make backend universal and support kcp kube-bind/kube-bind#276

Draft

review nits

49ae642

		p.aware = aware

		if err := p.aware.Engage(ctx, p.name, p.cl); err != nil {

✨ Extend interface with runner #62

Are you sure you want to change the base?

✨ Extend interface with runner #62

Conversation

mjudeikis commented Aug 13, 2025

Uh oh!

k8s-ci-robot commented Aug 13, 2025

Uh oh!

FourFifthsCode commented Aug 13, 2025

Uh oh!

mjudeikis commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FourFifthsCode commented Aug 14, 2025

Uh oh!

mjudeikis commented Aug 14, 2025

Uh oh!

ntnn commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

StartWithManager

Uh oh!

FourFifthsCode commented Aug 15, 2025

Uh oh!

ntnn commented Aug 15, 2025

Uh oh!

corentone commented Aug 16, 2025

Uh oh!

ntnn commented Aug 17, 2025

Uh oh!

mjudeikis commented Aug 18, 2025

Uh oh!

ntnn commented Aug 18, 2025

Uh oh!

mjudeikis commented Aug 18, 2025

Uh oh!

corentone commented Aug 18, 2025

Uh oh!

ntnn commented Aug 18, 2025

Uh oh!

mjudeikis commented Aug 18, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ntnn Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

FourFifthsCode commented Aug 18, 2025

Uh oh!

mjudeikis commented Aug 18, 2025

Uh oh!

Uh oh!

mjudeikis commented Aug 14, 2025 •

edited

Loading

ntnn commented Aug 14, 2025 •

edited

Loading

ntnn Aug 18, 2025 •

edited

Loading