✨ Extend interface with runner #62

mjudeikis · 2025-08-13T16:27:54Z

When creating vendor-agnostic providers, start by establishing your provider alongside all applications. This means you need to wire in every supported provider into the depths of the code, as the main interface does, to support Run.

The majority of providers will be runnable, so I think it makes sense to make it a permanent member of the interface. But SetupWithManager - no. And it can be done way earlier in the lifecycle of the program.

Now, if you pass in the code mcmanager.Provider, you can't start it, as the method is not exported.
So you end up with:

type struct {
  providerX
  providerY
  providerZ

vs just having a single interface to back any implementation.

FourFifthsCode · 2025-08-13T20:31:27Z

I was also thinking extending the provider interface for something like this makes sense. I wonder if it would helpful to have the multicluster manager's existing start function also call the provider's Run func before it sets everything up, and then maybe SetupWithMgr logic could fit somewhere in there too.

mjudeikis · 2025-08-14T06:04:04Z

I was also thinking extending the provider interface for something like this makes sense. I wonder if it would helpful to have the multicluster manager's existing start function also call the provider's Run func before it sets everything up, and then maybe SetupWithMgr logic could fit somewhere in there too.

@embik and I been discussing this async. My take was that one might want to control lifecycle of those. I personally find these things left flexible.

And result is still the same, as manager inside operates on mcmanager.Provider interface so Run method needs to be exposed. So now you get to the situation of "who is responsible for starting it, author of manager". And when not sure - in my head - leave it manual.

Like:

func (p *scopedManager) Start(ctx context.Context) error {
	if p.Manager.GetProvider() != nil {
		if err := p.Manager.GetProvider().Run(ctx); err != nil {
			return fmt.Errorf("failed to run provider: %w", err)
		}
	}

	return p.Manager.GetLocalManager().Start(ctx)
}

means you still need interface exposed.

FourFifthsCode · 2025-08-14T13:13:54Z

So now you get to the situation of "who is responsible for starting it, author of manager". And when not sure - in my head - leave it manual.

Yeah, the "who is responsible" is definitely tricky. On the other side of things it can be easy to forget to start the provider (which happened to me at one point 😅 )

What would you think of renaming Run() on the provider to something like StartupHook() and have it return a func? Maybe then we get the best of both worlds, a managed startup and clearer responsibility.

func (p *scopedManager) Start(ctx context.Context) error {
     prov :=  p.Manager.GetProvider() 
     if prov != nil && prov.StartupHook != nil {
          if err := prov.StartupHook(ctx); err != nil {
               return fmt.Errorf("failed to run provider: %w", err)
           }
     }

     return p.Manager.GetLocalManager().Start(ctx)
}

mjudeikis · 2025-08-14T13:57:09Z

So now you get to the situation of "who is responsible for starting it, author of manager". And when not sure - in my head - leave it manual.

Yeah, the "who is responsible" is definitely tricky. On the other side of things it can be easy to forget to start the provider (which happened to me at one point 😅 )

What would you think of renaming Run() on the provider to something like StartupHook() and have it return a func? Maybe then we get the best of both worlds, a managed startup and clearer responsibility.
func (p *scopedManager) Start(ctx context.Context) error {
     prov :=  p.Manager.GetProvider() 
     if prov != nil && prov.StartupHook != nil {
          if err := prov.StartupHook(ctx); err != nil {
               return fmt.Errorf("failed to run provider: %w", err)
           }
     }

     return p.Manager.GetLocalManager().Start(ctx)
}

I think I would be ok. In my current POC I kinda building towards idea that we would need SetupWithManager in the interface too :/ Still shuffling code around. Let me sit on this for few more days as Im iterating on this as I go.

ntnn · 2025-08-14T19:53:06Z

IMHO it would make some sense for the manager to start the provider.
The provider provides clusters for the manager and its reconcilers, so it would make sense for the manager to start the provider when the manager is ready to accept/engage the clusters.

So at the moment it will always end in this setup:

provider := myprovider.New()
mgr, _ := mcmanager.New(ctrl.GetConfigOrDie(), provider, mcmanager.Options{})
provider.SetupWithManager(mgr) // Or something else to set the manager
go mgr.Start(ctx)
go provider.Run(ctx)
// wait

Which could instead be:

provider := myprovider.New()
mgr, _ := mcmanager.New(ctrl.GetConfigOrDie(), provider, mcmanager.Options{}) // manager sets itself on the provider
mgr.Start(ctx) // manager starts the provider

When I read the interfaces for the first time I found it odd that the provider interface had no Run method and that the provider has to be started manually.

However I think that was by design as it might be too early to decide the lifecycle of both the provider and manager, as that could stifle design choices.

If the lifecycle is less flexible that could result in some designs not being possible.
E.g. the multi provider would allow adding and removing providers at runtime: #56
While this would still be possible (or even easier) with this change future design decisions based on the premise that the manager manages the provider lifecycle could make some provider designs not possible.

Then again - the lifecycles are already linked somewhat, so it would make sense to include that in the design.

StartWithManager

If the lifecycle is linked like this I'd prefer Provider.StartWithManager(context.Context, mcmanager.Manager) instead of .SetupWithManager(mcmanager.Manager) and .Start(context.Context).
The method starting the provider would have to check for the manager not being nil anyhow - regardless of where it is coming from:

func (p *Provider) SetupWithManager(mgr mcmanager.Manager) error {
	if mgr == nil {
		return ErrMgrNil
	}
	p.mgr = mgr
}

func (p *Provider) Start(ctx context.Context) error {
	if p.mgr == nil {
		return ErrMgrNil
	}
	// ...
}

or:

func (p *Provider) StartWithManager(ctx context.Context, mgr mcmanager.Manager) error {
	if mgr == nil {
		return ErrMgrNil
	}
	p.mgr = mgr
	// ...
}

And providers that would also work without a mgr wouldn't care either way.

FourFifthsCode · 2025-08-15T13:27:16Z

So if I understand correctly, you are suggesting using provider.StartWithManager instead of provider.Run?
And then the manager calls that func before it starts?
If so, that sounds good to me!

ntnn · 2025-08-15T16:41:02Z

Exactly, looking at some sample code the SetupWithManager and Run separately doesn't feel right and overall I much prefer the story of "the manager is started and then starts the provider when it is ready" over the developer having knowledge of the inner workings of the provider they are using to know when the provider should be started.

corentone · 2025-08-16T12:26:23Z

I have a hard time following everyone's position to be honest.

I can see two cases (which I think map to what you're suggesting, but we stand a bit in a middle today?)

The provider is a controller (recommended?) and therefore should have similar method/interfaces as a controller. It would be started by the manager and be a runnable. Being a controller, it can benefit from leveraging things the manager offer (clients, logging, etc)
The provider is independent from the manager runtime-wise, and only calls Engage on the manager (should there be more calls?).

I think 1 makes it easier for lifecycle and requires less provider-specific management. 2 gives a lot of control and indepedence from the rest of the controllers (could the controller bog down the manager and therefore block the provider from doing its job -- maybe there could be a small chance of deadlock?), but I'm not sure the added complexity is worth it.

In conclusion, if we go for 1, the only "external" requirement is for the provider to be a runnable for the manager to manage.

ntnn · 2025-08-17T08:02:39Z

I can see two cases (which I think map to what you're suggesting, but we stand a bit in a middle today?)

1. The provider is a controller (recommended?) and therefore should have similar method/interfaces as a controller. It would be started by the manager and be a runnable. Being a controller, it can benefit from leveraging things the manager offer (clients, logging, etc)

2. The provider is independent from the manager runtime-wise, and only calls Engage on the manager (should there be more calls?).

We don't stand in the middle - the status quo is 2: The provider is entirely independent from the manager.
So the user has to manage the lifecycle of both and needs to know in which order to start them.

That has some problems, e.g. the user needs to know when the provider expects to be run. Before the manager? After the manger? When should the manager be set on the provider? Before or after starting the manager?

E.g. in the Engage method the runnables are enganging the cluster:

multicluster-runtime/pkg/manager/manager.go

Lines 201 to 210 in a41032c

    
           func (m *mcManager) Engage(ctx context.Context, name string, cl cluster.Cluster) error { 
        
           	ctx, cancel := context.WithCancel(ctx) //nolint:govet // cancel is called in the error case only. 
        
           	for _, r := range m.mcRunnables { 
        
           		if err := r.Engage(ctx, name, cl); err != nil { 
        
           			cancel() 
        
           			return fmt.Errorf("failed to engage cluster %q: %w", name, err) 
        
           		} 
        
           	} 
        
           	return nil //nolint:govet // cancel is called in the error case only. 
        
           }

Say a provider only has the SetManager method with no further documentation aside from that it sets the manager:

prov := myprov.New(myprov.Opts{})
mgr, err := mcmanager.New(cfg, prov, mcmanager.Opts{})
prov.SetManager(mgr)
mgr.Add(myrunnable1)
mgr.Add(myrunnable2)
return mgr.Start(ctx)

In reality the .SetManager already starts working - so the runnables added later will miss the first clusters the provider provides.

In conclusion, if we go for 1, the only "external" requirement is for the provider to be a runnable for the manager to manage.

Correct - and to prevent the provider starting to provide clusters too early the manager should pass itself when running the provider.

When @mjudeikis and I were talking he also suggested that the Run/Start/... method on the provider interface might not exist due to circular dependencies, as the mcmanager.Manager already references the multicluster.Provider and multicluster.Aware.
That could be circumvented by the provider expecting multicluster.Aware as a parameter, as Provider and Aware are both in the multicluster package:

type Provider interface {
	Start(context.Context, Aware)
	Get(...)
	IndexField(...)
}

Providers that act like a controller (e.g. cluster-api or gardener) are already expecting to be passed a manager anyhow because they need their manager to target whatever cluster they pull the data from, not necessarily the cluster they are running in.
And most providers will only need a target to engage clusters with.

mjudeikis · 2025-08-18T05:38:36Z

When @mjudeikis and I were talking he also suggested that the Run/Start/... method on the provider interface might not exist due to circular dependencies, as the mcmanager.Manager already references the multicluster.Provider and multicluster.Aware.
That could be circumvented by the provider expecting multicluster.Aware as a parameter, as Provider and Aware are both in the multicluster package:

Yeah, this might be an option.

So if I understand correctly, you are suggesting using provider.StartWithManager instead of provider.Run?

It sounds reasonable. Need to test this in the code.

So I think im leaning towards:

provider.StartWithManager

Does anybody have anything against this?

ntnn · 2025-08-18T05:51:27Z

provider.StartWithManager 
Does anybody have anything against this?

If the method takes a manager I like that.
If the method takes an Aware instead of manage it should be called StartWithAware or similar, otherwise it wouldn't be accurate.
I'd also be fine with just calling it Start or Run if it accepts an Aware. That is pretty much what the existing providers look like, except that they expect the full manager.

mjudeikis · 2025-08-18T06:30:56Z

I think only issue could be circularity in go. Need to shuffle code and see as I dont have (and I hope I never will need to have) mental model of all packages :)

corentone · 2025-08-18T08:38:24Z

That has some problems, e.g. the user needs to know when the provider expects to be run. Before the manager? After the manger? When should the manager be set on the provider? Before or after starting the manager?

We're on the same page! I actually have been pushing towards making it a controller in some PRs because it became quickly clearly we needed a way to make setup and management of a provider simple.

I'm not too scared of the circular dependency. I think the provider interface could be put into a separate package alongside helper methods and both the provider implementations and manager would import it.

Aware is not too clear to me, thats recent right? What are we aware of?

I do like StartWithManager but I'd recommend keeping it the same as controllers, kubebuilder has SetupWithManager in doc which would group a New and Start together.

ntnn · 2025-08-18T08:43:59Z

Aware is the interface that has the Engage method:

multicluster-runtime/pkg/multicluster/multicluster.go

Lines 26 to 43 in 00d670c

    
           // Aware is an interface that can be implemented by components that 
        
           // can engage and disengage when clusters are added or removed at runtime. 
        
           type Aware interface { 
        
           	// Engage gets called when the component should start operations for the given Cluster. 
        
           	// The given context is tied to the Cluster's lifecycle and will be cancelled when the 
        
           	// Cluster is removed or an error occurs. 
        
           	// 
        
           	// Implementers should return an error if they cannot start operations for the given Cluster, 
        
           	// and should ensure this operation is re-entrant and non-blocking. 
        
           	// 
        
           	//	\_________________|)____.---'--`---.____ 
        
           	//              ||    \----.________.----/ 
        
           	//              ||     / /    `--' 
        
           	//            __||____/ /_ 
        
           	//           |___         \ 
        
           	//               `--------' 
        
           	Engage(context.Context, string, cluster.Cluster) error 
        
           }

Since the Provider interface is in the same package (and even file :D) I think the least invasive change would be to expect something that satisfies the Aware interface to be passed to a Start method on the provider.

mjudeikis · 2025-08-18T11:52:42Z

I updated the code. Moved Run to Start but I'm wondering if I should revert. I like start more, but its about delta.

Having aware as an argument works for most cases (for example, it does not work for the kubeconfig provider), but it's easy to override.

It does not solve the startup ordering problem, but I'm not sure how I feel about that either. I like explicitness.

@ntnn @embik @corentone @FourFifthsCode, what's your take on the current iteration?

providers/single/provider.go

providers/kubeconfig/provider.go

providers/cluster-api/provider.go

FourFifthsCode · 2025-08-18T15:09:26Z

I have a hard time following everyone's position to be honest.

I can see two cases (which I think map to what you're suggesting, but we stand a bit in a middle today?)

The provider is a controller (recommended?) and therefore should have similar method/interfaces as a controller. It would be started by the manager and be a runnable. Being a controller, it can benefit from leveraging things the manager offer (clients, logging, etc)

The provider is independent from the manager runtime-wise, and only calls Engage on the manager (should there be more calls?).

I think 1 makes it easier for lifecycle and requires less provider-specific management. 2 gives a lot of control and indepedence from the rest of the controllers (could the controller bog down the manager and therefore block the provider from doing its job -- maybe there could be a small chance of deadlock?), but I'm not sure the added complexity is worth it.

In conclusion, if we go for 1, the only "external" requirement is for the provider to be a runnable for the manager to manage.

@corentone I think you laid out the positions clearly!

Option 1 has the benefits of moving toward a cleaner startup and access to client, logging, recorders, etc.
Option 2 has the benefit of flexibility in use cases where a controller might not make sense.

I tend to lean toward option 1 myself because I view managers as a framework that helps get things bootstrapped and if I really need something custom I can just use informers directly. The Runnable interface is designed around this.

The one thing that is tricky is the circular dependency. Providers with controllers needs access to the manager, but managers need access to the clusters a provider manages 😅

Also not all providers need a controller.

mjudeikis · 2025-08-18T15:14:38Z

I have a hard time following everyone's position to be honest.
I can see two cases (which I think map to what you're suggesting, but we stand a bit in a middle today?)

The provider is a controller (recommended?) and therefore should have similar method/interfaces as a controller. It would be started by the manager and be a runnable. Being a controller, it can benefit from leveraging things the manager offer (clients, logging, etc)

The provider is independent from the manager runtime-wise, and only calls Engage on the manager (should there be more calls?).

I think 1 makes it easier for lifecycle and requires less provider-specific management. 2 gives a lot of control and indepedence from the rest of the controllers (could the controller bog down the manager and therefore block the provider from doing its job -- maybe there could be a small chance of deadlock?), but I'm not sure the added complexity is worth it.
In conclusion, if we go for 1, the only "external" requirement is for the provider to be a runnable for the manager to manage.

@corentone I think you laid out the positions clearly!

Option 1 has the benefits of moving toward a cleaner startup and access to client, logging, recorders, etc.

Option 2 has the benefit of flexibility in use cases where a controller might not make sense.

I tend to lean toward option 1 myself because I view managers as a framework that helps get things bootstrapped and if I really need something custom I can just use informers directly. The Runnable interface is designed around this.

The one thing that is tricky is the circular dependency. Providers with controllers needs access to the manager, but managers need access to the clusters a provider manages 😅

Also not all providers need a controller.

I will ask differently. Does whats currently in the code would solve your issues/challenges you seen on your end or are you missing something?

FourFifthsCode · 2025-08-18T19:32:05Z

I will ask differently. Does whats currently in the code would solve your issues/challenges you seen on your end or are you missing something?

So if we went with manager starting the provider, we could remove the need for manually starting the provider in example's main.go:

g.Go(func() error {
	return ignoreCanceled(provider.Start(ctx, mgr))
})

The tricky bit though I just realized is that we are pulling in the external controller runtime manager and calling its Start() method, which means we don't have direct access to call our provider's Start method inside of it:
https://github.com/kubernetes-sigs/multicluster-runtime/blob/main/pkg/manager/manager.go#L128

One thing we could do is maybe make the provider satisfy the single controller runtime Runnable interface and then add it during the New() func in the multi-cluster manager:
https://github.com/kubernetes-sigs/multicluster-runtime/blob/main/pkg/manager/manager.go#L137

func New(config *rest.Config, provider multicluster.Provider, opts manager.Options) (Manager, error) {
	mgr, err := manager.New(config, opts)
	if err != nil {
		return nil, err
	}
    mgr.Add(provider) // Add this - provider would satisfy the Runnable interface

	return WithMultiCluster(mgr, provider)
}

ntnn · 2025-08-18T20:43:53Z

The tricky bit though I just realized is that we are pulling in the external controller runtime manager and calling its Start() method, which means we don't have direct access to call our provider's Start method inside of it

Well, we can overwrite the .Start method and call the underlying managers' .Start when needed.

One thing we could do is maybe make the provider satisfy the single controller runtime Runnable interface and then add it during the New() func in the multi-cluster manager:

I like the idea of making the providers Runnable and letting the underlying manager take care of this - however then we'd be at step one with the ouroboros that the manager needs a provider and the provider needs a manager and we'd again need something like SetupWithManager.

Provider.Start(context.Context, mcmanager.Manager) still feels like the best approach to me.

But we can let the controller-runtime manager take care of running the provider with a RunnableFunc:

func (m *mcManager) Start(ctx context.Context) error {
	m.Add(func(ctx context.Context) error) { 
		return m.provider.Start(ctx, m)
	})
	return m.Manager.Start(ctx)
}

This would also have the advantage the the providers are the last thing starting after the rest of the manager components has started.

And should mcr later move to multiple providers per manager that can easily be expanded.

mjudeikis · 2025-08-19T08:37:03Z

pkg/manager/manager.go

@@ -254,6 +254,12 @@ func (p *scopedManager) Add(r manager.Runnable) error {

 // Start starts the manager.
 func (p *scopedManager) Start(ctx context.Context) error {
+	err := p.Add(manager.RunnableFunc(func(ctx context.Context) error {


TBH I dont think I like this :) I get this makes it easier as its always starts the provider, but I like my control :/

The manager shouldn't add the runnable. It's the provider setup that should add the runnable.
The provider should satisfy the runnable interface (see below) and just be added during init of the provider.

IMO flow should be:

p := providerX.New(/* aware interface */ mgr) mgr.Add(p) mgr.Start()

or, better:

p := providerX.NewInManager(/*runner */ mgr, /*aware*/ mgr) // name of func would be kept aligned with existing mgr.Start()

or, even better if we want to keep Aware and Runner together:

type AwareMgr interface { Aware Add(Runnable) error } p := providerX.NewInManager(/*runner + aware */ mgr) // name of func would be kept aligned with existing mgr.Start()

Is there a reason for the manager to be aware of the providers?

https://github.com/kubernetes-sigs/controller-runtime/blob/28871a12e07e3bfa7dd96af49776b5b863ce24e7/pkg/manager/manager.go#L298-L303

// Runnable allows a component to be started. // It's very important that Start blocks until // it's done running. type Runnable interface { // Start starts running the component. The component will stop running // when the context is closed. Start blocks until the context is closed or // an error occurs. Start(context.Context) error }

The manager shouldn't add the runnable. It's the provider setup that should add the runnable. The provider should satisfy the runnable interface (see below) and just be added during init of the provider.

IMO flow should be:

p := providerX.New(/* aware interface */ mgr) mgr.Add(p) mgr.Start()

I like this lot! Feels cleaner! And this still give user control but also easy to setup

Is there a reason for the manager to be aware of the providers?

Yeah, ideally not!

Just leaving a comment here, I would say that given the loops we are encountering otherwise (discussed later), I think treating the provider as a "special" singleton runnable that gets started in the manager code makes a lot of sense. I quite like @ntnn's proposal above.

mjudeikis · 2025-08-19T09:04:10Z

So I took a step back. Lets not blow this to the moon. Solve simple problem for now.

Lets keep it simple. This adds Start method into interface with aware as argument. Which allows to Engage clusters. Smallest change possible for now.

Everything else like:

StartupHooks
AutoStart
Some other magic
Lets put it on the backlog for now and not over-engineer.
Im tempted not to overload this for now as it solved simple problem and let it sink.

In the end we can agree this is advanced controller-manager usage and I expect people know their code. Until AI takes over, they will have to :)

For anything else, if you feel your issue is not addressed - please create an issue and lets move discussions there and lets do more of "data driven development" vs "assumptions based driven development" :)

corentone

it feels to me that the Start should not contain the aware and its the responsibility of the provider to take aware as a parameter during its setup.

The manager doesn't require to know about the provider I think, so we should be able to keep it this way.
The provider can use the manager for the following:
1/ (MUST) for the aware interface, so it can pass new clusters
2/ (OPT, but likely common) for the GetClient(), so it can read local objects
3/ (OPT, but likely common) for the Add(Runnable), so it can be started and managed as a runnable and remove burden on the user

We can figure if we want full composability (3 interfaces), medium (2 interfaces: mgr vs aware) or mandated abstraction (one interface with all).

I'm thinking either full composability or medium composability.

corentone · 2025-08-19T08:47:02Z

pkg/manager/manager.go

@@ -254,6 +254,12 @@ func (p *scopedManager) Add(r manager.Runnable) error {

 // Start starts the manager.
 func (p *scopedManager) Start(ctx context.Context) error {
+	err := p.Add(manager.RunnableFunc(func(ctx context.Context) error {


The manager shouldn't add the runnable. It's the provider setup that should add the runnable.
The provider should satisfy the runnable interface (see below) and just be added during init of the provider.

IMO flow should be:

p := providerX.New(/* aware interface */ mgr) mgr.Add(p) mgr.Start()

or, better:

p := providerX.NewInManager(/*runner */ mgr, /*aware*/ mgr) // name of func would be kept aligned with existing mgr.Start()

or, even better if we want to keep Aware and Runner together:

type AwareMgr interface { Aware Add(Runnable) error } p := providerX.NewInManager(/*runner + aware */ mgr) // name of func would be kept aligned with existing mgr.Start()

Is there a reason for the manager to be aware of the providers?

https://github.com/kubernetes-sigs/controller-runtime/blob/28871a12e07e3bfa7dd96af49776b5b863ce24e7/pkg/manager/manager.go#L298-L303

// Runnable allows a component to be started. // It's very important that Start blocks until // it's done running. type Runnable interface { // Start starts running the component. The component will stop running // when the context is closed. Start blocks until the context is closed or // an error occurs. Start(context.Context) error }

corentone · 2025-08-19T08:48:39Z

pkg/multicluster/multicluster.go

+	// Start runs the provider. Implementation of this method should block.
+	// If you need to pass in manager, it is recommended to implement SetupWithManager(mgr mcmanager.Manager) error method on individual providers.
+	// It is not part of the provider interface because it is not required for all providers.
+	Start(context.Context, Aware) error


I think we're mixing two concepts here.

The manager that is used for running things (the ACTUAL manager that deals with runnables) And to provide a kubeconfig (thought those two could also be split)

The Aware part that deals with cluster manager and register.

The start part is ONLY related to runnables. The aware part is a parameter that the provider should hold, provided during setup?.

So what do you suggest here? I dont see mixing here, but I been soaking in this for way too long :)

I don't see a mixing either.
The docstring specifies that if the full manager is needed that the provider should implement a different method. And I don't think that there will be many providers that actually need a reference to the full manager.

It only takes an Aware because it should use that Aware to engage clusters.

I'd instead argue that this will encourage cleaner design because provider developers will rather ask for a kubeconfig in their setup (either as parameters to their New or in their Options...) - which may be the local cluster config but could also be something else - rather than expecting the manager.

providers/cluster-api/provider.go

corentone · 2025-08-19T09:01:55Z

providers/cluster-api/provider.go

@@ -126,14 +125,14 @@ func (p *Provider) Get(_ context.Context, clusterName string) (cluster.Cluster,
 	return nil, multicluster.ErrClusterNotFound
 }

-// Run starts the provider and blocks.
-func (p *Provider) Run(ctx context.Context, mgr mcmanager.Manager) error {


should we mark it deprecated instead?

Do we really need to?
First, its replaced with very similar method with same functionality
Second, go is good to catch these at compile time so people should just make a change.

Its not like we deprecating a functionality and giving people to find different way of doing things.

ntnn · 2025-08-19T09:18:03Z

it feels to me that the Start should not contain the aware and its the responsibility of the provider to take aware as a parameter during its setup.

That comes back to the problem that providers can start providing clusters before they should.

The intent in the .Start taking the Aware is that this is explicitly the object that should be used to Engage clusters.

That we could solve the lifecycle problem at the same time that a provider could be started too early was an off idea, but I agree with @mjudeikis that we should continue discussing that in another ticket.

The manager doesn't require to know about the provider I think, so we should be able to keep it this way.

The manager has to know about the provider - otherwise the manager cannot get clusters or index fields.

mjudeikis · 2025-08-19T09:29:22Z

@corentone I just reverted the change for now. I feel there is scope creating happening :) Lets keep it simple for now:

There is way to start provider via interface
People to the starting :)

providers/file/provider.go

pkg/multicluster/multicluster.go

mjudeikis · 2025-08-19T11:11:47Z

@corentone

The manager doesn't require to know about the provider I think, so we should be able to keep it this way.

I think this is false statement to start with. Manager need to know about provider. It needs to pass in cluster lifecycle (Engage) into the provider. It by desing. So I dont think your suggestions quite work. Or I dont understand what you are suggesting.

If you feel strongly about this, can you propose a change. Not pseudo code, as most pseudo code we have in this PR conversations does not compile, does not work or is wrong :/

corentone · 2025-08-19T12:21:09Z

@corentone

The manager doesn't require to know about the provider I think, so we should be able to keep it this way.

I think this is false statement to start with. Manager need to know about provider. It needs to pass in cluster lifecycle (Engage) into the provider. It by desing. So I dont think your suggestions quite work. Or I dont understand what you are suggesting.

I think I see where the confusion is on my side. the Manager currently has a GetProvider I didn't realize was there. Hence the manager needs a pointer to the provider and the provider needs a pointer to the manager to call Engage.

I found out why we had the GetProvider in the manager, which is to decide if we are single or multicluster. All usages I found were to check if the provider was nil or not, it never used it.
Is that the expectation that the controllers or other manager consumers could run GetProvider and leverage it for something? I'm not sure what they would do with it, but I could be wrong. Would you have an opinion/examples on that part?

I think otherwise my suggestions only hold if the manager is not aware of the providers at all and just gets called with Engage() when other clusters are there.

If you feel strongly about this, can you propose a change. Not pseudo code, as most pseudo code we have in this PR conversations does not compile, does not work or is wrong :/

This is a valid ask and I'll only be able to build it out next week. I'm okay being ignored if a majority agree on a design.

If we keep the manager aware of the provider(s), we should clarify:

what is the manager expected to be able to do with the provider and what it requires from it. (It would likely be a Start and a SetAware method? -- which was combined by the proposed Start I think?
Make sure we avoid circular dependencies.
decide if the manager should be able to admit multiple providers or if we want to use the provider that contains multiple provider pattern.

Couple questions before I hibernate until I can code it :)

Are we agreeing that we want the Manager to run the providers as runnables (to facilitate with runtime?). Or do we want to make it optional?
Could the provider be considered a main-cluster controller (and hence logging and using the client provided by the manager) ? (That would mean that the provider has a pointer to the manager as a manager.
Is the Engage() method protected on manager startup? Or is it safe to call before startup? Are we safe on that side?

Sorry for being high level, hope that helps and not add confusion!

FourFifthsCode · 2025-08-19T13:14:36Z

@corentone

The manager doesn't require to know about the provider I think, so we should be able to keep it this way.

I think this is false statement to start with. Manager need to know about provider. It needs to pass in cluster lifecycle (Engage) into the provider. It by desing. So I dont think your suggestions quite work. Or I dont understand what you are suggesting.

I think I see where the confusion is on my side. the Manager currently has a GetProvider I didn't realize was there. Hence the manager needs a pointer to the provider and the provider needs a pointer to the manager to call Engage.

I found out why we had the GetProvider in the manager, which is to decide if we are single or multicluster. All usages I found were to check if the provider was nil or not, it never used it. Is that the expectation that the controllers or other manager consumers could run GetProvider and leverage it for something? I'm not sure what they would do with it, but I could be wrong. Would you have an opinion/examples on that part?

I think otherwise my suggestions only hold if the manager is not aware of the providers at all and just gets called with Engage() when other clusters are there.

If you feel strongly about this, can you propose a change. Not pseudo code, as most pseudo code we have in this PR conversations does not compile, does not work or is wrong :/

This is a valid ask and I'll only be able to build it out next week. I'm okay being ignored if a majority agree on a design.

If we keep the manager aware of the provider(s), we should clarify:

what is the manager expected to be able to do with the provider and what it requires from it. (It would likely be a Start and a SetAware method? -- which was combined by the proposed Start I think?

Make sure we avoid circular dependencies.

decide if the manager should be able to admit multiple providers or if we want to use the provider that contains multiple provider pattern.

Couple questions before I hibernate until I can code it :)

Are we agreeing that we want the Manager to run the providers as runnables (to facilitate with runtime?). Or do we want to make it optional?

Could the provider be considered a main-cluster controller (and hence logging and using the client provided by the manager) ? (That would mean that the provider has a pointer to the manager as a manager.

Is the Engage() method protected on manager startup? Or is it safe to call before startup? Are we safe on that side?

Sorry for being high level, hope that helps and not add confusion!

I think this gets to the core of the issue. When working on the kubeconfig provider, there felt like a good bit of boiler plate that could be common across providers and a few gotchas in regard to startup and circular dependencies, that I think its worthwhile to think about the interfaces and some tweaks could improve the experience. As well as some possible shared code.

@mjudeikis sorry for blowing this up when you're wanting to make a simple change, I just thought this would be a good opportunity to address this since it deals with some of the heart of the problem space. I'm good with a follow up issue to continue discussion as well. And that could also address some of the larger scope of the interfaces themselves, however, it does seem like that could impact the changes in this PR in the long run.

@corentone some examples on possible new designs of manager/provider interfaces and responsibility I think would be very interesting!

mjudeikis · 2025-08-19T13:16:10Z

what is the manager expected to be able to do with the provider and what it requires from it. (It would likely be a Start and a SetAware method? -- which was combined by the proposed Start I think?

So the Provider itself now does only Get cluster IndexField and with this PR Start. And Start contains Aware so one can Engage cluster if its one of the patterns (like Kind or Namespaces or off-tree kcp provider does).

Make sure we avoid circular dependencies.

Not an issue with current version.

decide if the manager should be able to admit multiple providers or if we want to use the provider that contains multiple provider patterns.

I dont think we can answer this question as of yet. @ntnn is experimenting here #54 with later one first one here #56

Which I think is out of scope for this PR/issue. And we should move this to those 2 prs above ^

Are we agreeing that we want the Manager to run the providers as runnables (to facilitate with runtime?). Or do we want to make it optional?

**** I think this is a core question here ****
I would take a pragmatic view - The Manager is not running provider(s) for now. Why? It's easier to add it later than remove it. Once we really see "starting providers is an issue", we can think of it. Once people start using "autostart" on "manager starting things", we can't take it away as it will introduce silent failures. While we can make "user-initiated start a noop" and pick up the management lifecycle of it. I feel we don't have enough adoption and signal for this apart "It's nice to have". Essentially, one of these is a "one-way door", while the other is a "two-way door". and I picked later one until we will know we will not want to get back :)

Could the provider be considered a main-cluster controller (and hence logging and using the client provided by the manager) ? (That would mean that the provider has a pointer to the manager as a manager.

I don't see a reason why one could not build a singleton provider for this. I don't quite see the reason for it - but why not? :D
It would be same as now having provider = nil.

Is the Engage() method protected on manager startup? Or is it safe to call before startup? Are we safe on that side?

Engage itself is safe (has lock inside). The problem is higher level tracking, which is provider authors responsibility (most of the change requests on this PR was from @ntnn on this :D )

Sorry for being high level, hope that helps and not add confusion!

mjudeikis · 2025-08-19T13:17:52Z

I'm good with a follow up issue to continue discussion as well. And that could also address some of the larger scope of the interfaces themselves, however, it does seem like that could impact the changes in this PR in the long run.

Lets do this. Once this merges, I will send my AI workers to summarise this and start a new discussion/issue. Maybe even google doc as it feels like big pie to eat in one go :)

providers/namespace/provider_test.go

Co-authored-by: Nelo-T. Wallus <[email protected]>

k8s-ci-robot · 2025-08-19T14:43:09Z

@ntnn: changing LGTM is restricted to collaborators

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot · 2025-08-19T14:43:12Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mjudeikis, ntnn

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [mjudeikis]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

embik · 2025-09-01T10:55:04Z

Leaving a comment here to sum up my thoughts:

Current code shows IMHO that the manager should start the provider as a runnable, lifecycling the go routines involved is a bit of a hassle right now. Doesn't produce clean code. I liked @ntnn's suggestion half-way through this discussion, I think it's the right approach because manager start is the only place where we know that the manager is up and running and it has access to the Engage method that our provider needs.

This is a good change for 0.22.0, perhaps we could cut 0.22.0-beta.0 with this feature (provider lifecycle being part of the manager) in place.

With that being said, this has been a long discussion and this PR is a good building block, so I will approve once I got the clear from @corentone.

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 13, 2025

k8s-ci-robot requested review from embik and sttts August 13, 2025 16:28

k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 13, 2025

mjudeikis changed the title ~~Extend interface with runner~~ ✨ Extend interface with runner Aug 13, 2025

mjudeikis force-pushed the mjudeikis.runner branch 2 times, most recently from 650df41 to 93f2c78 Compare August 14, 2025 06:19

mjudeikis force-pushed the mjudeikis.runner branch from ba8166d to d8ebc45 Compare August 18, 2025 10:14

ntnn reviewed Aug 18, 2025

View reviewed changes

providers/single/provider.go Outdated Show resolved Hide resolved

ntnn reviewed Aug 18, 2025

View reviewed changes

providers/kubeconfig/provider.go Show resolved Hide resolved

ntnn reviewed Aug 18, 2025

View reviewed changes

providers/cluster-api/provider.go Outdated Show resolved Hide resolved

mjudeikis mentioned this pull request Aug 18, 2025

Part3: Make backend universal and support kcp kube-bind/kube-bind#276

Merged

mjudeikis commented Aug 19, 2025

View reviewed changes

mjudeikis force-pushed the mjudeikis.runner branch 2 times, most recently from 8c8b82a to dfbd740 Compare August 19, 2025 08:59

corentone reviewed Aug 19, 2025

View reviewed changes