Skip to content

Conversation

mjudeikis
Copy link
Contributor

When creating vendor-agnostic providers, start by establishing your provider alongside all applications. This means you need to wire in every supported provider into the depths of the code, as the main interface does, to support Run.

The majority of providers will be runnable, so I think it makes sense to make it a permanent member of the interface. But SetupWithManager - no. And it can be done way earlier in the lifecycle of the program.

Now, if you pass in the code mcmanager.Provider, you can't start it, as the method is not exported.
So you end up with:

type struct {
  providerX
  providerY
  providerZ

vs just having a single interface to back any implementation.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 13, 2025
@k8s-ci-robot k8s-ci-robot requested review from embik and sttts August 13, 2025 16:28
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 13, 2025
@mjudeikis mjudeikis changed the title Extend interface with runner ✨ Extend interface with runner Aug 13, 2025
@FourFifthsCode
Copy link
Contributor

I was also thinking extending the provider interface for something like this makes sense. I wonder if it would helpful to have the multicluster manager's existing start function also call the provider's Run func before it sets everything up, and then maybe SetupWithMgr logic could fit somewhere in there too.

@mjudeikis
Copy link
Contributor Author

mjudeikis commented Aug 14, 2025

I was also thinking extending the provider interface for something like this makes sense. I wonder if it would helpful to have the multicluster manager's existing start function also call the provider's Run func before it sets everything up, and then maybe SetupWithMgr logic could fit somewhere in there too.

@embik and I been discussing this async. My take was that one might want to control lifecycle of those. I personally find these things left flexible.

And result is still the same, as manager inside operates on mcmanager.Provider interface so Run method needs to be exposed. So now you get to the situation of "who is responsible for starting it, author of manager". And when not sure - in my head - leave it manual.

Like:

func (p *scopedManager) Start(ctx context.Context) error {
	if p.Manager.GetProvider() != nil {
		if err := p.Manager.GetProvider().Run(ctx); err != nil {
			return fmt.Errorf("failed to run provider: %w", err)
		}
	}

	return p.Manager.GetLocalManager().Start(ctx)
}

means you still need interface exposed.

@mjudeikis mjudeikis force-pushed the mjudeikis.runner branch 2 times, most recently from 650df41 to 93f2c78 Compare August 14, 2025 06:19
@FourFifthsCode
Copy link
Contributor

So now you get to the situation of "who is responsible for starting it, author of manager". And when not sure - in my head - leave it manual.

Yeah, the "who is responsible" is definitely tricky. On the other side of things it can be easy to forget to start the provider (which happened to me at one point 😅 )

What would you think of renaming Run() on the provider to something like StartupHook() and have it return a func? Maybe then we get the best of both worlds, a managed startup and clearer responsibility.

func (p *scopedManager) Start(ctx context.Context) error {
     prov :=  p.Manager.GetProvider() 
     if prov != nil && prov.StartupHook != nil {
          if err := prov.StartupHook(ctx); err != nil {
               return fmt.Errorf("failed to run provider: %w", err)
           }
     }

     return p.Manager.GetLocalManager().Start(ctx)
}

@mjudeikis
Copy link
Contributor Author

So now you get to the situation of "who is responsible for starting it, author of manager". And when not sure - in my head - leave it manual.

Yeah, the "who is responsible" is definitely tricky. On the other side of things it can be easy to forget to start the provider (which happened to me at one point 😅 )

What would you think of renaming Run() on the provider to something like StartupHook() and have it return a func? Maybe then we get the best of both worlds, a managed startup and clearer responsibility.

func (p *scopedManager) Start(ctx context.Context) error {
     prov :=  p.Manager.GetProvider() 
     if prov != nil && prov.StartupHook != nil {
          if err := prov.StartupHook(ctx); err != nil {
               return fmt.Errorf("failed to run provider: %w", err)
           }
     }

     return p.Manager.GetLocalManager().Start(ctx)
}

I think I would be ok. In my current POC I kinda building towards idea that we would need SetupWithManager in the interface too :/ Still shuffling code around. Let me sit on this for few more days as Im iterating on this as I go.

@ntnn
Copy link
Contributor

ntnn commented Aug 14, 2025

IMHO it would make some sense for the manager to start the provider.
The provider provides clusters for the manager and its reconcilers, so it would make sense for the manager to start the provider when the manager is ready to accept/engage the clusters.

So at the moment it will always end in this setup:

provider := myprovider.New()
mgr, _ := mcmanager.New(ctrl.GetConfigOrDie(), provider, mcmanager.Options{})
provider.SetupWithManager(mgr) // Or something else to set the manager
go mgr.Start(ctx)
go provider.Run(ctx)
// wait

Which could instead be:

provider := myprovider.New()
mgr, _ := mcmanager.New(ctrl.GetConfigOrDie(), provider, mcmanager.Options{}) // manager sets itself on the provider
mgr.Start(ctx) // manager starts the provider

When I read the interfaces for the first time I found it odd that the provider interface had no Run method and that the provider has to be started manually.

However I think that was by design as it might be too early to decide the lifecycle of both the provider and manager, as that could stifle design choices.

If the lifecycle is less flexible that could result in some designs not being possible.
E.g. the multi provider would allow adding and removing providers at runtime: #56
While this would still be possible (or even easier) with this change future design decisions based on the premise that the manager manages the provider lifecycle could make some provider designs not possible.

Then again - the lifecycles are already linked somewhat, so it would make sense to include that in the design.

StartWithManager

If the lifecycle is linked like this I'd prefer Provider.StartWithManager(context.Context, mcmanager.Manager) instead of .SetupWithManager(mcmanager.Manager) and .Start(context.Context).
The method starting the provider would have to check for the manager not being nil anyhow - regardless of where it is coming from:

func (p *Provider) SetupWithManager(mgr mcmanager.Manager) error {
	if mgr == nil {
		return ErrMgrNil
	}
	p.mgr = mgr
}

func (p *Provider) Start(ctx context.Context) error {
	if p.mgr == nil {
		return ErrMgrNil
	}
	// ...
}

or:

func (p *Provider) StartWithManager(ctx context.Context, mgr mcmanager.Manager) error {
	if mgr == nil {
		return ErrMgrNil
	}
	p.mgr = mgr
	// ...
}

And providers that would also work without a mgr wouldn't care either way.

@FourFifthsCode
Copy link
Contributor

So if I understand correctly, you are suggesting using provider.StartWithManager instead of provider.Run?
And then the manager calls that func before it starts?
If so, that sounds good to me!

@ntnn
Copy link
Contributor

ntnn commented Aug 15, 2025

Exactly, looking at some sample code the SetupWithManager and Run separately doesn't feel right and overall I much prefer the story of "the manager is started and then starts the provider when it is ready" over the developer having knowledge of the inner workings of the provider they are using to know when the provider should be started.

@corentone
Copy link

I have a hard time following everyone's position to be honest.

I can see two cases (which I think map to what you're suggesting, but we stand a bit in a middle today?)

  1. The provider is a controller (recommended?) and therefore should have similar method/interfaces as a controller. It would be started by the manager and be a runnable. Being a controller, it can benefit from leveraging things the manager offer (clients, logging, etc)
  2. The provider is independent from the manager runtime-wise, and only calls Engage on the manager (should there be more calls?).

I think 1 makes it easier for lifecycle and requires less provider-specific management. 2 gives a lot of control and indepedence from the rest of the controllers (could the controller bog down the manager and therefore block the provider from doing its job -- maybe there could be a small chance of deadlock?), but I'm not sure the added complexity is worth it.

In conclusion, if we go for 1, the only "external" requirement is for the provider to be a runnable for the manager to manage.

@ntnn
Copy link
Contributor

ntnn commented Aug 17, 2025

I can see two cases (which I think map to what you're suggesting, but we stand a bit in a middle today?)

1. The provider is a controller (recommended?) and therefore should have similar method/interfaces as a controller. It would be started by the manager and be a runnable. Being a controller, it can benefit from leveraging things the manager offer (clients, logging, etc)

2. The provider is independent from the manager runtime-wise, and only calls Engage on the manager (should there be more calls?).

We don't stand in the middle - the status quo is 2: The provider is entirely independent from the manager.
So the user has to manage the lifecycle of both and needs to know in which order to start them.

That has some problems, e.g. the user needs to know when the provider expects to be run. Before the manager? After the manger? When should the manager be set on the provider? Before or after starting the manager?

E.g. in the Engage method the runnables are enganging the cluster:

func (m *mcManager) Engage(ctx context.Context, name string, cl cluster.Cluster) error {
ctx, cancel := context.WithCancel(ctx) //nolint:govet // cancel is called in the error case only.
for _, r := range m.mcRunnables {
if err := r.Engage(ctx, name, cl); err != nil {
cancel()
return fmt.Errorf("failed to engage cluster %q: %w", name, err)
}
}
return nil //nolint:govet // cancel is called in the error case only.
}

Say a provider only has the SetManager method with no further documentation aside from that it sets the manager:

prov := myprov.New(myprov.Opts{})
mgr, err := mcmanager.New(cfg, prov, mcmanager.Opts{})
prov.SetManager(mgr)
mgr.Add(myrunnable1)
mgr.Add(myrunnable2)
return mgr.Start(ctx)

In reality the .SetManager already starts working - so the runnables added later will miss the first clusters the provider provides.

In conclusion, if we go for 1, the only "external" requirement is for the provider to be a runnable for the manager to manage.

Correct - and to prevent the provider starting to provide clusters too early the manager should pass itself when running the provider.

When @mjudeikis and I were talking he also suggested that the Run/Start/... method on the provider interface might not exist due to circular dependencies, as the mcmanager.Manager already references the multicluster.Provider and multicluster.Aware.
That could be circumvented by the provider expecting multicluster.Aware as a parameter, as Provider and Aware are both in the multicluster package:

type Provider interface {
	Start(context.Context, Aware)
	Get(...)
	IndexField(...)
}

Providers that act like a controller (e.g. cluster-api or gardener) are already expecting to be passed a manager anyhow because they need their manager to target whatever cluster they pull the data from, not necessarily the cluster they are running in.
And most providers will only need a target to engage clusters with.

@mjudeikis
Copy link
Contributor Author

When @mjudeikis and I were talking he also suggested that the Run/Start/... method on the provider interface might not exist due to circular dependencies, as the mcmanager.Manager already references the multicluster.Provider and multicluster.Aware.
That could be circumvented by the provider expecting multicluster.Aware as a parameter, as Provider and Aware are both in the multicluster package:

Yeah, this might be an option.

So if I understand correctly, you are suggesting using provider.StartWithManager instead of provider.Run?

It sounds reasonable. Need to test this in the code.

So I think im leaning towards:

provider.StartWithManager 

Does anybody have anything against this?

@ntnn
Copy link
Contributor

ntnn commented Aug 18, 2025

provider.StartWithManager 

Does anybody have anything against this?

If the method takes a manager I like that.
If the method takes an Aware instead of manage it should be called StartWithAware or similar, otherwise it wouldn't be accurate.
I'd also be fine with just calling it Start or Run if it accepts an Aware. That is pretty much what the existing providers look like, except that they expect the full manager.

@mjudeikis
Copy link
Contributor Author

I think only issue could be circularity in go. Need to shuffle code and see as I dont have (and I hope I never will need to have) mental model of all packages :)

@corentone
Copy link

That has some problems, e.g. the user needs to know when the provider expects to be run. Before the manager? After the manger? When should the manager be set on the provider? Before or after starting the manager?

We're on the same page! I actually have been pushing towards making it a controller in some PRs because it became quickly clearly we needed a way to make setup and management of a provider simple.

I'm not too scared of the circular dependency. I think the provider interface could be put into a separate package alongside helper methods and both the provider implementations and manager would import it.

Aware is not too clear to me, thats recent right? What are we aware of?

I do like StartWithManager but I'd recommend keeping it the same as controllers, kubebuilder has SetupWithManager in doc which would group a New and Start together.

@ntnn
Copy link
Contributor

ntnn commented Aug 18, 2025

Aware is the interface that has the Engage method:

// Aware is an interface that can be implemented by components that
// can engage and disengage when clusters are added or removed at runtime.
type Aware interface {
// Engage gets called when the component should start operations for the given Cluster.
// The given context is tied to the Cluster's lifecycle and will be cancelled when the
// Cluster is removed or an error occurs.
//
// Implementers should return an error if they cannot start operations for the given Cluster,
// and should ensure this operation is re-entrant and non-blocking.
//
// \_________________|)____.---'--`---.____
// || \----.________.----/
// || / / `--'
// __||____/ /_
// |___ \
// `--------'
Engage(context.Context, string, cluster.Cluster) error
}

Since the Provider interface is in the same package (and even file :D) I think the least invasive change would be to expect something that satisfies the Aware interface to be passed to a Start method on the provider.

@mjudeikis
Copy link
Contributor Author

I updated the code. Moved Run to Start but I'm wondering if I should revert. I like start more, but its about delta.

Having aware as an argument works for most cases (for example, it does not work for the kubeconfig provider), but it's easy to override.

It does not solve the startup ordering problem, but I'm not sure how I feel about that either. I like explicitness.

@ntnn @embik @corentone @FourFifthsCode, what's your take on the current iteration?

@FourFifthsCode
Copy link
Contributor

I have a hard time following everyone's position to be honest.

I can see two cases (which I think map to what you're suggesting, but we stand a bit in a middle today?)

  1. The provider is a controller (recommended?) and therefore should have similar method/interfaces as a controller. It would be started by the manager and be a runnable. Being a controller, it can benefit from leveraging things the manager offer (clients, logging, etc)
  2. The provider is independent from the manager runtime-wise, and only calls Engage on the manager (should there be more calls?).

I think 1 makes it easier for lifecycle and requires less provider-specific management. 2 gives a lot of control and indepedence from the rest of the controllers (could the controller bog down the manager and therefore block the provider from doing its job -- maybe there could be a small chance of deadlock?), but I'm not sure the added complexity is worth it.

In conclusion, if we go for 1, the only "external" requirement is for the provider to be a runnable for the manager to manage.

@corentone I think you laid out the positions clearly!

  • Option 1 has the benefits of moving toward a cleaner startup and access to client, logging, recorders, etc.
  • Option 2 has the benefit of flexibility in use cases where a controller might not make sense.

I tend to lean toward option 1 myself because I view managers as a framework that helps get things bootstrapped and if I really need something custom I can just use informers directly. The Runnable interface is designed around this.

The one thing that is tricky is the circular dependency. Providers with controllers needs access to the manager, but managers need access to the clusters a provider manages 😅

Also not all providers need a controller.

@mjudeikis
Copy link
Contributor Author

I have a hard time following everyone's position to be honest.
I can see two cases (which I think map to what you're suggesting, but we stand a bit in a middle today?)

  1. The provider is a controller (recommended?) and therefore should have similar method/interfaces as a controller. It would be started by the manager and be a runnable. Being a controller, it can benefit from leveraging things the manager offer (clients, logging, etc)
  2. The provider is independent from the manager runtime-wise, and only calls Engage on the manager (should there be more calls?).

I think 1 makes it easier for lifecycle and requires less provider-specific management. 2 gives a lot of control and indepedence from the rest of the controllers (could the controller bog down the manager and therefore block the provider from doing its job -- maybe there could be a small chance of deadlock?), but I'm not sure the added complexity is worth it.
In conclusion, if we go for 1, the only "external" requirement is for the provider to be a runnable for the manager to manage.

@corentone I think you laid out the positions clearly!

  • Option 1 has the benefits of moving toward a cleaner startup and access to client, logging, recorders, etc.
  • Option 2 has the benefit of flexibility in use cases where a controller might not make sense.

I tend to lean toward option 1 myself because I view managers as a framework that helps get things bootstrapped and if I really need something custom I can just use informers directly. The Runnable interface is designed around this.

The one thing that is tricky is the circular dependency. Providers with controllers needs access to the manager, but managers need access to the clusters a provider manages 😅

Also not all providers need a controller.

I will ask differently. Does whats currently in the code would solve your issues/challenges you seen on your end or are you missing something?

@FourFifthsCode
Copy link
Contributor

I will ask differently. Does whats currently in the code would solve your issues/challenges you seen on your end or are you missing something?

So if we went with manager starting the provider, we could remove the need for manually starting the provider in example's main.go:

g.Go(func() error {
	return ignoreCanceled(provider.Start(ctx, mgr))
})

The tricky bit though I just realized is that we are pulling in the external controller runtime manager and calling its Start() method, which means we don't have direct access to call our provider's Start method inside of it:
https://github.com/kubernetes-sigs/multicluster-runtime/blob/main/pkg/manager/manager.go#L128

One thing we could do is maybe make the provider satisfy the single controller runtime Runnable interface and then add it during the New() func in the multi-cluster manager:
https://github.com/kubernetes-sigs/multicluster-runtime/blob/main/pkg/manager/manager.go#L137

func New(config *rest.Config, provider multicluster.Provider, opts manager.Options) (Manager, error) {
	mgr, err := manager.New(config, opts)
	if err != nil {
		return nil, err
	}
    mgr.Add(provider) // Add this - provider would satisfy the Runnable interface

	return WithMultiCluster(mgr, provider)
}

@ntnn
Copy link
Contributor

ntnn commented Aug 18, 2025

The tricky bit though I just realized is that we are pulling in the external controller runtime manager and calling its Start() method, which means we don't have direct access to call our provider's Start method inside of it

Well, we can overwrite the .Start method and call the underlying managers' .Start when needed.

One thing we could do is maybe make the provider satisfy the single controller runtime Runnable interface and then add it during the New() func in the multi-cluster manager:

I like the idea of making the providers Runnable and letting the underlying manager take care of this - however then we'd be at step one with the ouroboros that the manager needs a provider and the provider needs a manager and we'd again need something like SetupWithManager.

Provider.Start(context.Context, mcmanager.Manager) still feels like the best approach to me.

But we can let the controller-runtime manager take care of running the provider with a RunnableFunc:

func (m *mcManager) Start(ctx context.Context) error {
	m.Add(func(ctx context.Context) error) { 
		return m.provider.Start(ctx, m)
	})
	return m.Manager.Start(ctx)
}

This would also have the advantage the the providers are the last thing starting after the rest of the manager components has started.

And should mcr later move to multiple providers per manager that can easily be expanded.

@@ -254,6 +254,12 @@ func (p *scopedManager) Add(r manager.Runnable) error {

// Start starts the manager.
func (p *scopedManager) Start(ctx context.Context) error {
err := p.Add(manager.RunnableFunc(func(ctx context.Context) error {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I dont think I like this :) I get this makes it easier as its always starts the provider, but I like my control :/

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The manager shouldn't add the runnable. It's the provider setup that should add the runnable.
The provider should satisfy the runnable interface (see below) and just be added during init of the provider.

IMO flow should be:

p := providerX.New(/* aware interface */ mgr)
mgr.Add(p)
mgr.Start()

or, better:

p := providerX.NewInManager(/*runner */ mgr, /*aware*/ mgr) // name of func would be kept aligned with existing
mgr.Start()

or, even better if we want to keep Aware and Runner together:

type AwareMgr interface {
  Aware
  Add(Runnable) error
}

p := providerX.NewInManager(/*runner + aware */ mgr) // name of func would be kept aligned with existing
mgr.Start()

Is there a reason for the manager to be aware of the providers?


https://github.com/kubernetes-sigs/controller-runtime/blob/28871a12e07e3bfa7dd96af49776b5b863ce24e7/pkg/manager/manager.go#L298-L303

// Runnable allows a component to be started.
// It's very important that Start blocks until
// it's done running.
type Runnable interface {
	// Start starts running the component.  The component will stop running
	// when the context is closed. Start blocks until the context is closed or
	// an error occurs.
	Start(context.Context) error
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The manager shouldn't add the runnable. It's the provider setup that should add the runnable. The provider should satisfy the runnable interface (see below) and just be added during init of the provider.

IMO flow should be:

p := providerX.New(/* aware interface */ mgr)
mgr.Add(p)
mgr.Start()

I like this lot! Feels cleaner! And this still give user control but also easy to setup

Is there a reason for the manager to be aware of the providers?

Yeah, ideally not!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just leaving a comment here, I would say that given the loops we are encountering otherwise (discussed later), I think treating the provider as a "special" singleton runnable that gets started in the manager code makes a lot of sense. I quite like @ntnn's proposal above.

@mjudeikis mjudeikis force-pushed the mjudeikis.runner branch 2 times, most recently from 8c8b82a to dfbd740 Compare August 19, 2025 08:59
@mjudeikis
Copy link
Contributor Author

So I took a step back. Lets not blow this to the moon. Solve simple problem for now.

Lets keep it simple. This adds Start method into interface with aware as argument. Which allows to Engage clusters. Smallest change possible for now.

Everything else like:

  • StartupHooks
  • AutoStart
  • Some other magic
    Lets put it on the backlog for now and not over-engineer.
    Im tempted not to overload this for now as it solved simple problem and let it sink.

In the end we can agree this is advanced controller-manager usage and I expect people know their code. Until AI takes over, they will have to :)

For anything else, if you feel your issue is not addressed - please create an issue and lets move discussions there and lets do more of "data driven development" vs "assumptions based driven development" :)

Copy link

@corentone corentone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it feels to me that the Start should not contain the aware and its the responsibility of the provider to take aware as a parameter during its setup.

The manager doesn't require to know about the provider I think, so we should be able to keep it this way.
The provider can use the manager for the following:
1/ (MUST) for the aware interface, so it can pass new clusters
2/ (OPT, but likely common) for the GetClient(), so it can read local objects
3/ (OPT, but likely common) for the Add(Runnable), so it can be started and managed as a runnable and remove burden on the user

We can figure if we want full composability (3 interfaces), medium (2 interfaces: mgr vs aware) or mandated abstraction (one interface with all).

I'm thinking either full composability or medium composability.

@@ -254,6 +254,12 @@ func (p *scopedManager) Add(r manager.Runnable) error {

// Start starts the manager.
func (p *scopedManager) Start(ctx context.Context) error {
err := p.Add(manager.RunnableFunc(func(ctx context.Context) error {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The manager shouldn't add the runnable. It's the provider setup that should add the runnable.
The provider should satisfy the runnable interface (see below) and just be added during init of the provider.

IMO flow should be:

p := providerX.New(/* aware interface */ mgr)
mgr.Add(p)
mgr.Start()

or, better:

p := providerX.NewInManager(/*runner */ mgr, /*aware*/ mgr) // name of func would be kept aligned with existing
mgr.Start()

or, even better if we want to keep Aware and Runner together:

type AwareMgr interface {
  Aware
  Add(Runnable) error
}

p := providerX.NewInManager(/*runner + aware */ mgr) // name of func would be kept aligned with existing
mgr.Start()

Is there a reason for the manager to be aware of the providers?


https://github.com/kubernetes-sigs/controller-runtime/blob/28871a12e07e3bfa7dd96af49776b5b863ce24e7/pkg/manager/manager.go#L298-L303

// Runnable allows a component to be started.
// It's very important that Start blocks until
// it's done running.
type Runnable interface {
	// Start starts running the component.  The component will stop running
	// when the context is closed. Start blocks until the context is closed or
	// an error occurs.
	Start(context.Context) error
}

// Start runs the provider. Implementation of this method should block.
// If you need to pass in manager, it is recommended to implement SetupWithManager(mgr mcmanager.Manager) error method on individual providers.
// It is not part of the provider interface because it is not required for all providers.
Start(context.Context, Aware) error

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're mixing two concepts here.

  1. The manager that is used for running things (the ACTUAL manager that deals with runnables) And to provide a kubeconfig (thought those two could also be split)
  2. The Aware part that deals with cluster manager and register.

The start part is ONLY related to runnables. The aware part is a parameter that the provider should hold, provided during setup?.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what do you suggest here? I dont see mixing here, but I been soaking in this for way too long :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a mixing either.
The docstring specifies that if the full manager is needed that the provider should implement a different method. And I don't think that there will be many providers that actually need a reference to the full manager.

It only takes an Aware because it should use that Aware to engage clusters.

I'd instead argue that this will encourage cleaner design because provider developers will rather ask for a kubeconfig in their setup (either as parameters to their New or in their Options...) - which may be the local cluster config but could also be something else - rather than expecting the manager.

@@ -126,14 +125,14 @@ func (p *Provider) Get(_ context.Context, clusterName string) (cluster.Cluster,
return nil, multicluster.ErrClusterNotFound
}

// Run starts the provider and blocks.
func (p *Provider) Run(ctx context.Context, mgr mcmanager.Manager) error {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we mark it deprecated instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to?
First, its replaced with very similar method with same functionality
Second, go is good to catch these at compile time so people should just make a change.

Its not like we deprecating a functionality and giving people to find different way of doing things.

@ntnn
Copy link
Contributor

ntnn commented Aug 19, 2025

it feels to me that the Start should not contain the aware and its the responsibility of the provider to take aware as a parameter during its setup.

That comes back to the problem that providers can start providing clusters before they should.

The intent in the .Start taking the Aware is that this is explicitly the object that should be used to Engage clusters.

That we could solve the lifecycle problem at the same time that a provider could be started too early was an off idea, but I agree with @mjudeikis that we should continue discussing that in another ticket.

The manager doesn't require to know about the provider I think, so we should be able to keep it this way.

The manager has to know about the provider - otherwise the manager cannot get clusters or index fields.

@mjudeikis
Copy link
Contributor Author

@corentone I just reverted the change for now. I feel there is scope creating happening :) Lets keep it simple for now:

  1. There is way to start provider via interface
  2. People to the starting :)

@mjudeikis
Copy link
Contributor Author

@corentone

The manager doesn't require to know about the provider I think, so we should be able to keep it this way.

I think this is false statement to start with. Manager need to know about provider. It needs to pass in cluster lifecycle (Engage) into the provider. It by desing. So I dont think your suggestions quite work. Or I dont understand what you are suggesting.

If you feel strongly about this, can you propose a change. Not pseudo code, as most pseudo code we have in this PR conversations does not compile, does not work or is wrong :/

@corentone
Copy link

@corentone

The manager doesn't require to know about the provider I think, so we should be able to keep it this way.

I think this is false statement to start with. Manager need to know about provider. It needs to pass in cluster lifecycle (Engage) into the provider. It by desing. So I dont think your suggestions quite work. Or I dont understand what you are suggesting.

I think I see where the confusion is on my side. the Manager currently has a GetProvider I didn't realize was there. Hence the manager needs a pointer to the provider and the provider needs a pointer to the manager to call Engage.

I found out why we had the GetProvider in the manager, which is to decide if we are single or multicluster. All usages I found were to check if the provider was nil or not, it never used it.
Is that the expectation that the controllers or other manager consumers could run GetProvider and leverage it for something? I'm not sure what they would do with it, but I could be wrong. Would you have an opinion/examples on that part?

I think otherwise my suggestions only hold if the manager is not aware of the providers at all and just gets called with Engage() when other clusters are there.

If you feel strongly about this, can you propose a change. Not pseudo code, as most pseudo code we have in this PR conversations does not compile, does not work or is wrong :/

This is a valid ask and I'll only be able to build it out next week. I'm okay being ignored if a majority agree on a design.

If we keep the manager aware of the provider(s), we should clarify:

  • what is the manager expected to be able to do with the provider and what it requires from it. (It would likely be a Start and a SetAware method? -- which was combined by the proposed Start I think?
  • Make sure we avoid circular dependencies.
  • decide if the manager should be able to admit multiple providers or if we want to use the provider that contains multiple provider pattern.

Couple questions before I hibernate until I can code it :)

  • Are we agreeing that we want the Manager to run the providers as runnables (to facilitate with runtime?). Or do we want to make it optional?
  • Could the provider be considered a main-cluster controller (and hence logging and using the client provided by the manager) ? (That would mean that the provider has a pointer to the manager as a manager.
  • Is the Engage() method protected on manager startup? Or is it safe to call before startup? Are we safe on that side?

Sorry for being high level, hope that helps and not add confusion!

@FourFifthsCode
Copy link
Contributor

@corentone

The manager doesn't require to know about the provider I think, so we should be able to keep it this way.

I think this is false statement to start with. Manager need to know about provider. It needs to pass in cluster lifecycle (Engage) into the provider. It by desing. So I dont think your suggestions quite work. Or I dont understand what you are suggesting.

I think I see where the confusion is on my side. the Manager currently has a GetProvider I didn't realize was there. Hence the manager needs a pointer to the provider and the provider needs a pointer to the manager to call Engage.

I found out why we had the GetProvider in the manager, which is to decide if we are single or multicluster. All usages I found were to check if the provider was nil or not, it never used it. Is that the expectation that the controllers or other manager consumers could run GetProvider and leverage it for something? I'm not sure what they would do with it, but I could be wrong. Would you have an opinion/examples on that part?

I think otherwise my suggestions only hold if the manager is not aware of the providers at all and just gets called with Engage() when other clusters are there.

If you feel strongly about this, can you propose a change. Not pseudo code, as most pseudo code we have in this PR conversations does not compile, does not work or is wrong :/

This is a valid ask and I'll only be able to build it out next week. I'm okay being ignored if a majority agree on a design.

If we keep the manager aware of the provider(s), we should clarify:

  • what is the manager expected to be able to do with the provider and what it requires from it. (It would likely be a Start and a SetAware method? -- which was combined by the proposed Start I think?
  • Make sure we avoid circular dependencies.
  • decide if the manager should be able to admit multiple providers or if we want to use the provider that contains multiple provider pattern.

Couple questions before I hibernate until I can code it :)

  • Are we agreeing that we want the Manager to run the providers as runnables (to facilitate with runtime?). Or do we want to make it optional?
  • Could the provider be considered a main-cluster controller (and hence logging and using the client provided by the manager) ? (That would mean that the provider has a pointer to the manager as a manager.
  • Is the Engage() method protected on manager startup? Or is it safe to call before startup? Are we safe on that side?

Sorry for being high level, hope that helps and not add confusion!

I think this gets to the core of the issue. When working on the kubeconfig provider, there felt like a good bit of boiler plate that could be common across providers and a few gotchas in regard to startup and circular dependencies, that I think its worthwhile to think about the interfaces and some tweaks could improve the experience. As well as some possible shared code.

@mjudeikis sorry for blowing this up when you're wanting to make a simple change, I just thought this would be a good opportunity to address this since it deals with some of the heart of the problem space. I'm good with a follow up issue to continue discussion as well. And that could also address some of the larger scope of the interfaces themselves, however, it does seem like that could impact the changes in this PR in the long run.

@corentone some examples on possible new designs of manager/provider interfaces and responsibility I think would be very interesting!

@mjudeikis
Copy link
Contributor Author

mjudeikis commented Aug 19, 2025

  • what is the manager expected to be able to do with the provider and what it requires from it. (It would likely be a Start and a SetAware method? -- which was combined by the proposed Start I think?

So the Provider itself now does only Get cluster IndexField and with this PR Start. And Start contains Aware so one can Engage cluster if its one of the patterns (like Kind or Namespaces or off-tree kcp provider does).

  • Make sure we avoid circular dependencies.

Not an issue with current version.

  • decide if the manager should be able to admit multiple providers or if we want to use the provider that contains multiple provider patterns.

I dont think we can answer this question as of yet. @ntnn is experimenting here #54 with later one first one here #56

Which I think is out of scope for this PR/issue. And we should move this to those 2 prs above ^

  • Are we agreeing that we want the Manager to run the providers as runnables (to facilitate with runtime?). Or do we want to make it optional?

**** I think this is a core question here ****
I would take a pragmatic view - The Manager is not running provider(s) for now. Why? It's easier to add it later than remove it. Once we really see "starting providers is an issue", we can think of it. Once people start using "autostart" on "manager starting things", we can't take it away as it will introduce silent failures. While we can make "user-initiated start a noop" and pick up the management lifecycle of it. I feel we don't have enough adoption and signal for this apart "It's nice to have". Essentially, one of these is a "one-way door", while the other is a "two-way door". and I picked later one until we will know we will not want to get back :)

  • Could the provider be considered a main-cluster controller (and hence logging and using the client provided by the manager) ? (That would mean that the provider has a pointer to the manager as a manager.

I don't see a reason why one could not build a singleton provider for this. I don't quite see the reason for it - but why not? :D
It would be same as now having provider = nil.

  • Is the Engage() method protected on manager startup? Or is it safe to call before startup? Are we safe on that side?

Engage itself is safe (has lock inside). The problem is higher level tracking, which is provider authors responsibility (most of the change requests on this PR was from @ntnn on this :D )

Sorry for being high level, hope that helps and not add confusion!

@mjudeikis
Copy link
Contributor Author

I'm good with a follow up issue to continue discussion as well. And that could also address some of the larger scope of the interfaces themselves, however, it does seem like that could impact the changes in this PR in the long run.

Lets do this. Once this merges, I will send my AI workers to summarise this and start a new discussion/issue. Maybe even google doc as it feels like big pie to eat in one go :)

@k8s-ci-robot
Copy link
Contributor

@ntnn: changing LGTM is restricted to collaborators

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mjudeikis, ntnn

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@embik
Copy link
Member

embik commented Sep 1, 2025

Leaving a comment here to sum up my thoughts:

Current code shows IMHO that the manager should start the provider as a runnable, lifecycling the go routines involved is a bit of a hassle right now. Doesn't produce clean code. I liked @ntnn's suggestion half-way through this discussion, I think it's the right approach because manager start is the only place where we know that the manager is up and running and it has access to the Engage method that our provider needs.

This is a good change for 0.22.0, perhaps we could cut 0.22.0-beta.0 with this feature (provider lifecycle being part of the manager) in place.

With that being said, this has been a long discussion and this PR is a good building block, so I will approve once I got the clear from @corentone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants