Skip to content

✨ Extend interface with runner #62

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

mjudeikis
Copy link
Contributor

When creating vendor-agnostic providers, start by establishing your provider alongside all applications. This means you need to wire in every supported provider into the depths of the code, as the main interface does, to support Run.

The majority of providers will be runnable, so I think it makes sense to make it a permanent member of the interface. But SetupWithManager - no. And it can be done way earlier in the lifecycle of the program.

Now, if you pass in the code mcmanager.Provider, you can't start it, as the method is not exported.
So you end up with:

type struct {
  providerX
  providerY
  providerZ

vs just having a single interface to back any implementation.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mjudeikis

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 13, 2025
@k8s-ci-robot k8s-ci-robot requested review from embik and sttts August 13, 2025 16:28
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 13, 2025
@mjudeikis mjudeikis changed the title Extend interface with runner ✨ Extend interface with runner Aug 13, 2025
@FourFifthsCode
Copy link
Contributor

I was also thinking extending the provider interface for something like this makes sense. I wonder if it would helpful to have the multicluster manager's existing start function also call the provider's Run func before it sets everything up, and then maybe SetupWithMgr logic could fit somewhere in there too.

@mjudeikis
Copy link
Contributor Author

mjudeikis commented Aug 14, 2025

I was also thinking extending the provider interface for something like this makes sense. I wonder if it would helpful to have the multicluster manager's existing start function also call the provider's Run func before it sets everything up, and then maybe SetupWithMgr logic could fit somewhere in there too.

@embik and I been discussing this async. My take was that one might want to control lifecycle of those. I personally find these things left flexible.

And result is still the same, as manager inside operates on mcmanager.Provider interface so Run method needs to be exposed. So now you get to the situation of "who is responsible for starting it, author of manager". And when not sure - in my head - leave it manual.

Like:

func (p *scopedManager) Start(ctx context.Context) error {
	if p.Manager.GetProvider() != nil {
		if err := p.Manager.GetProvider().Run(ctx); err != nil {
			return fmt.Errorf("failed to run provider: %w", err)
		}
	}

	return p.Manager.GetLocalManager().Start(ctx)
}

means you still need interface exposed.

@FourFifthsCode
Copy link
Contributor

So now you get to the situation of "who is responsible for starting it, author of manager". And when not sure - in my head - leave it manual.

Yeah, the "who is responsible" is definitely tricky. On the other side of things it can be easy to forget to start the provider (which happened to me at one point 😅 )

What would you think of renaming Run() on the provider to something like StartupHook() and have it return a func? Maybe then we get the best of both worlds, a managed startup and clearer responsibility.

func (p *scopedManager) Start(ctx context.Context) error {
     prov :=  p.Manager.GetProvider() 
     if prov != nil && prov.StartupHook != nil {
          if err := prov.StartupHook(ctx); err != nil {
               return fmt.Errorf("failed to run provider: %w", err)
           }
     }

     return p.Manager.GetLocalManager().Start(ctx)
}

@mjudeikis
Copy link
Contributor Author

So now you get to the situation of "who is responsible for starting it, author of manager". And when not sure - in my head - leave it manual.

Yeah, the "who is responsible" is definitely tricky. On the other side of things it can be easy to forget to start the provider (which happened to me at one point 😅 )

What would you think of renaming Run() on the provider to something like StartupHook() and have it return a func? Maybe then we get the best of both worlds, a managed startup and clearer responsibility.

func (p *scopedManager) Start(ctx context.Context) error {
     prov :=  p.Manager.GetProvider() 
     if prov != nil && prov.StartupHook != nil {
          if err := prov.StartupHook(ctx); err != nil {
               return fmt.Errorf("failed to run provider: %w", err)
           }
     }

     return p.Manager.GetLocalManager().Start(ctx)
}

I think I would be ok. In my current POC I kinda building towards idea that we would need SetupWithManager in the interface too :/ Still shuffling code around. Let me sit on this for few more days as Im iterating on this as I go.

@ntnn
Copy link
Contributor

ntnn commented Aug 14, 2025

IMHO it would make some sense for the manager to start the provider.
The provider provides clusters for the manager and its reconcilers, so it would make sense for the manager to start the provider when the manager is ready to accept/engage the clusters.

So at the moment it will always end in this setup:

provider := myprovider.New()
mgr, _ := mcmanager.New(ctrl.GetConfigOrDie(), provider, mcmanager.Options{})
provider.SetupWithManager(mgr) // Or something else to set the manager
go mgr.Start(ctx)
go provider.Run(ctx)
// wait

Which could instead be:

provider := myprovider.New()
mgr, _ := mcmanager.New(ctrl.GetConfigOrDie(), provider, mcmanager.Options{}) // manager sets itself on the provider
mgr.Start(ctx) // manager starts the provider

When I read the interfaces for the first time I found it odd that the provider interface had no Run method and that the provider has to be started manually.

However I think that was by design as it might be too early to decide the lifecycle of both the provider and manager, as that could stifle design choices.

If the lifecycle is less flexible that could result in some designs not being possible.
E.g. the multi provider would allow adding and removing providers at runtime: #56
While this would still be possible (or even easier) with this change future design decisions based on the premise that the manager manages the provider lifecycle could make some provider designs not possible.

Then again - the lifecycles are already linked somewhat, so it would make sense to include that in the design.

StartWithManager

If the lifecycle is linked like this I'd prefer Provider.StartWithManager(context.Context, mcmanager.Manager) instead of .SetupWithManager(mcmanager.Manager) and .Start(context.Context).
The method starting the provider would have to check for the manager not being nil anyhow - regardless of where it is coming from:

func (p *Provider) SetupWithManager(mgr mcmanager.Manager) error {
	if mgr == nil {
		return ErrMgrNil
	}
	p.mgr = mgr
}

func (p *Provider) Start(ctx context.Context) error {
	if p.mgr == nil {
		return ErrMgrNil
	}
	// ...
}

or:

func (p *Provider) StartWithManager(ctx context.Context, mgr mcmanager.Manager) error {
	if mgr == nil {
		return ErrMgrNil
	}
	p.mgr = mgr
	// ...
}

And providers that would also work without a mgr wouldn't care either way.

@FourFifthsCode
Copy link
Contributor

So if I understand correctly, you are suggesting using provider.StartWithManager instead of provider.Run?
And then the manager calls that func before it starts?
If so, that sounds good to me!

@ntnn
Copy link
Contributor

ntnn commented Aug 15, 2025

Exactly, looking at some sample code the SetupWithManager and Run separately doesn't feel right and overall I much prefer the story of "the manager is started and then starts the provider when it is ready" over the developer having knowledge of the inner workings of the provider they are using to know when the provider should be started.

@corentone
Copy link

I have a hard time following everyone's position to be honest.

I can see two cases (which I think map to what you're suggesting, but we stand a bit in a middle today?)

  1. The provider is a controller (recommended?) and therefore should have similar method/interfaces as a controller. It would be started by the manager and be a runnable. Being a controller, it can benefit from leveraging things the manager offer (clients, logging, etc)
  2. The provider is independent from the manager runtime-wise, and only calls Engage on the manager (should there be more calls?).

I think 1 makes it easier for lifecycle and requires less provider-specific management. 2 gives a lot of control and indepedence from the rest of the controllers (could the controller bog down the manager and therefore block the provider from doing its job -- maybe there could be a small chance of deadlock?), but I'm not sure the added complexity is worth it.

In conclusion, if we go for 1, the only "external" requirement is for the provider to be a runnable for the manager to manage.

@ntnn
Copy link
Contributor

ntnn commented Aug 17, 2025

I can see two cases (which I think map to what you're suggesting, but we stand a bit in a middle today?)

1. The provider is a controller (recommended?) and therefore should have similar method/interfaces as a controller. It would be started by the manager and be a runnable. Being a controller, it can benefit from leveraging things the manager offer (clients, logging, etc)

2. The provider is independent from the manager runtime-wise, and only calls Engage on the manager (should there be more calls?).

We don't stand in the middle - the status quo is 2: The provider is entirely independent from the manager.
So the user has to manage the lifecycle of both and needs to know in which order to start them.

That has some problems, e.g. the user needs to know when the provider expects to be run. Before the manager? After the manger? When should the manager be set on the provider? Before or after starting the manager?

E.g. in the Engage method the runnables are enganging the cluster:

func (m *mcManager) Engage(ctx context.Context, name string, cl cluster.Cluster) error {
ctx, cancel := context.WithCancel(ctx) //nolint:govet // cancel is called in the error case only.
for _, r := range m.mcRunnables {
if err := r.Engage(ctx, name, cl); err != nil {
cancel()
return fmt.Errorf("failed to engage cluster %q: %w", name, err)
}
}
return nil //nolint:govet // cancel is called in the error case only.
}

Say a provider only has the SetManager method with no further documentation aside from that it sets the manager:

prov := myprov.New(myprov.Opts{})
mgr, err := mcmanager.New(cfg, prov, mcmanager.Opts{})
prov.SetManager(mgr)
mgr.Add(myrunnable1)
mgr.Add(myrunnable2)
return mgr.Start(ctx)

In reality the .SetManager already starts working - so the runnables added later will miss the first clusters the provider provides.

In conclusion, if we go for 1, the only "external" requirement is for the provider to be a runnable for the manager to manage.

Correct - and to prevent the provider starting to provide clusters too early the manager should pass itself when running the provider.

When @mjudeikis and I were talking he also suggested that the Run/Start/... method on the provider interface might not exist due to circular dependencies, as the mcmanager.Manager already references the multicluster.Provider and multicluster.Aware.
That could be circumvented by the provider expecting multicluster.Aware as a parameter, as Provider and Aware are both in the multicluster package:

type Provider interface {
	Start(context.Context, Aware)
	Get(...)
	IndexField(...)
}

Providers that act like a controller (e.g. cluster-api or gardener) are already expecting to be passed a manager anyhow because they need their manager to target whatever cluster they pull the data from, not necessarily the cluster they are running in.
And most providers will only need a target to engage clusters with.

@mjudeikis
Copy link
Contributor Author

When @mjudeikis and I were talking he also suggested that the Run/Start/... method on the provider interface might not exist due to circular dependencies, as the mcmanager.Manager already references the multicluster.Provider and multicluster.Aware.
That could be circumvented by the provider expecting multicluster.Aware as a parameter, as Provider and Aware are both in the multicluster package:

Yeah, this might be an option.

So if I understand correctly, you are suggesting using provider.StartWithManager instead of provider.Run?

It sounds reasonable. Need to test this in the code.

So I think im leaning towards:

provider.StartWithManager 

Does anybody have anything against this?

@ntnn
Copy link
Contributor

ntnn commented Aug 18, 2025

provider.StartWithManager 

Does anybody have anything against this?

If the method takes a manager I like that.
If the method takes an Aware instead of manage it should be called StartWithAware or similar, otherwise it wouldn't be accurate.
I'd also be fine with just calling it Start or Run if it accepts an Aware. That is pretty much what the existing providers look like, except that they expect the full manager.

@mjudeikis
Copy link
Contributor Author

I think only issue could be circularity in go. Need to shuffle code and see as I dont have (and I hope I never will need to have) mental model of all packages :)

@corentone
Copy link

That has some problems, e.g. the user needs to know when the provider expects to be run. Before the manager? After the manger? When should the manager be set on the provider? Before or after starting the manager?

We're on the same page! I actually have been pushing towards making it a controller in some PRs because it became quickly clearly we needed a way to make setup and management of a provider simple.

I'm not too scared of the circular dependency. I think the provider interface could be put into a separate package alongside helper methods and both the provider implementations and manager would import it.

Aware is not too clear to me, thats recent right? What are we aware of?

I do like StartWithManager but I'd recommend keeping it the same as controllers, kubebuilder has SetupWithManager in doc which would group a New and Start together.

@ntnn
Copy link
Contributor

ntnn commented Aug 18, 2025

Aware is the interface that has the Engage method:

// Aware is an interface that can be implemented by components that
// can engage and disengage when clusters are added or removed at runtime.
type Aware interface {
// Engage gets called when the component should start operations for the given Cluster.
// The given context is tied to the Cluster's lifecycle and will be cancelled when the
// Cluster is removed or an error occurs.
//
// Implementers should return an error if they cannot start operations for the given Cluster,
// and should ensure this operation is re-entrant and non-blocking.
//
// \_________________|)____.---'--`---.____
// || \----.________.----/
// || / / `--'
// __||____/ /_
// |___ \
// `--------'
Engage(context.Context, string, cluster.Cluster) error
}

Since the Provider interface is in the same package (and even file :D) I think the least invasive change would be to expect something that satisfies the Aware interface to be passed to a Start method on the provider.

@mjudeikis
Copy link
Contributor Author

I updated the code. Moved Run to Start but I'm wondering if I should revert. I like start more, but its about delta.

Having aware as an argument works for most cases (for example, it does not work for the kubeconfig provider), but it's easy to override.

It does not solve the startup ordering problem, but I'm not sure how I feel about that either. I like explicitness.

@ntnn @embik @corentone @FourFifthsCode, what's your take on the current iteration?

Comment on lines 52 to 54
p.aware = aware

if err := p.aware.Engage(ctx, p.name, p.cl); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
p.aware = aware
if err := p.aware.Engage(ctx, p.name, p.cl); err != nil {
if err := aware.Engage(ctx, p.name, p.cl); err != nil {

Just a nit

Comment on lines 158 to 163
return fmt.Errorf("manager is nil")
}
p.mgr = mgr
p.mcmanager = mgr

// Get the local manager from the multicluster manager
localMgr := mgr.GetLocalManager()
Copy link
Contributor

@ntnn ntnn Aug 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not specifically for this change but it comes up now with the manager - wouldn't it be better to give kubeconfig a config, and the provider spins up a manager by itself, rather than insisting on the same cluster?

flowchart LR

subgraph computeCluster[Compute Cluster]
	kubeconfigMCP[MCR-based operator with Kubeconfig provider]
end
kubeconfigSecrets -.-> kubeconfigMCP

subgraph secretsCluster[Cluster with Secrets]
	kubeconfigSecrets[Kubeconfigs]
end

kubeconfigMCP --> target1
kubeconfigSecrets -.-> target1
kubeconfigMCP --> target2
kubeconfigSecrets -.-> target2

subgraph target1[Target Cluster 1]
end

subgraph target2[Target Cluster 2]
end
Loading

Then the setup would resolve itself:

cfg := ctrl.GetConfigOrDie()

kconfigProvider := kubeconfig.New(kubeconfig.Options{
	Config: cfg, // this could be the same config as for mcmanager or a config for another cluster to read the secrets from
})
mcmgr, _ := mcmanager.New(cfg, kconfigProvider, mcmanager.Options{})
return mcmger.Start(ctx)

@FourFifthsCode What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would open things up a bit! Would just need to try and make it really clear where users are getting their secrets from and hopefully make it straight forward to configure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does almost feel cleaner to have the provider start the manager, but then that might make bootstrapping more complicated on the manager side.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it would - and it would block e.g. having multiple providers in one manager

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do this in a follow-up. While this sounds right, this somehow breaks the pattern we been using. I think I want to see it in separe PR so we can discuss.

Comment on lines +131 to +133
defer p.lock.Unlock()

p.aware = aware
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
defer p.lock.Unlock()
p.aware = aware
p.aware = aware
p.lock.Unlock()

Otherwise the provider locks I think? Because the reconcile also locks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But there should not be any reconciliation until started. However, it's better to be safe than sorry.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct but the .Start is blocking until the context is done and holds the lock at the same time, so the Reconcile method will never execute because it cannot acquire the lock

@FourFifthsCode
Copy link
Contributor

I have a hard time following everyone's position to be honest.

I can see two cases (which I think map to what you're suggesting, but we stand a bit in a middle today?)

  1. The provider is a controller (recommended?) and therefore should have similar method/interfaces as a controller. It would be started by the manager and be a runnable. Being a controller, it can benefit from leveraging things the manager offer (clients, logging, etc)
  2. The provider is independent from the manager runtime-wise, and only calls Engage on the manager (should there be more calls?).

I think 1 makes it easier for lifecycle and requires less provider-specific management. 2 gives a lot of control and indepedence from the rest of the controllers (could the controller bog down the manager and therefore block the provider from doing its job -- maybe there could be a small chance of deadlock?), but I'm not sure the added complexity is worth it.

In conclusion, if we go for 1, the only "external" requirement is for the provider to be a runnable for the manager to manage.

@corentone I think you laid out the positions clearly!

  • Option 1 has the benefits of moving toward a cleaner startup and access to client, logging, recorders, etc.
  • Option 2 has the benefit of flexibility in use cases where a controller might not make sense.

I tend to lean toward option 1 myself because I view managers as a framework that helps get things bootstrapped and if I really need something custom I can just use informers directly. The Runnable interface is designed around this.

The one thing that is tricky is the circular dependency. Providers with controllers needs access to the manager, but managers need access to the clusters a provider manages 😅

Also not all providers need a controller.

@mjudeikis
Copy link
Contributor Author

I have a hard time following everyone's position to be honest.
I can see two cases (which I think map to what you're suggesting, but we stand a bit in a middle today?)

  1. The provider is a controller (recommended?) and therefore should have similar method/interfaces as a controller. It would be started by the manager and be a runnable. Being a controller, it can benefit from leveraging things the manager offer (clients, logging, etc)
  2. The provider is independent from the manager runtime-wise, and only calls Engage on the manager (should there be more calls?).

I think 1 makes it easier for lifecycle and requires less provider-specific management. 2 gives a lot of control and indepedence from the rest of the controllers (could the controller bog down the manager and therefore block the provider from doing its job -- maybe there could be a small chance of deadlock?), but I'm not sure the added complexity is worth it.
In conclusion, if we go for 1, the only "external" requirement is for the provider to be a runnable for the manager to manage.

@corentone I think you laid out the positions clearly!

  • Option 1 has the benefits of moving toward a cleaner startup and access to client, logging, recorders, etc.
  • Option 2 has the benefit of flexibility in use cases where a controller might not make sense.

I tend to lean toward option 1 myself because I view managers as a framework that helps get things bootstrapped and if I really need something custom I can just use informers directly. The Runnable interface is designed around this.

The one thing that is tricky is the circular dependency. Providers with controllers needs access to the manager, but managers need access to the clusters a provider manages 😅

Also not all providers need a controller.

I will ask differently. Does whats currently in the code would solve your issues/challenges you seen on your end or are you missing something?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants