Skip to content

Commit b8df7d4

Browse files
committed
OCPBUGS-36378: capi: start controllers after WaitGroup is created
Some providers like Azure require 2 controllers to run. If a controller fails to be spawned (e.g cluster-api-provider-azureaso), we were not stopping controllers that were already running (e.g. the cluster-api, cluster-api-provider-azure), resulting in leak processes even though the Installer reported it had stopped the capi system: ``` ERROR failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed to run cluster api system: failed to run controller "azureaso infrastructure provider": failed to start controller "azureaso infrastructure provider": timeout waiting for process cluster-api-provider-azureaso to start successfully (it may have failed to start, or stopped unexpectedly before becoming ready) INFO Shutting down local Cluster API control plane... INFO Local Cluster API system has completed operations ``` By just changing the order of operations to run the controller *after* the WaitGroup is created, we are able to properly shutdown all running controllers and the local control plane in case of error: ``` ERROR failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed to run cluster api system: failed to run controller "aws infrastructure provider": failed to extract provider "aws infrastructure provider": fake error INFO Shutting down local Cluster API control plane... INFO Stopped controller: Cluster API INFO Local Cluster API system has completed operations ```
1 parent 2e34347 commit b8df7d4

File tree

1 file changed

+7
-7
lines changed

1 file changed

+7
-7
lines changed

pkg/clusterapi/system.go

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -316,13 +316,6 @@ func (c *system) Run(ctx context.Context) error {
316316
// We only show controller logs if the log level is DEBUG or above
317317
c.logWriter = logrus.StandardLogger().WriterLevel(logrus.DebugLevel)
318318

319-
// Run the controllers.
320-
for _, ct := range controllers {
321-
if err := c.runController(ctx, ct); err != nil {
322-
return fmt.Errorf("failed to run controller %q: %w", ct.Name, err)
323-
}
324-
}
325-
326319
// We create a wait group to wait for the controllers to stop,
327320
// this waitgroup is a global, and is used by the Teardown function
328321
// which is expected to be called when the program exits.
@@ -347,6 +340,13 @@ func (c *system) Run(ctx context.Context) error {
347340
}
348341
}()
349342

343+
// Run the controllers.
344+
for _, ct := range controllers {
345+
if err := c.runController(ctx, ct); err != nil {
346+
return fmt.Errorf("failed to run controller %q: %w", ct.Name, err)
347+
}
348+
}
349+
350350
return nil
351351
}
352352

0 commit comments

Comments
 (0)