Skip to content

otelcol: make Shutdown block until Run completes cleanup#14596

Open
RoryGlenn wants to merge 1 commit intoopen-telemetry:mainfrom
RoryGlenn:fix/4947-shutdown-blocks-until-run-complete
Open

otelcol: make Shutdown block until Run completes cleanup#14596
RoryGlenn wants to merge 1 commit intoopen-telemetry:mainfrom
RoryGlenn:fix/4947-shutdown-blocks-until-run-complete

Conversation

@RoryGlenn
Copy link

Description

Make Collector.Shutdown block until Run has finished all cleanup, following the http.Server pattern.

Link to tracking Issue: #4947

Changes

The three required behaviors from the issue are now implemented:

  1. Shutdown is safe to call at any point — already handled by sync.Once on the shutdown channel (unchanged)
  2. If Shutdown is called before Runsync.WaitGroup counter is 0, so Wait() returns immediately
  3. If Shutdown is called while Run is activeShutdown blocks until Run completes all cleanup and the state reaches StateClosed

Implementation

Added a sync.WaitGroup (runWG) to the Collector struct:

  • Run() calls runWG.Add(1) at entry and defer runWG.Done() to signal completion
  • Shutdown() calls runWG.Wait() after closing the shutdown channel, blocking until Run finishes

Testing

  • TestCollectorShutdownBlocksUntilRunComplete — verifies Shutdown() blocks and state is StateClosed when it returns
  • TestCollectorShutdownBeforeRun — verifies Shutdown() returns immediately when called before Run
  • TestCollectorShutdownCalledTwiceBlocks — verifies calling Shutdown() twice works correctly
  • Fixed TestCollectorStateAfterConfigChange — calls Shutdown() in a goroutine to avoid deadlock with the new blocking behavior
  • All existing tests continue to pass

Assisted-by: Claude Opus 4.6

Make Collector.Shutdown block until Run has finished all cleanup,
following the http.Server pattern described in issue open-telemetry#4947.

The three required behaviors are now implemented:
1. Shutdown is safe to call at any point (already handled by sync.Once)
2. If Shutdown is called before Run, it returns immediately
3. If Shutdown is called while Run is active, it blocks until Run
   has confirmed shutdown and the state reaches StateClosed

Implementation uses a sync.WaitGroup incremented at Run entry and
decremented via defer when Run exits. Shutdown waits on the WaitGroup
after closing the shutdown channel.

Fixes open-telemetry#4947

Assisted-by: Claude Opus 4.6
@RoryGlenn RoryGlenn requested a review from a team as a code owner February 16, 2026 00:00
@RoryGlenn RoryGlenn requested a review from dmitryax February 16, 2026 00:00
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a908109aa2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

// Consecutive calls to Run are not allowed, Run shouldn't be called once a collector is shut down.
// Sets up the control logic for config reloading and shutdown.
func (col *Collector) Run(ctx context.Context) error {
col.runWG.Add(1)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Register Run with wait group before callers can Wait

Shutdown can miss an in-flight startup when Run is launched asynchronously (for example go col.Run(ctx) followed immediately by col.Shutdown()), because runWG.Add(1) is executed inside Run after scheduling. In that interleaving, Shutdown observes a zero counter and returns before Run finishes startup/shutdown cleanup, so the new “Shutdown blocks until Run completes” contract is not guaranteed for callers that start Run in a goroutine.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant