Skip to content

Conversation

@skartikey
Copy link
Contributor

@skartikey skartikey commented Jan 5, 2026

Summary

  • Implements the startup-error-behavior framework (TSD-006) for handling Docker daemon
    unavailability
  • Uses Ping to check Docker connectivity during Start()
  • Returns StartupError with Retry flag based on whether the error is a connection failure
  • Allows users to configure behavior via startup_error_behavior option:
    • error (default): Fail startup if Docker is unavailable
    • retry: Keep retrying connection on each gather cycle
    • ignore: Remove plugin from processing if connection fails
    • probe: Probe plugin availability before deciding

This gives users control over how Telegraf handles Docker daemon unavailability at startup, rather than silently continuing with deferred initialization.

Checklist

Related issues

resolves #18089

@telegraf-tiger telegraf-tiger bot added area/docker fix pr to fix corresponding bug plugin/input 1. Request for new input plugins 2. Issues/PRs that are related to input plugins labels Jan 5, 2026
@skartikey skartikey self-assigned this Jan 5, 2026
@skartikey skartikey changed the title fix(inputs.docker): Allow Telegraf to start when Docker daemon is una… fix(inputs.docker): Allow Telegraf to start when Docker daemon is unavailable Jan 5, 2026
…vailable

This fixes a regression introduced in v1.36.3 where Telegraf would fail to start if the Docker/Podman socket was unavailable. The Start() method now logs a warning instead of returning a fatal error, and the client connection is retried lazily on each Gather() cycle.
@skartikey skartikey force-pushed the inputs_docker_socket_missing branch from 7c2a47a to ec111c6 Compare January 5, 2026 19:17
@skartikey skartikey assigned srebhan and mstrandboge and unassigned skartikey Jan 8, 2026
Copy link
Member

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @skartikey for the PR! However, I think this should be implemented using the
startup-error-behavior framework, i.e. returning a StartupError with a flag denoting that the error is retryable and let the user decide what to do. See this example on how to implement it.

Please only return a retryable error if it is retryable, e.g. using a Ping and check if IsErrConnectionFailed!

…on failures

Implement the startup-error-behavior framework (TSD-006) to handle
Docker daemon unavailability during startup.

This allows users to configure retry behavior via the
startup_error_behavior option (error, retry, ignore, probe) instead of
silently logging warnings and deferring connection to the first Gather.
@skartikey skartikey force-pushed the inputs_docker_socket_missing branch from f24d56a to 065bc90 Compare January 22, 2026 21:46
@skartikey
Copy link
Contributor Author

@srebhan Implemented the startup-error-behavior framework (TSD-006) to handle Docker daemon unavailability. Please take a look.

Copy link
Member

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @skartikey! Some more comments...

@skartikey skartikey force-pushed the inputs_docker_socket_missing branch from 402ca62 to dffca5d Compare January 23, 2026 19:16
Copy link
Member

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @skartikey! Some more minor comments...

@telegraf-tiger
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/docker fix pr to fix corresponding bug plugin/input 1. Request for new input plugins 2. Issues/PRs that are related to input plugins

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Telegraf 1.36.3-1 fails to start when Docker/podman socket is missing (regression from 1.36.2-1)

3 participants