Skip to content

[operate] running signoz-otel-collector before signoz-clickhouse fails to properly initΒ #10235

@KieranP

Description

@KieranP

Bug description

We use docker compose to run signoz. I just rebooted out signoz server (after applying security updates), and when it came back up, signoz-otel-collector service loaded before signoz-clickhouse. This mean that the collector, on init, failed to connect to clickhouse. Despite this, collector service remained in a state of running/ready in docker, but when our hosts were sending metrics to it, they were getting:

Feb 08 19:57:05 otelcol-contrib[612]: 2026-02-08T19:57:05.093Z info internal/retry_sender.go:133 Exporting failed. Will retry the request after interval. {"resource": {"service.instance.id": "bb80bc8d-875b-47a6-8b3d-002c24f6024a", "service.name": "otelcol-contrib", "service.version": "0.140.1"}, "otelcol.component.id": "otlp", "otelcol.component.kind": "exporter", "otelcol.signal": "logs", "error": "rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp REDACTED:4317: connect: connection refused"", "interval": "36.837733592s"}

Looking at https://github.com/SigNoz/signoz/blob/main/deploy/docker/docker-compose.yaml, the collector service has a dependency on "signoz", which in turn depends on "clickhouse", so in theory it should be waiting on clickhouse, but it seems like clickhouse container is reporting healthy before it actually is, because the collector bootup had this in the logs:

{"level":"info","ts":"2026-02-08T19:54:17.622Z","caller":"service@v0.128.0/service.go:282","msg":"Everything is ready. Begin running and processing data.","resource":{"service.instance.id":"77ac44d3-3144-4621-ae48-097fed6561c0","service.name":"/signoz-otel-collector","service.version":"dev"}}
{"level":"info","timestamp":"2026-02-08T19:54:17.816Z","caller":"signozcol/collector.go:120","msg":"Collector service is running"}
{"level":"error","timestamp":"2026-02-08T19:54:17.816Z","caller":"opamp/server_client.go:281","msg":"failed to apply config","component":"opamp-server-client","error":"failed to reload config: /var/tmp/collector-config.yaml: collector failed to restart: failed to build pipelines: failed to create "clickhouselogsexporter" exporter for data type "logs": cannot configure clickhouse logs exporter: dial tcp 172.18.0.4:9000: connect: connection refused","stacktrace":"github.com/SigNoz/signoz-otel-collector/opamp.(*serverClient).onRemoteConfigHandler\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/opamp/server_client.go:281\ngithub.com/SigNoz/signoz-otel-collector/opamp.(*serverClient).onMessageFuncHandler\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/opamp/server_client.go:265\ngithub.com/open-telemetry/opamp-go/client/internal.(*receivedProcessor).ProcessReceivedMessage\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.19.0/client/internal/receivedprocessor.go:160\ngithub.com/open-telemetry/opamp-go/client/internal.(*wsReceiver).ReceiverLoop\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.19.0/client/internal/wsreceiver.go:94"}

Note that it reports "Collector service is running", but the very next thing it tries to do is connect to clickhouse and fails with connection refused, indicating clickhouse was not actually ready to receive connections.

For now, I will run docker restart signoz-otel-collector some time after clickhouse container reports ready, and that seems to allow collector to boot up properly.

Expected behavior

  1. signoz-otel-collector should wait for signoz-clickhouse to fully initialize (would require changing the clickhouse health requirements check)
  2. collector should not report healthy to docker unless it can connect to clickhouse (if it did this, then docker would auto restart it and thing would have come back up automatically, but as it is, collector says healthy even though clickhouse connection fails)

Version information

  • Signoz version: 0.110.1

Metadata

Metadata

Assignees

Labels

operateIssues with operating SigNozquestionQuestions about using the SigNoz

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions