Skip to content

feat: healthchecks for sentry components #3859

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

mzglinski
Copy link
Contributor

Resolves #3853

Few services still are missing the healthcheck, here is why:

  • relay - does not have any binaries in the distroless image
  • taskbroker, taskworker, taskscheduler, uptime-checker - I was not able to find any healthcheck-related options; they probably do not support that at the moment
  • snuba-replacer - same as the above, no support
  • all types of cron and cleanups - I do not have an idea on how to test this, does not seem necessary.

Proof:

admin@ip-172-31-23-18:~$ docker ps --format "{{.ID}}  {{.Status}}  {{.Names}}"
495e2eab6069  Up 13 minutes (healthy)  sentry-self-hosted-post-process-forwarder-errors-1
69d517b4b935  Up 13 minutes (healthy)  sentry-self-hosted-attachments-consumer-1
bd5b91f08f51  Up 13 minutes (healthy)  sentry-self-hosted-monitors-clock-tasks-1
2d61e4c79ad6  Up 13 minutes (healthy)  sentry-self-hosted-ingest-replay-recordings-1
b46278c643a0  Up 13 minutes (healthy)  sentry-self-hosted-transactions-consumer-1
4237ef96f4ee  Up 13 minutes (healthy)  sentry-self-hosted-ingest-monitors-1
5b2c5ead49f4  Up 13 minutes (healthy)  sentry-self-hosted-ingest-occurrences-1
6cc8c31d2cb4  Up 13 minutes (healthy)  sentry-self-hosted-subscription-consumer-generic-metrics-1
b33990f43127  Up 13 minutes (healthy)  sentry-self-hosted-subscription-consumer-metrics-1
c5411608bbfa  Up 13 minutes (healthy)  sentry-self-hosted-ingest-feedback-events-1
2bacdce7feea  Up 13 minutes (healthy)  sentry-self-hosted-process-segments-1
dad0562aed81  Up 13 minutes (healthy)  sentry-self-hosted-subscription-consumer-transactions-1
e465753d8531  Up 13 minutes (healthy)  sentry-self-hosted-billing-metrics-consumer-1
3c1609b9491b  Up 13 minutes (healthy)  sentry-self-hosted-worker-1
90e191aec337  Up 13 minutes (healthy)  sentry-self-hosted-post-process-forwarder-issue-platform-1
4bdca9ba81e5  Up 13 minutes  sentry-self-hosted-taskscheduler-1
7c61c4245c22  Up 13 minutes (healthy)  sentry-self-hosted-metrics-consumer-1
509fe6e9cf47  Up 13 minutes (healthy)  sentry-self-hosted-subscription-consumer-events-1
4badc0ba0013  Up 13 minutes (healthy)  sentry-self-hosted-post-process-forwarder-transactions-1
40cb4922b2ce  Up 13 minutes (healthy)  sentry-self-hosted-ingest-profiles-1
1e288f0580ed  Up 13 minutes (healthy)  sentry-self-hosted-events-consumer-1
3b02beb5c1bf  Up 13 minutes (healthy)  sentry-self-hosted-uptime-results-1
0429ec2b6d6b  Up 11 minutes (healthy)  sentry-self-hosted-process-spans-1
818348adc7e4  Up 13 minutes (healthy)  sentry-self-hosted-monitors-clock-tick-1
5d2e479fd790  Up 13 minutes (healthy)  sentry-self-hosted-generic-metrics-consumer-1
63d15a68e992  Up 13 minutes (healthy)  sentry-self-hosted-subscription-consumer-eap-items-1
cfe7b3e90715  Up 13 minutes (healthy)  sentry-self-hosted-nginx-1
3d12bc44a4cf  Up 13 minutes (healthy)  sentry-self-hosted-snuba-subscription-consumer-generic-metrics-gauges-1
8f18d3eb1dea  Up 13 minutes (healthy)  sentry-self-hosted-snuba-generic-metrics-counters-consumer-1
e25bdfc59c68  Up 13 minutes (healthy)  sentry-self-hosted-snuba-group-attributes-consumer-1
d687d51bb984  Up 13 minutes (healthy)  sentry-self-hosted-symbolicator-1
ed58b5e5a949  Up 13 minutes (healthy)  sentry-self-hosted-snuba-replays-consumer-1
f3009c3409e9  Up 13 minutes (healthy)  sentry-self-hosted-snuba-generic-metrics-gauges-consumer-1
b35077d51a53  Up 13 minutes (healthy)  sentry-self-hosted-snuba-subscription-consumer-generic-metrics-counters-1
f381885f4909  Up 13 minutes (healthy)  sentry-self-hosted-snuba-eap-items-consumer-1
13f197f7ad9b  Up 13 minutes (healthy)  sentry-self-hosted-snuba-subscription-consumer-eap-items-1
44b2c073080c  Up 13 minutes (healthy)  sentry-self-hosted-snuba-subscription-consumer-events-1
6e1d241cd750  Up 13 minutes (healthy)  sentry-self-hosted-snuba-issue-occurrence-consumer-1
9d533f85f7e6  Up 13 minutes (healthy)  sentry-self-hosted-snuba-metrics-consumer-1
7b2c5890ab52  Up 13 minutes (healthy)  sentry-self-hosted-snuba-subscription-consumer-metrics-1
a75df827d790  Up 13 minutes (healthy)  sentry-self-hosted-snuba-transactions-consumer-1
04a692f2e0e4  Up 13 minutes (healthy)  sentry-self-hosted-snuba-spans-consumer-1
a7ee681b0ec6  Up 13 minutes (healthy)  sentry-self-hosted-snuba-subscription-consumer-generic-metrics-distributions-1
8ea3f42fc61a  Up 13 minutes (healthy)  sentry-self-hosted-snuba-subscription-consumer-transactions-1
7159e100353d  Up 13 minutes (healthy)  sentry-self-hosted-snuba-generic-metrics-distributions-consumer-1
8a60784e484e  Up 13 minutes (healthy)  sentry-self-hosted-vroom-1
7d6749d9210e  Up 13 minutes (healthy)  sentry-self-hosted-snuba-profiling-profiles-consumer-1
6e124890bff4  Up 13 minutes (healthy)  sentry-self-hosted-snuba-errors-consumer-1
d42bd59930cb  Up 13 minutes (healthy)  sentry-self-hosted-snuba-uptime-results-consumer-1
e8897683652a  Up 13 minutes (healthy)  sentry-self-hosted-snuba-profiling-profile-chunks-consumer-1
d38526433807  Up 13 minutes (healthy)  sentry-self-hosted-snuba-generic-metrics-sets-consumer-1
d29c78859f75  Up 13 minutes (healthy)  sentry-self-hosted-snuba-profiling-functions-consumer-1
8a6f10601099  Up 13 minutes (healthy)  sentry-self-hosted-snuba-subscription-consumer-generic-metrics-sets-1
99df45d4c785  Up 13 minutes (healthy)  sentry-self-hosted-snuba-outcomes-billing-consumer-1
e8f5e49744aa  Up 13 minutes (healthy)  sentry-self-hosted-snuba-outcomes-consumer-1
37bd4481e39b  Up About an hour  sentry-self-hosted-taskworker-1
2bd1cd98caa3  Up About an hour  sentry-self-hosted-cron-1
74828ab35749  Up About an hour  sentry-self-hosted-sentry-cleanup-1
4627619e030e  Up About an hour  sentry-self-hosted-snuba-replacer-1
0af9b0ea20ee  Up About an hour (healthy)  sentry-self-hosted-snuba-api-1
2d1c315deeed  Up 2 hours  sentry-self-hosted-relay-1
fb070e5032dd  Up 2 hours (healthy)  sentry-self-hosted-web-1
9fa54d834209  Up 2 hours  sentry-self-hosted-vroom-cleanup-1
92100b33c5b6  Up 2 hours  sentry-self-hosted-uptime-checker-1
5dbd19845444  Up 2 hours  sentry-self-hosted-taskbroker-1
f7d6b65c1b9d  Up 2 hours  sentry-self-hosted-symbolicator-cleanup-1
a8d84f1efbb9  Up 2 hours (healthy)  sentry-self-hosted-memcached-1
64e2ed88fbf6  Up 2 hours  sentry-self-hosted-smtp-1
e365daed53a4  Up 2 hours (healthy)  sentry-self-hosted-postgres-1
93be136e5ab3  Up 2 hours (healthy)  sentry-self-hosted-kafka-1
08db3e9efeb4  Up 2 hours (healthy)  sentry-self-hosted-clickhouse-1
8037893531ae  Up 2 hours (healthy)  sentry-self-hosted-redis-1

Legal Boilerplate

Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. and is gonna need some rights from me in order to utilize my contributions in this here PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.

.env Outdated
Comment on lines 22 to 23
HEATLHCHECK_START_PERIOD=10s
HEALTHCHECK_FILE_INTERVAL=60s
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential bug: A typo in the `HEATLHCHECK_START_PERIOD` variable name causes the healthcheck grace period to be ignored, potentially leading to premature service restarts.
  • Description: A typo in the environment variable HEATLHCHECK_START_PERIOD (missing an 'H') in the .env file and its reference in docker-compose.yml will cause Docker Compose to substitute an empty string for the healthcheck start_period. This leads to the value defaulting to 0s instead of the intended 10s. As a result, multiple services including postgres, redis, kafka, and web will not have the intended grace period during startup, which could cause them to be marked as unhealthy and restart unnecessarily, impacting system reliability.
  • Suggested fix: Correct the typo in the environment variable name from HEATLHCHECK_START_PERIOD to HEALTHCHECK_START_PERIOD in the .env file and update the corresponding reference in docker-compose.yml.
    severity: 0.65, confidence: 0.98

Did we get this right? 👍 / 👎 to inform future reviews.

@aldy505
Copy link
Collaborator

aldy505 commented Aug 8, 2025

relay - does not have any binaries in the distroless image

@Dav1dde do you have any workarounds for this, or perhaps, how does the health check work on SaaS?

taskbroker, taskworker, taskscheduler, uptime-checker - I was not able to find any healthcheck-related options; they probably do not support that at the moment

I'll ask around for uptime-checker. For taskbroker, we might leave it there.

@mzglinski
Copy link
Contributor Author

@aldy505 Relay healthcheck is possible on something like the kubernetes, where the orchestrator has the capability to make http request to the running container without the use of any binaries inside the image. Unfortunately docker compose does not have such a functionality.

@aldy505
Copy link
Collaborator

aldy505 commented Aug 8, 2025

@aldy505 Relay healthcheck is possible on something like the kubernetes, where the orchestrator has the capability to make http request to the running container without the use of any binaries inside the image. Unfortunately docker compose does not have such a functionality.

@mzglinski I think it's possible to add wget or curl into Relay's container. Let us wait for David.

1 similar comment
@aldy505
Copy link
Collaborator

aldy505 commented Aug 8, 2025

@aldy505 Relay healthcheck is possible on something like the kubernetes, where the orchestrator has the capability to make http request to the running container without the use of any binaries inside the image. Unfortunately docker compose does not have such a functionality.

@mzglinski I think it's possible to add wget or curl into Relay's container. Let us wait for David.

@Dav1dde
Copy link
Member

Dav1dde commented Aug 8, 2025

@Dav1dde do you have any workarounds for this

Can you do the health check through a separate container which has curl?

or perhaps, how does the health check work on SaaS?

Kubernetes supports HTTP probes.

I think it's possible to add wget or curl into Relay's container.

That won't be possible, the image is completely empty except for Relay and a tiny bit of support libraries.

@mzglinski
Copy link
Contributor Author

@Dav1dde do you have any workarounds for this

Can you do the health check through a separate container which has curl?

Not really, compose does not support multiple containers per service, and the healthcheck is specific to the container/service. The only option I can see would involve a Dockerfile that builds on top of the relay image and adds curl/bash to the local image. Similar to the solution used for Sentry image.

@Dav1dde
Copy link
Member

Dav1dde commented Aug 8, 2025

That's not great, immediate options which come to mind:

  1. Implement something in the Relay binary itself that can be invoked for health checks
  2. Build a -debug variant of the Relay image(s).

@aldy505
Copy link
Collaborator

aldy505 commented Aug 8, 2025

  1. Implement something in the Relay binary itself that can be invoked for health checks

Let's do this one.

@aldy505
Copy link
Collaborator

aldy505 commented Aug 9, 2025

Now we wait getsentry/relay#5044

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

Healthchecks for sentry components
3 participants