Skip to content

Conversation

@sanderegg
Copy link
Member

What do these changes do?

It was discovered that we are misusing the Docker HEALTHCHECK in our services Dockerfiles.

setting --start-period=1s is useless since the default --start-interval is set to 5s.
Therefore this PR uniformizes the Dockerfile in that regard and sets the following values:

all services:

--interval=10s # the interval with which at runtime the container is checked for healthyness
--timeout=5s # the timeout to define a container as unhealthy
--retries=5 # number of retries before a consider is considered as definitely unhealthy
--start-period=20s #NOTE: as soon as the healthcheck returns a 0 exit code, then it is considered as running by Docker engine
--start-interval=1s # this is the interval of checking, that means to have a fast startup this should be small (default is 5s)

special cases:

dynamic-sidecar

--start-period=180s @GitHK not sure why...

Related issue/s

How to test

Dev-ops checklist

@sanderegg sanderegg added the t:maintenance Some planned maintenance work label Apr 22, 2025
@sanderegg sanderegg added this to the Pauwel Kwak milestone Apr 22, 2025
@sanderegg sanderegg self-assigned this Apr 22, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes the misconfiguration of Docker healthchecks by updating the start period and adding a start interval to allow for faster initial checks while maintaining overall stability. Key changes include:

  • Updating --start-period from 1s to 20s and adding --start-interval=1s in multiple Docker healthcheck configurations.
  • Changing the endpoint for the healthcheck command from "http://localhost:8000/" to "http://localhost:8080/v0/" in most services.
  • Notably, while nearly all services update the endpoint, one instance remains unchanged.

Reviewed Changes

Copilot reviewed 22 out of 33 changed files in this pull request and generated no comments.

Show a summary per file
File Description
services/director/docker/healthcheck.py Updated healthcheck parameters and endpoint
services/director-v2/docker/healthcheck.py Updated healthcheck parameters and endpoint
services/datcore-adapter/docker/healthcheck.py Updated healthcheck parameters and endpoint
services/clusters-keeper/docker/healthcheck.py Updated healthcheck parameters and endpoint
services/catalog/docker/healthcheck.py Updated healthcheck parameters and endpoint
services/autoscaling/docker/healthcheck.py Updated healthcheck parameters and endpoint
services/api-server/docker/healthcheck.py Updated healthcheck parameters and endpoint
services/agent/docker/healthcheck.py Updated healthcheck parameters; endpoint remains unchanged compared to others
scripts/docker/healthcheck_curl_host.py Updated healthcheck parameters and endpoint
Files not reviewed (11)
  • services/agent/Dockerfile: Language not supported
  • services/api-server/Dockerfile: Language not supported
  • services/autoscaling/Dockerfile: Language not supported
  • services/catalog/Dockerfile: Language not supported
  • services/clusters-keeper/Dockerfile: Language not supported
  • services/dask-sidecar/Dockerfile: Language not supported
  • services/datcore-adapter/Dockerfile: Language not supported
  • services/director-v2/Dockerfile: Language not supported
  • services/director/Dockerfile: Language not supported
  • services/docker-api-proxy/Dockerfile: Language not supported
  • services/dynamic-scheduler/Dockerfile: Language not supported
Comments suppressed due to low confidence (1)

services/agent/docker/healthcheck.py:12

  • The CMD endpoint in this file is not updated to match the uniform endpoint 'http://localhost:8080/v0/' used in the other services. Please update it for consistency.
CMD python3 docker/healthcheck.py http://localhost:8000/

@codecov
Copy link

codecov bot commented Apr 22, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.74%. Comparing base (e0f204c) to head (d637e76).
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7557      +/-   ##
==========================================
+ Coverage   87.60%   88.74%   +1.14%     
==========================================
  Files        1752     1207     -545     
  Lines       67864    51449   -16415     
  Branches     1121      170     -951     
==========================================
- Hits        59453    45660   -13793     
+ Misses       8103     5731    -2372     
+ Partials      308       58     -250     
Flag Coverage Δ
integrationtests 65.05% <ø> (-0.07%) ⬇️
unittests 88.74% <ø> (+1.94%) ⬆️
Components Coverage Δ
api ∅ <ø> (∅)
pkg_aws_library ∅ <ø> (∅)
pkg_dask_task_models_library ∅ <ø> (∅)
pkg_models_library ∅ <ø> (∅)
pkg_notifications_library ∅ <ø> (∅)
pkg_postgres_database ∅ <ø> (∅)
pkg_service_integration ∅ <ø> (∅)
pkg_service_library ∅ <ø> (∅)
pkg_settings_library ∅ <ø> (∅)
pkg_simcore_sdk 77.24% <ø> (-8.16%) ⬇️
agent 96.46% <ø> (ø)
api_server 91.23% <ø> (ø)
autoscaling 96.08% <ø> (ø)
catalog 92.46% <ø> (ø)
clusters_keeper 99.24% <ø> (ø)
dask_sidecar 91.29% <ø> (ø)
datcore_adapter 98.12% <ø> (ø)
director 76.87% <ø> (+0.09%) ⬆️
director_v2 91.30% <ø> (ø)
dynamic_scheduler 97.40% <ø> (ø)
dynamic_sidecar 90.11% <ø> (ø)
efs_guardian 89.79% <ø> (ø)
invitations 93.28% <ø> (ø)
payments 92.66% <ø> (ø)
resource_usage_tracker 89.23% <ø> (+0.10%) ⬆️
storage 87.51% <ø> (-0.08%) ⬇️
webclient ∅ <ø> (∅)
webserver 85.97% <ø> (-0.02%) ⬇️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e0f204c...d637e76. Read the comment docs.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Collaborator

@matusdrobuliak66 matusdrobuliak66 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@sanderegg sanderegg force-pushed the maintenance/uniformize-healthchecks branch from 9c71e7c to d637e76 Compare April 22, 2025 14:37
@sanderegg sanderegg merged commit e6f13b1 into ITISFoundation:master Apr 22, 2025
3 of 4 checks passed
@sanderegg sanderegg deleted the maintenance/uniformize-healthchecks branch April 22, 2025 14:38
@sonarqubecloud
Copy link

Please retry analysis of this Pull-Request directly on SonarQube Cloud

@matusdrobuliak66 matusdrobuliak66 mentioned this pull request May 8, 2025
34 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

t:maintenance Some planned maintenance work

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants