[docker_deamon] docker healthcheck as service checks.#2859
Conversation
dfb3fd3 to
905c881
Compare
hkaj
left a comment
There was a problem hiding this comment.
2 minor comments about the config, otherwise LGTM
conf.d/docker_daemon.yaml.example
Outdated
| # Report docker container healthcheck events as service checks | ||
| # Note: enabling this option modifies the way in which we inspect the containers and causes | ||
| # some overhead - if you run a high volume of containers we may timeout. Please only | ||
| # enable if absolutely necessary. |
There was a problem hiding this comment.
I think you can remove the last sentence. The slow part of inspecting a container is the lock acquisition over said container, and this is also done in docker ps, so we may timeout the check but it could also happen without the inspect part.
Same for the custom_cgroups option actually, the last sentence should be dropped.
conf.d/docker_daemon.yaml.example
Outdated
| # Report docker container healthcheck events as service checks | ||
| # Note: enabling this option modifies the way in which we inspect the containers and causes | ||
| # some overhead - if you run a high volume of containers we may timeout. Please only | ||
| # enable if absolutely necessary. |
There was a problem hiding this comment.
There should also be a mention that this option is available starting from Docker 1.12 and enabling it with an older version will result in an UNKNOWN state for the service check. It should be obvious but I'd rather be extra explicit here.
32d4345 to
99bedbe
Compare
|
Actually I wonder if we shouldn't ask for a list of container name instead and only enable service checks for those listed. This could get spammy, and health checks in dynamic clusters where containers cease to exist and appear very often this may not have a lot of value. What do you think? |
checks.d/docker_daemon.py
Outdated
|
|
||
| # Report docker healthcheck SC's where available | ||
| if health_service_checks: | ||
| health_scs_containers = instance.get('health_service_check_whitelist', []) |
|
@hkaj so I tested this out... my only lingering question would be. Should we make the whitelist by image name (current behavior) or container name or both? Taking into account tags or not? Should we support wildcards? Let me know your thoughts - as you can see right now it's a straight match. |
5ce326c to
95b886c
Compare
hkaj
left a comment
There was a problem hiding this comment.
@truthbk ideally I think we should support the same behavior as for include/exclude containers, so regex support and on all container tags.
But that'll require more work. Maybe we can ship a first iteration with regex support but only on the image? And then transition to all-tags support.
If we go this way I'd suggest using another name for the yaml field, something like healthcheck_image_whitelist. This way when we improve it to support all tags we can use the new, more generic parameter healthcheck_whitelist and slowly deprecate the old one.
Of course you can also go hard and support all tags from the start if you feel like it 😄
checks.d/docker_daemon.py
Outdated
| custom_cgroups = _is_affirmative(instance.get('custom_cgroups', False)) | ||
|
|
||
| # submit healtcheck service checks? | ||
| health_service_checks = _is_affirmative(instance.get('health_service_checks', False)) |
There was a problem hiding this comment.
I think we don't need this parameter anymore. If the list is there, enable health checks, otherwise disable them. So you can look for the white list param directly, maybe check if it isn't empty.
hkaj
left a comment
There was a problem hiding this comment.
LGTM once conflicts are fixed and tests pass
[docker_daemon] set proper tagging and SC name for healthcheck. [docker_daemon] put healtcheck sc behind feature flag - it requires inspect. [ci] fixing docker_daemon CI. [ci] fixing old setuptools setup_env issues. Bumping to newest version. [docker_daemon] adding test for container inspeciton logic. [docker_daemon] amending config yaml.
[docker_daemon] whitelist health service checks yaml update. [docker_daemon] whitelist is a set for O(1) average complexity.
3567ef2 to
8b1cc24
Compare
What does this PR do?
This PR allows us to send service checks according to the healthcheck reported by docker for containers. This is a new feature introduced with Docker
1.12, and not all containers will implement it. Currently we will send anUNKNOWNstatus in that case. Maybe we should only submit the service check when available.Because grabbing the health status requires inspecting the containers, and that's a potentially expensive call, for now this feature will be hid behind a feature flag
health_service_checks.Also added a simple test. More work could be done on this front.
Motivation
Provides yet another point of view regarding a container's health, as intended by the container's maintainers.
I have also noticed a couple CI/tooling issues - we can sneak them in here. Or address them in a separate PR which probably makes more sense. One of the issues affects contributors as there's a problem with the old
setuptoolsandrake setup_env.