-
|
Hey, I'm kinda confused, I have a docker container that declares a bunch of stage2 services in I'm setting the I don't understand how to prevent this behavior and terminate the container properly. I managed to stumble on these two issues: #599 and #601 which seem to mention a similar behavior to what I'm experiencing, and the solution being to set The thing is though, I don't want to enforce a strict time limit for my services to become ready, my container works in kind of a dynamic way, and the services running in there are user-defined. In most cases, it takes quite a while for them to become ready (dozens of seconds), I don't have any reasonable number that I could use for that value, the wait time for readiness should be 0 (infinite), or something that differs per-service. When I do set this env var to something non-zero though, the container does seem to terminate within that specified time, but until then, the failing service just keeps restarting in an infinite loop. That's not good enough for me, since that number would need to be at least 100s to be on the safe side (maybe even more), as it can contain a lot of services with various dependencies and they can take different amounts of time to become ready. But I don't want to have to wait 100s for my container to exit if I already know that one of the services exitted, the moment that happens, I already want to enter into the termination phase, and stop all of the other services, regardless of their readiness (they should still be stopped gracefully though). So, to put it simply, I don't want services not becoming ready within some time to be considered a problem (or if it should be a problem, it should be decided per-service through the Other things I triedOther than using the And so, I wasn't able to terminate the container from inside with any method that I attempted to use, and I'm left with no idea how to go about doing so. I don't know if this is a bug, or just a misunderstanding of the behavior on my side, if it's a bug, maybe move this from discussions to an issue, if not, what's going on? And what should I do to get the container to actually terminate at the moment any of the services fails, without also needlessly waiting? Should I use something different than the halt command here to achieve this? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
|
The thing to understand is that s6, the process supervisor, was made to handle processes dying - which can happen for a variety of reasons, such as failing to acquire resources, the network being inaccessible, etc. - and that's why it restarts daemons by default. It considers these errors to be temporary, and in order to tell it that no, a process death is a serious issue and the whole container should shut down, you have to do some extra work. On top of that, since in most cases services can depend on one another, s6-overlay uses the s6-rc service manager, which was made to ensure that a service is only started once all its dependencies are ready. It is only useful at container boot time, but it is important - and here as well, by default a process death is not seen as a permanent failure; permanent failure only happens when the service hasn't managed to become ready after a certain time, despite being restarted if it dies. That is why you see all these configuration knobs with timeouts. Your use case is a little peculiar because you are saying "in normal circumstances my daemons never die; any death should trigger a full container shutdown", and also "I want infinite timeouts". It is possible to do that, it just requires some setup. First, if you want infinite timeouts, that's fine. Delete S6_CMD_WAIT_FOR_SERVICES_MAXTIME or any other variable that set timeouts, and s6-overlay won't time you out - it's just that you need to be careful to avoid cases where your container stalls forever. Second, the way to perform an action when a supervised process dies is to do the action you want in a exec /run/s6/basedir/bin/halt... except, not quite. That will work in cruise mode, when all the services have been launched and are just being supervised by s6. That will not work if a process dies while the services are still being brought up - because in that case, the s6-rc service manager is still running, and And the way to distinguish between both cases is to check whether s6-rc is currently running, which you can do by calling if s6-rc list ; then
exec /run/s6/basedir/bin/halt
else
exit 125
fiAdd this to all your services, and there you go. I agree that it's not intuitive at all, and that you need pretty esoteric knowledge of s6 to make it work, but it's the first time someone has your exact set of requirements. 😅 |
Beta Was this translation helpful? Give feedback.
-
|
It is indeed a flaw in the larger design, but it's the best I could come up with to offer parallel start while keeping compatibility with s6-overlay v2 that didn't inconvenience too many people. When I get the time, I'll try and find something that allows for immediate shutdown in a cleaner and easier way. |
Beta Was this translation helpful? Give feedback.
Yeah, I understand.
The thing is, while the startup is being controlled by s6-rc, there isn't much you can do; the solution I suggested tells s6-rc that the transition failed, but s6-rc is still trying to bring up the other services in parallel and will only exit when it has a result (either success or failure) from everything; there's no way to tell it to abort. It's usually not a problem because most people operate with reasonable timeouts.
If you want to quit immediately as soon as one service fails, you have to kill s6-rc, then call halt, rather than telling s6-rc that the service failed permanently. s6-rc isn't supervised, so there is no clean way of finding its pid, but in this case…