-
Notifications
You must be signed in to change notification settings - Fork 12
Description
Hello!
Under what environment/condition can the reference tutorial (explained at https://docs.chaostoolkit.org/reference/tutorial/, with its code being hosted here) work as advertised?
I am running the chaostoolkit tutorial experiment using Python 3.7.3 on Ubuntu 18.04.2.
In the documentation, the expected failure is supposed to occur at the we-can-request-sunset stage on the second run of the steady state hypothesis. However, the restart actions unexpectedly fail and therefore we stumble upon a CRITICAL alert that does not follow the scenario described in the documentation:
[2019-08-12 13:49:55 INFO] Action: restart-astre-service-to-pick-up-certificate [2019-08-12 13:49:55 INFO] Action: restart-sunset-service-to-pick-up-certificate [2019-08-12 13:49:55 INFO] Pausing after activity for 1s... [2019-08-12 13:49:56 INFO] Steady state hypothesis: Application responds [2019-08-12 13:49:56 INFO] Probe: the-astre-service-must-be-running [2019-08-12 13:49:56 CRITICAL] Steady state probe 'the-astre-service-must-be-running' is not in the given tolerance so failing this experiment
The relevant section in the experiment.json file is this one:
{
"type": "action",
"name": "restart-astre-service-to-pick-up-certificate",
"provider": {
"type": "process",
"path": "pkill",
"arguments": "--echo -HUP -F astre.pid"
}
},
My terminals show something like this for the astre and sunset servers when they are restarted:
[12/Aug/2019:14:09:57] ENGINE SIGHUP caught but not daemonized. Exiting.
[12/Aug/2019:14:09:57] ENGINE Bus STOPPING
[12/Aug/2019:14:09:57] ENGINE HTTP Server cherrypy._cpwsgi_server.CPWSGIServer(('127.0.0.1', 8444)) shut down
[12/Aug/2019:14:09:57] ENGINE Bus STOPPED
[12/Aug/2019:14:09:57] ENGINE Bus EXITING
[12/Aug/2019:14:09:57] ENGINE PID file removed: 'astre.pid'.
[12/Aug/2019:14:09:57] ENGINE Bus EXITED
What happens is that the servers are killed instead of being restarted.
As a matter of fact, after reading up a bit on signals, it seems (see https://unix.stackexchange.com/questions/15601/program-behavior-when-kill-hup-is-recieved in particular) that python is behaving as expected. SIGHUP is intended to mean that access is lost to interactive programs. BUT... daemons being non-interactive, they have, by convention, taken this signal to simply mean reloading their configuration.
From all I can gather, what's happening here is exactly what is to be expected. The python process is stopped.
How/where has this worked as described in the tutorial?