Skip to content

SIGHUP caught but not daemonized. #3

@jfsmith-at-coveo

Description

@jfsmith-at-coveo

Hello!

Under what environment/condition can the reference tutorial (explained at https://docs.chaostoolkit.org/reference/tutorial/, with its code being hosted here) work as advertised?

I am running the chaostoolkit tutorial experiment using Python 3.7.3 on Ubuntu 18.04.2.

In the documentation, the expected failure is supposed to occur at the we-can-request-sunset stage on the second run of the steady state hypothesis. However, the restart actions unexpectedly fail and therefore we stumble upon a CRITICAL alert that does not follow the scenario described in the documentation:

[2019-08-12 13:49:55 INFO] Action: restart-astre-service-to-pick-up-certificate
[2019-08-12 13:49:55 INFO] Action: restart-sunset-service-to-pick-up-certificate
[2019-08-12 13:49:55 INFO] Pausing after activity for 1s...
[2019-08-12 13:49:56 INFO] Steady state hypothesis: Application responds
[2019-08-12 13:49:56 INFO] Probe: the-astre-service-must-be-running
[2019-08-12 13:49:56 CRITICAL] Steady state probe 'the-astre-service-must-be-running' is not in the given tolerance so failing this experiment

The relevant section in the experiment.json file is this one:

        {
            "type": "action",
            "name": "restart-astre-service-to-pick-up-certificate",
            "provider": {
                "type": "process",
                "path": "pkill",
                "arguments": "--echo -HUP -F astre.pid"
            }
        },

My terminals show something like this for the astre and sunset servers when they are restarted:

[12/Aug/2019:14:09:57] ENGINE SIGHUP caught but not daemonized. Exiting.
[12/Aug/2019:14:09:57] ENGINE Bus STOPPING
[12/Aug/2019:14:09:57] ENGINE HTTP Server cherrypy._cpwsgi_server.CPWSGIServer(('127.0.0.1', 8444)) shut down
[12/Aug/2019:14:09:57] ENGINE Bus STOPPED
[12/Aug/2019:14:09:57] ENGINE Bus EXITING
[12/Aug/2019:14:09:57] ENGINE PID file removed: 'astre.pid'.
[12/Aug/2019:14:09:57] ENGINE Bus EXITED

What happens is that the servers are killed instead of being restarted.

As a matter of fact, after reading up a bit on signals, it seems (see https://unix.stackexchange.com/questions/15601/program-behavior-when-kill-hup-is-recieved in particular) that python is behaving as expected. SIGHUP is intended to mean that access is lost to interactive programs. BUT... daemons being non-interactive, they have, by convention, taken this signal to simply mean reloading their configuration.

From all I can gather, what's happening here is exactly what is to be expected. The python process is stopped.

How/where has this worked as described in the tutorial?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions