Skip to content

Conversation

benh-debian
Copy link
Contributor

The signal handler added for SIGINT in commit bdc1bdb is unsafe, as it calls (indirectly) free() and exit() which are not signal-safe functions.

This reverts that change and switches to a poll()-based event loop that handles signals without using a signal handler.

It also adds handling of SIGTERM, which is the signal that systemd will send to stop the service.

This reverts commit bdc1bdb.  Neither
free() nor exit() are safe to call in a signal handler.

Signed-off-by: Ben Hutchings <[email protected]>
Currently we use a blocking netlink socket, which is not compatible
with handling signals gracefully.

To handle SIGINT properly:

- Set the netlink socket to be non-blocking
- Block SIGINT and create a signalfd() to receive it
- Poll the netlink socket and signal fd in a loop

Signed-off-by: Ben Hutchings <[email protected]>
systemd's default stop signal and the kill command's default
signal are SIGTERM.  Catch this as well as SIGINT.

Signed-off-by: Ben Hutchings <[email protected]>
@chucklever
Copy link
Member

SIGINT can occur here only when tlshd is run at the command line, never when it is run as a daemon, as it normally is deployed. I can't find any immediate search results that explain how the parent exiting from a signal handler is in any way risky.

I'm not convinced the risk is high enough to include this PR in a 1.2 minor release. But the current signal handler is throw-away code.

@benh-debian
Copy link
Contributor Author

SIGINT can occur here only when tlshd is run at the command line, never when it is run as a daemon, as it normally is deployed.

Right. I added SIGTERM handling to provide a clean shutdown in the normal case.

I can't find any immediate search results that explain how the parent exiting from a signal handler is in any way risky.

Any Linux/Unix programming reference should explain this.

I'm not convinced the risk is high enough to include this PR in a 1.2 minor release. But the current signal handler is throw-away code.

OK.

@chucklever
Copy link
Member

SIGINT can occur here only when tlshd is run at the command line, never when it is run as a daemon, as it normally is deployed.

Right. I added SIGTERM handling to provide a clean shutdown in the normal case.

I understand that is what the PR does. But it doesn't tell me why SIGTERM cleanup is necessary. The only purpose for the SIGINT handler was to clean up valgrind results so I can more easily catch memory leaks. There's no hard requirement that I can see for explicit clean up in other cases, since all processes, including the parent, are exiting.

My fundamental qualms are about switching to using a polling event loop. I need to think about that. Would atexit() be a better approach for proper cleanup?

I can't find any immediate search results that explain how the parent exiting from a signal handler is in any way risky.

Searching again, I see claims that I shouldn't exit from a signal handler, but the suggestion is to use _exit(2) instead of exit(3). "man 7 signal" is referenced in some search results, but a quick browse doesn't find anything on point there either.

I don't feel that I understand the risk yet. Even so, I will pull this into a 1.3 dev branch for more study.

@chucklever
Copy link
Member

Manually pushed to ktls-utils-1.3-dev.

@chucklever chucklever closed this Jul 18, 2025
@chucklever
Copy link
Member

The combined signal and netlink polling mechanism seems straightforward, and passes my testing. I'm re-opening this PR in order to pull the commits into the main branch. Sorry for the noise, I'm still getting used to GitHub's workflows.

@chucklever chucklever reopened this Jul 21, 2025
@chucklever chucklever merged commit b8a754f into oracle:main Jul 21, 2025
8 of 9 checks passed
@chucklever
Copy link
Member

Hi Ben - would you be interested in adding a handler for SIGUSR1 (or SIGHUP) to cause tlshd to reload it's configuration?

@benh-debian
Copy link
Contributor Author

Hi Ben - would you be interested in adding a handler for SIGUSR1 (or SIGHUP) to cause tlshd to reload it's configuration?

That would be good to have. I'm not sure if you're asking me to implement it or offering to do so...?

In any case, SIGHUP is the usual reload signal, and what systemd will send by default.

@chucklever
Copy link
Member

Hi Ben - would you be interested in adding a handler for SIGUSR1 (or SIGHUP) to cause tlshd to reload it's configuration?

That would be good to have. I'm not sure if you're asking me to implement it or offering to do so...?

In any case, SIGHUP is the usual reload signal, and what systemd will send by default.

I'm asking if you'd like to implement and contribute a SIGHUP handler for tlshd.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants