You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This replaces the current `SIGUSR2` (#2716) with the new feature.
(Not supported on Windows).
* Restart the new process with zero downtime
The primary motivation is to enable the update of Fluentd
without data loss of plugins such as `in_udp`.
Specification:
* 2 ways to trigger this feature (non-Windows):
* Signal: `SIGUSR2` to the supervisor.
* Sending `SIGUSR2` to the workers triggers the traditional
GracefulReload.
* (Leave the traditional way, just in case)
* RPC: `/api/processes.zeroDowntimeRestart`
* Leave `/api/config.gracefulReload` for the traditional feature.
* This starts the new supervisor and workers with zero downtime
for some plugins.
* Input plugins with `zero_downtime_restart` supported work in
parallel.
* Supported input plugins:
* `in_tcp`
* `in_udp`
* `in_syslog`
* The old processes stop after 10s.
* The new supervisor works in `source-only` mode (#4661)
until the old processes stop.
* After the old processes stop, the data handled by the new
processes are loaded and processed.
* If need, you can configure `source_only_buffer` (see #4661).
* Windows: Not affected at all. Remains the traditional
GracefulReload.
Mechanism:
1. The supervisor receives SIGUSR2.
2. Spawn a new supervisor.
3. Take over shared sockets.
4. Launch new workers, and stop old processes in parallel.
* Launch new workers with source-only mode
* Limit to zero_downtime_restart_ready? input plugin
* Send SIGTERM to the old supervisor after 10s delay from 3.
5. The old supervisor stops and sends SIGWINCH to the new one.
6. The new workers run fully.
Note: need these feature
* #4661
* treasure-data/serverengine#146
Conditions under which `zero_downtime_restart_ready?` can be enabled:
* Must be able to work in parallel with another Fluentd instance.
* Notes:
* The sockets provided by server helper are shared with the
new Fluentd instance.
* Input plugins managing a position such as `in_tail` should
not enable its `zero_downtime_restart_ready?`.
* Such input plugins do not cause data loss on restart, so
there is no need to enable this in the first place.
* `in_http` and `in_forward` could also be supported.
Not supporting them this time is simply a matter of time to
consider.
The appropriateness of replacing the traditional SIGUSR2:
* The traditional SIGUSR2 feature has some limitations and issues.
* Limitations:
1. A change to system_config is ignored because it needs to
restart(kill/spawn) process.
2. All plugins must not use class variable when restarting.
* Issues:
* #2259
* #3469
* #3549
* This new feature allows restarts without downtime and such
limitations.
* Although supported plugins are limited, that is not a
problem for many plugins.
(The problem is with server-based input plugins where the
stop results in data loss).
* This new feature has a big advantage that it can also be used
to update Fluentd.
* In the future, fluent-package will use this feature to allow
update with zero downtime by default.
* If needed, we can still use the traditional feature by RPC or
directly sending `SIGUSR2` to the workers.
Co-authored-by: Shizuo Fujita <[email protected]>
Co-authored-by: Kentaro Hayashi <[email protected]>
Signed-off-by: Daijiro Fukuda <[email protected]>
0 commit comments