Skip to content
Merged
6 changes: 6 additions & 0 deletions context/kubernetes-integration.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,3 +37,9 @@ spec:
periodSeconds: 5
failureThreshold: 12
```

## Graceful Shutdown

Controllers handle `SIGTERM` gracefully (same as `SIGINT`). This ensures proper graceful shutdown when Kubernetes terminates pods during rolling updates, scaling down, or pod eviction.

**Note**: Kubernetes sends `SIGTERM` to containers when terminating pods. With graceful handling, your application will have time to clean up resources, finish in-flight requests, and shut down gracefully before Kubernetes sends `SIGKILL` (after the termination grace period).
6 changes: 6 additions & 0 deletions context/systemd-integration.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,3 +20,9 @@ WantedBy=multi-user.target
```

Ensure `Type=notify` is set, so that the service can notify systemd when it is ready.

## Graceful Shutdown

Controllers handle `SIGTERM` gracefully (same as `SIGINT`). This ensures proper graceful shutdown when systemd stops the service.

**Note**: systemd sends `SIGTERM` to services when stopping them. With graceful handling, your application will have time to clean up resources, finish in-flight requests, and shut down gracefully before systemd escalates to `SIGKILL` (after the timeout specified in the service file).
21 changes: 3 additions & 18 deletions fixtures/async/container/controllers/graceful.rb
Original file line number Diff line number Diff line change
Expand Up @@ -12,34 +12,19 @@ class Graceful < Async::Container::Controller
def setup(container)
container.run(name: "graceful", count: 1, restart: true) do |instance|
instance.ready!

# This is to avoid race conditions in the controller in test conditions.
sleep 0.001

clock = Async::Clock.start

original_action = Signal.trap(:INT) do
# We ignore the int, but in practical applications you would want start a graceful shutdown.
$stdout.puts "Graceful shutdown...", clock.total

Signal.trap(:INT, original_action)
end

$stdout.puts "Ready...", clock.total
$stdout.puts "Ready..."

sleep
ensure
$stdout.puts "Exiting...", clock.total
$stdout.puts "Exiting..."
end
end
end

controller = Graceful.new(graceful_stop: 0.01)
controller = Graceful.new

begin
controller.run
rescue Async::Container::Terminate
$stdout.puts "Terminated..."
rescue Interrupt
$stdout.puts "Interrupted..."
end
6 changes: 6 additions & 0 deletions guides/kubernetes-integration/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,3 +37,9 @@ spec:
periodSeconds: 5
failureThreshold: 12
```

## Graceful Shutdown

Controllers handle `SIGTERM` gracefully (same as `SIGINT`). This ensures proper graceful shutdown when Kubernetes terminates pods during rolling updates, scaling down, or pod eviction.

**Note**: Kubernetes sends `SIGTERM` to containers when terminating pods. With graceful handling, your application will have time to clean up resources, finish in-flight requests, and shut down gracefully before Kubernetes sends `SIGKILL` (after the termination grace period).
6 changes: 6 additions & 0 deletions guides/systemd-integration/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,3 +20,9 @@ WantedBy=multi-user.target
```

Ensure `Type=notify` is set, so that the service can notify systemd when it is ready.

## Graceful Shutdown

Controllers handle `SIGTERM` gracefully (same as `SIGINT`). This ensures proper graceful shutdown when systemd stops the service.

**Note**: systemd sends `SIGTERM` to services when stopping them. With graceful handling, your application will have time to clean up resources, finish in-flight requests, and shut down gracefully before systemd escalates to `SIGKILL` (after the timeout specified in the service file).
17 changes: 10 additions & 7 deletions lib/async/container/controller.rb
Original file line number Diff line number Diff line change
Expand Up @@ -134,9 +134,6 @@ def restart
if container.failed?
@notify&.error!("Container failed to start!")

Console.info(self, "Stopping failed container...")
container.stop(false)

raise SetupError, container
end

Expand All @@ -151,9 +148,14 @@ def restart
end

@notify&.ready!(size: @container.size)
rescue => error
raise
ensure
# If we are leaving this function with an exception, try to kill the container:
container&.stop(false)
# If we are leaving this function with an exception, kill the container:
if container
Console.warn(self, "Stopping failed container...", exception: error)
container.stop(false)
end
end

# Reload the existing container. Children instances will be reloaded using `SIGHUP`.
Expand Down Expand Up @@ -222,9 +224,10 @@ def run
::Thread.current.raise(Interrupt)
end

# SIGTERM behaves the same as SIGINT by default.
terminate_action = Signal.trap(:TERM) do
# $stderr.puts "Received TERM signal, terminating...", caller
::Thread.current.raise(Terminate)
# $stderr.puts "Received TERM signal, interrupting...", caller
::Thread.current.raise(Interrupt) # Same as SIGINT

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not change the rescue Terminate block instead of changing the raised exception?

Copy link
Contributor Author

@samuel-williams-shopify samuel-williams-shopify Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be possible, but Async itself specifically handles Interrupt differently from Terminate.

Interrupt triggers a graceful stop of the Async scheduler: https://github.com/socketry/async/blob/400bc2e1fedcaa72ae1fa86d5c8d7e1962c1f889/lib/async/scheduler.rb#L534-L543

I think we want to preserve this behaviour, and if we raised Terminate it would cause an ungraceful exit.

Later on, I think we can revisit this if necessary.

end

hangup_action = Signal.trap(:HUP) do
Expand Down
2 changes: 1 addition & 1 deletion lib/async/container/forked.rb
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ def self.fork(**options)
::Process.fork do
# We use `Thread.current.raise(...)` so that exceptions are filtered through `Thread.handle_interrupt` correctly.
Signal.trap(:INT){::Thread.current.raise(Interrupt)}
Signal.trap(:TERM){::Thread.current.raise(Terminate)}
Signal.trap(:TERM){::Thread.current.raise(Interrupt)} # Same as SIGINT.
Signal.trap(:HUP){::Thread.current.raise(Restart)}

# This could be a configuration option:
Expand Down
2 changes: 1 addition & 1 deletion lib/async/container/generic.rb
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,7 @@ def wait_until_ready
self.sleep

if self.status?(:ready)
Console.logger.debug(self) do |buffer|
Console.debug(self) do |buffer|
buffer.puts "All ready:"
@state.each do |child, state|
buffer.puts "\t#{child.inspect}: #{state}"
Expand Down
70 changes: 33 additions & 37 deletions lib/async/container/group.rb
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,20 @@

module Async
module Container
# The default timeout for interrupting processes, before escalating to terminating.
INTERRUPT_TIMEOUT = ENV.fetch("ASYNC_CONTAINER_INTERRUPT_TIMEOUT", 10).to_f

# The default timeout for terminating processes, before escalating to killing.
TERMINATE_TIMEOUT = ENV.fetch("ASYNC_CONTAINER_TERMINATE_TIMEOUT", 10).to_f
GRACEFUL_TIMEOUT = ENV.fetch("ASYNC_CONTAINER_GRACEFUL_TIMEOUT", "true").then do |value|
case value
when "true"
true # Default timeout for graceful termination.
when "false"
false # Immediately kill the processes.
else
value.to_f
end
end

# The default timeout for graceful termination.
DEFAULT_GRACEFUL_TIMEOUT = 10.0

# Manages a group of running processes.
class Group
Expand Down Expand Up @@ -155,50 +164,37 @@ def kill
# Stop all child processes with a multi-phase shutdown sequence.
#
# A graceful shutdown performs the following sequence:
# 1. Send SIGINT and wait up to `interrupt_timeout` seconds
# 2. Send SIGTERM and wait up to `terminate_timeout` seconds
# 3. Send SIGKILL and wait indefinitely for process cleanup
# 1. Send SIGINT and wait up to `graceful` seconds if specified.
# 2. Send SIGKILL and wait indefinitely for process cleanup.
#
# If `graceful` is false, skips the SIGINT phase and goes directly to SIGTERM → SIGKILL.
# If `graceful` is true, default to `DEFAULT_GRACEFUL_TIMEOUT` (10 seconds).
# If `graceful` is false, skip the SIGINT phase and go directly to SIGKILL.
#
# @parameter graceful [Boolean] Whether to send SIGINT first or skip directly to SIGTERM.
# @parameter interrupt_timeout [Numeric | Nil] Time to wait after SIGINT before escalating to SIGTERM.
# @parameter terminate_timeout [Numeric | Nil] Time to wait after SIGTERM before escalating to SIGKILL.
def stop(graceful = true, interrupt_timeout: INTERRUPT_TIMEOUT, terminate_timeout: TERMINATE_TIMEOUT)
case graceful
when true
# Use defaults.
when false
interrupt_timeout = nil
when Numeric
interrupt_timeout = graceful
terminate_timeout = graceful
end

Console.debug(self, "Stopping all processes...", interrupt_timeout: interrupt_timeout, terminate_timeout: terminate_timeout)
# @parameter graceful [Boolean | Numeric] Whether to send SIGINT first or skip directly to SIGKILL.
def stop(graceful = GRACEFUL_TIMEOUT)
Console.info(self, "Stopping all processes...", graceful: graceful)

# If a timeout is specified, interrupt the children first:
if interrupt_timeout
clock = Async::Clock.start

# Interrupt the children:
if graceful
# Send SIGINT to the children:
self.interrupt

# Wait for the children to exit:
self.wait_for_exit(clock, interrupt_timeout)
end

if terminate_timeout and self.any?
clock = Async::Clock.start
if graceful == true
graceful = DEFAULT_GRACEFUL_TIMEOUT
end

# If the children are still running, terminate them:
self.terminate
clock = Clock.start

# Wait for the children to exit:
self.wait_for_exit(clock, terminate_timeout)
self.wait_for_exit(clock, graceful)
end

ensure
# Do our best to clean up the children:
if any?
if graceful
Console.warn(self, "Killing processes after graceful shutdown failed...", size: self.size, clock: clock)
end

self.kill
self.wait
end
Expand Down
2 changes: 1 addition & 1 deletion lib/async/container/notify/pipe.rb
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ def send(**message)
@io.flush
end

private
private

def environment_for(arguments)
# Insert or duplicate the environment hash which is the first argument:
Expand Down
9 changes: 5 additions & 4 deletions readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,11 @@ Please see the [project documentation](https://socketry.github.io/async-containe

Please see the [project releases](https://socketry.github.io/async-container/releases/index) for all releases.

### Unreleased

- `SIGTERM` is now graceful, the same as `SIGINT`, for better compatibility with Kubernetes and systemd.
- `ASYNC_CONTAINER_INTERRUPT_TIMEOUT` and `ASYNC_CONTAINER_TERMINATE_TIMEOUT` are removed and replaced by `ASYNC_CONTAINER_GRACEFUL_TIMEOUT`.

### v0.29.0

- Introduce `Client#healthy!` for sending health check messages.
Expand Down Expand Up @@ -65,10 +70,6 @@ Please see the [project releases](https://socketry.github.io/async-container/rel

- [Production Reliability Improvements](https://socketry.github.io/async-container/releases/index#production-reliability-improvements)

### v0.25.0

- Introduce `async:container:notify:log:ready?` task for detecting process readiness.

## Contributing

We welcome contributions to this project.
Expand Down
5 changes: 5 additions & 0 deletions releases.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
# Releases

## Unreleased

- `SIGTERM` is now graceful, the same as `SIGINT`, for better compatibility with Kubernetes and systemd.
- `ASYNC_CONTAINER_INTERRUPT_TIMEOUT` and `ASYNC_CONTAINER_TERMINATE_TIMEOUT` are removed and replaced by `ASYNC_CONTAINER_GRACEFUL_TIMEOUT`.

## v0.29.0

- Introduce `Client#healthy!` for sending health check messages.
Expand Down
15 changes: 5 additions & 10 deletions test/async/container/controller.rb
Original file line number Diff line number Diff line change
Expand Up @@ -120,19 +120,12 @@ def after(error = nil)
super
end

it "has graceful shutdown" do
it "triggers graceful shutdown" do
expect(input.gets).to be == "Ready...\n"
start_time = input.gets.to_f

Process.kill(:INT, @pid)

expect(input.gets).to be == "Graceful shutdown...\n"
graceful_shutdown_time = input.gets.to_f

expect(input.gets).to be == "Exiting...\n"
exit_time = input.gets.to_f

expect(exit_time - graceful_shutdown_time).to be >= 0.01
end
end

Expand Down Expand Up @@ -164,7 +157,8 @@ def after(error = nil)

Process.kill(:INT, @pid)

expect(input.gets).to be == "Exiting...\n"
# It was killed:
expect(input.gets).to be_nil
end
end

Expand Down Expand Up @@ -213,7 +207,8 @@ def after(error = nil)

Process.kill(:TERM, pid)

expect(input.read).to be == "T"
# SIGTERM now behaves like SIGINT (graceful)
expect(input.read).to be == "I"
end
end

Expand Down
Loading