Add native OpenTelemetry support for deploy logs by lewispb · Pull Request #1803 · basecamp/kamal

lewispb · 2026-03-18T18:08:41Z

Summary

Add an output logging framework that captures deploy output and ships it to configurable destinations.

output: config section with pluggable logger backends (OTel and file to start)
modify method replaces with_lock as the integration point for all infrastructure-modifying commands — scopes output capture and instruments ActiveSupport::Notifications for deploy lifecycle events
Output captured from two sources: CLI messages (Thor say) and remote command output (SSHKit formatter with per-host tagging)
ActiveSupport::BroadcastLogger fans out to multiple loggers
OTel logger ships log lines and structured lifecycle events (kamal.start/kamal.complete/kamal.failed) via OTLP HTTP, with OTel semantic convention attributes
File logger writes one timestamped log file per command
Log destinations printed on command completion
Best-effort — output logger failures never block or fail a deploy

Idea by @djmb

Usage

# config/deploy.yml
output:
  otel:
    endpoint: http://otel-gateway:4318
  file:
    path: tmp/kamal/

How it works

modify replaces with_lock across all CLI commands. Every modify call enables output logging. The outermost modify instruments modify.kamal via AS::Notifications and closes loggers on exit. Nests safely.
Kamal::Output::Formatter extends SSHKit's Pretty formatter to capture command output, tagging each line with server.address (host) and log.iostream (stdout/stderr) via thread-local context
Kamal::Output::OtelLogger subscribes to modify.kamal start/finish for lifecycle events, receives log lines via BroadcastLogger <<. SSHKit severity levels mapped to OTel severity numbers.
Kamal::Output::FileLogger opens a timestamped log file on start, discards pre-start lines, closes on finish
Kamal::OtelShipper buffers lines in a thread-safe Queue, flushes every 5s or when buffer reaches batch size, via OTLP HTTP to /v1/logs
OTel format follows semantic conventions: standard attributes (deployment.*, exception.*, service.*, server.address, log.iostream), custom attributes namespaced under kamal.* (kamal.command, kamal.runtime, kamal.deploy_version, kamal.performer, kamal.run_id)

Test plan

kamal deploy with output: otel: configured — verify log records in collector (integration test)
kamal deploy with output: file: configured — verify timestamped log file with full output (integration test)
Deploy without output: — verify no behavioral change
Deploy with unreachable OTel endpoint — verify deploy completes normally (unit test)
Per-host tagging and iostream attributes on stream output (integration + unit tests)
Log summary output on command completion (integration + unit tests)
Full non-integration test suite passes

🤖 Generated with Claude Code

Copilot

Pull request overview

This PR introduces opt-in OpenTelemetry (OTLP/HTTP) export for Kamal deploy logs and deploy lifecycle events, configured via a new otel.endpoint section in deploy.yml.

Changes:

Add Kamal::OtelShipper (OTLP/HTTP JSON exporter) and Kamal::TeeIo (stdout/stderr tee) to stream deploy output.
Add Kamal::Configuration::Otel + documentation stubs to parse/validate otel: config and derive resource attributes from service/destination.
Emit deploy.start / deploy.complete / deploy.failed lifecycle events from Kamal::Cli::Main and wire shipper setup into Kamal::Commander.

Tip

If you aren't ready for review, convert to a draft PR.
Click "Convert to draft" or run gh pr ready --undo.
Click "Ready for review" or run gh pr ready to reengage.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 12 comments.

Show a summary per file

File	Description
`test/configuration/otel_test.rb`	Adds unit coverage for OTEL config enablement and derived attributes.
`lib/kamal/tee_io.rb`	Implements an IO wrapper that duplicates output to the OTEL shipper.
`lib/kamal/otel_shipper.rb`	Adds an OTLP/HTTP JSON exporter with buffering + periodic flushing.
`lib/kamal/configuration/otel.rb`	Introduces `otel` config object (enabled via endpoint).
`lib/kamal/configuration/docs/otel.yml`	Adds documentation for the new `otel:` config section.
`lib/kamal/configuration/docs/configuration.yml`	Links the top-level config docs to the new OTEL section.
`lib/kamal/configuration.rb`	Instantiates `config.otel` as part of configuration loading.
`lib/kamal/commander.rb`	Wires shipper initialization into commander SSHKit setup and exposes OTEL helpers.
`lib/kamal/cli/main.rb`	Emits deploy lifecycle events and ensures shipper shutdown after commands.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

lib/kamal/cli/main.rb

lib/kamal/commander.rb

lib/kamal/otel_shipper.rb

lib/kamal/tee_io.rb

lib/kamal/commander.rb

lib/kamal/otel_shipper.rb

lib/kamal/cli/main.rb

Copilot

Pull request overview

This PR introduces native OpenTelemetry (OTLP/HTTP) log/event shipping for Kamal deploy workflows, configured via a new otel: section in config/deploy.yml, and implemented by teeing deploy output into a background batch shipper.

Tip

If you aren't ready for review, convert to a draft PR.
Click "Convert to draft" or run gh pr ready --undo.
Click "Ready for review" or run gh pr ready to reengage.

Changes:

Add Kamal::Configuration::Otel plus docs and tests for parsing/enabling otel.endpoint.
Add Kamal::OtelShipper (OTLP/HTTP JSON) and Kamal::TeeIo to stream CLI output to the shipper.
Emit deploy lifecycle events (deploy.start|complete|failed) from deploy, redeploy, and rollback, and wire shipper setup into Kamal::Commander.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
test/configuration/otel_test.rb	Adds unit tests for `otel` config enablement and derived attributes.
lib/kamal/tee_io.rb	Introduces IO wrapper that duplicates output into the OTel shipper.
lib/kamal/otel_shipper.rb	Implements background batching + OTLP/HTTP log export.
lib/kamal/configuration/otel.rb	Adds `otel` configuration object (endpoint + derived attrs).
lib/kamal/configuration/docs/otel.yml	Adds new documentation section for `otel:` config.
lib/kamal/configuration/docs/configuration.yml	Links top-level config docs to the new `otel:` section.
lib/kamal/configuration.rb	Instantiates `@otel` configuration and exposes it.
lib/kamal/commander.rb	Initializes the OTel shipper and swaps `$stdout/$stderr`; adds event/shutdown helpers.
lib/kamal/cli/main.rb	Emits deploy lifecycle events and ensures shipper shutdown for deploy commands.

Comments suppressed due to low confidence (1)

lib/kamal/cli/main.rb:108

In rollback, deploy.start is emitted before KAMAL.config.version is set to the rollback target. Since the shipper is initialized lazily on the first otel_event, the resource attribute service.version will reflect the pre-rollback version (e.g., git SHA) rather than the rollback version. If you want rollback events/logs attributed to the rollback version, set the version before the first otel_event, or include the effective version as a log/event attribute instead of (or in addition to) a resource attribute.

    KAMAL.otel_event("deploy.start", command: "rollback", rollback_version: version, hosts: KAMAL.hosts.join(","))

    runtime = print_runtime do
      with_lock do
        invoke_options = deploy_options

        KAMAL.config.version = version
        old_version = nil

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

lib/kamal/commander.rb

lib/kamal/cli/main.rb

lib/kamal/otel_shipper.rb

lib/kamal/tee_io.rb

lib/kamal/commander.rb

djmb

I think we could implement this in a way the also allows us to send output to files or other destinations.

We also probably want to log any command that might change the infrastructure.

And we need to consider nested commands - e.g you can run kamal deploy which runs kamal app boot, or you can run kamal app boot directly.

Firstly we'd need an integration point for all relevant commands, which we more or less already have with the with_lock method. This should already capture all the mutating commands, though with a slight wrinkle that we don't lock building a container. That's not an infrastructure change so that makes sense but if you are logging a deploy you would want the build logs as part of it.

So we could rename with_lock to modify and add a lock argument.

So we'd have something like:

def deploy
  modify do
    invoke "kamal:cli:build:deliver", [], invoke_options

    modify(lock: true) do
      run_hook "pre-deploy", secrets: true
      invoke "kamal:cli:app:boot", [], invoke_options
      # ...
    end
  end
end

Then the outer modify can trigger the output capture. This scopes capture to operations that change infrastructure — kamal config, kamal logs etc. don't use modify and aren't captured. All existing with_lock calls become modify(lock: true) and get output capture for free.

Notifications:

We already rely on ActiveSupport, so we could plug AS Notifications in. Something like:

def invoke_command(command, *args)
  ActiveSupport::Notifications.instrument("command.kamal",
    command: self.command, subcommand: subcommand) do
    super
  end
end

Log output:

We could use standard Ruby loggers here. The default output continues going directly to stdout, but if we hook into Kamal::Cli::Base#say and also have a custom SSHKit Formatter, we could use those to capture the output.

Then we could configure loggers, and the OTel collector is just an option.

Finally instead of a top-level otel: key, we'd configure one or more loggers

output:
  otel:
    endpoint: http://otel-gateway:4318
    level: debug
  file:
    path: /var/log/kamal/
    level: info

loggers would be a slightly better name for that key but it's too like the existing logging one.

So the otel logger would subscribe to the notifications to get the command information and timing, then it would also be passed the logs. We might also want a modify event, so it know when something important that it should track happens. I think that would be enough information to implement it as we need it?

lewispb · 2026-03-23T12:23:05Z

lib/kamal/cli/base.rb

+          command: command, subcommand: subcommand, hosts: KAMAL.hosts.join(","))
+
+        ActiveSupport::Notifications.instrument("modify.kamal",
+          command: command, subcommand: subcommand) do


No need to split these AS:N lines.

Copilot

Pull request overview

Adds an output-logging subsystem intended to ship deploy (and other infrastructure-modifying) command output and lifecycle events to OpenTelemetry (OTLP/HTTP) and/or local files, wired through ActiveSupport notifications and a broadcast logger.

Tip

If you aren't ready for review, convert to a draft PR.
Click "Convert to draft" or run gh pr ready --undo.
Click "Ready for review" or run gh pr ready to reengage.

Changes:

Introduces Kamal::OtelShipper and output loggers (Kamal::Output::OtelLogger, Kamal::Output::FileLogger) and config docs.
Adds output: configuration (with backward-compat from top-level otel:), and wires output logger setup into Kamal::Commander.
Wraps many CLI “modifying” commands with a new modify(...) helper that instruments start/complete notifications and attempts to capture CLI output.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
test/otel_shipper_test.rb	Adds unit tests for buffering and OTLP HTTP payload generation.
test/configuration/otel_test.rb	Tests top-level `otel:` configuration enablement/disablement.
lib/kamal/output/otel_logger.rb	New logger that subscribes to notifications and ships events + log lines via `OtelShipper`.
lib/kamal/output/file_logger.rb	New logger that writes captured lines into per-command log files.
lib/kamal/otel_shipper.rb	Implements best-effort OTLP/HTTP JSON log exporting with background flush thread.
lib/kamal/configuration/output.rb	Adds `output:` config wrapper and backward-compat mapping from top-level `otel:`.
lib/kamal/configuration/otel.rb	Adds/keeps top-level `otel:` config object.
lib/kamal/configuration/docs/output.yml	Documents `output:` configuration options.
lib/kamal/configuration/docs/otel.yml	Documents top-level `otel:` (deprecated) configuration.
lib/kamal/configuration/docs/configuration.yml	Adds references to `output:` and deprecated `otel:` sections.
lib/kamal/configuration.rb	Initializes new `otel` and `output` configuration objects.
lib/kamal/commander.rb	Adds broadcast logger plumbing and config-driven output logger setup.
lib/kamal/cli/base.rb	Adds `modify` wrapper + `say` override to feed messages into output logger and emit notifications.
lib/kamal/cli/main.rb	Wraps deploy/redeploy/rollback and related flows with `modify` instrumentation/locking.
lib/kamal/cli/server.rb	Switches locking wrapper from `with_lock` to `modify(lock: true)`.
lib/kamal/cli/prune.rb	Switches locking wrapper from `with_lock` to `modify(lock: true)`.
lib/kamal/cli/proxy.rb	Switches locking wrapper from `with_lock` to `modify(lock: true)`.
lib/kamal/cli/app.rb	Switches locking wrapper from `with_lock` to `modify(lock: true)` (including conditional-lock helper).
lib/kamal/cli/accessory.rb	Switches locking wrapper from `with_lock` to `modify(lock: true)`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

lib/kamal/cli/main.rb

Copilot · 2026-03-23T12:27:46Z

lib/kamal/commander.rb

+      if (otel_config = config.output.otel)
+        output_logger.broadcast_to(
+          Kamal::Output::OtelLogger.new(endpoint: otel_config["endpoint"], tags: tags)
+        )
+      end
+
+      if (file_config = config.output.file)


config.output.otel may be an empty Hash when the user sets output: { otel: {} }, because the configuration validator does not enforce presence of required keys. In that case otel_config["endpoint"] is nil and OtelLogger will start a shipper with an invalid endpoint (nil host), creating a background thread that continually attempts (and rescues) failed HTTP sends. Consider treating missing/blank endpoint as a configuration error or simply skipping OTel logger setup unless endpoint.present? (same idea for file.path).

Suggested change

if (otel_config = config.output.otel)

output_logger.broadcast_to(

Kamal::Output::OtelLogger.new(endpoint: otel_config["endpoint"], tags: tags)

)

end

if (file_config = config.output.file)

if (otel_config = config.output.otel.presence) && otel_config["endpoint"].present?

output_logger.broadcast_to(

Kamal::Output::OtelLogger.new(endpoint: otel_config["endpoint"], tags: tags)

)

end

if (file_config = config.output.file.presence) && file_config["path"].present?

Copilot · 2026-03-23T12:27:47Z

lib/kamal/otel_shipper.rb

+    formatted = attributes.any? ? "[#{name}] #{attributes.map { |k, v| "#{k}=#{v}" }.join(" ")}" : "[#{name}]"
+    self << formatted
+  end
+
+  def flush


event enqueues into @buffer even after shutdown has been called, unlike << which returns early when @running is false. This can lead to unexpected background buffering (and potentially a final flush shipping) after shutdown/close. Consider making event consistent with << by no-op'ing when @running is false.

Copilot · 2026-03-23T12:27:47Z

test/otel_shipper_test.rb

+    @shipper = Kamal::OtelShipper.new(
+      endpoint: "http://localhost:4318",
+      tags: @tags
+    )
+  end


These tests inspect the internal buffer via drain_buffer, but Kamal::OtelShipper starts a background flush thread on initialization that will periodically flush (draining the buffer) every 5s. That can make the buffer-based assertions time-dependent/flaky on slow CI runs. Consider allowing the shipper to be initialized without the background thread (or with an injected flush interval) for tests, or stubbing start_flush_thread in setup so the buffer contents are stable.

lib/kamal/cli/base.rb

Copilot · 2026-03-23T12:27:47Z

lib/kamal/configuration/docs/output.yml

+# When configured, Kamal streams all output from infrastructure-modifying
+# commands to the specified destinations.


The docs say Kamal "streams all output" from infrastructure-modifying commands, but the current implementation only forwards strings passed through Thor's say (via KAMAL.log), and does not hook SSHKit output or generic $stdout/$stderr writes. Either narrow the wording here to what is actually captured, or wire output shipping into SSHKit / stdout-stderr so the doc is accurate.

Copilot

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 9 comments.

Comments suppressed due to low confidence (1)

lib/kamal/output/otel_logger.rb:41

These notifications are emitted for every modify.kamal instrumentation, and many non-deploy commands were switched to use modify(...). As a result, deploy.start/deploy.complete/deploy.failed events will also be emitted for commands like proxy boot, prune, etc., which conflicts with the PR description claiming deploy lifecycle events only for deploy/redeploy/rollback. Either gate these events to those commands or rename the events to something command-agnostic (so consumers don’t interpret a prune as a deploy).

        error: "#{error_class}: #{error_message}")
    else
      @shipper.event("deploy.complete",
        command: payload[:command].to_s)
    end
  end

  def add(severity, message = nil, progname = nil, &block)
    msg = message || (block ? block.call : progname)
    @shipper << msg.to_s if msg
  end

  def close
    ActiveSupport::Notifications.unsubscribe(@subscription)
    @shipper.shutdown
  end
end

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-23T12:38:22Z

lib/kamal/otel_shipper.rb

+      req = Net::HTTP::Post.new(@endpoint.path, "Content-Type" => "application/json")
+      req.body = JSON.generate(payload)
+      http.request(req)
+    rescue


ship_records uses a bare rescue, which will also swallow non-StandardError exceptions (e.g., SystemExit, NoMemoryError, SignalException). For best-effort behavior without masking critical process-level exceptions, rescue StandardError explicitly.

Suggested change

rescue

rescue StandardError

lib/kamal/cli/main.rb

lib/kamal/cli/base.rb

lib/kamal/output/file_logger.rb

lib/kamal/configuration/docs/output.yml

lib/kamal/configuration/output.rb

lib/kamal/commander.rb

lib/kamal/output/otel_logger.rb

Copilot

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 8 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

lib/kamal/cli/base.rb

lib/kamal/configuration/docs/output.yml

lib/kamal/commander.rb

Copilot · 2026-03-23T13:13:49Z

lib/kamal/output/otel_logger.rb

+    @shipper.event("deploy.start",
+      command: payload[:command].to_s,
+      hosts: payload[:hosts].to_s)
+  end
+
+  def finish(name, id, payload)
+    if payload[:exception]
+      error_class, error_message = payload[:exception]
+      @shipper.event("deploy.failed",
+        command: payload[:command].to_s,
+        error: "#{error_class}: #{error_message}")
+    else
+      @shipper.event("deploy.complete",


start emits a deploy.start event for every modify.kamal instrumentation, including non-deploy commands (proxy/prune/accessory/etc.). That makes the event name misleading for downstream consumers. Consider using an event name derived from payload[:command]/payload[:subcommand], or emitting a generic kamal.modify.start + attributes instead of hard-coding deploy.*.

Suggested change

@shipper.event("deploy.start",

command: payload[:command].to_s,

hosts: payload[:hosts].to_s)

end

def finish(name, id, payload)

if payload[:exception]

error_class, error_message = payload[:exception]

@shipper.event("deploy.failed",

command: payload[:command].to_s,

error: "#{error_class}: #{error_message}")

else

@shipper.event("deploy.complete",

base_event = payload[:command].to_s == "deploy" ? "deploy" : "kamal.modify"

@shipper.event("#{base_event}.start",

command: payload[:command].to_s,

hosts: payload[:hosts].to_s)

end

def finish(name, id, payload)

base_event = payload[:command].to_s == "deploy" ? "deploy" : "kamal.modify"

if payload[:exception]

error_class, error_message = payload[:exception]

@shipper.event("#{base_event}.failed",

command: payload[:command].to_s,

error: "#{error_class}: #{error_message}")

else

@shipper.event("#{base_event}.complete",

Copilot · 2026-03-23T13:13:49Z

test/otel_shipper_test.rb

+    stub_request = lambda do |req|
+      request_body = JSON.parse(req.body)
+      Net::HTTPOK.new("1.1", "200", "OK")
+    end


In these stubs, the with { |req| ... } block is used as a parameter matcher; returning a response object from the block is non-idiomatic and can be brittle. Consider making the matcher block return a boolean (e.g., true after capturing/parsing), and set the stubbed return value via .returns(...) for clarity.

lib/kamal/configuration/output.rb

Copilot · 2026-03-23T13:13:50Z

lib/kamal/otel_shipper.rb

+  def shutdown
+    @running = false
+    @thread&.join(FLUSH_INTERVAL + 1.second)
+    flush
+  end


shutdown uses Thread#kill on the flush thread. If the thread is killed while inside flush (holding @flush_mutex), the mutex can remain locked and later flush/shutdown calls can deadlock. Prefer a cooperative shutdown (set a flag, wake/join the thread) so the thread exits cleanly without being terminated while holding the mutex.

lib/kamal/cli/base.rb

Copilot

Pull request overview

Copilot reviewed 29 out of 29 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-26T15:15:32Z

lib/kamal/commander.rb

+    self.logging = false
+    @modify_depth = 0
    @specifics = @specific_roles = @specific_hosts = nil
    @config = @config_kwargs = nil


reset clears @output_logger without closing it first. Since Kamal::Output::OtelLogger starts a background flush thread (and FileLogger may hold an open file), dropping the reference can leak threads/file handles and also bypass the at_exit { @output_logger&.close } cleanup after a reset (e.g., when running aliases). Consider closing any existing output logger before setting it to nil (and/or ensuring loggers are shut down on reset).

Suggested change

@config = @config_kwargs = nil

@config = @config_kwargs = nil

@output_logger&.close

Copilot · 2026-03-26T15:15:33Z

lib/kamal/commander.rb

+
+      SSHKit.config.output = Kamal::Output::Formatter.new($stdout, output_logger)
+
+      at_exit { @output_logger&.close }


configure_output_with registers an at_exit hook every time it runs. Since KAMAL.reset can lead to reconfiguration (e.g., aliases), this can accumulate multiple hooks and call close multiple times (or close a newer logger while older ones were leaked). Consider guarding so the hook is only registered once per process (e.g., via an ivar flag) and/or ensuring reset closes the previous logger.

Suggested change

at_exit { @output_logger&.close }

unless defined?(@@output_at_exit_hook_registered) && @@output_at_exit_hook_registered

at_exit { @output_logger&.close }

@@output_at_exit_hook_registered = true

end

Copilot · 2026-03-26T15:15:33Z

lib/kamal/commander.rb

+  end
+
+  def log(line)
+    output_logger << "#{line}\n" if logging


log writes to the broadcast output logger without any error isolation. If a backend logger raises (e.g., file write errors, shipper failures), this can bubble up from say and interrupt a deploy, which conflicts with the “best-effort, never block a deploy” goal. Consider rescuing exceptions here (and optionally disabling further logging after the first failure to avoid repeated errors).

Suggested change

output_logger << "#{line}\n" if logging

return unless logging

output_logger << "#{line}\n"

rescue => e

@logging = false

$stderr.puts "Log output failed: #{e.class}: #{e.message}"

$stderr.puts e.backtrace.join("\n") if ENV["VERBOSE"]

Copilot · 2026-03-26T15:15:34Z

lib/kamal/output/base_logger.rb

+  def start(name, id, payload)
+    @started_at = Process.clock_gettime(Process::CLOCK_MONOTONIC)
+    on_start(payload)
+  end
+
+  def finish(name, id, payload)
+    runtime = (Process.clock_gettime(Process::CLOCK_MONOTONIC) - @started_at).round(1)
+    on_finish(payload, runtime)
+  end
+
+  def add(severity, message = nil, progname = nil, &block)
+    if msg = message || (block ? block.call : progname)
+      self << msg.to_s
+    end
+  end
+
+  def close
+    ActiveSupport::Notifications.unsubscribe(@subscription)
+    on_close
+  end


start/finish/close call on_start/on_finish/on_close directly without guarding against exceptions. Since these callbacks may perform IO (file open/write, network shipper shutdown), a failure can propagate out of ActiveSupport::Notifications and potentially interrupt the command. To keep output logging best-effort, consider rescuing exceptions around these hook calls (and optionally logging a one-time warning to stderr).

Copilot · 2026-03-26T15:15:34Z

test/integration/main_test.rb

+
+        complete_event = completes.find { |e| event_attr(e, "command") == command }
+        assert complete_event, "Expected command.complete event for #{command}"
+        assert_match /\d+\.\d+s/, event_attr(complete_event, "runtime")


event_attr only reads value.stringValue, but runtime is sent as a numeric OTLP attribute (doubleValue) per Kamal::OtelShipper#typed_value and is asserted as numeric in test/otel_shipper_test.rb. As written, event_attr(complete_event, "runtime") will be nil and the assert_match will fail. Consider updating the helper to handle doubleValue/intValue (or changing the emitted runtime attribute to a string consistently).

Suggested change

assert_match /\d+\.\d+s/, event_attr(complete_event, "runtime")

runtime = event_attr(complete_event, "runtime")

unless runtime

runtime_attr = complete_event["attributes"]&.find { |a| a["key"] == "runtime" }

runtime = runtime_attr && (runtime_attr.dig("value", "doubleValue") || runtime_attr.dig("value", "intValue"))

end

assert runtime, "Expected runtime attribute on #{command} complete event"

assert_match /\d+\.\d+s/, "#{runtime}s"

Copilot

Pull request overview

Copilot reviewed 29 out of 29 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-27T09:24:55Z

lib/kamal/otel_shipper.rb

+
+    def drain_buffer
+      records = []
+      records << @buffer.pop(true) until @buffer.empty?


drain_buffer uses a post-test until loop (records << @buffer.pop(true) until @buffer.empty?), which will always attempt at least one non-blocking pop even when the queue is empty. This will raise ThreadError on empty buffers (e.g., periodic flushes when nothing was logged), potentially killing the flush thread or breaking flush/shutdown. Rewrite the loop to only pop while the queue is non-empty (or handle ThreadError explicitly).

Suggested change

records << @buffer.pop(true) until @buffer.empty?

loop do

begin

records << @buffer.pop(true)

rescue ThreadError

break

end

end

Copilot · 2026-03-27T09:24:55Z

lib/kamal/output/file_logger.rb

+      command = [ payload[:command], payload[:subcommand] ].compact.join("_")
+      [ Time.now.strftime("%Y-%m-%dT%H-%M-%S"), payload[:destination], command ].compact.join("_") + ".log"
+    end


filename_for incorporates payload[:destination] (user-provided via --destination) and command/subcommand directly into the filename. If these include path separators or .., path.join(...) can write outside the intended log directory (path traversal). Also, the timestamp only has second resolution, so two deploys started within the same second can collide and append to the same file, violating the “one file per deploy” expectation. Sanitize each filename segment to a safe subset and add a uniqueness component (e.g., monotonic ns / pid / random suffix).

Suggested change

command = [ payload[:command], payload[:subcommand] ].compact.join("_")

[ Time.now.strftime("%Y-%m-%dT%H-%M-%S"), payload[:destination], command ].compact.join("_") + ".log"

end

time = Time.now

timestamp = time.strftime("%Y-%m-%dT%H-%M-%S")

unique_suffix = "#{Process.pid}-#{time.nsec}"

command = [ payload[:command], payload[:subcommand] ].compact.join("_")

destination_segment = sanitize_segment(payload[:destination]) if payload[:destination]

command_segment = sanitize_segment(command) unless command.empty?

[ timestamp, unique_suffix, destination_segment, command_segment ].compact.join("_") + ".log"

end

def sanitize_segment(value)

segment = value.to_s

# Allow only a conservative set of characters to avoid path traversal and

# other filesystem-related issues; replace everything else with "_".

sanitized = segment.gsub(/[^A-Za-z0-9_.-]/, "_")

sanitized.empty? ? "unknown" : sanitized

end

Copilot

Pull request overview

Copilot reviewed 29 out of 29 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 29 out of 29 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-27T11:36:03Z

lib/kamal/commander.rb

+  ensure
+    output_logger.close if modify_finished
+  end
+
+  def log(line)
+    output_logger << "#{line}\n" if logging
+  end


The PR description says output logging failures should never block or fail a deploy, but output_logger.close (in modify) and output_logger << ... (in log) can raise (e.g., file permission/disk full, IO errors) and would currently abort the command. Wrap these operations in a best-effort rescue (ideally logging the first failure to stderr) so deploys continue even if a backend logger fails.

Copilot · 2026-03-27T11:36:03Z

test/integration/integration_test.rb

    docker_compose "up --build -d"
    wait_for_healthy
    setup_deployer
+    deployer_exec("sh -c 'rm -f /tmp/otel/*.json'", workdir: "/")


Integration containers appear to be reused across multiple integration test cases, and the output config writes deploy logs to /tmp/kamal-deploy-logs. Only /tmp/otel/*.json is cleared here; if prior tests left *_deploy.log files, assert_deploy_log "*_deploy.log" in MainTest can end up reading multiple concatenated logs via the wildcard and become flaky/false-positive. Consider also removing /tmp/kamal-deploy-logs/*.log (or recreating the directory) during setup.

Suggested change

deployer_exec("sh -c 'rm -f /tmp/otel/*.json'", workdir: "/")

deployer_exec("sh -c 'rm -f /tmp/otel/*.json /tmp/kamal-deploy-logs/*.log'", workdir: "/")

Copilot

Pull request overview

Copilot reviewed 29 out of 29 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-27T13:33:03Z

test/integration/integration_test.rb

+    end
+
+    def otel_payloads
+      files = deployer_exec("sh -c 'ls /tmp/otel/*.json'", capture: true).strip.split("\n")


otel_payloads shells out to ls /tmp/otel/*.json; when there are no matching files yet, ls exits non-zero and docker_compose raises. Since wait_for_otel_events calls otel_events in a loop, this can make the wait logic fail immediately instead of waiting. Consider making the file listing resilient to an empty directory (e.g., redirect errors / use || true / use find), returning an empty list until payloads exist.

Suggested change

files = deployer_exec("sh -c 'ls /tmp/otel/*.json'", capture: true).strip.split("\n")

files = deployer_exec("sh -c 'find /tmp/otel -maxdepth 1 -name \"*.json\" -print'", capture: true).strip.split("\n")

Copilot · 2026-03-27T13:33:03Z

lib/kamal/configuration/output.rb

+  private
+    def build_loggers
+      output_config.filter_map do |key, settings|
+        if (klass_name = LOGGER_TYPES[key])


build_loggers assumes each logger config value is a Hash. In YAML, a key like otel: (with no nested mapping) is parsed as nil, which would cause settings["endpoint"] inside the logger .build method to raise a NoMethodError instead of the intended ArgumentError with a helpful message. Consider normalizing settings to {} (or raising a clearer error) before calling .build.

Suggested change

if (klass_name = LOGGER_TYPES[key])

if (klass_name = LOGGER_TYPES[key])

settings = {} if settings.nil?

Copilot · 2026-03-27T13:33:04Z

test/output/file_logger_test.rb

+  test "discards lines before start" do
+    @logger << "before modify\n"
+
+    assert_empty log_files


The PR description mentions the file logger “buffers pre-modify lines”, but the current implementation/test suite explicitly discards lines written before start (and FileLogger#<< is a no-op until a file is opened). Either implement buffering (so early CLI output isn’t lost) or update the PR description to match the actual behavior.

Copilot

Pull request overview

Copilot reviewed 29 out of 29 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-27T13:53:25Z

lib/kamal/output/file_logger.rb

+    super()
+  end
+
+  def <<(message)
+    @file&.print(message)
+  end
+
+  private
+    def on_start(payload)
+      path.mkpath
+      @file = File.open(path.join(filename_for(payload)), "a")
+      @file.sync = true
+    end
+
+    def on_finish(payload, runtime)
+      if @file
+        if payload[:exception]
+          error_class, error_message = payload[:exception]
+          @file.puts "# FAILED: #{error_class}: #{error_message} (#{runtime}s)"
+        else
+          @file.puts "# Completed in #{runtime}s"
+        end
+        @file.close
+        @file = nil
+      end
+    end
+
+    def on_close
+      if @file
+        @file.close
+        @file = nil


File operations here (mkpath, File.open, print, puts, close) can raise (permissions, ENOSPC, invalid path), which would currently propagate via the notification subscriber and potentially abort the deploy. To keep output logging best-effort, catch and swallow backend I/O errors (optionally logging the first failure to stderr and disabling this logger for the remainder of the run).

Suggested change

super()

end

def <<(message)

@file&.print(message)

end

private

def on_start(payload)

path.mkpath

@file = File.open(path.join(filename_for(payload)), "a")

@file.sync = true

end

def on_finish(payload, runtime)

if @file

if payload[:exception]

error_class, error_message = payload[:exception]

@file.puts "# FAILED: #{error_class}: #{error_message} (#{runtime}s)"

else

@file.puts "# Completed in #{runtime}s"

end

@file.close

@file = nil

end

end

def on_close

if @file

@file.close

@file = nil

@io_failed = false

super()

end

def <<(message)

safe_io do

@file&.print(message)

end

end

private

def safe_io

return if @io_failed

yield

rescue SystemCallError, IOError, StandardError => e

unless @io_failed

@io_failed = true

begin

warn "Kamal::Output::FileLogger disabled due to I/O error: #{e.class}: #{e.message}"

rescue StandardError

# Ignore errors writing to stderr

end

end

nil

end

def on_start(payload)

safe_io do

path.mkpath

@file = File.open(path.join(filename_for(payload)), "a")

@file.sync = true

end

end

def on_finish(payload, runtime)

safe_io do

if @file

if payload[:exception]

error_class, error_message = payload[:exception]

@file.puts "# FAILED: #{error_class}: #{error_message} (#{runtime}s)"

else

@file.puts "# Completed in #{runtime}s"

end

@file.close

@file = nil

end

end

end

def on_close

safe_io do

if @file

@file.close

@file = nil

end

Copilot · 2026-03-27T13:53:26Z

lib/kamal/output/base_logger.rb

+  def start(name, id, payload)
+    @started_at = Process.clock_gettime(Process::CLOCK_MONOTONIC)
+    on_start(payload)
+  end
+
+  def finish(name, id, payload)
+    runtime = (Process.clock_gettime(Process::CLOCK_MONOTONIC) - @started_at).round(1)
+    on_finish(payload, runtime)
+  end


start/finish call on_start/on_finish directly; if a logger backend raises (e.g., file permission errors), it will bubble up through ActiveSupport::Notifications and can fail the deploy despite the PR’s stated “best-effort” behavior. Consider rescuing exceptions around these callback hooks (and optionally logging the first failure to stderr) so output backends can never abort the instrumented operation.

Introduce a pluggable output logging system that captures deploy output and ships it to configurable destinations via a new `output:` section in deploy.yml. output: otel: endpoint: http://otel-gateway:4318 file: path: /var/log/kamal/ OTel backend ships OTLP log records via HTTP to /v1/logs. Structured lifecycle events (kamal.start/complete/failed) mark command boundaries with deployment metadata. Stream output lines are tagged per-host (server.address) and by stream type (log.iostream). Format follows OTel semantic conventions: standard attributes where applicable (deployment.*, exception.*, service.*), custom attributes namespaced under kamal.*. File backend writes one timestamped log file per command (e.g. 20260327_143022_production_deploy.log) with a completion/failure summary line at the end. Key design decisions: - `modify` method replaces `with_lock` as the integration point for all infrastructure-modifying commands, scoping output capture and instrumenting deploy lifecycle via ActiveSupport::Notifications - SSHKit output captured via custom Formatter with thread-local context for per-host and stream type tagging - Best-effort: output logger failures never block or fail a deploy Co-Authored-By: Donal McBreen <donal@37signals.com>

lewispb force-pushed the otel-deploy-logs branch 4 times, most recently from de7c524 to 672a6a6 Compare March 18, 2026 18:36

lewispb marked this pull request as ready for review March 18, 2026 19:07

Copilot AI review requested due to automatic review settings March 18, 2026 19:07

lewispb force-pushed the otel-deploy-logs branch from 672a6a6 to 4d2f520 Compare March 18, 2026 19:08

Copilot AI reviewed Mar 18, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings March 18, 2026 19:18

Copilot AI reviewed Mar 18, 2026

View reviewed changes

lewispb requested review from Copilot, djmb and jeremy March 18, 2026 19:28

This comment was marked as outdated.

Sign in to view

lewispb commented Mar 20, 2026

View reviewed changes

lib/kamal/commander.rb Outdated Show resolved Hide resolved

djmb self-requested a review March 20, 2026 11:02

djmb requested changes Mar 20, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings March 23, 2026 12:21

Copilot started reviewing on behalf of lewispb March 23, 2026 12:21 View session

lewispb commented Mar 23, 2026

View reviewed changes

Copilot AI reviewed Mar 23, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings March 23, 2026 12:31

Copilot started reviewing on behalf of lewispb March 23, 2026 12:31 View session

Copilot AI reviewed Mar 23, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings March 23, 2026 13:07

Copilot started reviewing on behalf of lewispb March 23, 2026 13:07 View session

Copilot AI reviewed Mar 23, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings March 23, 2026 14:40

Copilot started reviewing on behalf of lewispb March 23, 2026 14:41 View session

Copilot AI review requested due to automatic review settings March 26, 2026 15:07

Copilot started reviewing on behalf of djmb March 26, 2026 15:07 View session

Copilot AI reviewed Mar 26, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings March 27, 2026 09:17

Copilot started reviewing on behalf of djmb March 27, 2026 09:18 View session