Skip to content

Add native OpenTelemetry support for deploy logs#1803

Merged
djmb merged 1 commit intomainfrom
otel-deploy-logs
Mar 27, 2026
Merged

Add native OpenTelemetry support for deploy logs#1803
djmb merged 1 commit intomainfrom
otel-deploy-logs

Conversation

@lewispb
Copy link
Copy Markdown
Member

@lewispb lewispb commented Mar 18, 2026

Summary

Add an output logging framework that captures deploy output and ships it to configurable destinations.

  • output: config section with pluggable logger backends (OTel and file to start)
  • modify method replaces with_lock as the integration point for all infrastructure-modifying commands — scopes output capture and instruments ActiveSupport::Notifications for deploy lifecycle events
  • Output captured from two sources: CLI messages (Thor say) and remote command output (SSHKit formatter with per-host tagging)
  • ActiveSupport::BroadcastLogger fans out to multiple loggers
  • OTel logger ships log lines and structured lifecycle events (kamal.start/kamal.complete/kamal.failed) via OTLP HTTP, with OTel semantic convention attributes
  • File logger writes one timestamped log file per command
  • Log destinations printed on command completion
  • Best-effort — output logger failures never block or fail a deploy

Idea by @djmb

Usage

# config/deploy.yml
output:
  otel:
    endpoint: http://otel-gateway:4318
  file:
    path: tmp/kamal/

How it works

  • modify replaces with_lock across all CLI commands. Every modify call enables output logging. The outermost modify instruments modify.kamal via AS::Notifications and closes loggers on exit. Nests safely.
  • Kamal::Output::Formatter extends SSHKit's Pretty formatter to capture command output, tagging each line with server.address (host) and log.iostream (stdout/stderr) via thread-local context
  • Kamal::Output::OtelLogger subscribes to modify.kamal start/finish for lifecycle events, receives log lines via BroadcastLogger <<. SSHKit severity levels mapped to OTel severity numbers.
  • Kamal::Output::FileLogger opens a timestamped log file on start, discards pre-start lines, closes on finish
  • Kamal::OtelShipper buffers lines in a thread-safe Queue, flushes every 5s or when buffer reaches batch size, via OTLP HTTP to /v1/logs
  • OTel format follows semantic conventions: standard attributes (deployment.*, exception.*, service.*, server.address, log.iostream), custom attributes namespaced under kamal.* (kamal.command, kamal.runtime, kamal.deploy_version, kamal.performer, kamal.run_id)

Test plan

  • kamal deploy with output: otel: configured — verify log records in collector (integration test)
  • kamal deploy with output: file: configured — verify timestamped log file with full output (integration test)
  • Deploy without output: — verify no behavioral change
  • Deploy with unreachable OTel endpoint — verify deploy completes normally (unit test)
  • Per-host tagging and iostream attributes on stream output (integration + unit tests)
  • Log summary output on command completion (integration + unit tests)
  • Full non-integration test suite passes

🤖 Generated with Claude Code

@lewispb lewispb force-pushed the otel-deploy-logs branch 4 times, most recently from de7c524 to 672a6a6 Compare March 18, 2026 18:36
@lewispb lewispb marked this pull request as ready for review March 18, 2026 19:07
Copilot AI review requested due to automatic review settings March 18, 2026 19:07
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces opt-in OpenTelemetry (OTLP/HTTP) export for Kamal deploy logs and deploy lifecycle events, configured via a new otel.endpoint section in deploy.yml.

Changes:

  • Add Kamal::OtelShipper (OTLP/HTTP JSON exporter) and Kamal::TeeIo (stdout/stderr tee) to stream deploy output.
  • Add Kamal::Configuration::Otel + documentation stubs to parse/validate otel: config and derive resource attributes from service/destination.
  • Emit deploy.start / deploy.complete / deploy.failed lifecycle events from Kamal::Cli::Main and wire shipper setup into Kamal::Commander.

Tip

If you aren't ready for review, convert to a draft PR.
Click "Convert to draft" or run gh pr ready --undo.
Click "Ready for review" or run gh pr ready to reengage.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
test/configuration/otel_test.rb Adds unit coverage for OTEL config enablement and derived attributes.
lib/kamal/tee_io.rb Implements an IO wrapper that duplicates output to the OTEL shipper.
lib/kamal/otel_shipper.rb Adds an OTLP/HTTP JSON exporter with buffering + periodic flushing.
lib/kamal/configuration/otel.rb Introduces otel config object (enabled via endpoint).
lib/kamal/configuration/docs/otel.yml Adds documentation for the new otel: config section.
lib/kamal/configuration/docs/configuration.yml Links the top-level config docs to the new OTEL section.
lib/kamal/configuration.rb Instantiates config.otel as part of configuration loading.
lib/kamal/commander.rb Wires shipper initialization into commander SSHKit setup and exposes OTEL helpers.
lib/kamal/cli/main.rb Emits deploy lifecycle events and ensures shipper shutdown after commands.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings March 18, 2026 19:18
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces native OpenTelemetry (OTLP/HTTP) log/event shipping for Kamal deploy workflows, configured via a new otel: section in config/deploy.yml, and implemented by teeing deploy output into a background batch shipper.

Tip

If you aren't ready for review, convert to a draft PR.
Click "Convert to draft" or run gh pr ready --undo.
Click "Ready for review" or run gh pr ready to reengage.

Changes:

  • Add Kamal::Configuration::Otel plus docs and tests for parsing/enabling otel.endpoint.
  • Add Kamal::OtelShipper (OTLP/HTTP JSON) and Kamal::TeeIo to stream CLI output to the shipper.
  • Emit deploy lifecycle events (deploy.start|complete|failed) from deploy, redeploy, and rollback, and wire shipper setup into Kamal::Commander.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
test/configuration/otel_test.rb Adds unit tests for otel config enablement and derived attributes.
lib/kamal/tee_io.rb Introduces IO wrapper that duplicates output into the OTel shipper.
lib/kamal/otel_shipper.rb Implements background batching + OTLP/HTTP log export.
lib/kamal/configuration/otel.rb Adds otel configuration object (endpoint + derived attrs).
lib/kamal/configuration/docs/otel.yml Adds new documentation section for otel: config.
lib/kamal/configuration/docs/configuration.yml Links top-level config docs to the new otel: section.
lib/kamal/configuration.rb Instantiates @otel configuration and exposes it.
lib/kamal/commander.rb Initializes the OTel shipper and swaps $stdout/$stderr; adds event/shutdown helpers.
lib/kamal/cli/main.rb Emits deploy lifecycle events and ensures shipper shutdown for deploy commands.
Comments suppressed due to low confidence (1)

lib/kamal/cli/main.rb:108

  • In rollback, deploy.start is emitted before KAMAL.config.version is set to the rollback target. Since the shipper is initialized lazily on the first otel_event, the resource attribute service.version will reflect the pre-rollback version (e.g., git SHA) rather than the rollback version. If you want rollback events/logs attributed to the rollback version, set the version before the first otel_event, or include the effective version as a log/event attribute instead of (or in addition to) a resource attribute.
    KAMAL.otel_event("deploy.start", command: "rollback", rollback_version: version, hosts: KAMAL.hosts.join(","))

    runtime = print_runtime do
      with_lock do
        invoke_options = deploy_options

        KAMAL.config.version = version
        old_version = nil

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@lewispb lewispb requested review from Copilot, djmb and jeremy March 18, 2026 19:28

This comment was marked as outdated.

djmb

This comment was marked as outdated.

@djmb djmb self-requested a review March 20, 2026 11:02
Copy link
Copy Markdown
Collaborator

@djmb djmb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could implement this in a way the also allows us to send output to files or other destinations.

We also probably want to log any command that might change the infrastructure.

And we need to consider nested commands - e.g you can run kamal deploy which runs kamal app boot, or you can run kamal app boot directly.

Firstly we'd need an integration point for all relevant commands, which we more or less already have with the with_lock method. This should already capture all the mutating commands, though with a slight wrinkle that we don't lock building a container. That's not an infrastructure change so that makes sense but if you are logging a deploy you would want the build logs as part of it.

So we could rename with_lock to modify and add a lock argument.

So we'd have something like:

def deploy
  modify do
    invoke "kamal:cli:build:deliver", [], invoke_options

    modify(lock: true) do
      run_hook "pre-deploy", secrets: true
      invoke "kamal:cli:app:boot", [], invoke_options
      # ...
    end
  end
end

Then the outer modify can trigger the output capture. This scopes capture to operations that change infrastructure — kamal config, kamal logs etc. don't use modify and aren't captured. All existing with_lock calls become modify(lock: true) and get output capture for free.

Notifications:

We already rely on ActiveSupport, so we could plug AS Notifications in. Something like:

def invoke_command(command, *args)
  ActiveSupport::Notifications.instrument("command.kamal",
    command: self.command, subcommand: subcommand) do
    super
  end
end

Log output:

We could use standard Ruby loggers here. The default output continues going directly to stdout, but if we hook into Kamal::Cli::Base#say and also have a custom SSHKit Formatter, we could use those to capture the output.

Then we could configure loggers, and the OTel collector is just an option.

Finally instead of a top-level otel: key, we'd configure one or more loggers

output:
  otel:
    endpoint: http://otel-gateway:4318
    level: debug
  file:
    path: /var/log/kamal/
    level: info

loggers would be a slightly better name for that key but it's too like the existing logging one.

So the otel logger would subscribe to the notifications to get the command information and timing, then it would also be passed the logs. We might also want a modify event, so it know when something important that it should track happens. I think that would be enough information to implement it as we need it?

Copilot AI review requested due to automatic review settings March 23, 2026 12:21
command: command, subcommand: subcommand, hosts: KAMAL.hosts.join(","))

ActiveSupport::Notifications.instrument("modify.kamal",
command: command, subcommand: subcommand) do
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to split these AS:N lines.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an output-logging subsystem intended to ship deploy (and other infrastructure-modifying) command output and lifecycle events to OpenTelemetry (OTLP/HTTP) and/or local files, wired through ActiveSupport notifications and a broadcast logger.

Tip

If you aren't ready for review, convert to a draft PR.
Click "Convert to draft" or run gh pr ready --undo.
Click "Ready for review" or run gh pr ready to reengage.

Changes:

  • Introduces Kamal::OtelShipper and output loggers (Kamal::Output::OtelLogger, Kamal::Output::FileLogger) and config docs.
  • Adds output: configuration (with backward-compat from top-level otel:), and wires output logger setup into Kamal::Commander.
  • Wraps many CLI “modifying” commands with a new modify(...) helper that instruments start/complete notifications and attempts to capture CLI output.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
test/otel_shipper_test.rb Adds unit tests for buffering and OTLP HTTP payload generation.
test/configuration/otel_test.rb Tests top-level otel: configuration enablement/disablement.
lib/kamal/output/otel_logger.rb New logger that subscribes to notifications and ships events + log lines via OtelShipper.
lib/kamal/output/file_logger.rb New logger that writes captured lines into per-command log files.
lib/kamal/otel_shipper.rb Implements best-effort OTLP/HTTP JSON log exporting with background flush thread.
lib/kamal/configuration/output.rb Adds output: config wrapper and backward-compat mapping from top-level otel:.
lib/kamal/configuration/otel.rb Adds/keeps top-level otel: config object.
lib/kamal/configuration/docs/output.yml Documents output: configuration options.
lib/kamal/configuration/docs/otel.yml Documents top-level otel: (deprecated) configuration.
lib/kamal/configuration/docs/configuration.yml Adds references to output: and deprecated otel: sections.
lib/kamal/configuration.rb Initializes new otel and output configuration objects.
lib/kamal/commander.rb Adds broadcast logger plumbing and config-driven output logger setup.
lib/kamal/cli/base.rb Adds modify wrapper + say override to feed messages into output logger and emit notifications.
lib/kamal/cli/main.rb Wraps deploy/redeploy/rollback and related flows with modify instrumentation/locking.
lib/kamal/cli/server.rb Switches locking wrapper from with_lock to modify(lock: true).
lib/kamal/cli/prune.rb Switches locking wrapper from with_lock to modify(lock: true).
lib/kamal/cli/proxy.rb Switches locking wrapper from with_lock to modify(lock: true).
lib/kamal/cli/app.rb Switches locking wrapper from with_lock to modify(lock: true) (including conditional-lock helper).
lib/kamal/cli/accessory.rb Switches locking wrapper from with_lock to modify(lock: true).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +190 to +196
if (otel_config = config.output.otel)
output_logger.broadcast_to(
Kamal::Output::OtelLogger.new(endpoint: otel_config["endpoint"], tags: tags)
)
end

if (file_config = config.output.file)
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config.output.otel may be an empty Hash when the user sets output: { otel: {} }, because the configuration validator does not enforce presence of required keys. In that case otel_config["endpoint"] is nil and OtelLogger will start a shipper with an invalid endpoint (nil host), creating a background thread that continually attempts (and rescues) failed HTTP sends. Consider treating missing/blank endpoint as a configuration error or simply skipping OTel logger setup unless endpoint.present? (same idea for file.path).

Suggested change
if (otel_config = config.output.otel)
output_logger.broadcast_to(
Kamal::Output::OtelLogger.new(endpoint: otel_config["endpoint"], tags: tags)
)
end
if (file_config = config.output.file)
if (otel_config = config.output.otel.presence) && otel_config["endpoint"].present?
output_logger.broadcast_to(
Kamal::Output::OtelLogger.new(endpoint: otel_config["endpoint"], tags: tags)
)
end
if (file_config = config.output.file.presence) && file_config["path"].present?

Copilot uses AI. Check for mistakes.
Comment on lines +39 to +43
formatted = attributes.any? ? "[#{name}] #{attributes.map { |k, v| "#{k}=#{v}" }.join(" ")}" : "[#{name}]"
self << formatted
end

def flush
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

event enqueues into @buffer even after shutdown has been called, unlike << which returns early when @running is false. This can lead to unexpected background buffering (and potentially a final flush shipping) after shutdown/close. Consider making event consistent with << by no-op'ing when @running is false.

Copilot uses AI. Check for mistakes.
Comment on lines +11 to +15
@shipper = Kamal::OtelShipper.new(
endpoint: "http://localhost:4318",
tags: @tags
)
end
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests inspect the internal buffer via drain_buffer, but Kamal::OtelShipper starts a background flush thread on initialization that will periodically flush (draining the buffer) every 5s. That can make the buffer-based assertions time-dependent/flaky on slow CI runs. Consider allowing the shipper to be initialized without the background thread (or with an injected flush interval) for tests, or stubbing start_flush_thread in setup so the buffer contents are stable.

Copilot uses AI. Check for mistakes.
Comment on lines +5 to +6
# When configured, Kamal streams all output from infrastructure-modifying
# commands to the specified destinations.
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs say Kamal "streams all output" from infrastructure-modifying commands, but the current implementation only forwards strings passed through Thor's say (via KAMAL.log), and does not hook SSHKit output or generic $stdout/$stderr writes. Either narrow the wording here to what is actually captured, or wire output shipping into SSHKit / stdout-stderr so the doc is accurate.

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings March 23, 2026 12:31
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 9 comments.

Comments suppressed due to low confidence (1)

lib/kamal/output/otel_logger.rb:41

  • These notifications are emitted for every modify.kamal instrumentation, and many non-deploy commands were switched to use modify(...). As a result, deploy.start/deploy.complete/deploy.failed events will also be emitted for commands like proxy boot, prune, etc., which conflicts with the PR description claiming deploy lifecycle events only for deploy/redeploy/rollback. Either gate these events to those commands or rename the events to something command-agnostic (so consumers don’t interpret a prune as a deploy).
        error: "#{error_class}: #{error_message}")
    else
      @shipper.event("deploy.complete",
        command: payload[:command].to_s)
    end
  end

  def add(severity, message = nil, progname = nil, &block)
    msg = message || (block ? block.call : progname)
    @shipper << msg.to_s if msg
  end

  def close
    ActiveSupport::Notifications.unsubscribe(@subscription)
    @shipper.shutdown
  end
end


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

req = Net::HTTP::Post.new(@endpoint.path, "Content-Type" => "application/json")
req.body = JSON.generate(payload)
http.request(req)
rescue
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ship_records uses a bare rescue, which will also swallow non-StandardError exceptions (e.g., SystemExit, NoMemoryError, SignalException). For best-effort behavior without masking critical process-level exceptions, rescue StandardError explicitly.

Suggested change
rescue
rescue StandardError

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings March 23, 2026 13:07
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 8 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +13 to +25
@shipper.event("deploy.start",
command: payload[:command].to_s,
hosts: payload[:hosts].to_s)
end

def finish(name, id, payload)
if payload[:exception]
error_class, error_message = payload[:exception]
@shipper.event("deploy.failed",
command: payload[:command].to_s,
error: "#{error_class}: #{error_message}")
else
@shipper.event("deploy.complete",
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

start emits a deploy.start event for every modify.kamal instrumentation, including non-deploy commands (proxy/prune/accessory/etc.). That makes the event name misleading for downstream consumers. Consider using an event name derived from payload[:command]/payload[:subcommand], or emitting a generic kamal.modify.start + attributes instead of hard-coding deploy.*.

Suggested change
@shipper.event("deploy.start",
command: payload[:command].to_s,
hosts: payload[:hosts].to_s)
end
def finish(name, id, payload)
if payload[:exception]
error_class, error_message = payload[:exception]
@shipper.event("deploy.failed",
command: payload[:command].to_s,
error: "#{error_class}: #{error_message}")
else
@shipper.event("deploy.complete",
base_event = payload[:command].to_s == "deploy" ? "deploy" : "kamal.modify"
@shipper.event("#{base_event}.start",
command: payload[:command].to_s,
hosts: payload[:hosts].to_s)
end
def finish(name, id, payload)
base_event = payload[:command].to_s == "deploy" ? "deploy" : "kamal.modify"
if payload[:exception]
error_class, error_message = payload[:exception]
@shipper.event("#{base_event}.failed",
command: payload[:command].to_s,
error: "#{error_class}: #{error_message}")
else
@shipper.event("#{base_event}.complete",

Copilot uses AI. Check for mistakes.
Comment on lines +56 to +59
stub_request = lambda do |req|
request_body = JSON.parse(req.body)
Net::HTTPOK.new("1.1", "200", "OK")
end
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In these stubs, the with { |req| ... } block is used as a parameter matcher; returning a response object from the block is non-idiomatic and can be brittle. Consider making the matcher block return a boolean (e.g., true after capturing/parsing), and set the stubbed return value via .returns(...) for clarity.

Copilot uses AI. Check for mistakes.
Comment on lines +53 to +57
def shutdown
@running = false
@thread&.join(FLUSH_INTERVAL + 1.second)
flush
end
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shutdown uses Thread#kill on the flush thread. If the thread is killed while inside flush (holding @flush_mutex), the mutex can remain locked and later flush/shutdown calls can deadlock. Prefer a cooperative shutdown (set a flag, wake/join the thread) so the thread exits cleanly without being terminated while holding the mutex.

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings March 23, 2026 14:40
Copilot AI review requested due to automatic review settings March 26, 2026 15:07
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 29 out of 29 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

self.logging = false
@modify_depth = 0
@specifics = @specific_roles = @specific_hosts = nil
@config = @config_kwargs = nil
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reset clears @output_logger without closing it first. Since Kamal::Output::OtelLogger starts a background flush thread (and FileLogger may hold an open file), dropping the reference can leak threads/file handles and also bypass the at_exit { @output_logger&.close } cleanup after a reset (e.g., when running aliases). Consider closing any existing output logger before setting it to nil (and/or ensuring loggers are shut down on reset).

Suggested change
@config = @config_kwargs = nil
@config = @config_kwargs = nil
@output_logger&.close

Copilot uses AI. Check for mistakes.

SSHKit.config.output = Kamal::Output::Formatter.new($stdout, output_logger)

at_exit { @output_logger&.close }
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

configure_output_with registers an at_exit hook every time it runs. Since KAMAL.reset can lead to reconfiguration (e.g., aliases), this can accumulate multiple hooks and call close multiple times (or close a newer logger while older ones were leaked). Consider guarding so the hook is only registered once per process (e.g., via an ivar flag) and/or ensuring reset closes the previous logger.

Suggested change
at_exit { @output_logger&.close }
unless defined?(@@output_at_exit_hook_registered) && @@output_at_exit_hook_registered
at_exit { @output_logger&.close }
@@output_at_exit_hook_registered = true
end

Copilot uses AI. Check for mistakes.
end

def log(line)
output_logger << "#{line}\n" if logging
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log writes to the broadcast output logger without any error isolation. If a backend logger raises (e.g., file write errors, shipper failures), this can bubble up from say and interrupt a deploy, which conflicts with the “best-effort, never block a deploy” goal. Consider rescuing exceptions here (and optionally disabling further logging after the first failure to avoid repeated errors).

Suggested change
output_logger << "#{line}\n" if logging
return unless logging
output_logger << "#{line}\n"
rescue => e
@logging = false
$stderr.puts "Log output failed: #{e.class}: #{e.message}"
$stderr.puts e.backtrace.join("\n") if ENV["VERBOSE"]

Copilot uses AI. Check for mistakes.
Comment on lines +9 to +28
def start(name, id, payload)
@started_at = Process.clock_gettime(Process::CLOCK_MONOTONIC)
on_start(payload)
end

def finish(name, id, payload)
runtime = (Process.clock_gettime(Process::CLOCK_MONOTONIC) - @started_at).round(1)
on_finish(payload, runtime)
end

def add(severity, message = nil, progname = nil, &block)
if msg = message || (block ? block.call : progname)
self << msg.to_s
end
end

def close
ActiveSupport::Notifications.unsubscribe(@subscription)
on_close
end
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

start/finish/close call on_start/on_finish/on_close directly without guarding against exceptions. Since these callbacks may perform IO (file open/write, network shipper shutdown), a failure can propagate out of ActiveSupport::Notifications and potentially interrupt the command. To keep output logging best-effort, consider rescuing exceptions around these hook calls (and optionally logging a one-time warning to stderr).

Copilot uses AI. Check for mistakes.

complete_event = completes.find { |e| event_attr(e, "command") == command }
assert complete_event, "Expected command.complete event for #{command}"
assert_match /\d+\.\d+s/, event_attr(complete_event, "runtime")
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

event_attr only reads value.stringValue, but runtime is sent as a numeric OTLP attribute (doubleValue) per Kamal::OtelShipper#typed_value and is asserted as numeric in test/otel_shipper_test.rb. As written, event_attr(complete_event, "runtime") will be nil and the assert_match will fail. Consider updating the helper to handle doubleValue/intValue (or changing the emitted runtime attribute to a string consistently).

Suggested change
assert_match /\d+\.\d+s/, event_attr(complete_event, "runtime")
runtime = event_attr(complete_event, "runtime")
unless runtime
runtime_attr = complete_event["attributes"]&.find { |a| a["key"] == "runtime" }
runtime = runtime_attr && (runtime_attr.dig("value", "doubleValue") || runtime_attr.dig("value", "intValue"))
end
assert runtime, "Expected runtime attribute on #{command} complete event"
assert_match /\d+\.\d+s/, "#{runtime}s"

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings March 27, 2026 09:17
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 29 out of 29 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


def drain_buffer
records = []
records << @buffer.pop(true) until @buffer.empty?
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drain_buffer uses a post-test until loop (records << @buffer.pop(true) until @buffer.empty?), which will always attempt at least one non-blocking pop even when the queue is empty. This will raise ThreadError on empty buffers (e.g., periodic flushes when nothing was logged), potentially killing the flush thread or breaking flush/shutdown. Rewrite the loop to only pop while the queue is non-empty (or handle ThreadError explicitly).

Suggested change
records << @buffer.pop(true) until @buffer.empty?
loop do
begin
records << @buffer.pop(true)
rescue ThreadError
break
end
end

Copilot uses AI. Check for mistakes.
Comment on lines +46 to +48
command = [ payload[:command], payload[:subcommand] ].compact.join("_")
[ Time.now.strftime("%Y-%m-%dT%H-%M-%S"), payload[:destination], command ].compact.join("_") + ".log"
end
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filename_for incorporates payload[:destination] (user-provided via --destination) and command/subcommand directly into the filename. If these include path separators or .., path.join(...) can write outside the intended log directory (path traversal). Also, the timestamp only has second resolution, so two deploys started within the same second can collide and append to the same file, violating the “one file per deploy” expectation. Sanitize each filename segment to a safe subset and add a uniqueness component (e.g., monotonic ns / pid / random suffix).

Suggested change
command = [ payload[:command], payload[:subcommand] ].compact.join("_")
[ Time.now.strftime("%Y-%m-%dT%H-%M-%S"), payload[:destination], command ].compact.join("_") + ".log"
end
time = Time.now
timestamp = time.strftime("%Y-%m-%dT%H-%M-%S")
unique_suffix = "#{Process.pid}-#{time.nsec}"
command = [ payload[:command], payload[:subcommand] ].compact.join("_")
destination_segment = sanitize_segment(payload[:destination]) if payload[:destination]
command_segment = sanitize_segment(command) unless command.empty?
[ timestamp, unique_suffix, destination_segment, command_segment ].compact.join("_") + ".log"
end
def sanitize_segment(value)
segment = value.to_s
# Allow only a conservative set of characters to avoid path traversal and
# other filesystem-related issues; replace everything else with "_".
sanitized = segment.gsub(/[^A-Za-z0-9_.-]/, "_")
sanitized.empty? ? "unknown" : sanitized
end

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings March 27, 2026 11:01
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 29 out of 29 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings March 27, 2026 11:27
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 29 out of 29 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +158 to +164
ensure
output_logger.close if modify_finished
end

def log(line)
output_logger << "#{line}\n" if logging
end
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description says output logging failures should never block or fail a deploy, but output_logger.close (in modify) and output_logger << ... (in log) can raise (e.g., file permission/disk full, IO errors) and would currently abort the command. Wrap these operations in a best-effort rescue (ideally logging the first failure to stderr) so deploys continue even if a backend logger fails.

Copilot uses AI. Check for mistakes.
docker_compose "up --build -d"
wait_for_healthy
setup_deployer
deployer_exec("sh -c 'rm -f /tmp/otel/*.json'", workdir: "/")
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Integration containers appear to be reused across multiple integration test cases, and the output config writes deploy logs to /tmp/kamal-deploy-logs. Only /tmp/otel/*.json is cleared here; if prior tests left *_deploy.log files, assert_deploy_log "*_deploy.log" in MainTest can end up reading multiple concatenated logs via the wildcard and become flaky/false-positive. Consider also removing /tmp/kamal-deploy-logs/*.log (or recreating the directory) during setup.

Suggested change
deployer_exec("sh -c 'rm -f /tmp/otel/*.json'", workdir: "/")
deployer_exec("sh -c 'rm -f /tmp/otel/*.json /tmp/kamal-deploy-logs/*.log'", workdir: "/")

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings March 27, 2026 13:26
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 29 out of 29 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

end

def otel_payloads
files = deployer_exec("sh -c 'ls /tmp/otel/*.json'", capture: true).strip.split("\n")
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otel_payloads shells out to ls /tmp/otel/*.json; when there are no matching files yet, ls exits non-zero and docker_compose raises. Since wait_for_otel_events calls otel_events in a loop, this can make the wait logic fail immediately instead of waiting. Consider making the file listing resilient to an empty directory (e.g., redirect errors / use || true / use find), returning an empty list until payloads exist.

Suggested change
files = deployer_exec("sh -c 'ls /tmp/otel/*.json'", capture: true).strip.split("\n")
files = deployer_exec("sh -c 'find /tmp/otel -maxdepth 1 -name \"*.json\" -print'", capture: true).strip.split("\n")

Copilot uses AI. Check for mistakes.
private
def build_loggers
output_config.filter_map do |key, settings|
if (klass_name = LOGGER_TYPES[key])
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build_loggers assumes each logger config value is a Hash. In YAML, a key like otel: (with no nested mapping) is parsed as nil, which would cause settings["endpoint"] inside the logger .build method to raise a NoMethodError instead of the intended ArgumentError with a helpful message. Consider normalizing settings to {} (or raising a clearer error) before calling .build.

Suggested change
if (klass_name = LOGGER_TYPES[key])
if (klass_name = LOGGER_TYPES[key])
settings = {} if settings.nil?

Copilot uses AI. Check for mistakes.
Comment on lines +15 to +18
test "discards lines before start" do
@logger << "before modify\n"

assert_empty log_files
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description mentions the file logger “buffers pre-modify lines”, but the current implementation/test suite explicitly discards lines written before start (and FileLogger#<< is a no-op until a file is opened). Either implement buffering (so early CLI output isn’t lost) or update the PR description to match the actual behavior.

Copilot uses AI. Check for mistakes.
@djmb djmb force-pushed the otel-deploy-logs branch from 0f5044c to 9aebe5a Compare March 27, 2026 13:38
Copilot AI review requested due to automatic review settings March 27, 2026 13:46
@djmb djmb force-pushed the otel-deploy-logs branch from 9aebe5a to a50c61f Compare March 27, 2026 13:46
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 29 out of 29 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +11 to +41
super()
end

def <<(message)
@file&.print(message)
end

private
def on_start(payload)
path.mkpath
@file = File.open(path.join(filename_for(payload)), "a")
@file.sync = true
end

def on_finish(payload, runtime)
if @file
if payload[:exception]
error_class, error_message = payload[:exception]
@file.puts "# FAILED: #{error_class}: #{error_message} (#{runtime}s)"
else
@file.puts "# Completed in #{runtime}s"
end
@file.close
@file = nil
end
end

def on_close
if @file
@file.close
@file = nil
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File operations here (mkpath, File.open, print, puts, close) can raise (permissions, ENOSPC, invalid path), which would currently propagate via the notification subscriber and potentially abort the deploy. To keep output logging best-effort, catch and swallow backend I/O errors (optionally logging the first failure to stderr and disabling this logger for the remainder of the run).

Suggested change
super()
end
def <<(message)
@file&.print(message)
end
private
def on_start(payload)
path.mkpath
@file = File.open(path.join(filename_for(payload)), "a")
@file.sync = true
end
def on_finish(payload, runtime)
if @file
if payload[:exception]
error_class, error_message = payload[:exception]
@file.puts "# FAILED: #{error_class}: #{error_message} (#{runtime}s)"
else
@file.puts "# Completed in #{runtime}s"
end
@file.close
@file = nil
end
end
def on_close
if @file
@file.close
@file = nil
@io_failed = false
super()
end
def <<(message)
safe_io do
@file&.print(message)
end
end
private
def safe_io
return if @io_failed
yield
rescue SystemCallError, IOError, StandardError => e
unless @io_failed
@io_failed = true
begin
warn "Kamal::Output::FileLogger disabled due to I/O error: #{e.class}: #{e.message}"
rescue StandardError
# Ignore errors writing to stderr
end
end
nil
end
def on_start(payload)
safe_io do
path.mkpath
@file = File.open(path.join(filename_for(payload)), "a")
@file.sync = true
end
end
def on_finish(payload, runtime)
safe_io do
if @file
if payload[:exception]
error_class, error_message = payload[:exception]
@file.puts "# FAILED: #{error_class}: #{error_message} (#{runtime}s)"
else
@file.puts "# Completed in #{runtime}s"
end
@file.close
@file = nil
end
end
end
def on_close
safe_io do
if @file
@file.close
@file = nil
end

Copilot uses AI. Check for mistakes.
Comment on lines +9 to +17
def start(name, id, payload)
@started_at = Process.clock_gettime(Process::CLOCK_MONOTONIC)
on_start(payload)
end

def finish(name, id, payload)
runtime = (Process.clock_gettime(Process::CLOCK_MONOTONIC) - @started_at).round(1)
on_finish(payload, runtime)
end
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

start/finish call on_start/on_finish directly; if a logger backend raises (e.g., file permission errors), it will bubble up through ActiveSupport::Notifications and can fail the deploy despite the PR’s stated “best-effort” behavior. Consider rescuing exceptions around these callback hooks (and optionally logging the first failure to stderr) so output backends can never abort the instrumented operation.

Copilot uses AI. Check for mistakes.
@djmb djmb force-pushed the otel-deploy-logs branch from a50c61f to aaecf27 Compare March 27, 2026 14:02
Copilot AI review requested due to automatic review settings March 27, 2026 14:04
@djmb djmb force-pushed the otel-deploy-logs branch from aaecf27 to fc26d47 Compare March 27, 2026 14:04
@djmb djmb force-pushed the otel-deploy-logs branch 2 times, most recently from 408e1f3 to 820cafa Compare March 27, 2026 14:10
Introduce a pluggable output logging system that captures deploy output
and ships it to configurable destinations via a new `output:` section
in deploy.yml.

    output:
      otel:
        endpoint: http://otel-gateway:4318
      file:
        path: /var/log/kamal/

OTel backend ships OTLP log records via HTTP to /v1/logs. Structured
lifecycle events (kamal.start/complete/failed) mark command boundaries
with deployment metadata. Stream output lines are tagged per-host
(server.address) and by stream type (log.iostream). Format follows
OTel semantic conventions: standard attributes where applicable
(deployment.*, exception.*, service.*), custom attributes namespaced
under kamal.*.

File backend writes one timestamped log file per command
(e.g. 20260327_143022_production_deploy.log) with a completion/failure
summary line at the end.

Key design decisions:
- `modify` method replaces `with_lock` as the integration point for all
  infrastructure-modifying commands, scoping output capture and
  instrumenting deploy lifecycle via ActiveSupport::Notifications
- SSHKit output captured via custom Formatter with thread-local context
  for per-host and stream type tagging
- Best-effort: output logger failures never block or fail a deploy

Co-Authored-By: Donal McBreen <donal@37signals.com>
@djmb djmb force-pushed the otel-deploy-logs branch from 820cafa to 589688d Compare March 27, 2026 14:16
@djmb djmb merged commit 6a31d14 into main Mar 27, 2026
12 checks passed
@djmb djmb deleted the otel-deploy-logs branch March 27, 2026 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants