Add native OpenTelemetry support for deploy logs#1803
Conversation
de7c524 to
672a6a6
Compare
672a6a6 to
4d2f520
Compare
There was a problem hiding this comment.
Pull request overview
This PR introduces opt-in OpenTelemetry (OTLP/HTTP) export for Kamal deploy logs and deploy lifecycle events, configured via a new otel.endpoint section in deploy.yml.
Changes:
- Add
Kamal::OtelShipper(OTLP/HTTP JSON exporter) andKamal::TeeIo(stdout/stderr tee) to stream deploy output. - Add
Kamal::Configuration::Otel+ documentation stubs to parse/validateotel:config and derive resource attributes from service/destination. - Emit
deploy.start/deploy.complete/deploy.failedlifecycle events fromKamal::Cli::Mainand wire shipper setup intoKamal::Commander.
Tip
If you aren't ready for review, convert to a draft PR.
Click "Convert to draft" or run gh pr ready --undo.
Click "Ready for review" or run gh pr ready to reengage.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
test/configuration/otel_test.rb |
Adds unit coverage for OTEL config enablement and derived attributes. |
lib/kamal/tee_io.rb |
Implements an IO wrapper that duplicates output to the OTEL shipper. |
lib/kamal/otel_shipper.rb |
Adds an OTLP/HTTP JSON exporter with buffering + periodic flushing. |
lib/kamal/configuration/otel.rb |
Introduces otel config object (enabled via endpoint). |
lib/kamal/configuration/docs/otel.yml |
Adds documentation for the new otel: config section. |
lib/kamal/configuration/docs/configuration.yml |
Links the top-level config docs to the new OTEL section. |
lib/kamal/configuration.rb |
Instantiates config.otel as part of configuration loading. |
lib/kamal/commander.rb |
Wires shipper initialization into commander SSHKit setup and exposes OTEL helpers. |
lib/kamal/cli/main.rb |
Emits deploy lifecycle events and ensures shipper shutdown after commands. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
This PR introduces native OpenTelemetry (OTLP/HTTP) log/event shipping for Kamal deploy workflows, configured via a new otel: section in config/deploy.yml, and implemented by teeing deploy output into a background batch shipper.
Tip
If you aren't ready for review, convert to a draft PR.
Click "Convert to draft" or run gh pr ready --undo.
Click "Ready for review" or run gh pr ready to reengage.
Changes:
- Add
Kamal::Configuration::Otelplus docs and tests for parsing/enablingotel.endpoint. - Add
Kamal::OtelShipper(OTLP/HTTP JSON) andKamal::TeeIoto stream CLI output to the shipper. - Emit deploy lifecycle events (
deploy.start|complete|failed) fromdeploy,redeploy, androllback, and wire shipper setup intoKamal::Commander.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| test/configuration/otel_test.rb | Adds unit tests for otel config enablement and derived attributes. |
| lib/kamal/tee_io.rb | Introduces IO wrapper that duplicates output into the OTel shipper. |
| lib/kamal/otel_shipper.rb | Implements background batching + OTLP/HTTP log export. |
| lib/kamal/configuration/otel.rb | Adds otel configuration object (endpoint + derived attrs). |
| lib/kamal/configuration/docs/otel.yml | Adds new documentation section for otel: config. |
| lib/kamal/configuration/docs/configuration.yml | Links top-level config docs to the new otel: section. |
| lib/kamal/configuration.rb | Instantiates @otel configuration and exposes it. |
| lib/kamal/commander.rb | Initializes the OTel shipper and swaps $stdout/$stderr; adds event/shutdown helpers. |
| lib/kamal/cli/main.rb | Emits deploy lifecycle events and ensures shipper shutdown for deploy commands. |
Comments suppressed due to low confidence (1)
lib/kamal/cli/main.rb:108
- In
rollback,deploy.startis emitted beforeKAMAL.config.versionis set to the rollback target. Since the shipper is initialized lazily on the firstotel_event, the resource attributeservice.versionwill reflect the pre-rollback version (e.g., git SHA) rather than the rollback version. If you want rollback events/logs attributed to the rollback version, set the version before the firstotel_event, or include the effective version as a log/event attribute instead of (or in addition to) a resource attribute.
KAMAL.otel_event("deploy.start", command: "rollback", rollback_version: version, hosts: KAMAL.hosts.join(","))
runtime = print_runtime do
with_lock do
invoke_options = deploy_options
KAMAL.config.version = version
old_version = nil
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
I think we could implement this in a way the also allows us to send output to files or other destinations.
We also probably want to log any command that might change the infrastructure.
And we need to consider nested commands - e.g you can run kamal deploy which runs kamal app boot, or you can run kamal app boot directly.
Firstly we'd need an integration point for all relevant commands, which we more or less already have with the with_lock method. This should already capture all the mutating commands, though with a slight wrinkle that we don't lock building a container. That's not an infrastructure change so that makes sense but if you are logging a deploy you would want the build logs as part of it.
So we could rename with_lock to modify and add a lock argument.
So we'd have something like:
def deploy
modify do
invoke "kamal:cli:build:deliver", [], invoke_options
modify(lock: true) do
run_hook "pre-deploy", secrets: true
invoke "kamal:cli:app:boot", [], invoke_options
# ...
end
end
end
Then the outer modify can trigger the output capture. This scopes capture to operations that change infrastructure — kamal config, kamal logs etc. don't use modify and aren't captured. All existing with_lock calls become modify(lock: true) and get output capture for free.
Notifications:
We already rely on ActiveSupport, so we could plug AS Notifications in. Something like:
def invoke_command(command, *args)
ActiveSupport::Notifications.instrument("command.kamal",
command: self.command, subcommand: subcommand) do
super
end
end
Log output:
We could use standard Ruby loggers here. The default output continues going directly to stdout, but if we hook into Kamal::Cli::Base#say and also have a custom SSHKit Formatter, we could use those to capture the output.
Then we could configure loggers, and the OTel collector is just an option.
Finally instead of a top-level otel: key, we'd configure one or more loggers
output:
otel:
endpoint: http://otel-gateway:4318
level: debug
file:
path: /var/log/kamal/
level: info
loggers would be a slightly better name for that key but it's too like the existing logging one.
So the otel logger would subscribe to the notifications to get the command information and timing, then it would also be passed the logs. We might also want a modify event, so it know when something important that it should track happens. I think that would be enough information to implement it as we need it?
lib/kamal/cli/base.rb
Outdated
| command: command, subcommand: subcommand, hosts: KAMAL.hosts.join(",")) | ||
|
|
||
| ActiveSupport::Notifications.instrument("modify.kamal", | ||
| command: command, subcommand: subcommand) do |
There was a problem hiding this comment.
No need to split these AS:N lines.
There was a problem hiding this comment.
Pull request overview
Adds an output-logging subsystem intended to ship deploy (and other infrastructure-modifying) command output and lifecycle events to OpenTelemetry (OTLP/HTTP) and/or local files, wired through ActiveSupport notifications and a broadcast logger.
Tip
If you aren't ready for review, convert to a draft PR.
Click "Convert to draft" or run gh pr ready --undo.
Click "Ready for review" or run gh pr ready to reengage.
Changes:
- Introduces
Kamal::OtelShipperand output loggers (Kamal::Output::OtelLogger,Kamal::Output::FileLogger) and config docs. - Adds
output:configuration (with backward-compat from top-levelotel:), and wires output logger setup intoKamal::Commander. - Wraps many CLI “modifying” commands with a new
modify(...)helper that instruments start/complete notifications and attempts to capture CLI output.
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| test/otel_shipper_test.rb | Adds unit tests for buffering and OTLP HTTP payload generation. |
| test/configuration/otel_test.rb | Tests top-level otel: configuration enablement/disablement. |
| lib/kamal/output/otel_logger.rb | New logger that subscribes to notifications and ships events + log lines via OtelShipper. |
| lib/kamal/output/file_logger.rb | New logger that writes captured lines into per-command log files. |
| lib/kamal/otel_shipper.rb | Implements best-effort OTLP/HTTP JSON log exporting with background flush thread. |
| lib/kamal/configuration/output.rb | Adds output: config wrapper and backward-compat mapping from top-level otel:. |
| lib/kamal/configuration/otel.rb | Adds/keeps top-level otel: config object. |
| lib/kamal/configuration/docs/output.yml | Documents output: configuration options. |
| lib/kamal/configuration/docs/otel.yml | Documents top-level otel: (deprecated) configuration. |
| lib/kamal/configuration/docs/configuration.yml | Adds references to output: and deprecated otel: sections. |
| lib/kamal/configuration.rb | Initializes new otel and output configuration objects. |
| lib/kamal/commander.rb | Adds broadcast logger plumbing and config-driven output logger setup. |
| lib/kamal/cli/base.rb | Adds modify wrapper + say override to feed messages into output logger and emit notifications. |
| lib/kamal/cli/main.rb | Wraps deploy/redeploy/rollback and related flows with modify instrumentation/locking. |
| lib/kamal/cli/server.rb | Switches locking wrapper from with_lock to modify(lock: true). |
| lib/kamal/cli/prune.rb | Switches locking wrapper from with_lock to modify(lock: true). |
| lib/kamal/cli/proxy.rb | Switches locking wrapper from with_lock to modify(lock: true). |
| lib/kamal/cli/app.rb | Switches locking wrapper from with_lock to modify(lock: true) (including conditional-lock helper). |
| lib/kamal/cli/accessory.rb | Switches locking wrapper from with_lock to modify(lock: true). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
lib/kamal/commander.rb
Outdated
| if (otel_config = config.output.otel) | ||
| output_logger.broadcast_to( | ||
| Kamal::Output::OtelLogger.new(endpoint: otel_config["endpoint"], tags: tags) | ||
| ) | ||
| end | ||
|
|
||
| if (file_config = config.output.file) |
There was a problem hiding this comment.
config.output.otel may be an empty Hash when the user sets output: { otel: {} }, because the configuration validator does not enforce presence of required keys. In that case otel_config["endpoint"] is nil and OtelLogger will start a shipper with an invalid endpoint (nil host), creating a background thread that continually attempts (and rescues) failed HTTP sends. Consider treating missing/blank endpoint as a configuration error or simply skipping OTel logger setup unless endpoint.present? (same idea for file.path).
| if (otel_config = config.output.otel) | |
| output_logger.broadcast_to( | |
| Kamal::Output::OtelLogger.new(endpoint: otel_config["endpoint"], tags: tags) | |
| ) | |
| end | |
| if (file_config = config.output.file) | |
| if (otel_config = config.output.otel.presence) && otel_config["endpoint"].present? | |
| output_logger.broadcast_to( | |
| Kamal::Output::OtelLogger.new(endpoint: otel_config["endpoint"], tags: tags) | |
| ) | |
| end | |
| if (file_config = config.output.file.presence) && file_config["path"].present? |
lib/kamal/otel_shipper.rb
Outdated
| formatted = attributes.any? ? "[#{name}] #{attributes.map { |k, v| "#{k}=#{v}" }.join(" ")}" : "[#{name}]" | ||
| self << formatted | ||
| end | ||
|
|
||
| def flush |
There was a problem hiding this comment.
event enqueues into @buffer even after shutdown has been called, unlike << which returns early when @running is false. This can lead to unexpected background buffering (and potentially a final flush shipping) after shutdown/close. Consider making event consistent with << by no-op'ing when @running is false.
test/otel_shipper_test.rb
Outdated
| @shipper = Kamal::OtelShipper.new( | ||
| endpoint: "http://localhost:4318", | ||
| tags: @tags | ||
| ) | ||
| end |
There was a problem hiding this comment.
These tests inspect the internal buffer via drain_buffer, but Kamal::OtelShipper starts a background flush thread on initialization that will periodically flush (draining the buffer) every 5s. That can make the buffer-based assertions time-dependent/flaky on slow CI runs. Consider allowing the shipper to be initialized without the background thread (or with an injected flush interval) for tests, or stubbing start_flush_thread in setup so the buffer contents are stable.
| # When configured, Kamal streams all output from infrastructure-modifying | ||
| # commands to the specified destinations. |
There was a problem hiding this comment.
The docs say Kamal "streams all output" from infrastructure-modifying commands, but the current implementation only forwards strings passed through Thor's say (via KAMAL.log), and does not hook SSHKit output or generic $stdout/$stderr writes. Either narrow the wording here to what is actually captured, or wire output shipping into SSHKit / stdout-stderr so the doc is accurate.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 16 out of 16 changed files in this pull request and generated 9 comments.
Comments suppressed due to low confidence (1)
lib/kamal/output/otel_logger.rb:41
- These notifications are emitted for every
modify.kamalinstrumentation, and many non-deploy commands were switched to usemodify(...). As a result,deploy.start/deploy.complete/deploy.failedevents will also be emitted for commands likeproxy boot,prune, etc., which conflicts with the PR description claiming deploy lifecycle events only for deploy/redeploy/rollback. Either gate these events to those commands or rename the events to something command-agnostic (so consumers don’t interpret apruneas a deploy).
error: "#{error_class}: #{error_message}")
else
@shipper.event("deploy.complete",
command: payload[:command].to_s)
end
end
def add(severity, message = nil, progname = nil, &block)
msg = message || (block ? block.call : progname)
@shipper << msg.to_s if msg
end
def close
ActiveSupport::Notifications.unsubscribe(@subscription)
@shipper.shutdown
end
end
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
lib/kamal/otel_shipper.rb
Outdated
| req = Net::HTTP::Post.new(@endpoint.path, "Content-Type" => "application/json") | ||
| req.body = JSON.generate(payload) | ||
| http.request(req) | ||
| rescue |
There was a problem hiding this comment.
ship_records uses a bare rescue, which will also swallow non-StandardError exceptions (e.g., SystemExit, NoMemoryError, SignalException). For best-effort behavior without masking critical process-level exceptions, rescue StandardError explicitly.
| rescue | |
| rescue StandardError |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 17 out of 17 changed files in this pull request and generated 8 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
lib/kamal/output/otel_logger.rb
Outdated
| @shipper.event("deploy.start", | ||
| command: payload[:command].to_s, | ||
| hosts: payload[:hosts].to_s) | ||
| end | ||
|
|
||
| def finish(name, id, payload) | ||
| if payload[:exception] | ||
| error_class, error_message = payload[:exception] | ||
| @shipper.event("deploy.failed", | ||
| command: payload[:command].to_s, | ||
| error: "#{error_class}: #{error_message}") | ||
| else | ||
| @shipper.event("deploy.complete", |
There was a problem hiding this comment.
start emits a deploy.start event for every modify.kamal instrumentation, including non-deploy commands (proxy/prune/accessory/etc.). That makes the event name misleading for downstream consumers. Consider using an event name derived from payload[:command]/payload[:subcommand], or emitting a generic kamal.modify.start + attributes instead of hard-coding deploy.*.
| @shipper.event("deploy.start", | |
| command: payload[:command].to_s, | |
| hosts: payload[:hosts].to_s) | |
| end | |
| def finish(name, id, payload) | |
| if payload[:exception] | |
| error_class, error_message = payload[:exception] | |
| @shipper.event("deploy.failed", | |
| command: payload[:command].to_s, | |
| error: "#{error_class}: #{error_message}") | |
| else | |
| @shipper.event("deploy.complete", | |
| base_event = payload[:command].to_s == "deploy" ? "deploy" : "kamal.modify" | |
| @shipper.event("#{base_event}.start", | |
| command: payload[:command].to_s, | |
| hosts: payload[:hosts].to_s) | |
| end | |
| def finish(name, id, payload) | |
| base_event = payload[:command].to_s == "deploy" ? "deploy" : "kamal.modify" | |
| if payload[:exception] | |
| error_class, error_message = payload[:exception] | |
| @shipper.event("#{base_event}.failed", | |
| command: payload[:command].to_s, | |
| error: "#{error_class}: #{error_message}") | |
| else | |
| @shipper.event("#{base_event}.complete", |
test/otel_shipper_test.rb
Outdated
| stub_request = lambda do |req| | ||
| request_body = JSON.parse(req.body) | ||
| Net::HTTPOK.new("1.1", "200", "OK") | ||
| end |
There was a problem hiding this comment.
In these stubs, the with { |req| ... } block is used as a parameter matcher; returning a response object from the block is non-idiomatic and can be brittle. Consider making the matcher block return a boolean (e.g., true after capturing/parsing), and set the stubbed return value via .returns(...) for clarity.
| def shutdown | ||
| @running = false | ||
| @thread&.join(FLUSH_INTERVAL + 1.second) | ||
| flush | ||
| end |
There was a problem hiding this comment.
shutdown uses Thread#kill on the flush thread. If the thread is killed while inside flush (holding @flush_mutex), the mutex can remain locked and later flush/shutdown calls can deadlock. Prefer a cooperative shutdown (set a flag, wake/join the thread) so the thread exits cleanly without being terminated while holding the mutex.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 29 out of 29 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| self.logging = false | ||
| @modify_depth = 0 | ||
| @specifics = @specific_roles = @specific_hosts = nil | ||
| @config = @config_kwargs = nil |
There was a problem hiding this comment.
reset clears @output_logger without closing it first. Since Kamal::Output::OtelLogger starts a background flush thread (and FileLogger may hold an open file), dropping the reference can leak threads/file handles and also bypass the at_exit { @output_logger&.close } cleanup after a reset (e.g., when running aliases). Consider closing any existing output logger before setting it to nil (and/or ensuring loggers are shut down on reset).
| @config = @config_kwargs = nil | |
| @config = @config_kwargs = nil | |
| @output_logger&.close |
|
|
||
| SSHKit.config.output = Kamal::Output::Formatter.new($stdout, output_logger) | ||
|
|
||
| at_exit { @output_logger&.close } |
There was a problem hiding this comment.
configure_output_with registers an at_exit hook every time it runs. Since KAMAL.reset can lead to reconfiguration (e.g., aliases), this can accumulate multiple hooks and call close multiple times (or close a newer logger while older ones were leaked). Consider guarding so the hook is only registered once per process (e.g., via an ivar flag) and/or ensuring reset closes the previous logger.
| at_exit { @output_logger&.close } | |
| unless defined?(@@output_at_exit_hook_registered) && @@output_at_exit_hook_registered | |
| at_exit { @output_logger&.close } | |
| @@output_at_exit_hook_registered = true | |
| end |
| end | ||
|
|
||
| def log(line) | ||
| output_logger << "#{line}\n" if logging |
There was a problem hiding this comment.
log writes to the broadcast output logger without any error isolation. If a backend logger raises (e.g., file write errors, shipper failures), this can bubble up from say and interrupt a deploy, which conflicts with the “best-effort, never block a deploy” goal. Consider rescuing exceptions here (and optionally disabling further logging after the first failure to avoid repeated errors).
| output_logger << "#{line}\n" if logging | |
| return unless logging | |
| output_logger << "#{line}\n" | |
| rescue => e | |
| @logging = false | |
| $stderr.puts "Log output failed: #{e.class}: #{e.message}" | |
| $stderr.puts e.backtrace.join("\n") if ENV["VERBOSE"] |
| def start(name, id, payload) | ||
| @started_at = Process.clock_gettime(Process::CLOCK_MONOTONIC) | ||
| on_start(payload) | ||
| end | ||
|
|
||
| def finish(name, id, payload) | ||
| runtime = (Process.clock_gettime(Process::CLOCK_MONOTONIC) - @started_at).round(1) | ||
| on_finish(payload, runtime) | ||
| end | ||
|
|
||
| def add(severity, message = nil, progname = nil, &block) | ||
| if msg = message || (block ? block.call : progname) | ||
| self << msg.to_s | ||
| end | ||
| end | ||
|
|
||
| def close | ||
| ActiveSupport::Notifications.unsubscribe(@subscription) | ||
| on_close | ||
| end |
There was a problem hiding this comment.
start/finish/close call on_start/on_finish/on_close directly without guarding against exceptions. Since these callbacks may perform IO (file open/write, network shipper shutdown), a failure can propagate out of ActiveSupport::Notifications and potentially interrupt the command. To keep output logging best-effort, consider rescuing exceptions around these hook calls (and optionally logging a one-time warning to stderr).
test/integration/main_test.rb
Outdated
|
|
||
| complete_event = completes.find { |e| event_attr(e, "command") == command } | ||
| assert complete_event, "Expected command.complete event for #{command}" | ||
| assert_match /\d+\.\d+s/, event_attr(complete_event, "runtime") |
There was a problem hiding this comment.
event_attr only reads value.stringValue, but runtime is sent as a numeric OTLP attribute (doubleValue) per Kamal::OtelShipper#typed_value and is asserted as numeric in test/otel_shipper_test.rb. As written, event_attr(complete_event, "runtime") will be nil and the assert_match will fail. Consider updating the helper to handle doubleValue/intValue (or changing the emitted runtime attribute to a string consistently).
| assert_match /\d+\.\d+s/, event_attr(complete_event, "runtime") | |
| runtime = event_attr(complete_event, "runtime") | |
| unless runtime | |
| runtime_attr = complete_event["attributes"]&.find { |a| a["key"] == "runtime" } | |
| runtime = runtime_attr && (runtime_attr.dig("value", "doubleValue") || runtime_attr.dig("value", "intValue")) | |
| end | |
| assert runtime, "Expected runtime attribute on #{command} complete event" | |
| assert_match /\d+\.\d+s/, "#{runtime}s" |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 29 out of 29 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| def drain_buffer | ||
| records = [] | ||
| records << @buffer.pop(true) until @buffer.empty? |
There was a problem hiding this comment.
drain_buffer uses a post-test until loop (records << @buffer.pop(true) until @buffer.empty?), which will always attempt at least one non-blocking pop even when the queue is empty. This will raise ThreadError on empty buffers (e.g., periodic flushes when nothing was logged), potentially killing the flush thread or breaking flush/shutdown. Rewrite the loop to only pop while the queue is non-empty (or handle ThreadError explicitly).
| records << @buffer.pop(true) until @buffer.empty? | |
| loop do | |
| begin | |
| records << @buffer.pop(true) | |
| rescue ThreadError | |
| break | |
| end | |
| end |
| command = [ payload[:command], payload[:subcommand] ].compact.join("_") | ||
| [ Time.now.strftime("%Y-%m-%dT%H-%M-%S"), payload[:destination], command ].compact.join("_") + ".log" | ||
| end |
There was a problem hiding this comment.
filename_for incorporates payload[:destination] (user-provided via --destination) and command/subcommand directly into the filename. If these include path separators or .., path.join(...) can write outside the intended log directory (path traversal). Also, the timestamp only has second resolution, so two deploys started within the same second can collide and append to the same file, violating the “one file per deploy” expectation. Sanitize each filename segment to a safe subset and add a uniqueness component (e.g., monotonic ns / pid / random suffix).
| command = [ payload[:command], payload[:subcommand] ].compact.join("_") | |
| [ Time.now.strftime("%Y-%m-%dT%H-%M-%S"), payload[:destination], command ].compact.join("_") + ".log" | |
| end | |
| time = Time.now | |
| timestamp = time.strftime("%Y-%m-%dT%H-%M-%S") | |
| unique_suffix = "#{Process.pid}-#{time.nsec}" | |
| command = [ payload[:command], payload[:subcommand] ].compact.join("_") | |
| destination_segment = sanitize_segment(payload[:destination]) if payload[:destination] | |
| command_segment = sanitize_segment(command) unless command.empty? | |
| [ timestamp, unique_suffix, destination_segment, command_segment ].compact.join("_") + ".log" | |
| end | |
| def sanitize_segment(value) | |
| segment = value.to_s | |
| # Allow only a conservative set of characters to avoid path traversal and | |
| # other filesystem-related issues; replace everything else with "_". | |
| sanitized = segment.gsub(/[^A-Za-z0-9_.-]/, "_") | |
| sanitized.empty? ? "unknown" : sanitized | |
| end |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 29 out of 29 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 29 out of 29 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ensure | ||
| output_logger.close if modify_finished | ||
| end | ||
|
|
||
| def log(line) | ||
| output_logger << "#{line}\n" if logging | ||
| end |
There was a problem hiding this comment.
The PR description says output logging failures should never block or fail a deploy, but output_logger.close (in modify) and output_logger << ... (in log) can raise (e.g., file permission/disk full, IO errors) and would currently abort the command. Wrap these operations in a best-effort rescue (ideally logging the first failure to stderr) so deploys continue even if a backend logger fails.
test/integration/integration_test.rb
Outdated
| docker_compose "up --build -d" | ||
| wait_for_healthy | ||
| setup_deployer | ||
| deployer_exec("sh -c 'rm -f /tmp/otel/*.json'", workdir: "/") |
There was a problem hiding this comment.
Integration containers appear to be reused across multiple integration test cases, and the output config writes deploy logs to /tmp/kamal-deploy-logs. Only /tmp/otel/*.json is cleared here; if prior tests left *_deploy.log files, assert_deploy_log "*_deploy.log" in MainTest can end up reading multiple concatenated logs via the wildcard and become flaky/false-positive. Consider also removing /tmp/kamal-deploy-logs/*.log (or recreating the directory) during setup.
| deployer_exec("sh -c 'rm -f /tmp/otel/*.json'", workdir: "/") | |
| deployer_exec("sh -c 'rm -f /tmp/otel/*.json /tmp/kamal-deploy-logs/*.log'", workdir: "/") |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 29 out of 29 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| end | ||
|
|
||
| def otel_payloads | ||
| files = deployer_exec("sh -c 'ls /tmp/otel/*.json'", capture: true).strip.split("\n") |
There was a problem hiding this comment.
otel_payloads shells out to ls /tmp/otel/*.json; when there are no matching files yet, ls exits non-zero and docker_compose raises. Since wait_for_otel_events calls otel_events in a loop, this can make the wait logic fail immediately instead of waiting. Consider making the file listing resilient to an empty directory (e.g., redirect errors / use || true / use find), returning an empty list until payloads exist.
| files = deployer_exec("sh -c 'ls /tmp/otel/*.json'", capture: true).strip.split("\n") | |
| files = deployer_exec("sh -c 'find /tmp/otel -maxdepth 1 -name \"*.json\" -print'", capture: true).strip.split("\n") |
| private | ||
| def build_loggers | ||
| output_config.filter_map do |key, settings| | ||
| if (klass_name = LOGGER_TYPES[key]) |
There was a problem hiding this comment.
build_loggers assumes each logger config value is a Hash. In YAML, a key like otel: (with no nested mapping) is parsed as nil, which would cause settings["endpoint"] inside the logger .build method to raise a NoMethodError instead of the intended ArgumentError with a helpful message. Consider normalizing settings to {} (or raising a clearer error) before calling .build.
| if (klass_name = LOGGER_TYPES[key]) | |
| if (klass_name = LOGGER_TYPES[key]) | |
| settings = {} if settings.nil? |
| test "discards lines before start" do | ||
| @logger << "before modify\n" | ||
|
|
||
| assert_empty log_files |
There was a problem hiding this comment.
The PR description mentions the file logger “buffers pre-modify lines”, but the current implementation/test suite explicitly discards lines written before start (and FileLogger#<< is a no-op until a file is opened). Either implement buffering (so early CLI output isn’t lost) or update the PR description to match the actual behavior.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 29 out of 29 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| super() | ||
| end | ||
|
|
||
| def <<(message) | ||
| @file&.print(message) | ||
| end | ||
|
|
||
| private | ||
| def on_start(payload) | ||
| path.mkpath | ||
| @file = File.open(path.join(filename_for(payload)), "a") | ||
| @file.sync = true | ||
| end | ||
|
|
||
| def on_finish(payload, runtime) | ||
| if @file | ||
| if payload[:exception] | ||
| error_class, error_message = payload[:exception] | ||
| @file.puts "# FAILED: #{error_class}: #{error_message} (#{runtime}s)" | ||
| else | ||
| @file.puts "# Completed in #{runtime}s" | ||
| end | ||
| @file.close | ||
| @file = nil | ||
| end | ||
| end | ||
|
|
||
| def on_close | ||
| if @file | ||
| @file.close | ||
| @file = nil |
There was a problem hiding this comment.
File operations here (mkpath, File.open, print, puts, close) can raise (permissions, ENOSPC, invalid path), which would currently propagate via the notification subscriber and potentially abort the deploy. To keep output logging best-effort, catch and swallow backend I/O errors (optionally logging the first failure to stderr and disabling this logger for the remainder of the run).
| super() | |
| end | |
| def <<(message) | |
| @file&.print(message) | |
| end | |
| private | |
| def on_start(payload) | |
| path.mkpath | |
| @file = File.open(path.join(filename_for(payload)), "a") | |
| @file.sync = true | |
| end | |
| def on_finish(payload, runtime) | |
| if @file | |
| if payload[:exception] | |
| error_class, error_message = payload[:exception] | |
| @file.puts "# FAILED: #{error_class}: #{error_message} (#{runtime}s)" | |
| else | |
| @file.puts "# Completed in #{runtime}s" | |
| end | |
| @file.close | |
| @file = nil | |
| end | |
| end | |
| def on_close | |
| if @file | |
| @file.close | |
| @file = nil | |
| @io_failed = false | |
| super() | |
| end | |
| def <<(message) | |
| safe_io do | |
| @file&.print(message) | |
| end | |
| end | |
| private | |
| def safe_io | |
| return if @io_failed | |
| yield | |
| rescue SystemCallError, IOError, StandardError => e | |
| unless @io_failed | |
| @io_failed = true | |
| begin | |
| warn "Kamal::Output::FileLogger disabled due to I/O error: #{e.class}: #{e.message}" | |
| rescue StandardError | |
| # Ignore errors writing to stderr | |
| end | |
| end | |
| nil | |
| end | |
| def on_start(payload) | |
| safe_io do | |
| path.mkpath | |
| @file = File.open(path.join(filename_for(payload)), "a") | |
| @file.sync = true | |
| end | |
| end | |
| def on_finish(payload, runtime) | |
| safe_io do | |
| if @file | |
| if payload[:exception] | |
| error_class, error_message = payload[:exception] | |
| @file.puts "# FAILED: #{error_class}: #{error_message} (#{runtime}s)" | |
| else | |
| @file.puts "# Completed in #{runtime}s" | |
| end | |
| @file.close | |
| @file = nil | |
| end | |
| end | |
| end | |
| def on_close | |
| safe_io do | |
| if @file | |
| @file.close | |
| @file = nil | |
| end |
| def start(name, id, payload) | ||
| @started_at = Process.clock_gettime(Process::CLOCK_MONOTONIC) | ||
| on_start(payload) | ||
| end | ||
|
|
||
| def finish(name, id, payload) | ||
| runtime = (Process.clock_gettime(Process::CLOCK_MONOTONIC) - @started_at).round(1) | ||
| on_finish(payload, runtime) | ||
| end |
There was a problem hiding this comment.
start/finish call on_start/on_finish directly; if a logger backend raises (e.g., file permission errors), it will bubble up through ActiveSupport::Notifications and can fail the deploy despite the PR’s stated “best-effort” behavior. Consider rescuing exceptions around these callback hooks (and optionally logging the first failure to stderr) so output backends can never abort the instrumented operation.
408e1f3 to
820cafa
Compare
Introduce a pluggable output logging system that captures deploy output
and ships it to configurable destinations via a new `output:` section
in deploy.yml.
output:
otel:
endpoint: http://otel-gateway:4318
file:
path: /var/log/kamal/
OTel backend ships OTLP log records via HTTP to /v1/logs. Structured
lifecycle events (kamal.start/complete/failed) mark command boundaries
with deployment metadata. Stream output lines are tagged per-host
(server.address) and by stream type (log.iostream). Format follows
OTel semantic conventions: standard attributes where applicable
(deployment.*, exception.*, service.*), custom attributes namespaced
under kamal.*.
File backend writes one timestamped log file per command
(e.g. 20260327_143022_production_deploy.log) with a completion/failure
summary line at the end.
Key design decisions:
- `modify` method replaces `with_lock` as the integration point for all
infrastructure-modifying commands, scoping output capture and
instrumenting deploy lifecycle via ActiveSupport::Notifications
- SSHKit output captured via custom Formatter with thread-local context
for per-host and stream type tagging
- Best-effort: output logger failures never block or fail a deploy
Co-Authored-By: Donal McBreen <donal@37signals.com>
Summary
Add an output logging framework that captures deploy output and ships it to configurable destinations.
output:config section with pluggable logger backends (OTel and file to start)modifymethod replaceswith_lockas the integration point for all infrastructure-modifying commands — scopes output capture and instrumentsActiveSupport::Notificationsfor deploy lifecycle eventssay) and remote command output (SSHKit formatter with per-host tagging)ActiveSupport::BroadcastLoggerfans out to multiple loggerskamal.start/kamal.complete/kamal.failed) via OTLP HTTP, with OTel semantic convention attributesIdea by @djmb
Usage
How it works
modifyreplaceswith_lockacross all CLI commands. Everymodifycall enables output logging. The outermostmodifyinstrumentsmodify.kamalvia AS::Notifications and closes loggers on exit. Nests safely.Kamal::Output::Formatterextends SSHKit's Pretty formatter to capture command output, tagging each line withserver.address(host) andlog.iostream(stdout/stderr) via thread-local contextKamal::Output::OtelLoggersubscribes tomodify.kamalstart/finish for lifecycle events, receives log lines via BroadcastLogger<<. SSHKit severity levels mapped to OTel severity numbers.Kamal::Output::FileLoggeropens a timestamped log file on start, discards pre-start lines, closes on finishKamal::OtelShipperbuffers lines in a thread-safe Queue, flushes every 5s or when buffer reaches batch size, via OTLP HTTP to/v1/logsdeployment.*,exception.*,service.*,server.address,log.iostream), custom attributes namespaced underkamal.*(kamal.command,kamal.runtime,kamal.deploy_version,kamal.performer,kamal.run_id)Test plan
kamal deploywithoutput: otel:configured — verify log records in collector (integration test)kamal deploywithoutput: file:configured — verify timestamped log file with full output (integration test)output:— verify no behavioral change🤖 Generated with Claude Code