Conversation
|
alright, I've made it work with my own fork of The modifications were super light, actually it was just a matter of accepting a new config option for kamal-proxy (+ point to my own Docker image of kamal-proxy). So, my proxy:
ssl: true
hosts: [""]
tls_on_demand_url: "http://staging-app-web-latest:3000/locomotive/api/allowed_host"
app_port: 3000
forward_headers: trueNotes:
|
|
I also needed this feature to run the LocomotiveCMS hosting platform behind Kamal-Proxy 👉 did#1 |
|
thanks @kwilczynski for the code review! |
kevinmcconnell
left a comment
There was a problem hiding this comment.
Hey @did,
Thanks for getting this started! This does seem like a useful addition.
I left a couple of comments on the details. Let me know what you think about addressing those. (I'm also happy to make the changes myself if you don't have time).
internal/server/service.go
Outdated
| return autocert.HostWhitelist(hosts...), nil | ||
| } | ||
|
|
||
| _, err := url.ParseRequestURI(options.TLSOnDemandUrl) |
There was a problem hiding this comment.
I'd expect that the on demand URL will usually point to an endpoint in the application that's deployed, rather than to some other external app. In which case, it would be simpler for this to be a path rather than an absolute URL. We'd then automatically call it on the currently deployed target (a bit like how the health check paths work).
That way you don't have to worry about having a stable hostname to reach for all versions of the app, etc., because the proxy takes care of that for you.
I'm not sure if there's a common enough need to support an external on demand URL as well, but for simplicity's sake it would be nice to have this be path-only if possible.
What do you think?
There was a problem hiding this comment.
👌 I haven't thought about it!!!
In my production case, it would work perfectly because my Rails app was always in charge of returning that list of hostnames (even when I was hosting it with k8s).
Indeed, it'd have saved me a lot of time, trying to figure out the hostname of my endpoint.
Perhaps (it's a guess), we should keep the URL as well for developers who prefers to move the responsibility of this endpoint to another app (and probably deployed by Kamal too) for performance or architecture reasons.
Let's keep it simple in a first time so let's use the path only :-)
(I will make the modifications)
There was a problem hiding this comment.
I'd vote for full path as perhaps someone would want to host the check in a Kamal accessory instead?
There was a problem hiding this comment.
We can always allow both: if you supply a path we'll call that on the app target, but if you supply a URL to some other endpoint we'll use that.
That way we get to keep the simpler configuration when the app is responsible for checking, which I suspect would be the more common case.
There was a problem hiding this comment.
@kevinmcconnell I've implemented it. Let me know if it's okay for you. I also need to re-test this new implementation and see if it doesn't break my app.
|
This looks great! Keep up the good work. We run a CMS where we can set up new sites while the app is running. Our current solution is Passenger via OpenResty (Nginx with Lua) and |
Thanks! Funny, Passenger + OpenResty is exactly the solution I'm using to run the LocomotiveCMS hosting SaaS platform! I need to find some free time to work on Kevin's feedbacks. |
Yep, I've never had any problems with this setup. I wrote a custom Lua script to look at our database and find the authorised hostnames. Didn't even think about just hitting an endpoint on the app itself. So much simpler! :) |
|
Thanks for all your work on this, @did! I also need this feature and am happy to help finish this up. Could I help? |
|
thanks @jmadkins! Alright I'm back to this PR and I've just pushing come code. I won't say no to some help My next task is to use either a path or an URL (TlsOnDemandURL). |
|
[UPDATE] I've been testing my version of kamal-proxy on the pre-production Locomotive hosting platform with success 🎉. Here is my config file for Kamal (I'm using my own forked version of Kamal 2.7): |
|
I am also trying to build an app that offers custom domains. I finally got subdomains working with custom certificates which was a long process and now realizing this is a whole new obstacle but I'm excited somebody else has already done the work to make this possible :) Hope it gets merged soon. Would it be possible to use this in combination with a custom SSL certificate for main domain/subdomain routes |
|
I ended up using Caddy in front of kamal-proxy and just ran kamal-proxy in different ports. Works really well and you also get all the extra goodness that Caddy provides if necessary. Here's the basic gist: basecamp/kamal#1613 (comment) Good to see this become part of kamal-proxy but I'm not sure if I'll shift away from Caddy just yet :) |
|
Looking forward to see this feature merged! ❤️ Workaround with putting another ssl-terminating proxy in front of kamal-proxy seems to work only for one app per server. And adding additional layer and accessory complicates the infrastructure. |
|
I'm excited about this feature too <3 |
|
@did @kevinmcconnell What is missing here to get this PR merged? How can we help? |
hey @antonlitvinenko, I don't know. @kevinmcconnell do you need some help? (I can fix conflicts for instance). [UPDATE]: branch rebased. |
|
Thank you for working on this! |
|
Thanks for bringing this all up to date, @did! I've been a bit busy with other things, but I'll try to get this landed as soon as possible. |
|
@kevinmcconnell, my pleasure! Actually, I'm battle-testing the PR with my LocomotiveCMS hosting platform (it will be in production over the week-end). |
There was a problem hiding this comment.
Pull request overview
This PR implements On-Demand TLS functionality for Kamal Proxy, allowing dynamic TLS certificate issuance based on external validation instead of a static host whitelist. The feature is inspired by Caddy's On-Demand TLS and enables multi-tenant scenarios where hosts aren't known at startup.
Changes:
- Added
TLSOnDemandCheckerto validate certificate requests via external or local HTTP endpoints - Introduced
--tls-on-demand-urlCLI flag for configuring the validation endpoint - Modified service initialization to support dynamic host policy determination
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 15 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/server/tls_on_demand.go | New file implementing the on-demand TLS checker with external and local validation policies |
| internal/server/tls_on_demand_test.go | Comprehensive test coverage for on-demand TLS functionality |
| internal/server/service.go | Integration of on-demand checker into cert manager creation; fixed initialization order |
| internal/server/service_test.go | Minor formatting change (blank line added) |
| internal/server/router_test.go | Added test for state restoration with on-demand TLS; formatting updates |
| internal/cmd/deploy.go | Added CLI flag and validation logic for on-demand TLS configuration |
| internal/cmd/deploy_test.go | Tests for deploy command validation with on-demand URL |
| README.md | User-facing documentation for on-demand TLS feature |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
internal/server/tls_on_demand.go
Outdated
| resp, err := client.Get(url) | ||
| if err != nil { | ||
| return err | ||
| } |
There was a problem hiding this comment.
The context parameter passed to ExternalHostPolicy is not being used when making the HTTP request. This means the request won't respect cancellation, deadlines, or other context values. The http.Client.Get method should be replaced with http.NewRequestWithContext followed by client.Do(req) to properly propagate the context.
| resp, err := client.Get(url) | |
| if err != nil { | |
| return err | |
| } | |
| req, err := http.NewRequestWithContext(ctx, http.MethodGet, url, http.NoBody) | |
| if err != nil { | |
| return err | |
| } | |
| resp, err := client.Do(req) | |
| if err != nil { | |
| return err | |
| } |
| func (c *TLSOnDemandChecker) LocalHostPolicy() autocert.HostPolicy { | ||
| return func(ctx context.Context, host string) error { | ||
| path := c.buildURLOrPath(host) | ||
| req, err := http.NewRequestWithContext(ctx, http.MethodGet, path, http.NoBody) | ||
| if err != nil { | ||
| return err | ||
| } | ||
| req.TLS = &tls.ConnectionState{} | ||
|
|
||
| // We use httptest.NewRecorder here to route the request through the service's | ||
| // load balancer and handler, capturing the response in-memory without making | ||
| // a real network request. This ensures the request is processed as if it were | ||
| // an external client, but avoids network overhead and complexity. | ||
| recorder := httptest.NewRecorder() | ||
| c.service.ServeHTTP(recorder, req) |
There was a problem hiding this comment.
Potential infinite recursion risk: LocalHostPolicy calls c.service.ServeHTTP which routes requests through the service's middleware and load balancer. If the validation endpoint path (TLSOnDemandUrl) triggers another TLS certificate request during the ACME challenge process, this could create an infinite loop. Consider adding documentation or safeguards to warn users that the validation endpoint should not require TLS or be excluded from the TLS certificate validation logic.
There was a problem hiding this comment.
@copilot code review[agent] sure but we're talking about a local path managed by Kamal-proxy. So you enabled OnDemandTLS on Kamal proxy, of course your underlying services are in http.
…enhancing the context of the check request
removing it to keep the diff focused on functional changes Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
include the actual error object with the "error" key for better debugging Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.
Comments suppressed due to low confidence (1)
internal/server/service.go:529
redirectURLIfNeededcurrently bypasses HTTP→HTTPS redirects for TLS on-demand checks, but it can still perform canonical-host redirects. Combined with the internal on-demand check request not having a real Host header, this can cause the check endpoint to get a 301 and be treated as “denied”. Consider skipping canonical-host redirects as well whencontextKeyTLSOnDemandCheckis set (or skipping redirects entirely for these internal checks).
desiredScheme := currentScheme
isTLSOnDemandCheck, _ := r.Context().Value(contextKeyTLSOnDemandCheck).(bool)
if s.options.TLSEnabled && s.options.TLSRedirect && currentScheme == "http" && !isTLSOnDemandCheck {
desiredScheme = "https"
}
desiredHost := host
if s.options.CanonicalHost != "" && host != s.options.CanonicalHost {
desiredHost = s.options.CanonicalHost
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
internal/server/tls_on_demand.go
Outdated
| path := c.buildURLOrPath(host) | ||
| req, err := http.NewRequestWithContext(ctx, http.MethodGet, path, http.NoBody) | ||
| if err != nil { | ||
| return err | ||
| } | ||
|
|
||
| // We need to set this context value to true to indicate that this is a TLS on demand check | ||
| ctx = context.WithValue(req.Context(), contextKeyTLSOnDemandCheck, true) | ||
| req = req.WithContext(ctx) | ||
|
|
||
| // We use httptest.NewRecorder here to route the request through the service's | ||
| // load balancer and handler, capturing the response in-memory without making | ||
| // a real network request. This ensures the request is processed as if it were | ||
| // an external client, but avoids network overhead and complexity. | ||
| recorder := httptest.NewRecorder() | ||
| c.service.ServeHTTP(recorder, req) | ||
|
|
There was a problem hiding this comment.
LocalHostPolicy builds an internal request without setting req.Host. That means the downstream app (and the proxy’s redirect logic) sees an empty Host header, which can trigger canonical-host redirects or app-level host validation failures, causing on-demand checks to be denied incorrectly. Set the Host header explicitly (typically to the host being validated) for the internal request.
|
|
||
| // Verify that hosts is not modified | ||
| assert.Equal(t, []string{"example.com"}, deployCmd.args.ServiceOptions.Hosts) | ||
| }) |
There was a problem hiding this comment.
There’s no coverage for the expected validation behavior when --tls-on-demand-url is set but TLS is configured on a non-root path prefix (which should be rejected, consistent with the existing TLS root-path rule). Adding a test case here will prevent regressions once preRun enforces the root-path requirement for on-demand TLS too.
| }) | |
| }) | |
| t.Run("TLS on-demand URL with non-root path prefix should return error", func(t *testing.T) { | |
| deployCmd := newDeployCommand() | |
| // Set flags for TLS with on-demand URL and non-root path prefix | |
| deployCmd.cmd.Flags().Set("target", "http://localhost:8080") | |
| deployCmd.cmd.Flags().Set("tls", "true") | |
| deployCmd.cmd.Flags().Set("tls-on-demand-url", "http://example.com/validate") | |
| deployCmd.cmd.Flags().Set("host", "example.com") | |
| deployCmd.cmd.Flags().Set("path-prefix", "/api") | |
| // Call preRun and expect an error due to non-root path prefix | |
| err := deployCmd.preRun(deployCmd.cmd, []string{"test-service"}) | |
| require.Error(t, err) | |
| }) |
internal/cmd/deploy.go
Outdated
| deployCommand.cmd.Flags().BoolVar(&deployCommand.args.ServiceOptions.TLSEnabled, "tls", false, "Configure TLS for this target (requires a non-empty host)") | ||
| deployCommand.cmd.Flags().StringVar(&deployCommand.args.ServiceOptions.TLSOnDemandUrl, "tls-on-demand-url", "", "Will make an HTTP request to the given URL, asking whether a host is allowed to have a certificate issued.") |
There was a problem hiding this comment.
The --tls flag help text says it "requires a non-empty host", but when --tls-on-demand-url is provided the code explicitly switches to wildcard hosts. Please update the flag description (and/or clarify how --host should be used with on-demand TLS) so CLI help matches the actual behavior.
| deployCommand.cmd.Flags().BoolVar(&deployCommand.args.ServiceOptions.TLSEnabled, "tls", false, "Configure TLS for this target (requires a non-empty host)") | |
| deployCommand.cmd.Flags().StringVar(&deployCommand.args.ServiceOptions.TLSOnDemandUrl, "tls-on-demand-url", "", "Will make an HTTP request to the given URL, asking whether a host is allowed to have a certificate issued.") | |
| deployCommand.cmd.Flags().BoolVar(&deployCommand.args.ServiceOptions.TLSEnabled, "tls", false, "Configure TLS for this target. Without --tls-on-demand-url, at least one host is typically required; with --tls-on-demand-url, hosts may be left empty to allow wildcard/on-demand hosts.") | |
| deployCommand.cmd.Flags().StringVar(&deployCommand.args.ServiceOptions.TLSOnDemandUrl, "tls-on-demand-url", "", "Make an HTTP request to the given URL to decide whether a host is allowed to have a certificate issued (may be used with empty --host for wildcard/on-demand hosts).") |
| func (c *TLSOnDemandChecker) ExternalHostPolicy() autocert.HostPolicy { | ||
| return func(ctx context.Context, host string) error { | ||
| client := &http.Client{Timeout: 2 * time.Second} | ||
| requestURL := c.buildURLOrPath(host) | ||
| resp, err := client.Get(requestURL) | ||
| if err != nil { | ||
| return err | ||
| } | ||
| defer resp.Body.Close() | ||
|
|
||
| if resp.StatusCode != http.StatusOK { | ||
| body := make([]byte, 256) | ||
| n, _ := resp.Body.Read(body) | ||
| bodyStr := string(body[:n]) | ||
| return c.handleError(host, resp.StatusCode, bodyStr) | ||
| } |
There was a problem hiding this comment.
ExternalHostPolicy ignores the ctx it’s passed (uses client.Get), so cancellations/deadlines from the autocert call stack won’t propagate. Also, reading the response body with a single Read can return fewer bytes than available, leading to empty/partial error bodies even when the endpoint returns content. Use a request created with http.NewRequestWithContext + client.Do, and read up to 256 bytes via a bounded read (e.g., a limit reader) for deterministic results.
internal/server/tls_on_demand.go
Outdated
| func (c *TLSOnDemandChecker) buildURLOrPath(host string) string { | ||
| return fmt.Sprintf("%s?host=%s", c.options.TLSOnDemandUrl, url.QueryEscape(host)) | ||
| } |
There was a problem hiding this comment.
buildURLOrPath always appends ?host=..., which breaks when tls_on_demand_url already contains a query string (it would produce ...?a=b?host=...). Build the URL via url.Parse and set/merge the host query param so both existing and new query params are handled correctly.
| // Verify the body is truncated to 256 bytes | ||
| assert.Len(t, err.Error(), 256+len("test.example.com is not allowed to get a certificate (status: 403, body: \"")+len("\")")) |
There was a problem hiding this comment.
This assertion depends on the exact error string length, which is brittle (any wording change in the error message will break the test even if truncation still works). Prefer asserting that the embedded body portion is truncated to 256 bytes (or that the total error length is within an upper bound) rather than matching an exact total length.
add an assertion that verifies a different host is denied when the external URL returns a non-200 status code Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 9 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| resp, err := client.Get(requestURL) | ||
| if err != nil { | ||
| return err | ||
| } |
There was a problem hiding this comment.
ExternalHostPolicy ignores the provided ctx (it calls client.Get without binding ctx). This can cause the ACME host check to continue even after the caller cancels, and it prevents per-check deadlines from being honored. Build an http.Request with http.NewRequestWithContext(ctx, ...) and use client.Do(req).
| resp, err := client.Get(requestURL) | |
| if err != nil { | |
| return err | |
| } | |
| req, err := http.NewRequestWithContext(ctx, http.MethodGet, requestURL, nil) | |
| if err != nil { | |
| return err | |
| } | |
| resp, err := client.Do(req) | |
| if err != nil { | |
| return err | |
| } |
internal/server/tls_on_demand.go
Outdated
| path := c.buildURLOrPath(host) | ||
| req, err := http.NewRequestWithContext(ctx, http.MethodGet, path, http.NoBody) | ||
| if err != nil { | ||
| return err | ||
| } | ||
|
|
||
| // We need to set this context value to true to indicate that this is a TLS on demand check | ||
| ctx = context.WithValue(req.Context(), contextKeyTLSOnDemandCheck, true) | ||
| req = req.WithContext(ctx) | ||
|
|
There was a problem hiding this comment.
LocalHostPolicy creates a request with a path-only URL and never sets req.Host. This means r.Host will be empty inside the service, which can trigger unexpected canonical-host redirects (301) or break on-demand endpoints that depend on the Host header. Consider setting req.Host (and possibly the URL Host) to the "host" being validated so the on-demand handler sees a realistic request context.
| // Create a mock service that returns a large response body | ||
| largeBody := string(make([]byte, 500)) // 500 bytes | ||
|
|
There was a problem hiding this comment.
largeBody is built from a zero-filled byte slice (NUL bytes). That makes failures/log output hard to interpret and isn't representative of typical HTTP response bodies. Consider using a repeated printable character (e.g., strings.Repeat("a", 500)) so the truncation behavior is easier to reason about in the test.
| return func(ctx context.Context, host string) error { | ||
| path := c.buildURLOrPath(host) | ||
| req, err := http.NewRequestWithContext(ctx, http.MethodGet, path, http.NoBody) | ||
| if err != nil { | ||
| return err | ||
| } | ||
|
|
There was a problem hiding this comment.
LocalHostPolicy forwards the incoming ctx directly into the internal request, but unlike ExternalHostPolicy it doesn't enforce the documented 2s timeout. If the local handler/proxy path is slow or hangs, certificate issuance can block indefinitely. Consider wrapping ctx with context.WithTimeout (same duration as external) for the local check as well.
| // Verify the body is truncated to 256 bytes | ||
| assert.Len(t, err.Error(), 256+len("test.example.com is not allowed to get a certificate (status: 403, body: \"")+len("\")")) | ||
| } |
There was a problem hiding this comment.
This test asserts the exact length of err.Error() to verify truncation. That makes the test brittle to any future change in the error message formatting (even punctuation/escaping) while not materially improving confidence. Prefer asserting that the error contains the expected status and that the body portion is limited to 256 bytes (e.g., compare against the first 256 bytes of the response body).
fix: avoid potential host redirects if isTLSOnDemandCheck. Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
fix: avoid malformed external URL (TLSOnDemandURL) Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| func (s *Service) initialize(options ServiceOptions, targetOptions TargetOptions) error { | ||
| previousOptions := s.options | ||
| previousTargetOptions := s.targetOptions | ||
|
|
||
| s.options = options | ||
| s.targetOptions = targetOptions | ||
|
|
||
| certManager, err := s.createCertManager(options) | ||
| if err != nil { | ||
| s.options = previousOptions | ||
| s.targetOptions = previousTargetOptions | ||
| return err | ||
| } | ||
|
|
||
| middleware, err := s.createMiddleware(options, certManager) | ||
| if err != nil { | ||
| s.options = previousOptions | ||
| s.targetOptions = previousTargetOptions | ||
| return err | ||
| } |
There was a problem hiding this comment.
In initialize, s.options/s.targetOptions are now assigned before createCertManager/createMiddleware succeed. Because serviceRequestWithTarget reads s.options on the hot path without a lock, live requests can observe partially-updated options while still running on the old middleware/cert manager (and even briefly observe options that later get rolled back on error). Consider restoring the previous “build first, then swap” pattern and instead make the TLS-on-demand host policy builder take options as an explicit input (so it doesn’t depend on mutating s.options early), or otherwise ensure the update is atomic/thread-safe.
| c.args.ServiceOptions.Hosts = []string{""} | ||
| return nil | ||
| } | ||
|
|
||
| if len(c.args.ServiceOptions.Hosts) == 0 { | ||
| return fmt.Errorf("host must be set when using TLS") | ||
| } |
There was a problem hiding this comment.
preRun calls ServiceOptions.Normalize() before validating TLS host requirements. Since NormalizeHosts converts an empty host list into []string{""}, the later check len(c.args.ServiceOptions.Hosts) == 0 will never catch “no host provided”, and --tls can slip through with an empty host when --tls-on-demand-url is not set. Consider validating that at least one non-empty host is present (e.g., Hosts[0] != "") when TLS is enabled and on-demand URL is empty, and avoid the early return nil so the rest of preRun validations continue to run.
| c.args.ServiceOptions.Hosts = []string{""} | |
| return nil | |
| } | |
| if len(c.args.ServiceOptions.Hosts) == 0 { | |
| return fmt.Errorf("host must be set when using TLS") | |
| } | |
| // When using TLS on demand, use an empty host sentinel and allow | |
| // the rest of the validations below to run. | |
| c.args.ServiceOptions.Hosts = []string{""} | |
| } else { | |
| // When TLS is enabled without on-demand URL, require at least one | |
| // non-empty host. Normalize() may have converted an empty host | |
| // list into []string{""}, so we must check for a non-empty value. | |
| if len(c.args.ServiceOptions.Hosts) == 0 || c.args.ServiceOptions.Hosts[0] == "" { | |
| return fmt.Errorf("host must be set when using TLS") | |
| } | |
| } |
Big fan of Kamal & Kamal proxy here. Great job 👏
However, for my 2 Rails projects (website hosting for LocomotiveCMS & Maglev), I was missing the On-Demand TLS feature in Caddy.
It turns out it was easy to implement it in Kamal-Proxy since the
autocert.HostPolicytype is just a function returning an error if the host is not allowed to get a certificate.So, just by setting a new config option (
--tls-on-demand-url), we can test dynamically if a host can get a TLS certificate, just by calling the--tls-on-demand-urlendpoint.It just has to return a 200 HTTP code (http://my-api-end-point/any-path?host=).
In order to test it on my server, I implemented a little Sinatra service to test but I didn't include in my PR because it was in Ruby and the single example in your repository was written in Go. Besides, this is a super niche feature, so it might not require an example.
My next step is to build a Kamal proxy docker image and use it in the Kamal deployment of one of my projects.
Let me know if you want me to re-work the PR to match your PR rules.
Thanks!