Skip to content

implement the On-Demand TLS feature#63

Open
did wants to merge 16 commits intobasecamp:mainfrom
did:on-demand-tls
Open

implement the On-Demand TLS feature#63
did wants to merge 16 commits intobasecamp:mainfrom
did:on-demand-tls

Conversation

@did
Copy link

@did did commented Nov 1, 2024

Big fan of Kamal & Kamal proxy here. Great job 👏

However, for my 2 Rails projects (website hosting for LocomotiveCMS & Maglev), I was missing the On-Demand TLS feature in Caddy.

It turns out it was easy to implement it in Kamal-Proxy since the autocert.HostPolicy type is just a function returning an error if the host is not allowed to get a certificate.

So, just by setting a new config option (--tls-on-demand-url), we can test dynamically if a host can get a TLS certificate, just by calling the --tls-on-demand-url endpoint.
It just has to return a 200 HTTP code (http://my-api-end-point/any-path?host=).

In order to test it on my server, I implemented a little Sinatra service to test but I didn't include in my PR because it was in Ruby and the single example in your repository was written in Go. Besides, this is a super niche feature, so it might not require an example.

require 'sinatra'

ALLOWED_HOSTS = %w(paul.nocoffee.fr sacha.nocoffee.fr)

get '/' do
  <<-HTML 
<html>
  <body>
    <h1>Hello #{request.host}!</h1>
  </body>
</html>
  HTML
end

get '/up' do
  200
end

get '/check' do
  ALLOWED_HOSTS.include?(params[:host]) ? [200, ['ok']] : [406, ['fail']]
end

My next step is to build a Kamal proxy docker image and use it in the Kamal deployment of one of my projects.

Let me know if you want me to re-work the PR to match your PR rules.

Thanks!

@did
Copy link
Author

did commented Nov 3, 2024

alright, I've made it work with my own fork of kamal.

The modifications were super light, actually it was just a matter of accepting a new config option for kamal-proxy (+ point to my own Docker image of kamal-proxy).

So, my config/deploy.yml looks like:

proxy:
  ssl: true
  hosts: [""]
  tls_on_demand_url: "http://staging-app-web-latest:3000/locomotive/api/allowed_host"
  app_port: 3000
  forward_headers: true

Notes:

  • hosts: [""] is important. Basically, it tricks the kamal-proxy router by allowing the * (wildcard) route (originally, impossible with ssl: true). I'd rather put hosts: "*" instead but it would require some extra work on the kamal-proxy app.
  • staging-app-web-latest is the host of my app, following the Kamal app name pattern <service_name>-web-<version>. I had to set a specific version to make it work across deployments (export VERSION=latest).

@did
Copy link
Author

did commented Nov 29, 2024

I also needed this feature to run the LocomotiveCMS hosting platform behind Kamal-Proxy 👉 did#1

@did
Copy link
Author

did commented Dec 2, 2024

thanks @kwilczynski for the code review!

Copy link
Collaborator

@kevinmcconnell kevinmcconnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @did,

Thanks for getting this started! This does seem like a useful addition.

I left a couple of comments on the details. Let me know what you think about addressing those. (I'm also happy to make the changes myself if you don't have time).

return autocert.HostWhitelist(hosts...), nil
}

_, err := url.ParseRequestURI(options.TLSOnDemandUrl)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect that the on demand URL will usually point to an endpoint in the application that's deployed, rather than to some other external app. In which case, it would be simpler for this to be a path rather than an absolute URL. We'd then automatically call it on the currently deployed target (a bit like how the health check paths work).

That way you don't have to worry about having a stable hostname to reach for all versions of the app, etc., because the proxy takes care of that for you.

I'm not sure if there's a common enough need to support an external on demand URL as well, but for simplicity's sake it would be nice to have this be path-only if possible.

What do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌 I haven't thought about it!!!

In my production case, it would work perfectly because my Rails app was always in charge of returning that list of hostnames (even when I was hosting it with k8s).
Indeed, it'd have saved me a lot of time, trying to figure out the hostname of my endpoint.

Perhaps (it's a guess), we should keep the URL as well for developers who prefers to move the responsibility of this endpoint to another app (and probably deployed by Kamal too) for performance or architecture reasons.

Let's keep it simple in a first time so let's use the path only :-)

(I will make the modifications)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd vote for full path as perhaps someone would want to host the check in a Kamal accessory instead?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can always allow both: if you supply a path we'll call that on the app target, but if you supply a URL to some other endpoint we'll use that.

That way we get to keep the simpler configuration when the app is responsible for checking, which I suspect would be the more common case.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevinmcconnell I've implemented it. Let me know if it's okay for you. I also need to re-test this new implementation and see if it doesn't break my app.

@brendon
Copy link

brendon commented Mar 6, 2025

This looks great! Keep up the good work. We run a CMS where we can set up new sites while the app is running. Our current solution is Passenger via OpenResty (Nginx with Lua) and https://github.com/auto-ssl/lua-resty-auto-ssl. The last project on the list there appears abandoned though so I'm looking at other options :)

@did
Copy link
Author

did commented Mar 6, 2025

This looks great! Keep up the good work. We run a CMS where we can set up new sites while the app is running. Our current solution is Passenger via OpenResty (Nginx with Lua) and https://github.com/auto-ssl/lua-resty-auto-ssl. The last project on the list there appears abandoned though so I'm looking at other options :)

Thanks! Funny, Passenger + OpenResty is exactly the solution I'm using to run the LocomotiveCMS hosting SaaS platform!
The staging env uses Kamal with my patches and it runs great.

I need to find some free time to work on Kevin's feedbacks.

@brendon
Copy link

brendon commented Mar 6, 2025

This looks great! Keep up the good work. We run a CMS where we can set up new sites while the app is running. Our current solution is Passenger via OpenResty (Nginx with Lua) and https://github.com/auto-ssl/lua-resty-auto-ssl. The last project on the list there appears abandoned though so I'm looking at other options :)

Thanks! Funny, Passenger + OpenResty is exactly the solution I'm using to run the LocomotiveCMS hosting SaaS platform! The staging env uses Kamal with my patches and it runs great.

I need to find some free time to work on Kevin's feedbacks.

Yep, I've never had any problems with this setup. I wrote a custom Lua script to look at our database and find the authorised hostnames. Didn't even think about just hitting an endpoint on the app itself. So much simpler! :)

@jmadkins
Copy link

jmadkins commented Jul 2, 2025

Thanks for all your work on this, @did! I also need this feature and am happy to help finish this up. Could I help?

@did
Copy link
Author

did commented Jul 9, 2025

thanks @jmadkins! Alright I'm back to this PR and I've just pushing come code. I won't say no to some help ☺️

My next task is to use either a path or an URL (TlsOnDemandURL).
In the case of the path, it's still blurry for me if I had to get a hostname or just naively the localhost.
And perhaps, we will have to rename the config name as well since TlsOnDemandUrl is a bit weird if by default, we expect a path. Any thoughts?

@did did force-pushed the on-demand-tls branch from f7420e3 to 303332c Compare July 10, 2025 18:03
@did
Copy link
Author

did commented Aug 8, 2025

[UPDATE]

I've been testing my version of kamal-proxy on the pre-production Locomotive hosting platform with success 🎉.
Thanks for pushing me to implement the local path, so easier to setup indeed!

Here is my config file for Kamal (I'm using my own forked version of Kamal 2.7):

proxy: 
  ssl: true
  host: "*"
  tls_on_demand_url: "/locomotive/api/allowed_host"
  ssl_redirect: false
  app_port: 3000
  forward_headers: true  
  healthcheck: 
    interval: 3
    timeout: 60

@indigotechtutorials
Copy link

I am also trying to build an app that offers custom domains. I finally got subdomains working with custom certificates which was a long process and now realizing this is a whole new obstacle but I'm excited somebody else has already done the work to make this possible :) Hope it gets merged soon.

Would it be possible to use this in combination with a custom SSL certificate for main domain/subdomain routes

@brendon
Copy link

brendon commented Sep 9, 2025

I ended up using Caddy in front of kamal-proxy and just ran kamal-proxy in different ports. Works really well and you also get all the extra goodness that Caddy provides if necessary. Here's the basic gist: basecamp/kamal#1613 (comment)

Good to see this become part of kamal-proxy but I'm not sure if I'll shift away from Caddy just yet :)

@anatolyrr
Copy link

Looking forward to see this feature merged! ❤️

Workaround with putting another ssl-terminating proxy in front of kamal-proxy seems to work only for one app per server. And adding additional layer and accessory complicates the infrastructure.

@leh
Copy link

leh commented Oct 4, 2025

I'm excited about this feature too <3

@antonlitvinenko
Copy link

@did @kevinmcconnell What is missing here to get this PR merged? How can we help?

@did
Copy link
Author

did commented Feb 23, 2026

@did @kevinmcconnell What is missing here to get this PR merged? How can we help?

hey @antonlitvinenko, I don't know. @kevinmcconnell do you need some help? (I can fix conflicts for instance).

[UPDATE]: branch rebased.

@ronald2wing
Copy link

Thank you for working on this!

@kevinmcconnell
Copy link
Collaborator

Thanks for bringing this all up to date, @did! I've been a bit busy with other things, but I'll try to get this landed as soon as possible.

@did
Copy link
Author

did commented Feb 26, 2026

@kevinmcconnell, my pleasure!

Actually, I'm battle-testing the PR with my LocomotiveCMS hosting platform (it will be in production over the week-end).
Thus, yesterday, I was able to fix a bug when restoring the state (not sure about my fix though).
I'll keep you posted.

Copilot AI review requested due to automatic review settings February 26, 2026 21:56
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements On-Demand TLS functionality for Kamal Proxy, allowing dynamic TLS certificate issuance based on external validation instead of a static host whitelist. The feature is inspired by Caddy's On-Demand TLS and enables multi-tenant scenarios where hosts aren't known at startup.

Changes:

  • Added TLSOnDemandChecker to validate certificate requests via external or local HTTP endpoints
  • Introduced --tls-on-demand-url CLI flag for configuring the validation endpoint
  • Modified service initialization to support dynamic host policy determination

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 15 comments.

Show a summary per file
File Description
internal/server/tls_on_demand.go New file implementing the on-demand TLS checker with external and local validation policies
internal/server/tls_on_demand_test.go Comprehensive test coverage for on-demand TLS functionality
internal/server/service.go Integration of on-demand checker into cert manager creation; fixed initialization order
internal/server/service_test.go Minor formatting change (blank line added)
internal/server/router_test.go Added test for state restoration with on-demand TLS; formatting updates
internal/cmd/deploy.go Added CLI flag and validation logic for on-demand TLS configuration
internal/cmd/deploy_test.go Tests for deploy command validation with on-demand URL
README.md User-facing documentation for on-demand TLS feature

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


func testCreateServiceWithHandler(t *testing.T, options ServiceOptions, targetOptions TargetOptions, handler http.Handler) *Service {
server := httptest.NewServer(handler)

Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whitespace-only change (adding a blank line) appears to be unintentional formatting that doesn't contribute to the feature implementation. Consider removing it to keep the diff focused on functional changes.

Suggested change

Copilot uses AI. Check for mistakes.

service := &Service{options: ServiceOptions{TLSOnDemandUrl: server.URL}}
checker := NewTLSOnDemandChecker(service)
policy, _ := checker.HostPolicy()
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error return value from checker.HostPolicy() is silently ignored using the blank identifier. While this test case is designed to succeed, it's better practice to use require.NoError(t, err) to ensure the test fails explicitly if an unexpected error occurs, making test failures clearer.

Copilot uses AI. Check for mistakes.
Comment on lines +83 to +86
resp, err := client.Get(url)
if err != nil {
return err
}
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The context parameter passed to ExternalHostPolicy is not being used when making the HTTP request. This means the request won't respect cancellation, deadlines, or other context values. The http.Client.Get method should be replaced with http.NewRequestWithContext followed by client.Do(req) to properly propagate the context.

Suggested change
resp, err := client.Get(url)
if err != nil {
return err
}
req, err := http.NewRequestWithContext(ctx, http.MethodGet, url, http.NoBody)
if err != nil {
return err
}
resp, err := client.Do(req)
if err != nil {
return err
}

Copilot uses AI. Check for mistakes.
Comment on lines +82 to +83
url := c.buildURLOrPath(host)
resp, err := client.Get(url)
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable shadowing: the local variable 'url' (line 82) shadows the imported package 'url' from line 9. This makes the url package methods inaccessible within this scope and could lead to confusion. Consider renaming the local variable to something like 'requestURL' or 'endpoint'.

Suggested change
url := c.buildURLOrPath(host)
resp, err := client.Get(url)
requestURL := c.buildURLOrPath(host)
resp, err := client.Get(requestURL)

Copilot uses AI. Check for mistakes.
})

checker := NewTLSOnDemandChecker(service)
policy, _ := checker.HostPolicy()
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error return value from checker.HostPolicy() is silently ignored using the blank identifier. While this test case is designed to succeed, it's better practice to use require.NoError(t, err) to ensure the test fails explicitly if an unexpected error occurs, making test failures clearer.

Copilot uses AI. Check for mistakes.
kamal-proxy deploy service1 --target web-1:3000 --host "" --tls --tls-on-demand-url="http://localhost:4567/check"

The On-demand URL endpoint must return a 200 HTTP status code to allow certificate issuance.
Kamal Proxy will call the on-demand URL with a query string of `?host=` containing the host received by Kamal Proxy.
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The query string description could be clearer. Instead of "with a query string of ?host= containing the host", consider "with a query string parameter host containing the hostname" to make it clearer that the actual format is ?host=hostname.example.com.

Suggested change
Kamal Proxy will call the on-demand URL with a query string of `?host=` containing the host received by Kamal Proxy.
Kamal Proxy will call the on-demand URL with a query string parameter `host` containing the hostname received by Kamal Proxy (for example, `?host=hostname.example.com`).

Copilot uses AI. Check for mistakes.
_, err := url.ParseRequestURI(c.options.TLSOnDemandUrl)

if err != nil {
slog.Error("Unable to parse the tls_on_demand_url URL")
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The slog.Error call is missing the "error" parameter. According to the codebase pattern (e.g., cert.go:25, service.go:426), slog.Error calls should include the actual error object with the "error" key for better debugging.

Suggested change
slog.Error("Unable to parse the tls_on_demand_url URL")
slog.Error("Unable to parse the tls_on_demand_url URL", "error", err, "url", c.options.TLSOnDemandUrl)

Copilot uses AI. Check for mistakes.
Comment on lines +105 to +107
if c.args.ServiceOptions.TLSOnDemandUrl != "" {
c.args.ServiceOptions.Hosts = []string{""}
return nil
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When TLSOnDemandUrl is set, the function returns early (line 107) which bypasses the path prefix validation check at line 114-116 that ensures TLS settings are specified on the root path service. This could allow invalid configurations where on-demand TLS is enabled on non-root paths, which may not work correctly. Consider moving the path prefix validation before the TLSOnDemandUrl check or ensuring it applies to both cases.

Copilot uses AI. Check for mistakes.
Comment on lines +50 to +64
func (c *TLSOnDemandChecker) LocalHostPolicy() autocert.HostPolicy {
return func(ctx context.Context, host string) error {
path := c.buildURLOrPath(host)
req, err := http.NewRequestWithContext(ctx, http.MethodGet, path, http.NoBody)
if err != nil {
return err
}
req.TLS = &tls.ConnectionState{}

// We use httptest.NewRecorder here to route the request through the service's
// load balancer and handler, capturing the response in-memory without making
// a real network request. This ensures the request is processed as if it were
// an external client, but avoids network overhead and complexity.
recorder := httptest.NewRecorder()
c.service.ServeHTTP(recorder, req)
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential infinite recursion risk: LocalHostPolicy calls c.service.ServeHTTP which routes requests through the service's middleware and load balancer. If the validation endpoint path (TLSOnDemandUrl) triggers another TLS certificate request during the ACME challenge process, this could create an infinite loop. Consider adding documentation or safeguards to warn users that the validation endpoint should not require TLS or be excluded from the TLS certificate validation logic.

Copilot uses AI. Check for mistakes.

assert.NoError(t, manager.HostPolicy(context.Background(), "tenant.example.com"))
}

Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test only verifies that the on-demand policy allows a host when the external URL returns 200, but doesn't test the denial case. Consider adding an assertion that verifies a different host is denied when the external URL returns a non-200 status code, to ensure the validation logic works correctly in both directions.

Suggested change
func TestRouter_RestoreLastSavedState_WithTLSOnDemandURL_HostPolicyDeniesOnNon200(t *testing.T) {
statePath := filepath.Join(t.TempDir(), "state-deny.json")
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusForbidden)
}))
defer server.Close()
state := fmt.Sprintf(`[{
"name":"ondemand-deny",
"options":{
"hosts":[""],
"path_prefixes":["/"],
"tls_enabled":true,
"tls_certificate_path":"",
"tls_private_key_path":"",
"tls_on_demand_url":"%s",
"tls_redirect":false,
"canonical_host":"",
"acme_directory":"",
"acme_cache_path":"",
"error_page_path":"",
"strip_prefix":true,
"writer_affinity_timeout":1000000000,
"read_targets_accept_websockets":false
},
"target_options":{
"health_check_config":{"path":"/up","port":0,"interval":1000000000,"timeout":5000000000,"host":""},
"response_timeout":30000000000,
"buffer_requests":false,
"buffer_responses":false,
"max_memory_buffer_size":1048576,
"max_request_body_size":0,
"max_response_body_size":0,
"log_request_headers":null,
"log_response_headers":null,
"forward_headers":false,
"scope_cookie_paths":false
},
"active_targets":["localhost:3000"],
"active_readers":[],
"rollout_targets":null,
"rollout_readers":null,
"pause_controller":{"state":0,"stop_message":"","fail_after":0},
"rollout_controller":null
}]`, server.URL)
require.NoError(t, os.WriteFile(statePath, []byte(state), 0600))
router := NewRouter(statePath)
require.NoError(t, router.RestoreLastSavedState())
service := router.services.Get("ondemand-deny")
require.NotNil(t, service)
manager, ok := service.certManager.(*autocert.Manager)
require.True(t, ok)
require.NotNil(t, manager.HostPolicy)
assert.Error(t, manager.HostPolicy(context.Background(), "denied.example.com"))
}

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.