Skip to content

Conversation

Minding000
Copy link

Issue

The Caddy image needs the NET_BIND_SERVICE capability. There is no explanation on why and how to avoid it.
Especially in Kubernetes it is common to drop all capabilities and to use non privileged ports. Here the default behavior is confusing.

Changes

Added explanation on why the capability is required by default and how to change it.
Included the error message to make it searchable.

@Minding000 Minding000 changed the title Added NET_BIND_SERVICE note Caddy: Added NET_BIND_SERVICE note Sep 2, 2025
@Minding000 Minding000 force-pushed the caddy/net-bind-service branch from d4bfda3 to b198010 Compare September 2, 2025 20:54
@tianon
Copy link
Member

tianon commented Sep 2, 2025

cc caddy image maintainers for review/approval: @hairyhenderson @francislavoie

(see also moby/moby#41030, which adjusted net.ipv4.ip_unprivileged_port_start to be 0 by default in Docker's network namespaces in Docker 20.10+ which negates the original purpose of adding this capability to the binary and thus that might be worth reconsidering now)

@francislavoie
Copy link
Contributor

We added that in 2023 @tianon, it was necessary for certain users who wanted to run the container as non-root, I think.

Added in caddyserver/caddy-docker@b650917, issue for context: caddyserver/caddy-docker#104 (comment)

I don't really understand all the nuances here, maybe you could clarify with that context in mind 🤔

@tianon
Copy link
Member

tianon commented Sep 2, 2025

At that time, I don't think this change had made its way into the common Kubernetes runtimes, and I believe that's changed now, but worth confirming. In other words, for the majority of use cases in containers now, "privileged ports" shouldn't be a thing, and any port should be usable regardless of the UID the container is running as (and that's safe due to the inherent isolation that is a network namespace).

In other words, I don't think caddy needs to continue jumping through these hoops to set that capability on the binary at all, and that would simplify this PR in particular because the problem would go away entirely instead, but perhaps @Minding000 can verify by testing with their cap-less image whether binding to ports lower than 1024 still works?

@tianon
Copy link
Member

tianon commented Sep 2, 2025

To illustrate further, with just Docker:

$ docker run -it --rm --user 1000:1000 --cap-drop NET_BIND_SERVICE --name test "$(docker build -q - <<<$'FROM caddy\nRUN setcap cap_net_bind_service=-ep /usr/bin/caddy')"
2025/09/02 21:26:58.562	INFO	maxprocs: Leaving GOMAXPROCS=16: CPU quota undefined
2025/09/02 21:26:58.562	INFO	GOMEMLIMIT is updated	{"package": "github.com/KimMachineGun/automemlimit/memlimit", "GOMEMLIMIT": 60335323545, "previous": 9223372036854775807}
2025/09/02 21:26:58.562	INFO	using config from file	{"file": "/etc/caddy/Caddyfile"}
2025/09/02 21:26:58.563	INFO	adapted config to JSON	{"adapter": "caddyfile"}
2025/09/02 21:26:58.563	INFO	admin	admin endpoint started	{"address": "localhost:2019", "enforce_origin": false, "origins": ["//localhost:2019", "//[::1]:2019", "//127.0.0.1:2019"]}
2025/09/02 21:26:58.564	WARN	http.auto_https	server is listening only on the HTTP port, so no automatic HTTPS will be applied to this server	{"server_name": "srv0", "http_port": 80}
2025/09/02 21:26:58.564	INFO	tls.cache.maintenance	started background certificate maintenance	{"cache": "0xc0006d6400"}
2025/09/02 21:26:58.564	WARN	http	HTTP/2 skipped because it requires TLS	{"network": "tcp", "addr": ":80"}
2025/09/02 21:26:58.564	WARN	http	HTTP/3 skipped because it requires TLS	{"network": "tcp", "addr": ":80"}
2025/09/02 21:26:58.564	INFO	http.log	server running	{"name": "srv0", "protocols": ["h1", "h2", "h3"]}
2025/09/02 21:26:58.564	ERROR	unable to autosave config	{"file": "/config/caddy/autosave.json", "error": "open /config/caddy/autosave.json: permission denied"}
2025/09/02 21:26:58.564	INFO	serving initial configuration
2025/09/02 21:26:58.564	WARN	tls	unable to get instance ID; storage clean stamps will be incomplete	{"error": "open /data/caddy/instance.uuid: permission denied"}
2025/09/02 21:26:58.564	ERROR	tls	could not clean default/global storage	{"error": "unable to acquire storage_clean lock: creating lock file: open /data/caddy/locks/storage_clean.lock: no such file or directory"}
2025/09/02 21:26:58.564	INFO	tls	finished cleaning storage units

(listening just fine on port 80, even when started as user 1000:1000, and the capability in question explicitly dropped)

@francislavoie
Copy link
Contributor

I see, good to know.

We still have issues with non-root though, namely /data and /config dirs don't have the correct permissions since they get created as root in the build (as you can see in the logs). I don't know how to resolve that, they need to be writable most of the time.

FYI @abjugard, you had done the patch for setcap, do you have any objection to us reverting that, now that k8s should have caught up?

@Minding000
Copy link
Author

@francislavoie My understanding is:

  • Binding to privileged ports requires NET_BIND_SERVICE
  • Processes started by the root user have this capability, others don't (by default)
  • setcap was added to the image to allow non-root users to bind to privileged ports (e.g. 80/443)
  • setcap also requires the container runtime to give the process the capability NET_BIND_SERVICE
  • Most container runtimes add basic capabilities by default, but in hardened environments all capabilities are usually dropped
  • The moby runtime has moved to declaring all ports unprivileged (see PR linked by @tianon)
  • containerd / runc still declare ports below 1024 privileged @tianon:

Error: loading initial config: loading new config: http app module: start: listening on :80: listen tcp :80: bind: permission denied

What is confusing as a user is that you need to set the capability even when you're not using an privileged port.
Of course there is no way to know that at build time for the caddy-docker repository.
So I'm suggesting a note in the documentation.
An alternative would be a nocap image tag e.g. caddy:caddy:2.10.0-alpine-nocap without the setcap command, but that might be too much of a hassle

@francislavoie
Copy link
Contributor

containerd / runc still declare ports below 1024 privileged

Argh, what a mess 🙈

@Minding000
Copy link
Author

Minding000 commented Sep 2, 2025

I think even containerd might allow privileged ports by default starting at version 2: containerd/containerd#6924
I am running v1.7.27 (even there it is configurable) - the lastest available in the Docker APT repository.

I agree with @tianon that it makes sense to reconsider including setcap in the image. Especially with docker and containerd allowing unprivileged ports going forwards. Many projects running in Kubernetes use unprivileged ports for HTTP (e.g. 3000) and rely on the Ingress to route it to 443. Other projects can include setcap in their own image derivations.

But with many people still running older runtimes that would be a breaking change, so I don't expect this to happen (quickly).
Until then I think the documentation note is useful for anyone that is confused by the behavior.

Sidenote: nginx has a separate unprivileged image that exposes a different port by default avoiding the setcap issue: https://hub.docker.com/r/nginxinc/nginx-unprivileged

@abjugard
Copy link

abjugard commented Sep 4, 2025

FYI @abjugard, you had done the patch for setcap, do you have any objection to us reverting that, now that k8s should have caught up?

If the ports exposed is changed from 80/443 to higher numbers I don't see a reason for calling setcap, but as @Minding000 mentions many Linux distributions retain the configuration of ports below 1024 being privileged and requiring special capabilities be set on the binary in order to run as a non-root user.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants