Optionally disallow the use of 'root' user for Processes and Tasks of docker lifecycle Apps #4452

acosta11 · 2025-07-14T21:35:14Z

Proposed Change

Add a configuration option to allow or disallow the use of the 'root' and '0' user for processes and tasks of docker lifecycle apps.

~~Use 'vcap' as the default user in the case that the 'root' user is disallowed.~~ Edit: opting to leave the user as is given docker images need to add the desired user anyway, which makes a non-root default less clear.

Use Case

Operators can secure their runtime infrastructure by preventing the possibility of cascading user privilege escalation to the root user of the underlying os namespace construct realizing the desired Process.

Dev Notes

Given the scope of docker lifecycle apps, it seems like the validation logic for this feature has to live in the policy layer. If implemented in the message layer, I think it may have the unintended effect of allowing the root user in the net new context of buildpack lifecycle apps when configured to match the current default of allowed for docker apps (assuming the flag exists at this level of coarseness). Edit: the user allow list functionality in the policy layer works to this end, but we still need to ensure docker apps that set the user in the Dockerfile are appropriately handled.

Confirmed in manual end to end testing that disallowing 'root' user on the cloud_controller_clock will cause an error in the nsync process for apps with the root user that were previously valid. Triggered a new sync by manually deleting the lrp on the diego cell. The error does not get surfaced to the api/cli. Instead, a user would have to go into the bosh logs and see the attempted sync on the scheduler job. Note that allowing the 'root' user on the clock job avoids this error, but still prevents new process/task updates from using the 'root' user, so it seems like a reasonable option for backwards compatibility.

Example error:

{
  "timestamp": "2025-07-29T20:36:27.040856189Z",
  "message": "error-updating-lrp-state",
  "log_level": "error",
  "source": "cc.diego.sync.processes",
  "data": {
    "error": "RunnerError",
    "error_message": "Runner error: 'root' user not permitted.",
    "error_backtrace": "/var/vcap/data/packages/cloud_controller_ng/5478c77a2ce8a293944ab99954deb8c0e045a5fd/cloud_controller_ng/lib/cloud_controller/diego/docker/lifecycle_protocol.rb:34:in `desired_lrp_builder'\n/var/vcap/data/packages/cloud_controller_ng/5478c77a2ce8a293944ab99954deb8c0e045a5fd/cloud_controller_ng/lib/cloud_controller/diego/app_recipe_builder.rb:54:in `app_lrp_arguments'\n/var/vcap/data/packages/cloud_controller_ng/5478c77a2ce8a293944ab99954deb8c0e045a5fd/cloud_controller_ng/lib/cloud_controller/diego/app_recipe_builder.rb:31:in `build_app_lrp'\n/var/vcap/data/packages/cloud_controller_ng/5478c77a2ce8a293944ab99954deb8c0e045a5fd/cloud_controller_ng/lib/cloud_controller/diego/bbs_apps_client.rb:13:in `block in desire_app'\n/var/vcap/data/packages/cloud_controller_ng/5478c77a2ce8a293944ab99954deb8c0e045a5fd/cloud_controller_ng/lib/cloud_controller/diego/bbs_apps_client.rb:107:in `handle_diego_errors'\n/var/vcap/data/packages/cloud_controller_ng/5478c77a2ce8a293944ab99954deb8c0e045a5fd/cloud_controller_ng/lib/cloud_controller/diego/bbs_apps_client.rb:12:in `desire_app'\n/var/vcap/data/packages/cloud_controller_ng/5478c77a2ce8a293944ab99954deb8c0e045a5fd/cloud_controller_ng/lib/cloud_controller/diego/processes_sync.rb:104:in `block (3 levels) in desire_lrps'\n/var/vcap/data/packages/cloud_controller_ng/5478c77a2ce8a293944ab99954deb8c0e045a5fd/cloud_controller_ng/lib/utils/workpool.rb:41:in `block (3 levels) in create_workpool_thread'\n/var/vcap/data/packages/cloud_controller_ng/5478c77a2ce8a293944ab99954deb8c0e045a5fd/cloud_controller_ng/lib/utils/workpool.rb:39:in `loop'\n/var/vcap/data/packages/cloud_controller_ng/5478c77a2ce8a293944ab99954deb8c0e045a5fd/cloud_controller_ng/lib/utils/workpool.rb:39:in `block (2 levels) in create_workpool_thread'\n/var/vcap/data/packages/cloud_controller_ng/5478c77a2ce8a293944ab99954deb8c0e045a5fd/cloud_controller_ng/lib/utils/workpool.rb:38:in `catch'\n/var/vcap/data/packages/cloud_controller_ng/5478c77a2ce8a293944ab99954deb8c0e045a5fd/cloud_controller_ng/lib/utils/workpool.rb:38:in `block in create_workpool_thread'\n..."
  },
  "thread_id": 59180,
  "fiber_id": 59200,
  "process_id": 7,
  "file": "/var/vcap/data/packages/cloud_controller_ng/5478c77a2ce8a293944ab99954deb8c0e045a5fd/cloud_controller_ng/lib/cloud_controller/diego/processes_sync.rb",
  "lineno": 80,
  "method": "block in process_workpool_exceptions"
}

Checklist

I have reviewed the contributing guide
I have viewed, signed, and submitted the Contributor License Agreement
I have made this pull request to the main branch
I have run all the unit tests using bundle exec rake
I have run CF Acceptance Tests

linux-foundation-easycla · 2025-07-14T21:35:21Z

The committers listed above are authorized under a signed CLA.

✅ login: acosta11 / name: Andrew Costa (d4ad30a, 188d860, 356d7b1)

acosta11 · 2025-07-14T21:45:04Z

~~Oh looks like I had pivotal-cf membership visibility set to private. Is there a way to rerun the CLA check to see if I need any other updates?~~

Edit: I still had to click sign again. But it seems to be working now.

app/models/runtime/process_model.rb

Gerg · 2025-07-14T22:01:54Z

@acosta11 you will also need to open a PR to add this new property to https://github.com/cloudfoundry/capi-release

acosta11 · 2025-07-14T22:54:12Z

@acosta11 you will also need to open a PR to add this new property to https://github.com/cloudfoundry/capi-release

Draft PR: cloudfoundry/capi-release#561

Still need to test this end to end with the bosh release config.

Gerg · 2025-07-15T17:58:28Z

app/models/runtime/droplet_model.rb

@@ -138,7 +138,7 @@ def docker_user
        end
      end

-      container_user.presence || AppModel::DEFAULT_DOCKER_CONTAINER_USER
+      container_user.presence || (Config.config.get(:allow_process_root_user) ? AppModel::DEFAULT_DOCKER_CONTAINER_USER : AppModel::DEFAULT_CONTAINER_USER)


This is semantically a bit confusing, since it means in some cases the docker_user default is not the DEFAULT_DOCKER_USER, but is instead the DEFAULT_CONTAINER_USER. Also we have to duplicate the same ternary in both droplet and process model.

Maybe instead (roughly):

Encapsulate the ternary in a method (on AppModel?) default_docker_user

DEFAULT_DOCKER_CONTAINER_USER goes away (since it's only used in one place)

Indeed, I could see trying to move this up higher in the hierarchy in general to avoid duplication. I'll give it a try and see how it looks.

app/models/runtime/constraints/process_user_policy.rb

Gerg · 2025-07-15T22:35:18Z

I haven't validated this myself yet, but I don't think this PR covers the case where the root user is coming from the Dockerfile. In that case, the user will be returned from staging as root and stored in the Droplet's docker_exec_metadata.

acosta11 · 2025-07-15T23:06:21Z

I haven't validated this myself yet, but I don't think this PR covers the case where the root user is coming from the Dockerfile. In that case, the user will be returned from staging as root and stored in the Droplet's docker_exec_metadata.

Indeed it's not exercised through that complete flow. I can look at adding a test to explicitly configure the user to root in the docker metadata. I think the setup that I copied was only setting image information so would have to update the call to the factory.

Gerg · 2025-07-16T18:38:30Z

Indeed it's not exercised through that complete flow. I can look at adding a test to explicitly configure the user to root in the docker metadata. I think the setup that I copied was only setting image information so would have to update the call to the factory.

There is some complexity there, since the user specified in the Dockerfile -> execution_metadata will be set on the Droplet as a result of staging. I don't think we want to fail the staging because:

The app dev could still override what user will be used via the Process/Task APIs
The check wouldn't catch older Droplets that were staged with the root user prior to enabling the new check

I think we need to check the user-that-will-be-used when actually starting to run the process/task (but, I'm open to other ideas). I'm fine if you want to break that into a separate PR, to keep things more manageable with this one.

acosta11 · 2025-07-22T21:19:57Z

Initial pass on the runtime check for 'root' user leverages the desired_lrp and task_action builder layer to avoid disrupting staging. Talking with Greg, I also considered going into all the relevant action implementations to start the error message a bit earlier in the call chain. That said, the builder level check ended up fairly succinct and easy to test, so I'm slightly favoring this implementation to avoid missing something in the action layer with a larger set of required changes.

Curious if it looks reasonable to y'all.

acosta11 · 2025-07-30T21:53:45Z

Wrapped up end to end testing, added some error handling from an issue detected as part of that process, and rebased on latest main with the task handler and ccng config updates. Should be good for review now.

See updated PR description for nsync edge case that I also validated in the e2e env. If anything, taking one last look at our handling of the USER UID:GID directive instead of just USER root. Technically it would be relevant to also prevent the use of user id or group id 0.

acosta11 · 2025-08-05T23:02:17Z

Rebased on main and renamed property to allow_docker_root_user instead of allow_process_root_user since this applies to both Processes and Tasks. Bump @Gerg @tcdowney if y'all had any other thoughts.

app/models/runtime/constraints/process_user_policy.rb

app/models/runtime/droplet_model.rb

lib/cloud_controller/diego/docker/lifecycle_protocol.rb

spec/unit/lib/cloud_controller/diego/docker/lifecycle_protocol_spec.rb

Gerg · 2025-08-08T22:11:44Z

spec/unit/lib/cloud_controller/diego/docker/lifecycle_protocol_spec.rb

+                TestConfig.override(allow_docker_root_user: false)
+              end
+
+              context 'and the process does not set a user' do


I think we also want to block the case where:

allowed_users includes root

process/task user is set to root

allow_docker_root_user is set to false

This case that may never happen in the real world, but I think that's the correct behavior.

spec/unit/lib/cloud_controller/diego/docker/lifecycle_protocol_spec.rb

Gerg · 2025-08-08T22:17:27Z

lib/cloud_controller/diego/desire_app_handler.rb

Do we need equivalent logic for surfacing the error when running tasks?

Good question, I'm not familiar with task execution error cases and only found this particular layer with some guidance from Tim. Is it a problem for a new execution of a task to fail after the cutover to blocking the root user? I assume that a previously running task isn't an availability concern the same way an already deployed running app would be.

lib/cloud_controller/diego/desire_app_handler.rb

spec/unit/models/runtime/droplet_model_spec.rb

spec/unit/models/runtime/task_model_spec.rb

* Ignore user in droplet docker execution metadata via dockerfile for staging because user may be subsequently overridden on Process or Task model * Enforce that 'root' or '0' user is not used at runtime by task_action and desired_lrp builders * Re-raise the error in desire app handler. If not re-raised, error is supressed with no clear user feedback.

acosta11 · 2025-08-12T17:42:58Z

Rebased on main again and ultimately removed all of the model layer changes in favor of the user allow list. Added more explicit tests at the diego builder layer for the cases where user is '0' or absent in the Dockerfile to round out the ways by which a root user may be specified.

Gerg requested review from tcdowney and Gerg July 14, 2025 21:36

acosta11 changed the title ~~Optionally disallow the use of root user for Processes and Tasks of docker lifecycle Apps~~ Optionally disallow the use of 'root' user for Processes and Tasks of docker lifecycle Apps Jul 14, 2025

acosta11 commented Jul 14, 2025

View reviewed changes

app/models/runtime/process_model.rb Show resolved Hide resolved

acosta11 force-pushed the feat/allow-root-user-config-flag branch 2 times, most recently from 7c07dd0 to c5fcbde Compare July 14, 2025 21:58

acosta11 mentioned this pull request Jul 14, 2025

Add property to allow the use of the 'root' user for docker Apps cloudfoundry/capi-release#561

Draft

3 tasks

Gerg reviewed Jul 15, 2025

View reviewed changes

app/models/runtime/constraints/process_user_policy.rb Outdated Show resolved Hide resolved

acosta11 force-pushed the feat/allow-root-user-config-flag branch from c5fcbde to ce9433b Compare July 22, 2025 21:12

Gerg self-requested a review July 24, 2025 21:10

acosta11 force-pushed the feat/allow-root-user-config-flag branch 2 times, most recently from d7fd389 to a3f85ca Compare July 30, 2025 20:20

acosta11 force-pushed the feat/allow-root-user-config-flag branch from a3f85ca to f31d46d Compare August 5, 2025 22:58

acosta11 force-pushed the feat/allow-root-user-config-flag branch 2 times, most recently from ef38937 to 824088f Compare August 7, 2025 19:35

tcdowney reviewed Aug 7, 2025

View reviewed changes

app/models/runtime/constraints/process_user_policy.rb Outdated Show resolved Hide resolved

tcdowney reviewed Aug 7, 2025

View reviewed changes

app/models/runtime/droplet_model.rb Outdated Show resolved Hide resolved