Skip to content

Optionally disallow the use of 'root' user for Processes and Tasks of docker lifecycle Apps #4452

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

acosta11
Copy link
Member

@acosta11 acosta11 commented Jul 14, 2025

Proposed Change

Add a configuration option to allow or disallow the use of the 'root' and '0' user for processes and tasks of docker lifecycle apps.

Use 'vcap' as the default user in the case that the 'root' user is disallowed. Edit: opting to leave the user as is given docker images need to add the desired user anyway, which makes a non-root default less clear.

Use Case

Operators can secure their runtime infrastructure by preventing the possibility of cascading user privilege escalation to the root user of the underlying os namespace construct realizing the desired Process.

Dev Notes

Given the scope of docker lifecycle apps, it seems like the validation logic for this feature has to live in the policy layer. If implemented in the message layer, I think it may have the unintended effect of allowing the root user in the net new context of buildpack lifecycle apps when configured to match the current default of allowed for docker apps (assuming the flag exists at this level of coarseness). Edit: the user allow list functionality in the policy layer works to this end, but we still need to ensure docker apps that set the user in the Dockerfile are appropriately handled.

Confirmed in manual end to end testing that disallowing 'root' user on the cloud_controller_clock will cause an error in the nsync process for apps with the root user that were previously valid. Triggered a new sync by manually deleting the lrp on the diego cell. The error does not get surfaced to the api/cli. Instead, a user would have to go into the bosh logs and see the attempted sync on the scheduler job. Note that allowing the 'root' user on the clock job avoids this error, but still prevents new process/task updates from using the 'root' user, so it seems like a reasonable option for backwards compatibility.

Example error:

{
  "timestamp": "2025-07-29T20:36:27.040856189Z",
  "message": "error-updating-lrp-state",
  "log_level": "error",
  "source": "cc.diego.sync.processes",
  "data": {
    "error": "RunnerError",
    "error_message": "Runner error: 'root' user not permitted.",
    "error_backtrace": "/var/vcap/data/packages/cloud_controller_ng/5478c77a2ce8a293944ab99954deb8c0e045a5fd/cloud_controller_ng/lib/cloud_controller/diego/docker/lifecycle_protocol.rb:34:in `desired_lrp_builder'\n/var/vcap/data/packages/cloud_controller_ng/5478c77a2ce8a293944ab99954deb8c0e045a5fd/cloud_controller_ng/lib/cloud_controller/diego/app_recipe_builder.rb:54:in `app_lrp_arguments'\n/var/vcap/data/packages/cloud_controller_ng/5478c77a2ce8a293944ab99954deb8c0e045a5fd/cloud_controller_ng/lib/cloud_controller/diego/app_recipe_builder.rb:31:in `build_app_lrp'\n/var/vcap/data/packages/cloud_controller_ng/5478c77a2ce8a293944ab99954deb8c0e045a5fd/cloud_controller_ng/lib/cloud_controller/diego/bbs_apps_client.rb:13:in `block in desire_app'\n/var/vcap/data/packages/cloud_controller_ng/5478c77a2ce8a293944ab99954deb8c0e045a5fd/cloud_controller_ng/lib/cloud_controller/diego/bbs_apps_client.rb:107:in `handle_diego_errors'\n/var/vcap/data/packages/cloud_controller_ng/5478c77a2ce8a293944ab99954deb8c0e045a5fd/cloud_controller_ng/lib/cloud_controller/diego/bbs_apps_client.rb:12:in `desire_app'\n/var/vcap/data/packages/cloud_controller_ng/5478c77a2ce8a293944ab99954deb8c0e045a5fd/cloud_controller_ng/lib/cloud_controller/diego/processes_sync.rb:104:in `block (3 levels) in desire_lrps'\n/var/vcap/data/packages/cloud_controller_ng/5478c77a2ce8a293944ab99954deb8c0e045a5fd/cloud_controller_ng/lib/utils/workpool.rb:41:in `block (3 levels) in create_workpool_thread'\n/var/vcap/data/packages/cloud_controller_ng/5478c77a2ce8a293944ab99954deb8c0e045a5fd/cloud_controller_ng/lib/utils/workpool.rb:39:in `loop'\n/var/vcap/data/packages/cloud_controller_ng/5478c77a2ce8a293944ab99954deb8c0e045a5fd/cloud_controller_ng/lib/utils/workpool.rb:39:in `block (2 levels) in create_workpool_thread'\n/var/vcap/data/packages/cloud_controller_ng/5478c77a2ce8a293944ab99954deb8c0e045a5fd/cloud_controller_ng/lib/utils/workpool.rb:38:in `catch'\n/var/vcap/data/packages/cloud_controller_ng/5478c77a2ce8a293944ab99954deb8c0e045a5fd/cloud_controller_ng/lib/utils/workpool.rb:38:in `block in create_workpool_thread'\n..."
  },
  "thread_id": 59180,
  "fiber_id": 59200,
  "process_id": 7,
  "file": "/var/vcap/data/packages/cloud_controller_ng/5478c77a2ce8a293944ab99954deb8c0e045a5fd/cloud_controller_ng/lib/cloud_controller/diego/processes_sync.rb",
  "lineno": 80,
  "method": "block in process_workpool_exceptions"
}

Checklist

  • I have reviewed the contributing guide

  • I have viewed, signed, and submitted the Contributor License Agreement

  • I have made this pull request to the main branch

  • I have run all the unit tests using bundle exec rake

  • I have run CF Acceptance Tests

Copy link

linux-foundation-easycla bot commented Jul 14, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

@Gerg Gerg requested review from tcdowney and Gerg July 14, 2025 21:36
@acosta11 acosta11 changed the title Optionally disallow the use of root user for Processes and Tasks of docker lifecycle Apps Optionally disallow the use of 'root' user for Processes and Tasks of docker lifecycle Apps Jul 14, 2025
@acosta11
Copy link
Member Author

acosta11 commented Jul 14, 2025

Oh looks like I had pivotal-cf membership visibility set to private. Is there a way to rerun the CLA check to see if I need any other updates?

Edit: I still had to click sign again. But it seems to be working now.

@acosta11 acosta11 force-pushed the feat/allow-root-user-config-flag branch 2 times, most recently from 7c07dd0 to c5fcbde Compare July 14, 2025 21:58
@Gerg
Copy link
Member

Gerg commented Jul 14, 2025

@acosta11 you will also need to open a PR to add this new property to https://github.com/cloudfoundry/capi-release

@acosta11
Copy link
Member Author

@acosta11 you will also need to open a PR to add this new property to https://github.com/cloudfoundry/capi-release

Draft PR: cloudfoundry/capi-release#561

Still need to test this end to end with the bosh release config.

@@ -138,7 +138,7 @@ def docker_user
end
end

container_user.presence || AppModel::DEFAULT_DOCKER_CONTAINER_USER
container_user.presence || (Config.config.get(:allow_process_root_user) ? AppModel::DEFAULT_DOCKER_CONTAINER_USER : AppModel::DEFAULT_CONTAINER_USER)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is semantically a bit confusing, since it means in some cases the docker_user default is not the DEFAULT_DOCKER_USER, but is instead the DEFAULT_CONTAINER_USER. Also we have to duplicate the same ternary in both droplet and process model.

Maybe instead (roughly):

  1. Encapsulate the ternary in a method (on AppModel?) default_docker_user
  2. DEFAULT_DOCKER_CONTAINER_USER goes away (since it's only used in one place)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I could see trying to move this up higher in the hierarchy in general to avoid duplication. I'll give it a try and see how it looks.

@Gerg
Copy link
Member

Gerg commented Jul 15, 2025

I haven't validated this myself yet, but I don't think this PR covers the case where the root user is coming from the Dockerfile. In that case, the user will be returned from staging as root and stored in the Droplet's docker_exec_metadata.

@acosta11
Copy link
Member Author

I haven't validated this myself yet, but I don't think this PR covers the case where the root user is coming from the Dockerfile. In that case, the user will be returned from staging as root and stored in the Droplet's docker_exec_metadata.

Indeed it's not exercised through that complete flow. I can look at adding a test to explicitly configure the user to root in the docker metadata. I think the setup that I copied was only setting image information so would have to update the call to the factory.

@Gerg
Copy link
Member

Gerg commented Jul 16, 2025

Indeed it's not exercised through that complete flow. I can look at adding a test to explicitly configure the user to root in the docker metadata. I think the setup that I copied was only setting image information so would have to update the call to the factory.

There is some complexity there, since the user specified in the Dockerfile -> execution_metadata will be set on the Droplet as a result of staging. I don't think we want to fail the staging because:

  1. The app dev could still override what user will be used via the Process/Task APIs
  2. The check wouldn't catch older Droplets that were staged with the root user prior to enabling the new check

I think we need to check the user-that-will-be-used when actually starting to run the process/task (but, I'm open to other ideas). I'm fine if you want to break that into a separate PR, to keep things more manageable with this one.

@acosta11 acosta11 force-pushed the feat/allow-root-user-config-flag branch from c5fcbde to ce9433b Compare July 22, 2025 21:12
@acosta11
Copy link
Member Author

Initial pass on the runtime check for 'root' user leverages the desired_lrp and task_action builder layer to avoid disrupting staging. Talking with Greg, I also considered going into all the relevant action implementations to start the error message a bit earlier in the call chain. That said, the builder level check ended up fairly succinct and easy to test, so I'm slightly favoring this implementation to avoid missing something in the action layer with a larger set of required changes.

Curious if it looks reasonable to y'all.

@Gerg Gerg self-requested a review July 24, 2025 21:10
@acosta11 acosta11 force-pushed the feat/allow-root-user-config-flag branch 2 times, most recently from d7fd389 to a3f85ca Compare July 30, 2025 20:20
@acosta11
Copy link
Member Author

acosta11 commented Jul 30, 2025

Wrapped up end to end testing, added some error handling from an issue detected as part of that process, and rebased on latest main with the task handler and ccng config updates. Should be good for review now.

See updated PR description for nsync edge case that I also validated in the e2e env. If anything, taking one last look at our handling of the USER UID:GID directive instead of just USER root. Technically it would be relevant to also prevent the use of user id or group id 0.

@acosta11 acosta11 force-pushed the feat/allow-root-user-config-flag branch from a3f85ca to f31d46d Compare August 5, 2025 22:58
@acosta11
Copy link
Member Author

acosta11 commented Aug 5, 2025

Rebased on main and renamed property to allow_docker_root_user instead of allow_process_root_user since this applies to both Processes and Tasks. Bump @Gerg @tcdowney if y'all had any other thoughts.

@acosta11 acosta11 force-pushed the feat/allow-root-user-config-flag branch 2 times, most recently from ef38937 to 824088f Compare August 7, 2025 19:35
TestConfig.override(allow_docker_root_user: false)
end

context 'and the process does not set a user' do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we also want to block the case where:

  1. allowed_users includes root
  2. process/task user is set to root
  3. allow_docker_root_user is set to false

This case that may never happen in the real world, but I think that's the correct behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need equivalent logic for surfacing the error when running tasks?

Copy link
Member Author

@acosta11 acosta11 Aug 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, I'm not familiar with task execution error cases and only found this particular layer with some guidance from Tim. Is it a problem for a new execution of a task to fail after the cutover to blocking the root user? I assume that a previously running task isn't an availability concern the same way an already deployed running app would be.

* Ignore user in droplet docker execution metadata via dockerfile for staging
  because user may be subsequently overridden on Process or Task model
* Enforce that 'root' or '0' user is not used at runtime by task_action and
  desired_lrp builders
* Re-raise the error in desire app handler. If not re-raised, error
  is supressed with no clear user feedback.
@acosta11 acosta11 force-pushed the feat/allow-root-user-config-flag branch from 824088f to 188d860 Compare August 12, 2025 17:36
@acosta11
Copy link
Member Author

Rebased on main again and ultimately removed all of the model layer changes in favor of the user allow list. Added more explicit tests at the diego builder layer for the cases where user is '0' or absent in the Dockerfile to round out the ways by which a root user may be specified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants