Skip to content

Conversation

FloThinksPi
Copy link
Member

@FloThinksPi FloThinksPi commented Jul 16, 2025

Click Here for a better reviewable/readable version.

Related RFC-0040

@beyhan beyhan requested review from a team, rkoster, beyhan, Gerg, stephanme and cweibel and removed request for a team July 16, 2025 15:32
@beyhan beyhan added toc rfc CFF community RFC labels Jul 16, 2025
Copy link

@cweibel cweibel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the the forced migration to cflinuxfs4 was hard (and sounds like some folks have not made that jump yet), unless the desired stack is being maintained in some way I would be cautious in allowing that stack to be the default.

I do like the idea of being able to natively support alternate stacks (in our case creating a "hardened" cflinuxfs4 stack) but for every additional stack provided to customers we need to make sure smoke/acceptance tests still pass

provided one or a remote one by checking if the stack is an exact match
in the stacks table(it already does this to check validity of the
manifest/request) and if it's not an exact match try to evaluate it as
remote container image reference. If it does not match the container url schema produce a error message.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make our compliance folks happy, it would nice to generate a checksum or similar so that the stack image used is one which has already been scanned and allowed for by the operators (instead of blindly relying on a url which could have changes/updates/injections which would be hard to spot)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already part of the docker features. Cf has a way to reference images https://docs.cloudfoundry.org/devguide/deploy-apps/push-docker.html
The tag can be a digest already as of today - its a bit hidden in the docs as its called version there.
This RFC just uses whats already there so also this feature.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from a compliance point of view it would be good to be able to force people to only use digests (as a feature flag)

@FloThinksPi
Copy link
Member Author

I do like the idea of being able to natively support alternate stacks (in our case creating a "hardened" cflinuxfs4 stack) but for every additional stack provided to customers we need to make sure smoke/acceptance tests still pass

We explicitly do not have to assure that! We only have to do it for system stacks that are shipped as part of cf-deployment. Similar how we do it with buildpacks we only test the buildpacks we ship in cf-deployment. If a customer uses a custom buildpack(here this feature already exists for years) and thereby takes ownership of the buildpack he uses https://docs.cloudfoundry.org/buildpacks/custom.html then its his obligation to make sure it is compatible with the system stack anyway already.
With custom stacks, as written in the rfc, we require to use also a custom buildpack. Its not possible to use a custom stack with a system buildpack. Thus the app developer can take over full ownership of this stack - if he requires that for whatever reason - similar as he could partially with the custom buildpacks. It is an optional thing for a CF user to opt in to do that and in no circumstance is CF Community obliged to test a custom buildpack nor a custom stack.

Copy link
Member

@beyhan beyhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow the RFC draft creation process and change the name of the file to rfc-draft-provide-custom-stacks-functionality.md because our automation generates and assigns the RFC number when it is accepted and merged.

@beyhan beyhan moved this from Inbox to In Progress in CF Community Jul 22, 2025
@beyhan beyhan self-requested a review July 23, 2025 09:03
@Gerg
Copy link
Member

Gerg commented Jul 29, 2025

Observation: This RFC introduces a platform dependency on an external container registry, if you want to use this feature. While a registry dependency has existed for Docker/CNB lifecycles, this would be a first for regular buildpacks. This may be an adoption barrier, since these buildpack apps wouldn't require a registry until their stack is removed.

As a thought exercise, I could imagine an alternate implementation where the app stack is a tar file that is uploaded to the CAPI blobstore.

@FloThinksPi
Copy link
Member Author

Observation: This RFC introduces a platform dependency on an external container registry, if you want to use this feature. While a registry dependency has existed for Docker/CNB lifecycles, this would be a first for regular buildpacks. This may be an adoption barrier, since these buildpack apps wouldn't require a registry until their stack is removed.

As a thought exercise, I could imagine an alternate implementation where the app stack is a tar file that is uploaded to the CAPI blobstore.

True, i covered this in https://github.com/cloudfoundry/community/pull/1251/files#diff-b9b4cb8a848bbbf5ae034e92f8810d910e77b55cdb30882d8c111fe7f19db8bdR358-R364

Since fixing the availabillity issues for docker lifecycle is another big topic the idea was to propose another RFC for that specifically :)
As long as this is not fixed we might can add to this RFC that this feature flag(which is defaulted to off) is experimental due to this reason.

@beyhan
Copy link
Member

beyhan commented Jul 30, 2025

Observation: This RFC introduces a platform dependency on an external container registry, if you want to use this feature. While a registry dependency has existed for Docker/CNB lifecycles, this would be a first for regular buildpacks. This may be an adoption barrier, since these buildpack apps wouldn't require a registry until their stack is removed.
As a thought exercise, I could imagine an alternate implementation where the app stack is a tar file that is uploaded to the CAPI blobstore.

True, i covered this in https://github.com/cloudfoundry/community/pull/1251/files#diff-b9b4cb8a848bbbf5ae034e92f8810d910e77b55cdb30882d8c111fe7f19db8bdR358-R364

Since fixing the availabillity issues for docker lifecycle is another big topic the idea was to propose another RFC for that specifically :) As long as this is not fixed we might can add to this RFC that this feature flag(which is defaulted to off) is experimental due to this reason.

To my understanding @Gerg concern isn’t about whether registries are reliable, but about whether we should introduce that external dependency at all. Introducing a registry can impact adoption. One additional use case for this could be an air-gapped environment where teams will be forced to maintain a private registry in case they would like to use this feature.. That extra infrastructure brings operational overhead and complexity, whereas a solution that relies solely on the Cloud Foundry components already in place works out of the box, behaves predictably in both connected and disconnected environments.

@Gerg
Copy link
Member

Gerg commented Aug 5, 2025

Observation: This RFC introduces a platform dependency on an external container registry, if you want to use this

...

To my understanding @Gerg concern isn’t about whether registries are reliable, but about whether we should introduce that external dependency at all. Introducing a registry can impact adoption. One additional use case for this could be an air-gapped environment where teams will be forced to maintain a private registry in case they would like to use this feature.. That extra infrastructure brings operational overhead and complexity, whereas a solution that relies solely on the Cloud Foundry components already in place works out of the box, behaves predictably in both connected and disconnected environments.

As an example scenario:

I'm a Cloud Foundry operator, and I have a number of apps running on cflinuxfs3 in my environment. I want to move over to cflinuxfs4 only, for security reasons and to keep up-to-date with the latest CF releases. I only support traditional buildpack apps on my environment (no Docker, no CNB).

Currently, my only option is to force-update them to cflinuxfs4, using something like Stack Auditor (assuming I can't get the app devs to do it), which isn't guaranteed to work.

This RFC gives me another option, but only if I have access to a container registry in my environment. I can't use public registries (e.g. Docker Hub) for security/compliance reasons. So, I'd now have to deploy/operate my own private registry in order to use this feature (and only for this feature), which is a significant barrier to entry.


It could be that most/all CF operators would already have a container registry, either for CF Docker apps, or for other platforms (e.g. Kubernetes). Maybe the above scenario is too much of an edge case to worry about in 2025.

Alternatively, this could be evidence that CF should start including a container registry as part of the "batteries included" experience (similar to how we include the WebDAV blobstore). Though, in this particular case, I'm not sure it buys us much over just storing the stacks in the existing blobstore.

@Gerg
Copy link
Member

Gerg commented Aug 5, 2025

If the primary use case is for stack migration, I could imagine a UX where the stack is automatically persisted for the app, behind the scenes. Something like:

$ cf freeze-app-stack my-app

That command would take a snapshot of the stack currently used by the app and copy it to the CC blobstore. In the future, the app will use the "frozen" app stack, until the app is updated to use a regular stack.

This makes the stack migration use case more seamless, but it doesn't support use cases like app developers running custom stacks (which could be a good thing 🤔).

@rkoster
Copy link
Contributor

rkoster commented Aug 6, 2025

If the primary use case is for stack migration, I could imagine a UX where the stack is automatically persisted for the app, behind the scenes. Something like:

$ cf freeze-app-stack my-app

That command would take a snapshot of the stack currently used by the app and copy it to the CC blobstore. In the future, the app will use the "frozen" app stack, until the app is updated to use a regular stack.

This makes the stack migration use case more seamless, but it doesn't support use cases like app developers running custom stacks (which could be a good thing 🤔).

I like the idea from a UX point of view, but maybe it should go even further and just freeze the whole app, meaning droplet + stack. Just freezing the stack won't help much of system buildpacks have removed support for that stack.

Basically you are taking an existing app and making an OCI image out of the stack + droplet, but store it in the blobstore instead of an OCI registry.

@beyhan
Copy link
Member

beyhan commented Aug 6, 2025

If the primary use case is for stack migration, I could imagine a UX where the stack is automatically persisted for the app, behind the scenes. Something like:

$ cf freeze-app-stack my-app

That command would take a snapshot of the stack currently used by the app and copy it to the CC blobstore. In the future, the app will use the "frozen" app stack, until the app is updated to use a regular stack.
This makes the stack migration use case more seamless, but it doesn't support use cases like app developers running custom stacks (which could be a good thing 🤔).

I like the idea from a UX point of view, but maybe it should go even further and just freeze the whole app, meaning droplet + stack. Just freezing the stack won't help much of system buildpacks have removed support for that stack.

Basically you are taking an existing app and making an OCI image out of the stack + droplet, but store it in the blobstore instead of an OCI registry.

I have concerns about freezing the app at this stage, as it would prevent any updates until the migration to the next technology stack is complete. In my experience, teams typically need the flexibility to continue updating and maintaining their applications throughout the migration process, rather than having a hard freeze in place.

@stephanme
Copy link
Member

In a standard cf-deployment, you have the last 5 droplets as history. This history is kept if staging fails because of missing system buildpacks and/or because the stack was disabled (see #1220, disabled = apps continue to run but can't be staged anymore).

If a user has ignored the deprecation timeline and all announcements (happens only too often) but still "needs the flexibility to continue updating and maintaining their applications throughout the migration process", the user needs to do something with the app before the app can be staged again:

  • configure a custom stack
  • configure a custom buildpack if the buildpacks for the old stacks got already removed from the system buildpacks

The main use case that I see for custom stacks it to provide a rather quick solution for a user escalation. I don't think that this has to be effortless for the user nor does it have to provide the same nice experience as CF usually provides for buildpack apps. And I don't see this as a permanent solution for apps to use old stacks - just as a workaround.

That said, I think a registry based solution for custom stacks is good enough for the stack migration use case. Maintaining the custom stack in blobstore (e.g. via cf freeze-app-stack my-app or a dedicated custom stack upload) could be the next iteration and address more use cases than stack migration.

@beyhan
Copy link
Member

beyhan commented Aug 13, 2025

In a standard cf-deployment, you have the last 5 droplets as history. This history is kept if staging fails because of missing system buildpacks and/or because the stack was disabled (see #1220, disabled = apps continue to run but can't be staged anymore).

If a user has ignored the deprecation timeline and all announcements (happens only too often) but still "needs the flexibility to continue updating and maintaining their applications throughout the migration process", the user needs to do something with the app before the app can be staged again:

  • configure a custom stack
  • configure a custom buildpack if the buildpacks for the old stacks got already removed from the system buildpacks

The main use case that I see for custom stacks it to provide a rather quick solution for a user escalation. I don't think that this has to be effortless for the user nor does it have to provide the same nice experience as CF usually provides for buildpack apps. And I don't see this as a permanent solution for apps to use old stacks - just as a workaround.

That said, I think a registry based solution for custom stacks is good enough for the stack migration use case. Maintaining the custom stack in blobstore (e.g. via cf freeze-app-stack my-app or a dedicated custom stack upload) could be the next iteration and address more use cases than stack migration.

I think shrinking the scope only for the migration use case in this RFC and having this hidden behind a feature flag will leave enough options to evolve the feature in the future or don't use the current state.


##### CF API Changes

First of all the CF API SHOULD add a new feature flag similar to the `diego_docker` feature flag that allows to enable the use of lifecycle docker container images. This flag SHOULD be called `diego_custom_stacks` and be disabled by default in the CF API.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a disable_custom_stacks feature then - sounds like a default to be enabled. But maybe we can set the default to disable_custom_stacks = true then at least it aligns with custom buildpacks since the features are so similar anyway ?

Should i change the RFC ? @Gerg

- When we would prevent staging with a deprecated/locked/removed stack (RFC-40)
we still CAN offer the user a way forward in his full responsibility
to be unblocked and to own the whole applications stack end-to-end to
take their own decisions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docker images already provide this functionality. If they want to go this route shouldn't we just provide a way to build an image given a buildpack and a rootfs image, similar to what the pack cli provides for cloud native buildpacks?

My concerns are mostly with the fact that the responsibilities between platform operators and developers are blurred with this proposal. The clean boundaries that exist today are gone, yes they can take responsibility for there own stack, but kernel bumps and system buildpacks dropping stack support will still break them. So I like the idea more of being really explicit about all the responsibility they take on by forcing them to setup their own staging process using a provided tool (and output an OCI image).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The clean boundaries still exist i think and take the exact pattern that existed for custom buildpacks - once you chose custom buildpacks its your obligation to update it once in a while.
This RFC just pulls this pattern trough to the end which means also the stack and own the whole dependencies also with apps that do have a staging process. This RFC does not propose to provide a system buildpack that works with custom stacks - operators can decide to themself if they provide a system buildpack without stack requirement however the idea is that you can either have system buildpack and stack - custom buildpack + system stack - custom buildpack and stack.

Currently also nothing guarantees that a custom buildpack will also run with any system stack version too btw.

Indeed docker apps would offer this full freedom however a customer would need to setup a ci system - rebuild the internal staging process for classical buildpacks that is not straigt forward to reverse engineer - buy a container registry and push the image - consume it in CF. Technically its possible but it is verry challenging for a consumer.
And the docker lifecycle on top of that is broken in the sense that when the registry is down of the image you app can be downed as well. Not a process one can offer a CF User as alternative for a stack removal unfortunately i think - its far to complex and requires internal CF knowledge to give this to a CF User.

@FloThinksPi
Copy link
Member Author

From @Gerg

Alternatively, this could be evidence that CF should start including a container registry as part of the "batteries included" experience (similar to how we include the WebDAV blobstore). Though, in this particular case, I'm not sure it buys us much over just storing the stacks in the existing blobstore.

The point is i think we anyway need to include a OCI compliant registry in CF to make the lifecycle docker feature HA. Currently when a external registry is down and a cell evacuation runs e.g. due to a CF Update it will result in a downtime for the application that runs on lifecycle docker. The qualities of lifecycle docker and buildpack/cnb are different.

Also the blobstore in that sense is already some sort of container image resgistry(a very basic, custom one that is incompatible with OCI and lacks all the features). A stack is nothing else as a base layer also technically its a container image in a tar format. And a droplet is a container image in tar format without the base layers. Combined/rebased onto each other you have something executable. How we manage these traditional images in the blobstore is also not verry optimized as the blobstore lacks newer OCI features that optimize image usage like e.g. layering, pulling a diff onto a cell - Lazy pulling to decrease startup times drastically etc. but works well enough for us i think.

However the docker lifecycle is a bit broken for productive usecases and to fix that it may make sense to introduce a transparent container registry acting as proxy into the system anyway - one could also use the blobstore but then you loose all the optimizations a registry has build in.

Either way or another i think when we allow the system to pull base layers from a container registry the app behaves more like a docker app then a traditional one from a diego point of view. Wherever this base layer may come from ints not on the local filesystem like the stack is for current buildpack lifecycle apps. Its like said a docker app with a buildpack lifecycle on top basically. So i would rather instead of building a 3rd workflow (buildpacks/cnbs, docker apps, custom stacks) which all 3 work differently rely on just the 2 working mechanisms that we have(stack is on the local filesystem precaced + image is pulled by diego at execution time).

As we anyway need to fix docker apps imho we would also make this feature more suitable for consumers not having an own registry.

However i would also note here that to use this extra feature it is also i think acceptable to demand from a operator or CF User to have a registry available. This feature is nothing else then custom buildpacks where we dont provide a git server as part of cf too. We require the customer to have a git server themself or to take the buildpack from public github.com or rely on system buildpacks. Same with custom stacks they integrate well into the paradigma of custom buildpacks from a CF User perspective.

The difference only is if github/the page that hosts the buildpack is down you cannot restage as the buildpack cannot be pulled. If the registry of a docker image is down the app fails to restart and can get unavailable. Thus diego workflows that pull an image should have a CF local cache like its done for droplets(blobstore) and system stacks(on local diego filesystem) also for container images. But this is separate from this proposal i would say and just influences if this feature is flagged as stable or experimental with the known downsides of docker apps in diego.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rfc CFF community RFC toc
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

6 participants