Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 68 additions & 0 deletions toc/rfc/rfc-draft-cc-blobstore-storage-cli.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# Meta
[meta]: #meta
- Name: Cloud Controller Blobstore Type: storage-cli
- Start Date: 2025-07-18
- Author(s): @johha, @stephanme
- Status: Draft <!-- Acceptable values: Draft, Approved, On Hold, Superseded -->
- RFC Pull Request: [community #1253](https://github.com/cloudfoundry/community/pull/1253)


## Summary

Add a new blobstore type `storage-cli` to the Cloud controller that is based on the Bosh storage CLIs. Long-term, the `storage-cli` blobstore type shall replace the blobstore type `fog`.
The RFC also proposes to create a new "Storage CLI" area in the Foundational Infrastructure WG to allow cooperation of Bosh and CAPI teams and to consolidate the Bosh storage CLIs in one repository for easier code reuse.

## Problem

Cloud Controller uses the fog gem family to interface with the blobstores of different IaaS providers like Azure, AWS, GCP, and Alibaba Cloud.
These Ruby gems are largely unmaintained, introducing risks such as:
* Dependency on deprecated SDKs (e.g. Azure SDK for Ruby has reached EOL)
* Blocking Ruby version upgrades
* Potential for unpatched CVEs

## Proposal

Bosh faced similar issues, as it is also written in Ruby and interacts with blobstores. To address this, Bosh introduced standalone CLI tools which shell out from Ruby to handle all blobstore operations:
- https://github.com/cloudfoundry/bosh-azure-storage-cli
- https://github.com/cloudfoundry/bosh-s3cli
- https://github.com/cloudfoundry/bosh-gcscli
- https://github.com/cloudfoundry/bosh-ali-storage-cli

These storage CLIs are implemented in Go and use the respective provider golang SDKs that are well supported for the foreseeable future.

Cloud Controller shall implement a new blobstore type `storage-cli` that uses the mentioned storage CLIs for blobstore operations. Missing functionality needed by the Cloud Controller shall be added to the storage CLIs in a compatible way:
Copy link
Member

@Gerg Gerg Aug 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know of at least 2 more special connection parameters that we configure:

  • Google: Uniform Bucket Access
  • Azure: timeout for block writes

I see two strategies how to deal with it:

  1. add connection parameters as needed/requested by operators over time
  2. add a generic config param list similar to fog that gets somehow mapped to the connection configuration in the used golang SDK per IaaS

Option 1 is simpler and I would prefer it if there is a way to keep the fog implementation for some transition period. If we have to remove fog due to Ruby version update, then option 2 might be a way to go if we see at least an indication from operators that they can't live with the reduced configuration param set.

I would postpone a decision for option 2 until we have a storage-cli provider running for at least one IaaS in (near-)production.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We agreed during the TOC that this is not blocking and could be decided during the implementation phase.

- missing commands such as `copy`, `list`, `properties`, `ensure-bucket-exists`
- missing configuration parameters such as GCP Uniform Bucket Access and timeout parameters

It shall be possible to switch from blobstore type `fog` to type `storage-cli` in a productive Cloud Foundry installation. Once blobstore type `storage-cli` supports all four mentioned IaaS providers, the blobstore type `fog` can be removed from Cloud Controller.

### Storage CLI

A new area "Storage CLI" shall be added to the Foundational Infrastructure WG in order to allow cooperation of Bosh and CAPI teams:

- create a new "Storage CLI" area
- add existing approvers of areas "VM deployment lifecycle (BOSH)" (FI) and "CAPI" (ARI) as initial approvers to this new area
- move the existing 4 bosh storage CLI repos from area "VM deployment lifecycle (BOSH)" into the new area
- create a new repository `storage-cli` in this area with the goal to consolidate all existing bosh storage CLIs here
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be storage-clis since it will host an arbitrary set of them, or is this proposal that there is one single storage-cli which is the interface to all supported backing blobstores?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding was that there will be one storage-cli which supports different providers. Which provider is used could be determined via the config file which needs to be provided as a cli argument.
E.g. for azure:

{
  "provider":                    "azure" 
  "account_name":         "<string> (required)",
  "account_key":            "<string> (required)",
  "container_name":       "<string> (required)",
  "environment":             "<string> (optional, default: 'AzureCloud')",
}

So the provider field would be required and all other fields would be provider specific. S3 for example has completely different parameters (example).
The format of the config can be discussed after the RFC was accepted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This idea makes sense.

I think it would be helpful for this to be a bit more structured so that provider determines the shape of a sub-section (maybe named credential), leaving the top level with keys that are consistent across providers (e.g. "bucket name")

For folks perusing here are the various configs"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like the goal is a wholesale replacement of the existing four CLIs with a single unified CLI.

I am a little worried about this becoming a complete re-write, and needing to integrate this into the various bosh codebases in one big cut-over.

If we take the approach of consolidating into a single binary with an updated interface it makes #1253 (comment) less relevant.

The git history of the individual repose should be preserved however the consolidation takes place though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a lot of value in having the clis be distinct and simple vs trying to cram them all together with lots of different flags and configs. I do support consolidating the repos for sure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rkoster brought up the idea to consolidate the CLIs in one repo. Arguments were:

  • code reuse: cmd line parsing, config handling (the blobstore access itself probably won't allow for any reuse)
  • easy refactoring and consistency across all CLIs

Having old CLIs and new CLI side by side for some time decouples development for good and for bad. From CC point of view we would start the implementation with Azure until it runs in production. This ensures that we get the interface and needed blobstore operations right (e.g. the POC allows resource matching from functional point of view but likely it won't perform so we may need a CLI command that supports resource matching for a list of blobs). Once Azure CLI stabilised, we would go for the remaining IaaS. At least this was the plan.

I don't see a blocker for adding the new blobstore cmds in the existing bosh-azure-storage-cli in a compatible way that doesn't break bosh. I understood that existing CI would check this compatibility which is good.

I'm open for differnet ways forward whatever fits best:

  1. new repo, copy bosh-azure-storage-cli, extend for CC, same for other IaaS, bosh switches to new storage-cli when ready

    • the approach currently proposed in the RFC
    • development of CC commands is decoupled, no risk on bosh
    • higher effort for later migration from old to new CLI in bosh, risk that incompatibilities sneak in unnoticed (could be mitigated by additonional CI validation in bosh)
  2. new repo, consolidate first, then extend for CC

    • consolidate CLIs first into a new storage-cli[s] repo (e.g. using the approach proposed below) without adding new commands needed by CC, bosh switches to the new, consolidated CLI asap
    • development of CC commands gets initially delayed a bit but might benefit from the now possible code reuse and easier refactoring
    • no late surprises
  3. no new repo, add new cmds in existing CLIs

For options 2 and 3 we should move the existing CLIs into the new area so that CAPI approvers can work on them.

Copy link
Member

@aramprice aramprice Jul 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My vote would be for #2 above. I do not have strong feelings about whether the CLIs are unified (though I think I lean towards not unifying them) and I do feel strongly that they should be in the same repository / pipeline.

The option to consolidate the CLIs remains open as a possible incremental step, or even a build-time option ¯_(ツ)_/¯ ).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed to weaker wording for the CLI consolidation, i.e. leaving the option open in the RFC as implementation details. We should then move the 4 existing storage CLIs into the new area and go from there.

A pragmatic way forward could be to implement the missing operations in bosh-azure-storage-cli, validate the approach with adapted CC (CATS, perf tests), then consolidate into one repo and implement for the remaining IaaS. But we can sort this out in a WG meeting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm all for moving the CLIs to an area that's accessible to the CAPI folks.

I would like the CLIs to be consolidated into a single repo as part of this move.

- setup CI, consolidate CLIs into the new `storage-cli` repo, implement missing commands and configuration parameters for each IaaS

### Bosh

- eventually switch from (old) bosh storage CLIs to consolidated `storage-cli`
- finally archive the old bosh storage CLI repos

### Cloud Controller

- add a new blobstore type `storage-cli` that shells out to `storage-cli` for blobstore operations
- validate functionality with CATS
- benchmark blobstore operation performance and compare with blobstore type `fog`, enhance performance tests where necessary
- eventually deprecate and remove the blobstore type `fog` once all IaaS providers are covered

### cf-deployment

- add experimental ops files per IaaS provider for using the `storage-cli` blobstore type
- eventually promote those ops files and replace the existing fog-based blobstore ops files

## Additional Information

- [cloud_controller_ng #4443](https://github.com/cloudfoundry/cloud_controller_ng/pull/4443) - ADR: Use Bosh Storage CLIs for Blobstore Operations