Skip to content
Mark Bainter edited this page Feb 7, 2026 · 1 revision

Note

This is a living document intended to reflect my vision for how this tool will function. It is not fixed, and will likely change radically as I think through the challenges I'm trying to solve.

Constraints

Simplicity of Configuration

I really do not want to create my own DSL for this. I want this to be a tool that is reasonably simple to adopt for a range of users, and I don't want adopting it to require a whole new skill. For example, where there are resources like "rulesets" that can be exported the tool should be able to just apply those exported rulesets by leveraging some form of standard templating language when applied for specifying dynamic information like users.

Otherwise we should leverage existing means of defining configuration that are easy to understand. Leveraging similar configuration approaches to safe-settings for example, or at least mirroring the fields from the API, so that it is easy to understand what is happening and map it to the functionality being manipulated.

Minimal state management

To be able to support handling events like a resource being renamed in the WebUI we will have to track some amount of state, but this should be reduced to tracking the relationship of the unique identifier for a resource to the definition of the resource in policy at most. If a user renames a group and then changes the code we should be able to recognize that and not make a mess of things. But if a user changes a setting unrelated to defined policy we should not care.

Configuration by Policy

For this I think the answer is probably to design an effective Cue Schema for GitHub resources, and then a Policy.cue that can be used to set expectations for the configuration, with appropriate deploy definitions for resources. This should then be usable both for checking a given configuration and for an apply run to bring a resource into compliance if required.

Note

OPA might also be an option here -- but OPA is somewhat difficult for some folks to reason about.

Governance by Policy / Tests

There can optionally be policy defined for outcomes at the enterprise and/or organization level, such as:

  • "All organizations must have X set"
  • "All repositories must grant read access to "
  • "CoPilot licenses must only be granted at Enterprise level"
  • "Only should be repository admins outside of user spaces"

The goal here is to provide a definition of the outcomes that the organization wants to be true and ensure that changes made do not violate those goals unintentionally. Perhaps with some standardized library capabilities tied to something like Cucumber that would allow for writing those in more expressive ways.

OPA would probably be well suited to this, and it's the way I would probably solve this if it was just for me, but I think it will be helpful for more people if I use a more approachable solution.

Concepts and Implementation

Core Infrastructure

One or more github repositories define a high level governing policy, not only for GitHub, but also for custodian. This defines how and when other sources of configuration and policy are authorized and applied, delegating authority explicitly.

Changes to these repositories run the custodian tool in check mode. After the first pass, it also pulls in all downstream sources of configuration to validate that the outcomes are expected, and then the core part is packaged up into policy and deployed as a versioned and signed artifact. Think of this like the integration step in a CI pipeline, ensuring that the changes to global policy will still have the desired outcomes downstream.

Top-Level Application

Jobs run either triggered by the change in the core repositories or on a schedule (or both). They pull the policy artifact and walk the enterprise, organizations, teams, repositories, etc as needed to apply the changes. This should be smart, capable of splitting steps across multiple iterations as it converges towards the desired end-state.

As the job moves down the tree it is aware of other sources of configuration and policy, and pulls them in as needed. Ideally there is a break point here somewhere around the organization level that is the lower boundary of this job. However, this may be tricky to implement in some cases, where changes to the leaf nodes (like teams and repositories) might be required before core changes can be made.

This convergence will be the most difficult aspect of this implementation.

Remaining application

From there, other authorized sources may be triggered by the change -- for example suborgs, and from there repositories depending on how this is implemented.

More commonly, other tree or leaf nodes (orgs, suborgs, repos) with localized configuration or policy will update on changes to their configuration or policy. Those also run with checks that pull the global policy to validate outcomes. These are similarly deployed as signed artifacts to be deployed.

Important

One key problem here will be in the relationship between multiple sources and handling authorized version updates, change timing, etc

Misc Thoughts

This might be best managed via a standardized coordinated workflow. That way for a simple org, the pipeline is simple. For complex orgs, it's complex.

Clone this wiki locally