Commit 9d6ed99
authored
Introduce a policy-driven quota system w/ real-time enforcement (#322)
## Summary
Adds a first-class, policy-driven quota system to Milo: a v1alpha1 API
surface for registering quota-managed resource types, allocating
capacity to consumers, creating claims at admission time, and enforcing
limits. Policies (CEL-based) automate both grant and claim creation.
Real-time enforcement occurs in the API server admission chain, with
decisions computed by a single-writer AllowanceBucket controller.
Tracing and Prometheus metrics are included.
## API Surface (v1alpha1)
See **[Quota API types and documentation]** for a detailed overview of
the API. Here's the brief overview:
- **ResourceRegistration** — declares a quota-managed resource type
(units, name-gen, constraints; default-deny surface).
- **ResourceGrant** — allocates capacity (static allowances) to a
consumer; only `Active=True` grants count.
- **AllowanceBucket** — aggregated view per consumer+resourceType:
`limit`, `allocated`, `available`, counts, and `contributingGrantRefs`.
- **ResourceClaim** — request for capacity (static amounts) evaluated
per create; status conveys `Granted` vs `Denied` with reasons.
- **GrantCreationPolicy** — watches “trigger” resources; when CEL
conditions match, renders and creates/updates/deletes `ResourceGrant`s
(supports hierarchical parent context).
- **ClaimCreationPolicy** — matches incoming creates by GVK + CEL;
renders a `ResourceClaim` used by admission for request-time
enforcement.
## Architecture & Flow
See the **[quota system architecture overview]** for diagrams and
end-to-end flow.
1. **Register types**: Operators create **ResourceRegistration** for
each quota-managed type.
2. **Provision capacity**: Operators/policies create **ResourceGrant**s
for consumers; a controller validates and sets `Active=True`.
3. **Aggregate state**: **AllowanceBucketController**
materializes/updates per-consumer buckets and becomes the **single
writer** of allocation fields (SSA field ownership).
4. **Enforce at admission (create)**: The **Admission Plugin**
intercepts object **create**, finds `Ready=True`
**ClaimCreationPolicy**(ies), renders a **ResourceClaim** (labeled
`auto-created=true`), waits for the bucket’s decision, and
allows/denies.
5. **Lifecycle maintenance**: If **Granted**, the **Ownership
Controller** sets ownerReferences to the created object; if **Denied**
and auto-created, the **Cleanup Controller** deletes the claim on a
safety timer.
## Components
### Policy evaluation & automation
- **Policy Engine** — CELEngine (conditions + name expr), TemplateEngine
(CEL-template rendering for claims/grants), in-memory cache for O(1)
policy lookups during admission; metrics for informer/workqueue
performance. See **[Policy engine for template rendering and
evaluation]**.
- **Policy Controllers & Dynamic Informers** — Ready/validation
controllers for Claim/Grant policies and a dynamic informer framework to
watch trigger GVKs at runtime; includes a **Grant Creation Executor**
and **ParentContext Resolver** for hierarchical targeting. See **[Policy
controllers and dynamic informer framework]**.
### Core enforcement
- **AllowanceBucketController** — centralizes limit/consumption
aggregation and produces grant/deny decisions; owns allocation fields
via **SSA**.
- **ResourceClaimController** — aggregates allocation status into a
clear `Granted` condition (separate from bucket writer).
- **ResourceGrantController** — validates grants against active
registrations; sets `Active` condition.
- **ResourceRegistrationController** — marks registrations `Active` once
valid. See **[Core quota controllers]**.
### Admission
- **Claim-Creation Admission Plugin** — positioned after authz and
before storage; matches policies, renders claims, **blocks** until
decision or timeout; uses a shared informer/watch to avoid N×
connections; exposes Prometheus metrics for latency, waiters, results.
See **[Admission plugin for request-time enforcement]**.
### Lifecycle hygiene
- **DeniedAutoClaimCleanupController** — deletes denied, auto-created
claims after a grace period; preserves manual claims.
- **ResourceClaimOwnershipController** — sets ownerReferences for
granted claims; dual-path (fast path vs. safety-net grace). See **[Claim
lifecycle controllers]**.
### Integration, config, and observability
- **System Integration** — registers plugin + all controllers, ensures
initialization order and graceful shutdown. See **[Integrate quota into
apiserver and controller manager]**.
- **System Configurations** — controller/plugin flags, RBAC roles, and
controller-manager wiring; explicit metrics coverage. See **[Quota
system configurations]**.
- **Tracing** — API server tracing support (OpenTelemetry) for admission
path visibility. See **[Distributed tracing support for API server]**.
### Validation & docs
- **Shared Validation Package** — unified CEL env, resource type
validation via informer cache, and template validators; reused by
engine, controllers, and admission. See **[Validation package for quota
resources]**.
- **CEL rule placement** — CEL checks on resource refs moved from CRD
schemas to controller/admission validation for flexibility. See
**[Remove CRD-level CEL validations on resource refs]**.
- **Docs & E2E** — architecture + API docs and end-to-end tests covering
dynamic informers, policy validation, grant automation, bucket
decisions, and admission wait/deny paths. See **[End-to-end tests for
quota system]**.
## Out of Scope (Not implemented in this series)
- **Usage-based / metered quotas** (time-series, rolling windows,
rate-limits).
- **Overage handling & burst semantics** (grace, debt, preemption).
- **Autoscaling of buckets or sharded accounting stores**.
- **UI/UX surfaces** (dashboards, self-service management).
- **Billing integration** (prices, entitlements, invoicing).
- **Policy catalogs/marketplace** beyond the policy CRDs themselves.
### Links
- #283
- #295
- #305
- #307
- #306
- #308
- #309
- #310
- #311
- #312
- #313
- #315
- #316
- #330
- #333
### Improvements
These improvements were made to the original set of PRs based on
feedback received or issues encountered during testing.
- #340
- #342
- #330
- #346
- #351
- #353
### Outstanding Items
These items were identified during review of the PRs above that need to
be addressed in follow up PRs.
- [X] Convert controllers to be multi-cluster aware
- [X] Update admission plugin to support managing claims in project
control planes
[Quota API types and documentation]:
#283
[Quota system architecture overview]:
#295
[Validation package for quota resources]:
#305
[Policy engine for template rendering and evaluation]:
#307
[Policy controllers and dynamic informer framework]:
#306
[Claim lifecycle controllers]:
#308
[Core quota controllers]: #309
[Admission plugin for request-time enforcement]:
#310
[Integrate quota into apiserver and controller manager]:
#311
[Distributed tracing support for API server]:
#312
[Quota system configurations]:
#313
[End-to-end tests for quota system]:
#315
[Operational visibility section]:
#316
[Remove CRD-level CEL validations on resource refs]:
#330
[Remove k8s native resource quota system]:
https://github.com/datum-cloud/milo/pull/pull/333File tree
239 files changed
+29447
-242
lines changed- cmd/milo
- apiserver
- controller-manager
- config
- apiserver
- components
- apiserver-tracing
- prometheus-monitoring
- controller-manager
- base
- overlays/core-control-plane/rbac
- crd
- bases/quota
- overlays/core-control-plane
- overlays/test-infra
- patches
- services
- quota
- iam
- protected-resources
- roles
- telemetry
- metrics/control-plane
- telemetry
- recording-rules
- quota
- resource-metrics-collector
- docs
- api
- architecture
- internal
- apiserver/storage/project
- controllers/resourcemanager
- informer
- quota
- admission
- cel
- controllers
- core
- lifecycle
- policy
- engine
- templateutil
- validation
- pkg
- apis/quota
- v1alpha1
- multicluster-runtime/milo
- test
- admission/validating-admission-policy
- group
- iam/user-deletion-garbage-collection
- quota
- core-functionality
- assertions
- test-data
- enforcement-edge-cases
- test-data
- grant-creation-policy
- assertions
- test-data
- multi-cluster-enforcement
- test-data
- multi-resource-claims
- assertions
- test-data
- registration-validation
- assertions
- test-data
- resource-management
- organizationmembership-fieldselector
- project-creation
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
239 files changed
+29447
-242
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
25 | 28 | | |
26 | 29 | | |
27 | 30 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
94 | 94 | | |
95 | 95 | | |
96 | 96 | | |
97 | | - | |
98 | | - | |
99 | 97 | | |
100 | 98 | | |
101 | 99 | | |
102 | | - | |
103 | 100 | | |
104 | 101 | | |
105 | 102 | | |
106 | | - | |
107 | 103 | | |
108 | 104 | | |
109 | 105 | | |
110 | | - | |
111 | | - | |
112 | 106 | | |
113 | 107 | | |
114 | 108 | | |
115 | | - | |
116 | | - | |
117 | | - | |
118 | | - | |
119 | | - | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
120 | 116 | | |
121 | | - | |
122 | | - | |
123 | 117 | | |
124 | 118 | | |
125 | 119 | | |
126 | 120 | | |
127 | | - | |
128 | | - | |
129 | | - | |
130 | | - | |
131 | | - | |
132 | | - | |
133 | 121 | | |
134 | 122 | | |
135 | 123 | | |
136 | | - | |
137 | | - | |
138 | 124 | | |
139 | 125 | | |
140 | 126 | | |
141 | | - | |
142 | | - | |
143 | 127 | | |
144 | 128 | | |
145 | | - | |
146 | 129 | | |
147 | 130 | | |
148 | 131 | | |
149 | 132 | | |
150 | 133 | | |
151 | | - | |
152 | | - | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
153 | 137 | | |
154 | 138 | | |
155 | 139 | | |
| |||
308 | 292 | | |
309 | 293 | | |
310 | 294 | | |
| 295 | + | |
311 | 296 | | |
312 | 297 | | |
313 | 298 | | |
| |||
333 | 318 | | |
334 | 319 | | |
335 | 320 | | |
336 | | - | |
| 321 | + | |
337 | 322 | | |
338 | 323 | | |
339 | 324 | | |
| |||
375 | 360 | | |
376 | 361 | | |
377 | 362 | | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
| 403 | + | |
378 | 404 | | |
379 | 405 | | |
380 | 406 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
10 | 38 | | |
11 | 39 | | |
12 | | - | |
| 40 | + | |
| 41 | + | |
13 | 42 | | |
14 | | - | |
15 | | - | |
16 | | - | |
17 | | - | |
18 | | - | |
19 | | - | |
20 | | - | |
21 | | - | |
22 | | - | |
23 | | - | |
24 | | - | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
25 | 49 | | |
26 | 50 | | |
27 | | - | |
28 | | - | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
29 | 56 | | |
30 | | - | |
| 57 | + | |
| 58 | + | |
31 | 59 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
7 | 10 | | |
8 | 11 | | |
9 | 12 | | |
10 | 13 | | |
11 | 14 | | |
12 | 15 | | |
| 16 | + | |
13 | 17 | | |
14 | 18 | | |
15 | 19 | | |
| |||
18 | 22 | | |
19 | 23 | | |
20 | 24 | | |
| 25 | + | |
21 | 26 | | |
22 | 27 | | |
23 | 28 | | |
| |||
40 | 45 | | |
41 | 46 | | |
42 | 47 | | |
| 48 | + | |
43 | 49 | | |
44 | 50 | | |
| 51 | + | |
45 | 52 | | |
46 | 53 | | |
47 | 54 | | |
| |||
147 | 154 | | |
148 | 155 | | |
149 | 156 | | |
150 | | - | |
| 157 | + | |
151 | 158 | | |
152 | 159 | | |
153 | 160 | | |
| 161 | + | |
| 162 | + | |
154 | 163 | | |
155 | 164 | | |
156 | 165 | | |
| 166 | + | |
157 | 167 | | |
158 | 168 | | |
159 | 169 | | |
| |||
195 | 205 | | |
196 | 206 | | |
197 | 207 | | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
198 | 214 | | |
199 | 215 | | |
200 | 216 | | |
| |||
210 | 226 | | |
211 | 227 | | |
212 | 228 | | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
213 | 238 | | |
214 | 239 | | |
215 | 240 | | |
| |||
311 | 336 | | |
312 | 337 | | |
313 | 338 | | |
314 | | - | |
| 339 | + | |
315 | 340 | | |
316 | 341 | | |
317 | 342 | | |
| |||
334 | 359 | | |
335 | 360 | | |
336 | 361 | | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
| 403 | + | |
| 404 | + | |
| 405 | + | |
| 406 | + | |
| 407 | + | |
| 408 | + | |
| 409 | + | |
| 410 | + | |
| 411 | + | |
| 412 | + | |
0 commit comments