Skip to content

Commit 9d6ed99

Browse files
authored
Introduce a policy-driven quota system w/ real-time enforcement (#322)
## Summary Adds a first-class, policy-driven quota system to Milo: a v1alpha1 API surface for registering quota-managed resource types, allocating capacity to consumers, creating claims at admission time, and enforcing limits. Policies (CEL-based) automate both grant and claim creation. Real-time enforcement occurs in the API server admission chain, with decisions computed by a single-writer AllowanceBucket controller. Tracing and Prometheus metrics are included. ## API Surface (v1alpha1) See **[Quota API types and documentation]** for a detailed overview of the API. Here's the brief overview: - **ResourceRegistration** — declares a quota-managed resource type (units, name-gen, constraints; default-deny surface). - **ResourceGrant** — allocates capacity (static allowances) to a consumer; only `Active=True` grants count. - **AllowanceBucket** — aggregated view per consumer+resourceType: `limit`, `allocated`, `available`, counts, and `contributingGrantRefs`. - **ResourceClaim** — request for capacity (static amounts) evaluated per create; status conveys `Granted` vs `Denied` with reasons. - **GrantCreationPolicy** — watches “trigger” resources; when CEL conditions match, renders and creates/updates/deletes `ResourceGrant`s (supports hierarchical parent context). - **ClaimCreationPolicy** — matches incoming creates by GVK + CEL; renders a `ResourceClaim` used by admission for request-time enforcement. ## Architecture & Flow See the **[quota system architecture overview]** for diagrams and end-to-end flow. 1. **Register types**: Operators create **ResourceRegistration** for each quota-managed type. 2. **Provision capacity**: Operators/policies create **ResourceGrant**s for consumers; a controller validates and sets `Active=True`. 3. **Aggregate state**: **AllowanceBucketController** materializes/updates per-consumer buckets and becomes the **single writer** of allocation fields (SSA field ownership). 4. **Enforce at admission (create)**: The **Admission Plugin** intercepts object **create**, finds `Ready=True` **ClaimCreationPolicy**(ies), renders a **ResourceClaim** (labeled `auto-created=true`), waits for the bucket’s decision, and allows/denies. 5. **Lifecycle maintenance**: If **Granted**, the **Ownership Controller** sets ownerReferences to the created object; if **Denied** and auto-created, the **Cleanup Controller** deletes the claim on a safety timer. ## Components ### Policy evaluation & automation - **Policy Engine** — CELEngine (conditions + name expr), TemplateEngine (CEL-template rendering for claims/grants), in-memory cache for O(1) policy lookups during admission; metrics for informer/workqueue performance. See **[Policy engine for template rendering and evaluation]**. - **Policy Controllers & Dynamic Informers** — Ready/validation controllers for Claim/Grant policies and a dynamic informer framework to watch trigger GVKs at runtime; includes a **Grant Creation Executor** and **ParentContext Resolver** for hierarchical targeting. See **[Policy controllers and dynamic informer framework]**. ### Core enforcement - **AllowanceBucketController** — centralizes limit/consumption aggregation and produces grant/deny decisions; owns allocation fields via **SSA**. - **ResourceClaimController** — aggregates allocation status into a clear `Granted` condition (separate from bucket writer). - **ResourceGrantController** — validates grants against active registrations; sets `Active` condition. - **ResourceRegistrationController** — marks registrations `Active` once valid. See **[Core quota controllers]**. ### Admission - **Claim-Creation Admission Plugin** — positioned after authz and before storage; matches policies, renders claims, **blocks** until decision or timeout; uses a shared informer/watch to avoid N× connections; exposes Prometheus metrics for latency, waiters, results. See **[Admission plugin for request-time enforcement]**. ### Lifecycle hygiene - **DeniedAutoClaimCleanupController** — deletes denied, auto-created claims after a grace period; preserves manual claims. - **ResourceClaimOwnershipController** — sets ownerReferences for granted claims; dual-path (fast path vs. safety-net grace). See **[Claim lifecycle controllers]**. ### Integration, config, and observability - **System Integration** — registers plugin + all controllers, ensures initialization order and graceful shutdown. See **[Integrate quota into apiserver and controller manager]**. - **System Configurations** — controller/plugin flags, RBAC roles, and controller-manager wiring; explicit metrics coverage. See **[Quota system configurations]**. - **Tracing** — API server tracing support (OpenTelemetry) for admission path visibility. See **[Distributed tracing support for API server]**. ### Validation & docs - **Shared Validation Package** — unified CEL env, resource type validation via informer cache, and template validators; reused by engine, controllers, and admission. See **[Validation package for quota resources]**. - **CEL rule placement** — CEL checks on resource refs moved from CRD schemas to controller/admission validation for flexibility. See **[Remove CRD-level CEL validations on resource refs]**. - **Docs & E2E** — architecture + API docs and end-to-end tests covering dynamic informers, policy validation, grant automation, bucket decisions, and admission wait/deny paths. See **[End-to-end tests for quota system]**. ## Out of Scope (Not implemented in this series) - **Usage-based / metered quotas** (time-series, rolling windows, rate-limits). - **Overage handling & burst semantics** (grace, debt, preemption). - **Autoscaling of buckets or sharded accounting stores**. - **UI/UX surfaces** (dashboards, self-service management). - **Billing integration** (prices, entitlements, invoicing). - **Policy catalogs/marketplace** beyond the policy CRDs themselves. ### Links - #283 - #295 - #305 - #307 - #306 - #308 - #309 - #310 - #311 - #312 - #313 - #315 - #316 - #330 - #333 ### Improvements These improvements were made to the original set of PRs based on feedback received or issues encountered during testing. - #340 - #342 - #330 - #346 - #351 - #353 ### Outstanding Items These items were identified during review of the PRs above that need to be addressed in follow up PRs. - [X] Convert controllers to be multi-cluster aware - [X] Update admission plugin to support managing claims in project control planes [Quota API types and documentation]: #283 [Quota system architecture overview]: #295 [Validation package for quota resources]: #305 [Policy engine for template rendering and evaluation]: #307 [Policy controllers and dynamic informer framework]: #306 [Claim lifecycle controllers]: #308 [Core quota controllers]: #309 [Admission plugin for request-time enforcement]: #310 [Integrate quota into apiserver and controller manager]: #311 [Distributed tracing support for API server]: #312 [Quota system configurations]: #313 [End-to-end tests for quota system]: #315 [Operational visibility section]: #316 [Remove CRD-level CEL validations on resource refs]: #330 [Remove k8s native resource quota system]: https://github.com/datum-cloud/milo/pull/pull/333
2 parents b47a0e2 + 6eb17ae commit 9d6ed99

File tree

239 files changed

+29447
-242
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

239 files changed

+29447
-242
lines changed

.dockerignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,9 @@ dev-temp/
2222

2323
# Configuration and manifests
2424
config/
25+
# Exception: CRD bootstrap package needs to be included for embedded CRDs
26+
!config/crd/bootstrap.go
27+
!config/crd/bases/
2528
Taskfile.yaml
2629
Taskfile.yml
2730
docker-compose.yaml

Taskfile.yaml

Lines changed: 53 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -94,62 +94,46 @@ tasks:
9494
set -e
9595
echo "🚀 Deploying complete Milo control plane to test-infra cluster..."
9696
97-
# Deploy all components including etcd, API server, controller manager, webhooks, RBAC, and networking
98-
# Everything is deployed in the milo-system namespace for simplified testing
9997
echo "📋 Deploying Milo control plane with etcd storage backend..."
10098
task test-infra:kubectl -- apply -k config/overlays/test-infra/
10199
102-
# Wait for etcd Helm release to be complete (ensures deployment is fully reconciled)
103100
echo "⏳ Waiting for etcd Helm release to be ready..."
104101
task test-infra:kubectl -- wait --for=condition=Ready helmrelease/etcd -n milo-system --timeout=300s
105102
106-
# Wait for etcd pod readiness (API server needs etcd to store data)
107103
echo "⏳ Waiting for etcd pod to be ready..."
108104
task test-infra:kubectl -- wait --for=condition=Ready pod -l app.kubernetes.io/component=etcd -n milo-system --timeout=180s
109105
110-
# Wait for API server to be ready (required for CRD installation)
111-
# API server must be running before we can install custom resources
112106
echo "⏳ Waiting for API server to be ready..."
113107
task test-infra:kubectl -- wait --for=condition=Ready pod -l app.kubernetes.io/name=milo-apiserver -n milo-system --timeout=180s
114108
115-
# Install CRDs into Milo API server (defines custom resource schemas)
116-
# Controller manager needs these CRDs to watch and reconcile custom resources
117-
echo "📋 Installing core control plane CRDs into Milo API server..."
118-
task kubectl -- apply -k config/crd/overlays/core-control-plane/
119-
task kubectl -- wait --for=condition=Established customresourcedefinitions --all
109+
echo "✅ Verifying CRDs were bootstrapped..."
110+
CRD_COUNT=$(task kubectl -- get crd --no-headers 2>/dev/null | grep miloapis | wc -l || echo "0")
111+
if [ "$CRD_COUNT" -eq "0" ]; then
112+
echo "⚠️ Warning: No Milo CRDs found - checking API server logs..."
113+
else
114+
echo "✓ Found $CRD_COUNT Milo CRDs bootstrapped by API server"
115+
fi
120116
121-
# Step 6b: Install infrastructure control plane CRDs into Milo API server
122-
# This includes ProjectControlPlanes needed by the controller manager
123117
echo "📋 Installing infrastructure control plane CRDs into the infrastructure cluster..."
124118
task test-infra:kubectl -- apply -k config/crd/overlays/infra-control-plane/
125119
task test-infra:kubectl -- wait --for=condition=Established customresourcedefinitions projectcontrolplanes.infrastructure.miloapis.com
126120
127-
# Step 7: Verify CRDs are properly installed (sanity check)
128-
echo "✅ Verifying CRDs are installed..."
129-
CRD_COUNT=$(task kubectl -- get crd --no-headers | grep miloapis | wc -l || echo "0")
130-
echo "Installed $CRD_COUNT Milo CRDs in API server"
131-
132-
# Step 7b: Create test users in Milo API server
133121
echo "👤 Creating test users in Milo API server..."
134122
task kubectl -- apply -f config/overlays/test-infra/resources/test-users.yaml
135123
136-
# Step 8: Wait for controller manager (now that CRDs exist for it to reconcile)
137-
# Controller manager can only start successfully after CRDs are available
138124
echo "⏳ Waiting for controller manager to be ready..."
139125
task test-infra:kubectl -- wait --for=condition=Ready pod -l app.kubernetes.io/name=milo-controller-manager -n milo-system --timeout=120s
140126
141-
# Step 8b: Install webhook configurations to Milo API server
142-
# Webhooks validate and mutate resources submitted to the Milo API server
143127
echo "🔒 Installing webhook configurations to Milo API server..."
144128
task kubectl -- apply -k config/webhook/
145-
# Webhook configurations don't have conditions to wait for - they're ready immediately after creation
146129
echo "✅ Webhook configurations installed successfully"
147130
148131
WEBHOOK_COUNT=$(task kubectl -- get mutatingwebhookconfigurations,validatingwebhookconfigurations --no-headers | grep resourcemanager.miloapis.com | wc -l || echo "0")
149132
echo "Installed $WEBHOOK_COUNT webhook configurations in Milo API server"
150133
151-
# Update kubeconfig for easy developer access
152-
echo "📝 Updating kubeconfig for developer access..."
134+
echo "⚙️ Installing service configurations to Milo API server..."
135+
task kubectl -- apply -k config/services/
136+
echo "✅ Service configurations installed successfully"
153137
154138
echo ""
155139
echo "✅ Milo API server and storage deployed successfully!"
@@ -308,6 +292,7 @@ tasks:
308292
cmds:
309293
- task: generate:code
310294
- task: generate:docs
295+
- task: generate:test-docs
311296

312297
generate:code:
313298
desc: Generate code including deepcopy, objects, CRDs, and potentially protobuf marshallers
@@ -333,7 +318,7 @@ tasks:
333318
- "\"{{.TOOL_DIR}}/controller-gen\" webhook paths=\"./internal/webhooks/...\" output:dir=\"./config/webhook\""
334319
# Generate RBAC rules for the controllers.
335320
- echo "Generating RBAC rules for the controllers..."
336-
- "\"{{.TOOL_DIR}}/controller-gen\" rbac:roleName=milo-controller-manager paths=\"./internal/controllers/...\" output:dir=\"./config/controller-manager/overlays/core-control-plane/rbac\""
321+
- "\"{{.TOOL_DIR}}/controller-gen\" rbac:roleName=milo-controller-manager paths=\"./internal/controllers/...\" paths=\"./internal/quota/controllers/...\" output:dir=\"./config/controller-manager/overlays/core-control-plane/rbac\""
337322
silent: true
338323

339324
generate:openapi:identity:
@@ -375,6 +360,47 @@ tasks:
375360
done;
376361
silent: true
377362

363+
generate:test-docs:
364+
desc: Generate test documentation using Chainsaw build docs
365+
deps:
366+
- task: install-go-tool
367+
vars:
368+
NAME: chainsaw
369+
PACKAGE: github.com/kyverno/chainsaw
370+
VERSION: "{{.CHAINSAW_VERSION}}"
371+
cmds:
372+
- |
373+
set -e
374+
echo "Generating test documentation..."
375+
376+
# Find all test directories containing chainsaw-test.yaml files
377+
TEST_DIRS=$(find test -name "chainsaw-test.yaml" -exec dirname {} \; | sort -u)
378+
379+
if [ -z "$TEST_DIRS" ]; then
380+
echo "No chainsaw test files found"
381+
exit 0
382+
fi
383+
384+
# Store the absolute path to chainsaw
385+
CHAINSAW_BIN="{{.TOOL_DIR}}/chainsaw"
386+
387+
# Generate documentation for each test directory
388+
for test_dir in $TEST_DIRS; do
389+
echo "Generating README.md for $test_dir..."
390+
391+
# Generate README.md in the test directory
392+
(cd "$test_dir" && "$CHAINSAW_BIN" build docs --test-dir .)
393+
394+
if [ -f "$test_dir/README.md" ]; then
395+
echo " ✓ $test_dir/README.md"
396+
else
397+
echo " ⚠️ No README.md generated for $test_dir"
398+
fi
399+
done
400+
401+
echo "✅ Test documentation generated in test directories"
402+
silent: true
403+
378404
install-go-tool:
379405
desc: Install a Go tool to {{.TOOL_DIR}}/{{.NAME}} (symlinked from {{.TOOL_DIR}}/{{.NAME}}-{{.VERSION}})
380406
silent: true

cmd/milo/apiserver/admission.go

Lines changed: 43 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -7,25 +7,53 @@ import (
77
mutatingwebhook "k8s.io/apiserver/pkg/admission/plugin/webhook/mutating"
88
validatingwebhook "k8s.io/apiserver/pkg/admission/plugin/webhook/validating"
99
"k8s.io/kubernetes/pkg/kubeapiserver/options"
10+
11+
admissionquota "go.miloapis.com/milo/internal/quota/admission"
12+
)
13+
14+
// GetMiloOrderedPlugins returns the complete ordered list of admission plugins,
15+
// including both Kubernetes built-in plugins and our custom Milo plugins.
16+
// Custom plugins are inserted before the webhook plugins as recommended.
17+
func GetMiloOrderedPlugins() []string {
18+
// Start with Kubernetes' built-in ordered plugin list
19+
plugins := make([]string, 0, len(options.AllOrderedPlugins)+1)
20+
21+
// Find where to insert our custom plugins
22+
for _, plugin := range options.AllOrderedPlugins {
23+
if plugin == "ValidatingAdmissionPolicy" {
24+
// Insert our custom validating plugins here, before ValidatingAdmissionPolicy
25+
// This ensures they run before the generic webhook validators
26+
plugins = append(plugins, admissionquota.PluginName)
27+
}
28+
plugins = append(plugins, plugin)
29+
}
30+
31+
return plugins
32+
}
33+
34+
// MiloAdmissionPlugins lists all Milo-specific admission plugins
35+
var MiloAdmissionPlugins = sets.New[string](
36+
admissionquota.PluginName, // ResourceQuotaEnforcement
37+
// Add future Milo admission plugins here
1038
)
1139

12-
// DefaultOffAdmissionPlugins get admission plugins off by default for kube-apiserver.
40+
// DefaultOffAdmissionPlugins returns admission plugins that should be disabled by default.
41+
// This keeps only essential plugins and our custom Milo plugins enabled.
1342
func DefaultOffAdmissionPlugins() sets.Set[string] {
14-
defaultOnPlugins := sets.New[string](
15-
lifecycle.PluginName, // NamespaceLifecycle
16-
// defaulttolerationseconds.PluginName, // DefaultTolerationSeconds
17-
mutatingwebhook.PluginName, // MutatingAdmissionWebhook
18-
validatingwebhook.PluginName, // ValidatingAdmissionWebhook
19-
// resourcequota.PluginName, // ResourceQuota
20-
// certapproval.PluginName, // CertificateApproval
21-
// certsigning.PluginName, // CertificateSigning
22-
// ctbattest.PluginName, // ClusterTrustBundleAttest
23-
// certsubjectrestriction.PluginName, // CertificateSubjectRestriction
24-
validatingadmissionpolicy.PluginName, // ValidatingAdmissionPolicy, only active when feature gate ValidatingAdmissionPolicy is enabled
43+
// Plugins we want ON by default
44+
defaultOnPlugins := sets.New(
45+
lifecycle.PluginName, // NamespaceLifecycle
46+
mutatingwebhook.PluginName, // MutatingAdmissionWebhook
47+
validatingwebhook.PluginName, // ValidatingAdmissionWebhook
48+
validatingadmissionpolicy.PluginName, // ValidatingAdmissionPolicy
2549
)
2650

27-
universe := sets.New(options.AllOrderedPlugins...)
28-
universe.Insert(lifecycle.PluginName)
51+
// Add all Milo plugins to the enabled set
52+
defaultOnPlugins = defaultOnPlugins.Union(MiloAdmissionPlugins)
53+
54+
// Get the complete plugin list including our custom plugins
55+
allPlugins := sets.New(GetMiloOrderedPlugins()...)
2956

30-
return universe.Difference(defaultOnPlugins)
57+
// Return plugins to turn OFF (all plugins minus the ones we want ON)
58+
return allPlugins.Difference(defaultOnPlugins)
3159
}

cmd/milo/apiserver/config.go

Lines changed: 78 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,16 @@ import (
44
"net/http"
55
"time"
66

7+
"go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp"
8+
semconv "go.opentelemetry.io/otel/semconv/v1.17.0"
9+
"go.opentelemetry.io/otel/trace"
710
corev1 "k8s.io/api/core/v1"
811
apiextensionsapiserver "k8s.io/apiextensions-apiserver/pkg/apiserver"
912
"k8s.io/apimachinery/pkg/runtime"
1013
"k8s.io/apiserver/pkg/admission"
1114
"k8s.io/apiserver/pkg/endpoints/filterlatency"
1215
genericapifilters "k8s.io/apiserver/pkg/endpoints/filters"
16+
"k8s.io/apiserver/pkg/endpoints/request"
1317
genericfeatures "k8s.io/apiserver/pkg/features"
1418
"k8s.io/apiserver/pkg/server"
1519
genericfilters "k8s.io/apiserver/pkg/server/filters"
@@ -18,6 +22,7 @@ import (
1822
"k8s.io/apiserver/pkg/util/webhook"
1923
"k8s.io/client-go/discovery"
2024
"k8s.io/client-go/rest"
25+
"k8s.io/component-base/tracing"
2126
utilversion "k8s.io/component-base/version"
2227
aggregatorapiserver "k8s.io/kube-aggregator/pkg/apiserver"
2328
aggregatorscheme "k8s.io/kube-aggregator/pkg/apiserver/scheme"
@@ -40,8 +45,10 @@ import (
4045
"go.miloapis.com/milo/internal/apiserver/admission/initializer"
4146
sessionsbackend "go.miloapis.com/milo/internal/apiserver/identity/sessions"
4247
identitystorage "go.miloapis.com/milo/internal/apiserver/storage/identity"
48+
admissionquota "go.miloapis.com/milo/internal/quota/admission"
4349
identityapi "go.miloapis.com/milo/pkg/apis/identity"
4450
identityopenapi "go.miloapis.com/milo/pkg/apis/identity/v1alpha1"
51+
quotaapi "go.miloapis.com/milo/pkg/apis/quota"
4552
datumfilters "go.miloapis.com/milo/pkg/server/filters"
4653
openapicommon "k8s.io/kube-openapi/pkg/common"
4754
)
@@ -147,13 +154,16 @@ func NewConfig(opts options.CompletedOptions) (*Config, error) {
147154
Options: opts,
148155
}
149156

150-
// Initialize Milo scheme with identity APIs and install into legacy scheme
157+
// Initialize Milo scheme with identity and quota APIs and install into legacy scheme
151158
miloScheme := runtime.NewScheme()
152159
identityapi.Install(miloScheme)
153160
identityapi.Install(legacyscheme.Scheme)
161+
quotaapi.Install(miloScheme)
162+
quotaapi.Install(legacyscheme.Scheme)
154163

155164
apiResourceConfigSource := controlplane.DefaultAPIResourceConfigSource()
156165
apiResourceConfigSource.DisableResources(corev1.SchemeGroupVersion.WithResource("serviceaccounts"))
166+
apiResourceConfigSource.DisableResources(corev1.SchemeGroupVersion.WithResource("resourcequotas"))
157167
// Enable identity group/version served by virtual storage
158168
apiResourceConfigSource.EnableVersions(identityopenapi.SchemeGroupVersion)
159169

@@ -195,6 +205,12 @@ func NewConfig(opts options.CompletedOptions) (*Config, error) {
195205
c.ControlPlane = kubeAPIs
196206
c.ControlPlane.Generic.EffectiveVersion = utilversion.DefaultKubeEffectiveVersion()
197207

208+
// Configure tracing for the loopback client to ensure trace context propagation
209+
// to admission plugins that use dynamic clients (e.g., quota admission plugin)
210+
if kubeAPIs.Generic.LoopbackClientConfig != nil && kubeAPIs.Generic.TracerProvider != nil {
211+
kubeAPIs.Generic.LoopbackClientConfig.Wrap(tracing.WrapperFor(kubeAPIs.Generic.TracerProvider))
212+
}
213+
198214
combinedInits := append(upstreamInits, loopbackInit)
199215

200216
authInfoResolver := webhook.NewDefaultAuthenticationInfoResolverWrapper(kubeAPIs.ProxyTransport, kubeAPIs.Generic.EgressSelector, kubeAPIs.Generic.LoopbackClientConfig, kubeAPIs.Generic.TracerProvider)
@@ -210,6 +226,15 @@ func NewConfig(opts options.CompletedOptions) (*Config, error) {
210226
}
211227
c.APIExtensions = apiExtensions
212228

229+
// Add readiness check for quota validator to ensure cache is synced before serving traffic
230+
kubeAPIs.Generic.AddReadyzChecks(admissionquota.ReadinessCheck())
231+
232+
// Add post-start hook to bootstrap CRDs from embedded filesystem
233+
// This installs all CRDs EXCEPT infrastructure.miloapis.com group, which should remain in the infrastructure cluster
234+
kubeAPIs.Generic.AddPostStartHookOrDie("bootstrap-crds", func(ctx server.PostStartHookContext) error {
235+
return bootstrapCRDsHook(ctx, kubeAPIs.Generic.LoopbackClientConfig)
236+
})
237+
213238
// TODO(jreese) create an admission plugin that will prohibit the creation of
214239
// a Secret with a type of `kubernetes.io/service-account-token`
215240
c.APIExtensions.GenericConfig.DisabledPostStartHooks.Insert("start-legacy-token-tracking-controller")
@@ -311,7 +336,7 @@ func DefaultBuildHandlerChain(apiHandler http.Handler, c *server.Config) http.Ha
311336
// }
312337
handler = genericfilters.WithHTTPLogging(handler)
313338
if c.FeatureGate.Enabled(genericfeatures.APIServerTracing) {
314-
handler = genericapifilters.WithTracing(handler, c.TracerProvider)
339+
handler = withCustomTracing(handler, c.TracerProvider)
315340
}
316341
handler = genericapifilters.WithLatencyTrackers(handler)
317342
// WithRoutine will execute future handlers in a separate goroutine and serving
@@ -334,3 +359,54 @@ func DefaultBuildHandlerChain(apiHandler http.Handler, c *server.Config) http.Ha
334359

335360
return handler
336361
}
362+
363+
// withCustomTracing provides tracing that always creates child spans (not links)
364+
// ensuring proper trace hierarchy for all requests including internal loopback requests.
365+
func withCustomTracing(handler http.Handler, tp trace.TracerProvider) http.Handler {
366+
opts := []otelhttp.Option{
367+
otelhttp.WithPropagators(tracing.Propagators()),
368+
otelhttp.WithTracerProvider(tp),
369+
otelhttp.WithServerName("milo-apiserver"),
370+
otelhttp.WithSpanNameFormatter(func(operation string, r *http.Request) string {
371+
ctx := r.Context()
372+
info, exist := request.RequestInfoFrom(ctx)
373+
if !exist || !info.IsResourceRequest {
374+
return r.Method
375+
}
376+
return getSpanNameFromRequestInfo(info, r)
377+
}),
378+
// Note: NOT using WithPublicEndpoint() to always create child spans instead of links
379+
}
380+
381+
wrappedHandler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
382+
// Add the http.target attribute to the span
383+
if r.URL != nil {
384+
trace.SpanFromContext(r.Context()).SetAttributes(semconv.HTTPTarget(r.URL.RequestURI()))
385+
}
386+
handler.ServeHTTP(w, r)
387+
})
388+
389+
// With Noop TracerProvider, the otelhttp still handles context propagation.
390+
// See https://github.com/open-telemetry/opentelemetry-go/tree/main/example/passthrough
391+
return otelhttp.NewHandler(wrappedHandler, "MiloAPI", opts...)
392+
}
393+
394+
// getSpanNameFromRequestInfo creates span names from Kubernetes request info
395+
func getSpanNameFromRequestInfo(info *request.RequestInfo, r *http.Request) string {
396+
spanName := "/" + info.APIPrefix
397+
if info.APIGroup != "" {
398+
spanName += "/" + info.APIGroup
399+
}
400+
spanName += "/" + info.APIVersion
401+
if info.Namespace != "" {
402+
spanName += "/namespaces/{:namespace}"
403+
}
404+
spanName += "/" + info.Resource
405+
if info.Name != "" {
406+
spanName += "/" + "{:name}"
407+
}
408+
if info.Subresource != "" {
409+
spanName += "/" + info.Subresource
410+
}
411+
return r.Method + " " + spanName
412+
}

0 commit comments

Comments
 (0)