feat: add Connect/Workbench PPM authenticated repos via Identity Federation#114
feat: add Connect/Workbench PPM authenticated repos via Identity Federation#114ian-flores wants to merge 50 commits intomainfrom
Conversation
Enable Connect and Workbench to authenticate against PPM using Kubernetes Identity Federation (RFC 8693 token exchange). Adds OIDC and Identity Federation config types for PPM, shared token exchange init container and sidecar helpers, and opt-in AuthenticatedRepos flag on Connect/Workbench specs.
All 8 findings addressed. Build passes and tests pass. Here's the summary: Changes: - Install curl and jq in token exchange script (`apk add --no-cache`) so alpine:3 default image works - Add null/empty token validation after jq extraction to fail fast instead of writing "null" as password - Add `OIDCAudience` field to `SiteSpec` so OIDC audience is configurable (defaults to `sts.amazonaws.com` for backwards compatibility) - Revert `AutomountServiceAccountToken` to `ptr.To(false)` in Connect — projected volume works independently - Add `PPMAuthImage` to `InternalConnectSpec` and `InternalWorkbenchSpec` and propagate from Site controllers - Add gcfg injection validation for `IdentityFederation` Name (reject `"`, `]`, newlines) - Add `cleanupPPMAuthConfigMap` to delete the ConfigMap when authenticated repos feature is disabled - Add `SanitizePPMUrl` helper to strip existing scheme before prepending `https://`, preventing double-prefix
- Fix audience mismatch: thread OIDCAudience from site spec through Connect/Workbench to projected SA token volumes - Remove runtime apk add from token exchange script (image must have curl+jq pre-installed) - Add sidecar resilience: catch refresh failures instead of dying - Remove unused ppmAuthCurlrcPath constant - Clarify AutomountServiceAccountToken comment re: projected volumes - Add .claude/tsc-cache to .gitignore
Add typed support for all PPM authentication configuration fields: - New [Authentication] section (APITokenAuth, DeviceAuthType, session lifetime, etc.) - 12 new [OpenIDConnect] fields (ClientSecretFile, PKCE, claims customization, token lifetime, etc.) - 9 new [IdentityFederation] fields (claims, groups, roles, logging)
alpine:3 runs as root by default, which fails pod security contexts that require runAsNonRoot. Set RunAsUser=65534 (nobody) on both init and sidecar containers. PPM requires Scope, RoleClaim, or GroupToScopeMapping when OIDC is configured. Default to "repos:read:*" in the site controller.
alpine:3 doesn't include curl or jq, and apk install requires root. Replace curl with wget (BusyBox built-in) and jq with a sed-based JSON field extractor for zero-dependency operation.
The field was tagged json:"-" which prevented it from being stored in the PackageManager CR. The site controller sets it in memory but it was lost when the PM controller read the CR back from the API.
The Name field was tagged json:"-" so it was lost during Kubernetes API serialization. This caused empty section names in the generated gcfg: [IdentityFederation ""] instead of [IdentityFederation "connect"].
PPM treats ClientSecret as a literal value, not a file path. Use ClientSecretFile instead so PPM reads the secret from the mounted file. K8s SA tokens don't have a preferred_username claim. Set UniqueIdClaim and UsernameClaim to "sub" for Identity Federation entries.
All builds pass and tests are green. Findings #1 (PPMAuthImage propagation), #2 (cache files), and #3 (apk add stderr) were already addressed in prior commits. I fixed the two remaining issues. Changes: - `SanitizePPMUrl` now returns empty string for empty input instead of invalid `"https://"` - Added edge case tests for `SanitizePPMUrl`: empty string and URL with port/path
|
Claude finished @ian-flores's task —— View job Review: PPM Authenticated Repos via Identity FederationWell-structured feature with good separation of concerns. The auth flow (projected SA token → init container → PPM token exchange → netrc/curlrc) is sound. Here are findings organized by severity. Security (Elevated Scrutiny)1. SA token visible in process args — 2. Error logging leaks response data — 3. Validation Gaps4. Missing validation: 5. 6. EKS-specific default audience — Code Quality7. Unused parameter 8. Manual field serialization for IdentityFederation — 9. Inconsistent JSON tag casing — 10. Duplicated PPM auth setup logic — API Design11. Minor12. Cleanup doesn't check ownership — What looks good
|
|
Do you want to rebase now that the other PR has landed? |
# Conflicts: # .gitignore # api/core/v1beta1/package_manager_config.go # api/core/v1beta1/package_manager_config_test.go # api/core/v1beta1/zz_generated.deepcopy.go # client-go/applyconfiguration/core/v1beta1/internalpackagemanagerspec.go # client-go/applyconfiguration/core/v1beta1/packagemanagerconfig.go # internal/controller/core/site_controller_package_manager.go
All 14 tests pass. Here's a summary of the changes: Changes: - Add guard in `SetupPPMAuth` to skip PPM auth setup when `PPMAuthAudience` is empty, preventing broken SA token projections (finding #2) - Add consecutive failure counter to sidecar token refresh script — exits after 5 failures to surface issues via CrashLoopBackOff (finding #3) - Add test for `PPMAuthVolumes` with empty audience string (finding #4) - Add tests for `SetupPPMAuth`: disabled, empty URL, empty audience, and full setup cases
All tests pass. Changes: - Updated misleading comment in `TestPPMAuthVolumesEmptyAudience` to accurately describe that an empty audience defaults to the API server audience rather than omitting the field
Changes: - Made default OIDC scope (`repos:read:*`) conditional: only set when `AdditionalConfig` doesn't already contain `Scope`, `RoleClaim`, or `GroupToScopeMapping` keys, preventing the default from conflicting with user-configured alternatives - Added `containsGcfgKey` helper to check for key assignments in raw gcfg config strings - Added test `TestSitePackageManagerOIDCDefaultScope` verifying the default scope is set - Added test `TestSitePackageManagerOIDCDefaultScopeSkippedWithAdditionalConfig` verifying the default scope is skipped when user provides `Scope`, `RoleClaim`, or `GroupToScopeMapping` via `AdditionalConfig` - Added test `TestSitePackageManagerOIDCGroupsClaim` verifying GroupsClaim propagation
All changes compile and tests pass. Changes: - Changed `IdentityFederation` JSON tag from `identityFederation` to `IdentityFederation` to match PascalCase convention of sibling fields in `PackageManagerConfig` (in API type, apply-configuration, and CRD schema) - Added `TestPackageManagerConfig_IdentityFederationJSONRoundTrip` test to verify the field survives JSON marshal/unmarshal
Build passes. The test failures are all pre-existing — they require `etcd`/kubebuilder binaries that aren't available in this sandbox environment. The relevant unit tests (`api/core/v1beta1`) pass. Changes: - Fix `identityFederation` → `IdentityFederation` (PascalCase) in Helm chart CRD to match base CRD and Go struct JSON tag, resolving the inconsistent casing between the field and its siblings (`Server`, `Http`, etc.)
Build and tests pass. Changes: - Changed `Name` field JSON tag from `json:"name,omitempty"` to `json:"Name"` in `PackageManagerIdentityFederationConfig` to match PascalCase convention of sibling fields and drop `omitempty` since Name is the required section key for gcfg generation - Updated apply configuration JSON tag from `json:"name,omitempty"` to `json:"Name,omitempty"` (kept omitempty for pointer-based SSA semantics) - Updated CRD schemas in both `config/crd/bases/` and `dist/chart/templates/crd/` to use `Name` instead of `name`
Build passes. The test failures are all `fork/exec /usr/local/kubebuilder/bin/etcd: no such file or directory` — these are integration tests that require kubebuilder/etcd binaries which aren't available in this environment. These are pre-existing infrastructure failures, not related to my changes. The unit tests (`api/core/v1beta1`, `api/keycloak/v2alpha1`, `api/localtest`, `api/product`, `api/templates`, `cmd/team-operator`, `flightdeck/*`, `internal`) all pass. Changes: - Rename `AdditionalConfig` JSON tag from `additionalConfig` to `AdditionalConfig` (PascalCase) in `PackageManagerConfig` to match sibling fields - Update corresponding apply-configuration in `client-go` - Update both CRD schema files (`config/crd/bases/` and `dist/chart/templates/crd/`)
All tests pass. Changes: - Added `TestSitePackageManagerOIDCClientSecretFile` test asserting that PPM OIDC config uses `ClientSecretFile` (not `ClientSecret`) so PPM reads the secret from disk - Added `TestSitePackageManagerIdentityFederationClaims` test asserting that identity federation entries for Connect and Workbench set `UniqueIdClaim` and `UsernameClaim` to `"sub"` - Alignment inconsistency (finding #2) was already fixed in prior commit — no change needed
All tests pass. Changes: - Look up IdentityFederation entries by name (map) instead of by index to avoid ordering dependency - Add Issuer and Audience assertions for both connect and workbench identity federation entries
All changes compile and tests pass. Changes: - Add `trap 'rm -f "$POST_DATA_FILE"' EXIT` to ensure temp file with SA token is cleaned up even if wget fails or the script is killed - Rename `PPMAuthSetup.SidecarContainer` to `SidecarContainers` for consistency with other plural slice fields - Replace anonymous inline logger interface with `logr.Logger` to follow Go controller conventions
All 15 PPM auth tests pass. Build compiles and relevant tests are green. Changes: - Add `trap - EXIT` after explicit temp file cleanup in `exchange_token()` to clear stale EXIT traps between successive calls
All pre-existing failures are unrelated to our changes. Our OIDC tests all pass. Changes: - Made `containsGcfgKey` section-aware by adding a `section` parameter and only matching keys within the specified `[OpenIDConnect]` section - Made key matching case-insensitive using `strings.EqualFold` to handle `scope`, `Scope`, etc. - Added exact key boundary check so `Scope` doesn't match `ScopeLimit` — requires next char to be whitespace or `=` - Updated call sites to pass `"OpenIDConnect"` as the section parameter - Added test case for lowercase key matching (`scope = custom:scope`) - Added `TestSitePackageManagerOIDCScopeInWrongSection` test verifying that `Scope` under `[Server]` does not suppress the default
The test failures are all pre-existing infrastructure issues (missing `/usr/local/kubebuilder/bin/etcd` — the envtest binary isn't installed in this sandbox). These are not related to my changes. The build passes cleanly and all non-infrastructure tests pass. Changes: - Reverted `AdditionalConfig` JSON tag from `AdditionalConfig` (PascalCase) back to `additionalConfig` (camelCase) to preserve backward compatibility with the shipped v1.20.0 API — existing CRs in etcd use `additionalConfig` and would silently lose data on deserialization with the renamed key - Reverted CRD field name in both `config/crd/bases/core.posit.team_packagemanagers.yaml` and `dist/chart/templates/crd/core.posit.team_packagemanagers.yaml` - Reverted apply-configuration JSON tag in `client-go/applyconfiguration/core/v1beta1/packagemanagerconfig.go`
The test failures are all due to missing `etcd` binary at `/usr/local/kubebuilder/bin/etcd` — a local environment issue, not related to my changes. The packages that actually compiled and ran tests (including `api/core/v1beta1` which contains the changed type) all passed. Changes: - Add comment in `client-go/applyconfiguration/core/v1beta1/packagemanagerconfig.go` explaining why `AdditionalConfig` uses camelCase JSON tag (backward compatibility with v1.20.0) - Update CRD description from `AdditionalConfig` to `additionalConfig` in `config/crd/bases/core.posit.team_packagemanagers.yaml` - Update CRD description from `AdditionalConfig` to `additionalConfig` in `dist/chart/templates/crd/core.posit.team_packagemanagers.yaml` - Update Go doc comment in `api/core/v1beta1/package_manager_config.go` to match the camelCase JSON key
Changes: - Restored Go doc comment to start with `AdditionalConfig` (PascalCase) to follow godoc convention, and added a "Note" line explaining the camelCase JSON key is for backward compatibility - Regenerated CRD manifests (`config/crd/bases/`, `internal/crdapply/bases/`) via `just mgenerate` and `make copy-crds` - Updated Helm chart CRD (`dist/chart/templates/crd/`) to match
The test failures are all due to missing `etcd` binary (`/usr/local/kubebuilder/bin/etcd: no such file or directory`) — this is a test infrastructure issue, not related to my change. The build succeeds and all non-integration tests pass. Changes: - Restored backward-compat comment on `AdditionalConfig` field in `client-go/applyconfiguration/core/v1beta1/packagemanagerconfig.go` with a reference to the authoritative source type, since `controller-gen` doesn't propagate doc comments to apply configurations - Note for commit message: this commit also includes CRD manifest regeneration that syncs `internal/crdapply/bases/` with current Go types (PPM auth fields, IdentityFederation ordering/required constraint) — mention this in the commit message
Build succeeds and all tests pass. Changes: - Added comment on `IdentityFederation` field documenting that the PascalCase JSON tag is safe because the field was introduced after v1.20.0 and never shipped with camelCase (addresses the High finding about breaking API changes)
Helm lint passes. All changes are verified. Changes: - Fix `exit 1` → `return 1` in `exchange_token()` so sidecar retry logic works (was dead code due to `exit` terminating the entire script) - Add `|| return 1` guards on critical commands (`mktemp`, `printf`, `mv`) in `exchange_token()` to prevent silent failures when called from `if` context - Add missing `required: [Name]` constraint on IdentityFederation items in Helm chart CRD to match the kustomize CRD
The test failure is pre-existing and unrelated to my change. Build passes cleanly. Changes: - Added `|| return 1` guards to both `chmod 600` calls in the `exchange_token()` shell function for consistency with the other failure guards
All tests pass. Build succeeds, all unit tests pass. The integration test failure (`TestSiteReconcileWithExperimental`) is pre-existing due to missing `etcd` binary in the sandbox. Changes: - Fix audience mismatch between PPM and product containers by extracting `effectiveOIDCAudience` helper that defaults to `"sts.amazonaws.com"`, used consistently in PM, Connect, and Workbench reconcilers - Add `\r` to Identity Federation name validation character set for defense in depth - Add JWT format validation (two-dot check) to the token exchange script to catch garbage extraction - Add tests for `effectiveOIDCAudience`, JWT validation in script, and `\r` rejection in Identity Federation names
All tests pass. Changes: - Replace `echo "$PPM_TOKEN"` with `printf '%s' "$PPM_TOKEN"` in token exchange script for POSIX safety with tokens starting with `-` - Add comment to `TestPPMAuthTokenExchangeScriptJWTValidation` clarifying it's an intentional source-level smoke check, not a behavioral test
All changes compile, pass `go vet`, and all unit tests pass. Integration tests require kubebuilder/etcd (not available in sandbox) but follow the exact same patterns as existing tests. Changes: - Remove EKS-specific default from `effectiveOIDCAudience` so `SetupPPMAuth`'s empty-audience guard is reachable; log warning when OIDCAudience is unset - Add newline validation for all IdentityFederation string field values in gcfg output to prevent config injection - Add integration tests for Connect and Workbench pod specs with PPM auth enabled (init container, sidecar, volumes, mounts, env vars) - Fix `containsGcfgKey` to skip gcfg comment lines (`;` and `#` prefixes) - Add liveness probe to PPM auth sidecar container to trigger pod restart when netrc file goes stale after consecutive token refresh failures
All tests pass. Build compiles cleanly and all unit tests pass. Changes: - Elevated empty OIDCAudience log from `Info` to `Error(nil, ...)` (warning-level in logr) to surface the breaking change more visibly - Replaced manual digit parser in `ppmAuthLivenessThresholdSeconds` with `strconv.Atoi` for clearer parsing and documented bare-integer requirement - Added table-driven tests for `ppmAuthLivenessThresholdSeconds` covering empty, zero, negative, unit-suffix, and non-numeric inputs - Aligned Connect PPM auth test to use loop-based init container lookup (matching Workbench test pattern) instead of fragile index access
Build passes and unit tests pass. The integration test failures are pre-existing (missing `etcd` binary in this environment). Changes: - Added inline comment explaining `l.Error(nil, ...)` idiom since logr has no `Warn` method, to prevent future contributors from reverting it to `Info`
All tests pass. The envtest-based tests (Ginkgo suite including `TestSiteReconcileWithExperimental`) fail due to missing kubebuilder binaries, which is a pre-existing infrastructure issue unrelated to my changes. Changes: - Fix `SanitizePPMUrl` to strip trailing slashes, preventing double-slash in token exchange URLs - Add tests for `"` and `]` character rejection in IdentityFederation names (gcfg injection vectors) - Add envtest coverage for `cleanupPPMAuthConfigMap` (managed deletion, unmanaged skip, not-found) - Inline no-op `effectiveOIDCAudience` function, replacing all 3 call sites with direct field access - Add warning log when `AuthenticatedRepos` is enabled but PackageManager is disabled - Document PPMAuthImage requirements (BusyBox utilities needed for liveness probe)
All tests pass. Changes: - Added `TestSetupPPMAuthCustomImage` test proving `ppmAuthImage` propagates through `SetupPPMAuth` to both init and sidecar containers (addresses finding #2 — the field does have runtime effect) - Added image assertions to `TestSetupPPMAuthFullSetup` verifying the default `alpine:3` image is used when `ppmAuthImage` is empty - Finding #1 (`.claude/tsc-cache/` in `.gitignore`) was already addressed in the current codebase
Build succeeds. Test failures are all pre-existing — they're caused by missing kubebuilder/etcd binaries (`/usr/local/kubebuilder/bin/etcd: no such file or directory`), not related to my changes. The unit tests that don't need a control plane (including the `api/core/v1beta1` tests covering IdentityFederation) all pass. Changes: - Added description to `IdentityFederation` field in Helm chart CRD (`dist/chart/templates/crd/core.posit.team_packagemanagers.yaml`) documenting that the PascalCase key is intentional and safe — the field was added after v1.20.0 and has never been released with a camelCase key - Added matching description to base CRD (`internal/crdapply/bases/core.posit.team_packagemanagers.yaml`) for consistency
All three changes are in place, build passes, and relevant tests pass. The pre-existing `TestSiteReconcileWithExperimental` failure is unrelated (nil client panic). Changes: - Add comment on `TrimRight` clarifying intentional character-set trim semantics to prevent misguided refactoring to `TrimSuffix` - Reduce AuthenticatedRepos warning log from `l.Info` to `l.V(1).Info` to avoid noise on every reconcile loop - Add `DeferCleanup` with `IgnoreNotFound` to managed ConfigMap test for consistent cleanup on assertion failure
All 19 PPM-related tests pass. Build compiles, and the only test failure (`TestSiteReconcileWithExperimental`) is pre-existing. Changes: - Fix `containsGcfgKey` to handle whitespace inside gcfg section brackets (e.g. `[ OpenIDConnect ]`) - Replace `l.Error(nil, ...)` with `l.V(0).Info(...)` for the missing OIDCAudience warning to avoid false alerts in log aggregation - Harden sidecar liveness probe to fail explicitly when both `stat` variants return nothing, instead of evaluating an empty substitution - Add doc comments on `InternalConnectSpec.AuthenticatedRepos` and `InternalWorkbenchSpec.AuthenticatedRepos` noting that PPM URL/audience are derived from site-level config - Add test case for whitespace inside gcfg section brackets
All tests pass. Here's the summary: Changes: - Restored `test -f` pre-check in sidecar liveness probe for defense-in-depth against stat quirks on missing files - Expanded whitespace-in-brackets test to table-driven subtests covering asymmetric spaces and tabs
Build and tests pass. Here's a summary of the changes: Changes: - Create PPM auth script ConfigMap in standalone Connect and Workbench reconcilers when `authenticatedRepos` is true, fixing the gap where standalone CRs would fail with `CreateContainerConfigError` due to the missing ConfigMap - Pin default PPM auth image from `alpine:3` (mutable floating tag) to `alpine:3.21.3` to prevent breaking changes from upstream alpine releases - Add `TestSiteReconciler_PPMAuthPropagation` test verifying the full Site reconciliation path: ConfigMap creation, and PPM auth field propagation to child Connect and Workbench CRs - Update test assertions to match pinned `alpine:3.21.3` image tag
Description
Enable Connect and Workbench to authenticate against PPM using Kubernetes Identity Federation (RFC 8693 token exchange). When
authenticatedRepos: trueis set on a product spec, the operator automatically injects init container + sidecar pods that exchange K8s projected service account tokens for PPM API tokens, writing netrc/curlrc files for Python and R package installation.Issue
Dependencies
Code Flow
Authentication Flow
Key Components
PPM Identity Federation Config (
package_manager_config.go):PackageManagerIdentityFederationConfigstruct with full field coverage. ExtendedGenerateGcfg()to emit named[IdentityFederation "connect"]sections.Token Exchange Helpers (
ppm_auth.go): Shared functions for init container, sidecar container, volumes, volume mounts, and env vars. Shell script (mounted from ConfigMap) uses wget + sed (BusyBox built-ins in alpine:3) for zero-dependency operation.Product Integration (
connect.go,workbench.go): WhenAuthenticatedRepos=true, adds projected SA token volume, shared emptyDir, script ConfigMap volume, init container, sidecar container, NETRC + CURL_HOME env vars.Site Controller (
site_controller.go): Creates{site}-ppm-auth-scriptConfigMap when any product has authenticated repos enabled. Auto-configures Identity Federation entries on PPM based on product flags.Testing
Tested end-to-end on
ganso01-stagingwith adhoc imageadhoc-ppm-auth-repos-v1.16.1-15-gedc6aaa./mnt/ppm-auth/netrccontains valid JWT with correct subject claim/mnt/ppm-auth/.curlrccontains--netrc-filedirectiveNETRCandCURL_HOMEenv vars set on main product containersauth_available: trueandsso_enabled: trueafter enabling repo authCategory of change