Skip to content

docs: auth realm isolation PRD#2033

Merged
Benbentwo merged 10 commits intomainfrom
feature/dev-3960-fix-caching-issue-with-super-user-identities-between
Jan 29, 2026
Merged

docs: auth realm isolation PRD#2033
Benbentwo merged 10 commits intomainfrom
feature/dev-3960-fix-caching-issue-with-super-user-identities-between

Conversation

@Benbentwo
Copy link
Member

@Benbentwo Benbentwo commented Jan 28, 2026

what

  • Added comprehensive PRD for credential namespace isolation feature
  • Addresses credential collision issue when same identity names are used across different repositories
  • Documents hybrid namespace approach with environment variable, config file, and automatic path hash precedence

why

When engineers work with multiple customer repositories that use identical identity names (e.g., core-root/terraform), AWS credentials collide and cause cross-contamination. This PRD outlines the design for isolating credentials using repository-specific namespaces, preventing accidental use of the wrong customer's credentials.

references

  • DEV-3960: Fix caching issue with super user identities between customers

Summary by CodeRabbit

  • Documentation
    • Added three PRDs defining credential realm isolation: realm concept, selection precedence (env/config/automatic), naming/validation and sanitization rules.
    • Documented new per-realm credential storage layout and provider-specific Azure path and environment wiring.
    • Added migration guidance (breaking path change, no automatic migration), user-facing messaging, security considerations, testing guidance, and success metrics.

✏️ Tip: You can customize this high-level summary in your review settings.

Document the design for fixing credential caching collisions between
customers with identical identity names. Introduces hybrid namespace
approach with three precedence levels: environment variable,
explicit config, and automatic path hash.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
@Benbentwo Benbentwo requested a review from a team as a code owner January 28, 2026 20:13
@github-actions github-actions bot added the size/m Medium size PR label Jan 28, 2026
@github-actions
Copy link

github-actions bot commented Jan 28, 2026

Dependency Review

✅ No vulnerabilities or license issues found.

Scanned Files

None

@Benbentwo Benbentwo added documentation Improvements or additions to documentation no-release Do not create a new release (wait for additional code changes) labels Jan 28, 2026
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 28, 2026

📝 Walkthrough

Walkthrough

Adds realm-based credential isolation: a computed or specified top-level realm (ATMOS_AUTH_REALM, atmos.yaml, or SHA256(CliConfigPath)) that is sanitized and used to scope filesystem credential paths, keyring keys, schema fields, file managers, provider wiring, environment preparation, docs, and migration/testing guidance.

Changes

Cohort / File(s) Summary
PRD & Architecture Docs
docs/prd/auth-user-credential-realm-isolation.md, docs/prd/auth-realm-architecture.md, docs/prd/azure-auth-file-isolation.md
New/updated PRDs and architecture docs defining realms, precedence, validation/sanitization rules, path/key formats, migration guidance, testing, security analysis, and changelog.
Realm package & logic
pkg/auth/realm/..., pkg/auth/realm/realm.go
New realm computation and sanitization logic; precedence (env, atmos.yaml, SHA256(CliConfigPath)); RealmInfo export and validation-only sanitization policy.
Schema & Interfaces
pkg/schema/*.go, pkg/auth/*.go, pkg/auth/manager/*.go
Adds Realm to AuthConfiguration and threads realm through auth structs and manager interfaces (PostAuthenticateParams, WhoamiInfo, CredentialStore, etc.).
File managers (AWS, Azure, others)
pkg/*/filemanager/... (e.g., pkg/aws/filemanager/..., pkg/azure/filemanager/...)
Path helpers and APIs updated to accept/compute realm-aware XDG-style base paths (e.g., ~/.config/atmos/{realm}/...); Get*Path functions now accept realm parameter.
Azure provider integration
pkg/azure/... (multiple files)
Azure file manager and provider code made realm-aware: new BasePath/ConfigDir fields, PrepareEnvironment updates (AZURE_CONFIG_DIR), MSAL/profile cache and credential paths moved under realm, device-code provider wiring updated, provider spec files.base_path supported.
Keyring & key format
pkg/keyring/...
Key names updated to include realm so keyring entries are scoped by realm.
Auth flow, hooks & UI
cmd/..., pkg/hooks/..., pkg/auth/...
Realm computed at runtime and propagated through auth flows, hooks, post-auth messages, status/Whoami outputs, and environment preparation.
Migration, testing & changelog
docs/..., test/...
Documents migration (breaking change in credential paths, no automatic migration), unit/integration/manual test plans, security considerations, and success metrics; changelog entries added.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant CLI as Client (CLI)
  participant Realm as RealmService\n(pkg/auth/realm)
  participant FM as FileManager
  participant KR as Keyring
  participant Prov as Provider\n(Azure/AWS)
  CLI->>Realm: Request realm (ATMOS_AUTH_REALM / atmos.yaml / SHA256(CliConfigPath))
  Realm-->>CLI: Return sanitized realm
  CLI->>FM: Compute realm-scoped paths (credentials, config) with realm
  FM-->>Prov: Provide paths (msal_token_cache.json, credentials.json, config.json)
  CLI->>KR: Store/retrieve credential key including realm in key name
  KR-->>CLI: Return credential blob
  CLI->>Prov: Use files/env (e.g., AZURE_CONFIG_DIR) to authenticate
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • aknysh
  • osterman
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'docs: auth realm isolation PRD' directly summarizes the main change—adding PRD documentation for auth realm isolation.
Linked Issues check ✅ Passed The PR addresses DEV-3960 by documenting a realm isolation solution to prevent credential collision across customers with identical identity names.
Out of Scope Changes check ✅ Passed All changes are documentation-focused PRDs explaining the realm isolation feature; no implementation code is included, keeping changes within scope.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/dev-3960-fix-caching-issue-with-super-user-identities-between

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@docs/prd/auth-user-credential-namespace-isolation.md`:
- Around line 116-132: The sanitize() referenced by getCredentialNamespace
should be explicitly implemented to enforce a safe namespace: allow only ASCII
alphanumerics, hyphen (-) and underscore (_), replace any other character with a
safe replacement (e.g., '-') rather than rejecting, collapse consecutive
replacements to a single '-' and trim leading/trailing non-alphanumerics;
enforce a maximum length (e.g., 64 chars) by truncating and ensure the result is
not empty—if empty, return a deterministic fallback (use the first 8 chars of
the sha256 of atmosConfig.CliConfigPath as current logic). Apply this
sanitization to values from ATMOS_AUTH_NAMESPACE and atmosConfig.Auth.Namespace
inside getCredentialNamespace.
🧹 Nitpick comments (3)
docs/prd/auth-user-credential-namespace-isolation.md (3)

130-131: Consider documenting hash collision probability.

Truncating SHA256 to 8 characters provides ~4.3 billion unique values, which should be sufficient for typical usage. However, it might be worth adding a brief note about collision probability or the rationale for choosing 8 characters (e.g., "provides sufficient uniqueness for typical repository counts while keeping paths readable").


186-196: Note the keyring breaking change in the migration section.

The keyring key format change is another breaking change that affects users with existing keyring entries. Consider adding a note in the Migration Considerations section (lines 243-262) that users relying on OS keyring storage will also need to re-authenticate, not just those using file-based credentials.


232-232: Defer markdown linting issues to cleanup PR.

The static analysis tools flag a missing language specification for the fenced code block at line 232, and LanguageTool suggests using em dashes at lines 328-330. These are minor formatting issues that should be addressed in a separate documentation cleanup commit and don't block this PR.

Based on learnings: For PRD docs in cloudposse/atmos, markdownlint issues like MD040 should be deferred to a separate documentation cleanup commit and must not block the current PR.

Address CodeRabbit feedback by specifying the sanitize() function:
- Define allowed character set (alphanumeric, hyphen, underscore)
- Document sanitization rules (replacement, collapsing, trimming)
- Enforce 64 character maximum length
- Include security considerations for path traversal prevention

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
@codecov
Copy link

codecov bot commented Jan 28, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 75.52%. Comparing base (b731596) to head (4166c96).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2033      +/-   ##
==========================================
+ Coverage   75.49%   75.52%   +0.03%     
==========================================
  Files         793      793              
  Lines       73699    73699              
==========================================
+ Hits        55641    55664      +23     
+ Misses      14565    14542      -23     
  Partials     3493     3493              
Flag Coverage Δ
unittests 75.52% <ø> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 5 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Benbentwo and others added 2 commits January 28, 2026 12:42
Replace all instances of "namespace" with "realm" for credential
isolation terminology:
- ATMOS_AUTH_NAMESPACE → ATMOS_AUTH_REALM
- auth.namespace → auth.realm
- Namespace → Realm in all documentation and code examples

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Add formal definition explaining that a realm is a complete, isolated
authentication universe that determines identity existence, authentication
methods, and credential storage/resolution.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
coderabbitai[bot]
coderabbitai bot previously approved these changes Jan 28, 2026
Create comprehensive PRD documenting how realms should be used throughout
the atmos auth system:

- Realm as top-level directory (not appended to provider)
- Default realm computed as SHA256 hash of CliConfigPath
- Data flow from hooks through manager to credential storage
- All touchpoints: files, keyring, PostAuthenticateParams
- Schema and interface changes required
- User experience with atmos auth status

Update existing realm isolation PRD to reference new architecture doc
and align directory structure.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
@github-actions github-actions bot added size/l Large size PR and removed size/m Medium size PR labels Jan 28, 2026
coderabbitai[bot]
coderabbitai bot previously approved these changes Jan 28, 2026
@osterman osterman changed the title docs: auth credential namespace isolation PRD docs: auth credential realm isolation PRD Jan 28, 2026
@osterman osterman changed the title docs: auth credential realm isolation PRD docs: auth realm isolation PRD Jan 28, 2026
- Replace sanitization with validation (error on invalid characters)
- Update directory structure: ~/.config/atmos/{realm}/{cloud}/{provider}/
- All cloud providers now share common base path with realm as top-level
- Add clear error examples for invalid realm values

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@docs/prd/auth-user-credential-realm-isolation.md`:
- Around line 166-192: The validate function currently lacks a check for
consecutive separators; add a rule in validate (after the invalidChars check or
before the start/end checks) that scans the input for the sequences "--", "__",
"-_", and "_-" (e.g., using strings.Contains or a regex like `(--|__|-_|_-)`)
and return a clear error (e.g., "realm must not contain consecutive hyphens or
underscores") when any are found so the function rejects inputs with consecutive
separator patterns.

- Update Azure PRD with realm directory structure (~/.config/atmos/{realm}/azure/)
- Add implementation scope notes to realm PRDs (AWS now, Azure when implemented)
- Update file manager examples to include realm parameter
- Add cross-references between Azure and realm PRDs

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@github-actions github-actions bot added size/xl Extra large size PR and removed size/l Large size PR labels Jan 28, 2026
@mergify
Copy link

mergify bot commented Jan 28, 2026

Warning

This PR exceeds the recommended limit of 1,000 lines.

Large PRs are difficult to review and may be rejected due to their size.

Please verify that this PR does not address multiple issues.
Consider refactoring it into smaller, more focused PRs to facilitate a smoother review process.

Benbentwo and others added 2 commits January 28, 2026 15:30
Add missing check for consecutive hyphens/underscores (--/__/-_/_-)
as specified in validation rule #4.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
osterman
osterman previously approved these changes Jan 28, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/prd/azure-auth-file-isolation.md (1)

356-441: Realm parameter is missing in usage and interface after signature change.

The PRD updates GetProviderDir to accept realm, but SetupFiles, Cleanup, DeleteIdentity, and the universal FileManager interface still show the old signature. This is a spec inconsistency that will cause mismatched implementations.

🔧 Suggested doc fixes (align all snippets and interface with realm)
- func (m *AzureFileManager) GetProviderDir(providerName, realm string) string {
+ func (m *AzureFileManager) GetProviderDir(providerName, realm string) string {
     if realm != "" {
         return filepath.Join(m.baseDir, realm, "azure", providerName)
     }
     return filepath.Join(m.baseDir, "azure", providerName)
 }

- func (m *AzureFileManager) Cleanup(providerName string) error {
+ func (m *AzureFileManager) Cleanup(providerName, realm string) error {
     defer perf.Track(nil, "azure.files.Cleanup")()

-    providerDir := m.GetProviderDir(providerName)
+    providerDir := m.GetProviderDir(providerName, realm)
     ...
 }

- func (m *AzureFileManager) DeleteIdentity(ctx context.Context, providerName, identityName string) error {
+ func (m *AzureFileManager) DeleteIdentity(ctx context.Context, providerName, identityName, realm string) error {
     ...
-    return m.Cleanup(providerName)
+    return m.Cleanup(providerName, realm)
 }
 type FileManager interface {
-    GetProviderDir(providerName string) string
+    GetProviderDir(providerName, realm string) string
     GetBaseDir() string
     GetDisplayPath() string
-    Cleanup(providerName string) error
-    DeleteIdentity(ctx context.Context, providerName, identityName string) error
+    Cleanup(providerName, realm string) error
+    DeleteIdentity(ctx context.Context, providerName, identityName, realm string) error
     CleanupAll() error
 }

Also applies to: 902-925

🤖 Fix all issues with AI agents
In `@docs/prd/auth-realm-architecture.md`:
- Around line 89-96: Update the Azure “Current Path” entry in the
auth-realm-architecture table to accurately reflect Azure CLI's default storage
(replace `~/.azure/atmos/{provider}/` with `~/.azure/` or use “N/A” if you
prefer to indicate not implemented); adjust the accompanying Azure Note if
needed to remain consistent with the new table value so the paths and statements
about implementation status in the table and the paragraph below match.
- Around line 127-138: Update the Keyring Storage format to include the provider
field in the key (replace the shown format "atmos:{realm}:{identity}" with a
provider-aware format such as "atmos:{realm}:{provider}:{identity}" or
"{provider}:atmos:{realm}:{identity}" to match other PRDs), update the table
header and examples under "Keyring Storage" and change the example lines (e.g.,
Current: `core-root/terraform` → With realm+provider:
`atmos:a1b2c3d4:aws:core-root/terraform`) so the docs align with existing
keyring patterns and avoid conflicting implementations.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/prd/azure-auth-file-isolation.md (1)

356-441: Make realm usage consistent in FileManager API and examples.

GetProviderDir is introduced with a realm parameter, but later code snippets and the FileManager interface drop it. That mismatch will cause confusion when implementing realm-aware cleanup/setup.

Suggested doc alignment
-type FileManager interface {
-    // GetProviderDir returns the provider-specific directory.
-    GetProviderDir(providerName string) string
+type FileManager interface {
+    // GetProviderDir returns the provider-specific directory with realm.
+    GetProviderDir(providerName, realm string) string
-    providerDir := fileManager.GetProviderDir(providerName)
+    providerDir := fileManager.GetProviderDir(providerName, realm)

Also applies to: 902-925

🤖 Fix all issues with AI agents
In `@docs/prd/auth-realm-architecture.md`:
- Around line 311-320: The call to realm.GetRealm in NewAuthManager ignores its
error return; update NewAuthManager to capture (realmInfo, err) :=
realm.GetRealm(config.Realm, cliConfigPath), check if err != nil and return a
wrapped or propagated error (or handle it appropriately) before proceeding, so
any realm validation failure stops initialization; refer to NewAuthManager and
realm.GetRealm to locate and fix the call site.

In `@docs/prd/auth-user-credential-realm-isolation.md`:
- Around line 273-282: The createKeyringKey implementation in
pkg/auth/credentials/store.go currently injects providerName into the key, but
the architecture PRD expects keys of the form atmos:{realm}:{identity}; update
the createKeyringKey function to produce "atmos:{realm}:{identity}" (and when
realm is empty produce "atmos:{identity}") instead of including providerName,
remove providerName from the function signature and all callers (or pass only
identityName and realm), and update any tests or usages that construct/parse
keys to match the new format.

- Fix NewAuthManager to properly handle error from realm.GetRealm()
- Fix createKeyringKey to use atmos:{realm}:{identity} format without providerName
  (consistent with architecture PRD's keyring storage design)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@Benbentwo Benbentwo merged commit 7c5190f into main Jan 29, 2026
57 checks passed
@Benbentwo Benbentwo deleted the feature/dev-3960-fix-caching-issue-with-super-user-identities-between branch January 29, 2026 16:38
@github-actions
Copy link

These changes were released in v1.205.1-rc.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation no-release Do not create a new release (wait for additional code changes) size/xl Extra large size PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants