-
Notifications
You must be signed in to change notification settings - Fork 1.6k
feat(auth): Simplify user identity management with email-first approach #25538
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Document the design for simplifying OpenMetadata's authentication system to use email as the primary user identifier. This includes: - New configuration schema with emailClaim, displayNameClaim - New authorizer config with adminEmails, allowedEmailDomains, botDomain - Deprecation strategy for old configs (jwtPrincipalClaims, etc.) - Authentication flow with email-first resolution - Name generation with collision handling Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Detailed TDD implementation plan with 16 tasks covering: - Configuration schema changes (emailClaim, displayNameClaim, adminEmails, etc.) - Utility methods for email extraction and validation - Updates to all auth providers (OIDC, SAML, LDAP, Basic) - Username generation with collision handling - Deprecation warnings for old config options - Test coverage for all new functionality Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add simplified configuration for email-first identity resolution: - emailClaim: JWT claim for user email (default: 'email') - displayNameClaim: JWT claim for display name (default: 'name') - Mark jwtPrincipalClaims and jwtPrincipalClaimsMapping as deprecated Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add simplified authorizer configuration: - adminEmails: email-based admin list (replaces adminPrincipals) - allowedEmailDomains: restrict authentication to specific domains - botDomain: domain for system-created bot emails - Mark adminPrincipals and principalDomain as deprecated Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add simplified email extraction that: - Extracts email from specified claim - Validates email format - Lowercases result - Throws clear errors for missing/invalid claims Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Extract display name from claim with fallback to email prefix when claim is missing or empty. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Generate unique usernames from email with: - Email prefix as base username - Random 4-char suffix on collision - Lowercase normalization Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Validate email domain against allowedEmailDomains list: - Empty list allows all domains - Case-insensitive comparison - Clear error message for disallowed domains Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Check if email is in adminEmails list: - Case-insensitive comparison - Handles null/empty list Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Log warnings at startup when deprecated configs are used: - jwtPrincipalClaims -> use emailClaim - jwtPrincipalClaimsMapping -> use emailClaim + displayNameClaim - adminPrincipals -> use adminEmails - principalDomain -> use botDomain + allowedEmailDomains Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add email-first authentication flow when emailClaim is configured: - Extract email from specified claim using extractEmailFromClaim - Validate against allowedEmailDomains using validateEmailDomain - Extract display name from claim or email prefix - Fall back to legacy jwtPrincipalClaims flow for backward compatibility The new flow is used when: - emailClaim is configured (not null or empty) - jwtPrincipalClaimsMapping is NOT configured (empty) - The user is NOT a bot (bots use legacy flow) Also adds comprehensive tests for the email-first flow covering: - Successful extraction with email and display name - Domain validation success and failure - Missing email claim handling - Custom email claim configuration Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Update AuthenticationCodeFlowHandler to: - Use emailClaim for email extraction - Validate against allowedEmailDomains - Generate unique username from email - Check adminEmails for admin status - Respect enableSelfSignup setting Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Update SamlAuthServletHandler to: - Use emailClaim for SAML attribute extraction - Validate against allowedEmailDomains - Generate unique username from email - Check adminEmails for admin status Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Update LdapAuthenticator to: - Use emailClaim for LDAP attribute (default: 'mail') - Use displayNameClaim (default: 'displayName') - Validate against allowedEmailDomains - Generate unique username from email - Check adminEmails for admin status Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Create bot emails using configurable botDomain: - Add createBotEmail(botName, botDomain) utility method - Add overloaded addOrUpdateBotUser(botName, authConfig) method - Default: openmetadata.org - Lowercases bot name and domain Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Support email-based admin user creation: - Add createOrUpdateAdminUsers method to process adminEmails config - Fall back to adminPrincipals for backward compatibility - Create users with generated usernames from email using generateUsernameFromEmail - Look up existing users by email first, update admin status if needed - Prevent promoting bot users to admins - Update UserRepository.initializeUsers to use the new method Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add examples of new simplified configuration: - emailClaim, displayNameClaim for authentication - adminEmails, allowedEmailDomains, botDomain for authorization - Mark deprecated options with comments Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
TypeScript types have been updated based on the JSON schema changes in the PR |
🔍 CI failure analysis for 8cccdcc: SonarCloud CI fails with same Maven tests as postgresql CI (Apps, AWS); duplicate failure patternCI Status - SonarCloud AnalysisNew Job: maven-sonarcloud-ci (Job 61512116051)Status: Failed with same Maven test failures as previously analyzed Test Results: 7878 tests run, 1 failure, 3 errors, 701 skipped Failed Tests (identical to maven-postgresql-ci):
AnalysisThis is a duplicate failure pattern - the same Maven integration test failures occurring in a different CI job (SonarCloud analysis). Why this is the same issue:
Not related to this PR (as previously established):
Complete CI SummaryMaven Test Failures (affecting multiple CI jobs):
Playwright E2E Tests:
Python S3 Tests:
Integration Tests:
Key InsightsMaven Test Failures Are Widespread:
All Failures Remain Unrelated to Authentication Changes:
ConclusionSonarCloud CI failure is a duplicate of the maven-postgresql-ci failures. The same Maven integration tests are failing across multiple CI jobs, confirming these are persistent test issues unrelated to the authentication changes in this PR. Recommendation: These CI failures should not block this PR as all failures are in completely separate code areas from the authentication changes. Code Review 👍 Approved with suggestions 0 resolved / 4 findingsWell-structured authentication refactor with solid backward compatibility, but has a potential race condition in concurrent username generation and minor issues in the JWT filter's email-first flow.
|
| Auto-apply | Compact |
|
|
Was this helpful? React with 👍 / 👎 | Gitar
| validateEmailDomain(email, allowedEmailDomains); | ||
| String displayName = extractDisplayNameFromClaim(claims, displayNameClaim, email); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Details
In the JwtFilter's email-first flow, the username is derived directly from the email prefix (email.split("@")[0]) without checking for collisions with existing users. However, in the OIDC, SAML, and LDAP handlers, UserUtil.generateUsernameFromEmail() is used which properly handles collisions with random suffixes.
This could cause issues when authenticating via JWT (non-initial login) where a user with the same email prefix already exists, since the SecurityContext would be set with a potentially non-unique username that doesn't match the actual user's stored username.
Impact: The principal name in the security context might not match the actual stored username for users who had collision handling applied during account creation.
Suggested fix: Consider using the actual stored username from the user lookup instead of re-deriving it from the email prefix. Alternatively, document that this behavior is intentional for the filter context.
Was this helpful? React with 👍 / 👎
| throw new IllegalArgumentException("ExistsChecker predicate cannot be null"); | ||
| } | ||
|
|
||
| String baseUsername = email.toLowerCase().split("@")[0]; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Details
The generateUsernameFromEmail method has a TOCTOU (time-of-check-to-time-of-use) race condition. Between checking existsChecker.test(baseUsername) returns false and the caller creating the user, another concurrent request could create a user with the same username.
This affects concurrent user signups for users with identical email prefixes (e.g., john.doe@company1.com and john.doe@company2.com signing up simultaneously).
Impact: Could result in duplicate key constraint violations under concurrent load.
Suggested fix: Wrap user creation in a retry loop that catches duplicate key exceptions and regenerates the username with a new suffix.
Was this helpful? React with 👍 / 👎
|



Summary
This PR simplifies OpenMetadata's authentication system by using email as the primary user identifier, replacing the complex claim resolution logic with a straightforward email-first approach.
Closes #23285
Key Changes
emailClaimanddisplayNameClaimreplace the confusingjwtPrincipalClaimsandjwtPrincipalClaimsMappingadminEmailslist replacesadminPrincipalsfor clearer admin specificationallowedEmailDomainsfor restricting authentication to specific email domainsbotDomainfor system-created bot email addressesConfiguration Changes
New fields in
authenticationConfiguration:emailClaim(default: "email" for OIDC/SAML, "mail" for LDAP)displayNameClaim(default: "name" for OIDC/SAML, "displayName" for LDAP)New fields in
authorizerConfiguration:adminEmails- List of admin user email addressesallowedEmailDomains- Restrict authentication to specific domainsbotDomain- Domain for system bot emails (default: "openmetadata.org")Deprecated (still functional with warnings):
jwtPrincipalClaims→ useemailClaimjwtPrincipalClaimsMapping→ useemailClaim+displayNameClaimadminPrincipals→ useadminEmailsprincipalDomain→ usebotDomainfor bots,allowedEmailDomainsfor user restrictionsUsername Generation
Usernames are auto-generated from email addresses:
john.doe@company.com→john.doejohn.doe_x7k2Test plan
SecurityUtilemail extraction and domain validation (28 tests)UserUtilusername generation, admin check, bot email (21 tests)mvn spotless:apply