Skip to content

Conversation

@mohityadav766
Copy link
Member

@mohityadav766 mohityadav766 commented Jan 26, 2026

Summary

This PR simplifies OpenMetadata's authentication system by using email as the primary user identifier, replacing the complex claim resolution logic with a straightforward email-first approach.

Closes #23285

Key Changes

  • Email as primary identifier: Users are now identified and looked up by email address instead of complex claim resolution
  • Simplified configuration: New emailClaim and displayNameClaim replace the confusing jwtPrincipalClaims and jwtPrincipalClaimsMapping
  • Email-based admin management: New adminEmails list replaces adminPrincipals for clearer admin specification
  • Domain restrictions: New allowedEmailDomains for restricting authentication to specific email domains
  • Bot email configuration: New botDomain for system-created bot email addresses
  • Backward compatibility: All deprecated configs continue to work with deprecation warnings logged at startup

Configuration Changes

New fields in authenticationConfiguration:

  • emailClaim (default: "email" for OIDC/SAML, "mail" for LDAP)
  • displayNameClaim (default: "name" for OIDC/SAML, "displayName" for LDAP)

New fields in authorizerConfiguration:

  • adminEmails - List of admin user email addresses
  • allowedEmailDomains - Restrict authentication to specific domains
  • botDomain - Domain for system bot emails (default: "openmetadata.org")

Deprecated (still functional with warnings):

  • jwtPrincipalClaims → use emailClaim
  • jwtPrincipalClaimsMapping → use emailClaim + displayNameClaim
  • adminPrincipals → use adminEmails
  • principalDomain → use botDomain for bots, allowedEmailDomains for user restrictions

Username Generation

Usernames are auto-generated from email addresses:

  • john.doe@company.comjohn.doe
  • On collision, append random suffix: john.doe_x7k2

Test plan

  • Unit tests for SecurityUtil email extraction and domain validation (28 tests)
  • Unit tests for UserUtil username generation, admin check, bot email (21 tests)
  • All tests pass
  • Code formatted with mvn spotless:apply
  • Manual testing with OIDC provider
  • Manual testing with SAML provider
  • Manual testing with LDAP provider

mohityadav766 and others added 23 commits January 26, 2026 15:54
Document the design for simplifying OpenMetadata's authentication
system to use email as the primary user identifier. This includes:
- New configuration schema with emailClaim, displayNameClaim
- New authorizer config with adminEmails, allowedEmailDomains, botDomain
- Deprecation strategy for old configs (jwtPrincipalClaims, etc.)
- Authentication flow with email-first resolution
- Name generation with collision handling

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Detailed TDD implementation plan with 16 tasks covering:
- Configuration schema changes (emailClaim, displayNameClaim, adminEmails, etc.)
- Utility methods for email extraction and validation
- Updates to all auth providers (OIDC, SAML, LDAP, Basic)
- Username generation with collision handling
- Deprecation warnings for old config options
- Test coverage for all new functionality

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add simplified configuration for email-first identity resolution:
- emailClaim: JWT claim for user email (default: 'email')
- displayNameClaim: JWT claim for display name (default: 'name')
- Mark jwtPrincipalClaims and jwtPrincipalClaimsMapping as deprecated

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add simplified authorizer configuration:
- adminEmails: email-based admin list (replaces adminPrincipals)
- allowedEmailDomains: restrict authentication to specific domains
- botDomain: domain for system-created bot emails
- Mark adminPrincipals and principalDomain as deprecated

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add simplified email extraction that:
- Extracts email from specified claim
- Validates email format
- Lowercases result
- Throws clear errors for missing/invalid claims

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Extract display name from claim with fallback to email prefix
when claim is missing or empty.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Generate unique usernames from email with:
- Email prefix as base username
- Random 4-char suffix on collision
- Lowercase normalization

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Validate email domain against allowedEmailDomains list:
- Empty list allows all domains
- Case-insensitive comparison
- Clear error message for disallowed domains

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Check if email is in adminEmails list:
- Case-insensitive comparison
- Handles null/empty list

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Log warnings at startup when deprecated configs are used:
- jwtPrincipalClaims -> use emailClaim
- jwtPrincipalClaimsMapping -> use emailClaim + displayNameClaim
- adminPrincipals -> use adminEmails
- principalDomain -> use botDomain + allowedEmailDomains

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add email-first authentication flow when emailClaim is configured:
- Extract email from specified claim using extractEmailFromClaim
- Validate against allowedEmailDomains using validateEmailDomain
- Extract display name from claim or email prefix
- Fall back to legacy jwtPrincipalClaims flow for backward compatibility

The new flow is used when:
- emailClaim is configured (not null or empty)
- jwtPrincipalClaimsMapping is NOT configured (empty)
- The user is NOT a bot (bots use legacy flow)

Also adds comprehensive tests for the email-first flow covering:
- Successful extraction with email and display name
- Domain validation success and failure
- Missing email claim handling
- Custom email claim configuration

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Update AuthenticationCodeFlowHandler to:
- Use emailClaim for email extraction
- Validate against allowedEmailDomains
- Generate unique username from email
- Check adminEmails for admin status
- Respect enableSelfSignup setting

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Update SamlAuthServletHandler to:
- Use emailClaim for SAML attribute extraction
- Validate against allowedEmailDomains
- Generate unique username from email
- Check adminEmails for admin status

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Update LdapAuthenticator to:
- Use emailClaim for LDAP attribute (default: 'mail')
- Use displayNameClaim (default: 'displayName')
- Validate against allowedEmailDomains
- Generate unique username from email
- Check adminEmails for admin status

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Create bot emails using configurable botDomain:
- Add createBotEmail(botName, botDomain) utility method
- Add overloaded addOrUpdateBotUser(botName, authConfig) method
- Default: openmetadata.org
- Lowercases bot name and domain

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Support email-based admin user creation:
- Add createOrUpdateAdminUsers method to process adminEmails config
- Fall back to adminPrincipals for backward compatibility
- Create users with generated usernames from email using generateUsernameFromEmail
- Look up existing users by email first, update admin status if needed
- Prevent promoting bot users to admins
- Update UserRepository.initializeUsers to use the new method

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add examples of new simplified configuration:
- emailClaim, displayNameClaim for authentication
- adminEmails, allowedEmailDomains, botDomain for authorization
- Mark deprecated options with comments

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@github-actions github-actions bot added backend safe to test Add this label to run secure Github workflows on PRs labels Jan 26, 2026
@mohityadav766 mohityadav766 changed the title User auth improvements feat(auth): Simplify user identity management with email-first approach Jan 26, 2026
@github-actions
Copy link
Contributor

TypeScript types have been updated based on the JSON schema changes in the PR

@mohityadav766 mohityadav766 requested a review from aji-aju January 26, 2026 19:05
@gitar-bot
Copy link

gitar-bot bot commented Jan 26, 2026

🔍 CI failure analysis for 8cccdcc: SonarCloud CI fails with same Maven tests as postgresql CI (Apps, AWS); duplicate failure pattern

CI Status - SonarCloud Analysis

New Job: maven-sonarcloud-ci (Job 61512116051)

Status: Failed with same Maven test failures as previously analyzed

Test Results: 7878 tests run, 1 failure, 3 errors, 701 skipped

Failed Tests (identical to maven-postgresql-ci):

  1. AppsResourceTest.post_trigger_app_200: Max retries exceeded polling for app trigger
  2. AwsCredentialsUtilTest (3 errors): Credential validation tests throwing IllegalArgumentException

Analysis

This is a duplicate failure pattern - the same Maven integration test failures occurring in a different CI job (SonarCloud analysis).

Why this is the same issue:

  • ✅ Identical test failures: AppsResourceTest + AwsCredentialsUtilTest
  • ✅ Same error messages
  • ✅ Same test count (7878 tests run)
  • ✅ Different CI job but running same test suite

Not related to this PR (as previously established):

  • ✅ No Apps/Application code modified
  • ✅ No AWS credentials code modified
  • ✅ This PR only changes authentication (JwtFilter, SecurityUtil, UserUtil)
  • ✅ Zero overlap with failing test areas

Complete CI Summary

Maven Test Failures (affecting multiple CI jobs):

  • ❌ maven-postgresql-ci: 6 failures (Feed, Apps, AWS)
  • ❌ maven-sonarcloud-ci: 4 failures (Apps, AWS) [subset of postgresql failures]
  • Pattern: Same underlying test issues appearing in multiple CI jobs

Playwright E2E Tests:

  • ❌ Retry completed (attempt 2/2), still failed
  • Max retries reached per Flaky Test Retry rule

Python S3 Tests:

  • ❌ Consistent failure across all runs

Integration Tests:

  • ✅ Fixed by retry (TagRecognizerFeedbackIT)

Key Insights

Maven Test Failures Are Widespread:

  • Same test failures appearing in both postgresql and sonarcloud CI jobs
  • Suggests these are not environment-specific but actual test issues
  • Likely caused by:
    • Recent changes to Apps/Feed/AWS logic merged to main
    • Test assertion problems
    • Test data setup issues

All Failures Remain Unrelated to Authentication Changes:

  • ✅ This PR: Java backend authentication only
  • ❌ Failures: Apps, Feed, AWS, S3, Playwright E2E
  • ✅ Zero code overlap

Conclusion

SonarCloud CI failure is a duplicate of the maven-postgresql-ci failures. The same Maven integration tests are failing across multiple CI jobs, confirming these are persistent test issues unrelated to the authentication changes in this PR.

Recommendation: These CI failures should not block this PR as all failures are in completely separate code areas from the authentication changes.

Code Review 👍 Approved with suggestions 0 resolved / 4 findings

Well-structured authentication refactor with solid backward compatibility, but has a potential race condition in concurrent username generation and minor issues in the JWT filter's email-first flow.

⚠️ Edge Case: Email-first flow uses email prefix as username without collision check

📄 openmetadata-service/src/main/java/org/openmetadata/service/security/JwtFilter.java:217-218

In the JwtFilter's email-first flow, the username is derived directly from the email prefix (email.split("@")[0]) without checking for collisions with existing users. However, in the OIDC, SAML, and LDAP handlers, UserUtil.generateUsernameFromEmail() is used which properly handles collisions with random suffixes.

This could cause issues when authenticating via JWT (non-initial login) where a user with the same email prefix already exists, since the SecurityContext would be set with a potentially non-unique username that doesn't match the actual user's stored username.

Impact: The principal name in the security context might not match the actual stored username for users who had collision handling applied during account creation.

Suggested fix: Consider using the actual stored username from the user lookup instead of re-deriving it from the email prefix. Alternatively, document that this behavior is intentional for the filter context.

⚠️ Bug: Race condition between username existence check and user creation

📄 openmetadata-service/src/main/java/org/openmetadata/service/util/UserUtil.java:96-100

The generateUsernameFromEmail method has a TOCTOU (time-of-check-to-time-of-use) race condition. Between checking existsChecker.test(baseUsername) returns false and the caller creating the user, another concurrent request could create a user with the same username.

This affects concurrent user signups for users with identical email prefixes (e.g., john.doe@company1.com and john.doe@company2.com signing up simultaneously).

Impact: Could result in duplicate key constraint violations under concurrent load.

Suggested fix: Wrap user creation in a retry loop that catches duplicate key exceptions and regenerates the username with a new suffix.

💡 Quality: Use equalsIgnoreCase instead of toLowerCase().equals()

📄 openmetadata-service/src/main/java/org/openmetadata/service/security/SecurityUtil.java:2105

In validateEmailDomain, the domain comparison uses d.toLowerCase().equals(domain) which creates a new String object for each comparison in the stream.

boolean allowed = allowedEmailDomains.stream().anyMatch(d -> d.toLowerCase().equals(domain));

Impact: Minor performance overhead due to unnecessary String allocations.

Suggested fix: Use equalsIgnoreCase:

boolean allowed = allowedEmailDomains.stream().anyMatch(d -> d.equalsIgnoreCase(domain));

The same pattern appears in UserUtil.isAdminEmail at line 3023:

.anyMatch(adminEmail -> adminEmail.toLowerCase().equals(normalizedEmail));
💡 Edge Case: Removed retry delay in LDAP connection attempts

📄 openmetadata-service/src/main/java/org/openmetadata/service/security/auth/LdapAuthenticator.java:2296-2299

The refactored getLdapUserInfo method removed the baseDelayMs variable and the exponential backoff delay between retry attempts:

-  private String getUserDnFromLdap(String email) {
-    // Retry configuration (will be made configurable in future)
+  private LdapUserInfo getLdapUserInfo(String email) {
     final int maxRetries = 3;
-    final int baseDelayMs = 500;

Without a delay between retries, the code will immediately retry failed LDAP connections, which could overwhelm a recovering LDAP server or fail faster than necessary without giving the server time to recover.

Impact: Reduced resilience during LDAP server transient failures.

Suggested fix: Reintroduce the delay with exponential backoff between retry attempts.

Tip

Comment Gitar fix CI or enable auto-apply: gitar auto-apply:on

Options

Auto-apply is off → Gitar will not commit updates to this branch.
Display: compact → Showing less information.

Comment with these commands to change:

Auto-apply Compact
gitar auto-apply:on         
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Comment on lines +217 to +218
validateEmailDomain(email, allowedEmailDomains);
String displayName = extractDisplayNameFromClaim(claims, displayNameClaim, email);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Edge Case: Email-first flow uses email prefix as username without collision check

Details

In the JwtFilter's email-first flow, the username is derived directly from the email prefix (email.split("@")[0]) without checking for collisions with existing users. However, in the OIDC, SAML, and LDAP handlers, UserUtil.generateUsernameFromEmail() is used which properly handles collisions with random suffixes.

This could cause issues when authenticating via JWT (non-initial login) where a user with the same email prefix already exists, since the SecurityContext would be set with a potentially non-unique username that doesn't match the actual user's stored username.

Impact: The principal name in the security context might not match the actual stored username for users who had collision handling applied during account creation.

Suggested fix: Consider using the actual stored username from the user lookup instead of re-deriving it from the email prefix. Alternatively, document that this behavior is intentional for the filter context.


Was this helpful? React with 👍 / 👎

Comment on lines +96 to +100
throw new IllegalArgumentException("ExistsChecker predicate cannot be null");
}

String baseUsername = email.toLowerCase().split("@")[0];

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Bug: Race condition between username existence check and user creation

Details

The generateUsernameFromEmail method has a TOCTOU (time-of-check-to-time-of-use) race condition. Between checking existsChecker.test(baseUsername) returns false and the caller creating the user, another concurrent request could create a user with the same username.

This affects concurrent user signups for users with identical email prefixes (e.g., john.doe@company1.com and john.doe@company2.com signing up simultaneously).

Impact: Could result in duplicate key constraint violations under concurrent load.

Suggested fix: Wrap user creation in a retry loop that catches duplicate key exceptions and regenerates the username with a new suffix.


Was this helpful? React with 👍 / 👎

@github-actions
Copy link
Contributor

Jest test Coverage

UI tests summary

Lines Statements Branches Functions
Coverage: 65%
65.8% (55627/84538) 44.83% (28695/64002) 47.68% (8738/18325)

@sonarqubecloud
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Simplify User Identity Management: Just Tell us your Email

2 participants