Skip to content

fix(bots): skip bot upsert when nothing changed to stop team-strip + reindex loop on boot#28128

Draft
joaopamaral wants to merge 1 commit into
open-metadata:mainfrom
Automattic:bot-team-strip-fix
Draft

fix(bots): skip bot upsert when nothing changed to stop team-strip + reindex loop on boot#28128
joaopamaral wants to merge 1 commit into
open-metadata:mainfrom
Automattic:bot-team-strip-fix

Conversation

@joaopamaral
Copy link
Copy Markdown

Summary

BotResource.initialize() runs UserUtil.addOrUpdateBotUser(user) for every bot on every OM boot. The in-memory User built by UserUtil.user(...) does not have the teams field populated, so the PUT path through userRepository.createOrUpdate -> UserUpdater.entitySpecificUpdate runs updateTeams(original, updated) with original.teams = [Organization] (or the bot's real stored teams) and updated.teams = null. updateTeams then executes:

deleteTo(original.getId(), USER, Relationship.HAS, Entity.TEAM);
assignTeams(updated, updated.getTeams()); // updated.getTeams() == null

…which strips every stored team membership the bot had, bumps the user version, and triggers an Elasticsearch reindex of both the user and each affected team. With several bots this fires on every restart and produces the reindex storm we observed in production logs:

storeNewVersion called for entity: user <bot-id>,
  changeDescription=fieldsDeleted=[
    FieldChange[name=teams,
                oldValue=[{"id":"...","name":"Organization",...}],
                newValue=<null>]]

Repeated for profiler-bot, governance-bot, usage-bot, ingestion-bot, … each one taking ~100–200 ms plus a team-side reindex. We also saw Circular dependency detected in team hierarchy for team: Organization. Skipping to prevent StackOverflowError. from SubjectContext during the same window — the boot-time team churn was tripping the cycle guard.

In a real environment with several bots this added ~3 minutes to every boot.

Fix

Short-circuit addOrUpdateBotUser when the incoming bot has no real change vs. the persisted row: compare description, displayName, and roles. If they all match, return the original user and skip the PUT entirely — no UserUpdater, no team strip, no version bump, no reindex.

Two small adjustments to make the guard actually fire:

  • retrieveWithAuthMechanism now also loads \"roles\" (was loading only \"authenticationMechanism\"). description and displayName are scalar JSON-column fields and were already populated by the base read.
  • Compare roles via listOrEmpty(...) on both sides because the database-loaded original returns an empty list while the freshly built in-memory user returns null, and Objects.equals(null, []) is false.

The first call still hits the existing code path (no originalUser -> guard skipped), so seeding new bots is unchanged.

Reproducer

  1. Boot OpenMetadata. The standard set of bots (ingestion-bot, profiler-bot, governance-bot, usage-bot, …) is upserted by BotResource.initialize().
  2. As an admin, assign one of those bots to a real (non-Organization) team.
  3. Restart OpenMetadata.
  4. Observed (before this fix): the bot is no longer in that team, only Organization. Boot logs show fieldsDeleted=[name=teams, oldValue=[...], newValue=null] for every bot, each followed by user + team ES reindex log lines.
  5. After this fix: the bot still belongs to the assigned team after the restart. None of those fieldsDeleted=[teams] log lines appear. Boot time drops accordingly (~3 minutes saved in production).

Test plan

Added openmetadata-service/src/test/java/org/openmetadata/service/util/UserUtilBotTest with two Mockito unit tests:

  • addOrUpdateBotUserShortCircuitsWhenNothingChanged — mocks UserRepository, stubs getByName to return a stored bot whose roles/description/displayName match the incoming user, calls addOrUpdateBotUser(boundary), asserts the return is the same instance as the stored user, and verifies userRepository.createOrUpdate(...) was never called. Verified to fail before the fix (test reaches UserUtil.addOrUpdateUser and explodes on the unstubbed createOrUpdate); passes after the fix.
  • addOrUpdateBotUserGoesThroughUpsertWhenDisplayNameChanged — same setup but with mismatching displayName; verifies userRepository.createOrUpdate(...) is called, so we don't accidentally short-circuit on real changes.

Local manual verification: spun the fix into the 1.12.7 backport branch we run in production, restarted, observed no fieldsDeleted=[teams] log entries for the bot users and no follow-up bot/team reindex log lines. Boot duration dropped by ~3 minutes.

Opening as draft for maintainer feedback on:

  • whether the comparison set (roles, description, displayName) is acceptable or you'd prefer a broader/narrower field check;
  • the chosen test style (Mockito unit test) vs. a different pattern you'd rather see.

🤖 Generated with Claude Code

…reindex loop on boot

`BotResource.initialize()` runs `UserUtil.addOrUpdateBotUser(user)` for
every bot on every OM boot. The in-memory `User` built by
`UserUtil.user(...)` does not have the `teams` field populated, so the
PUT path through `userRepository.createOrUpdate ->
UserUpdater.entitySpecificUpdate` runs `updateTeams(original, updated)`
with `original.teams = [Organization]` (or the bot's real stored teams)
and `updated.teams = null`. `updateTeams` then executes
`deleteTo(user, HAS, TEAM) + assignTeams(null)`, which strips every
stored team membership the bot had, bumps the user version, and triggers
an Elasticsearch reindex of both the user and each affected team. With
several bots this fires on every restart and produces the reindex storm
plus "Circular dependency detected in team hierarchy for team:
Organization" warnings in the boot logs. In one production deployment
this added almost 3 minutes to every boot.

Short-circuit when the incoming bot has no real change vs. the persisted
row: compare `description`, `displayName`, and `roles`. If they all
match, return the original user and skip the PUT entirely — no
`UserUpdater`, no team strip, no version bump, no reindex.

Two adjustments to make the guard actually fire:
- `retrieveWithAuthMechanism` now also loads `"roles"` (was loading only
  `"authenticationMechanism"`); `description` and `displayName` are
  scalar JSON-column fields and were already populated by the base read.
- Compare `roles` via `listOrEmpty(...)` on both sides because the
  database-loaded original returns an empty list while the freshly built
  in-memory user returns null, and `Objects.equals(null, [])` is false.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Comment on lines +66 to +67
new org.openmetadata.schema.entity.teams.AuthenticationMechanism()
.withAuthType(org.openmetadata.schema.entity.teams.AuthenticationMechanism.AuthType.JWT));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Quality: Fully qualified class names used instead of imports

The test uses fully qualified names org.openmetadata.schema.entity.teams.AuthenticationMechanism inline (lines 66-67 and 108-109) instead of importing the class. Per project conventions, wildcard imports and fully qualified names should be avoided — add a proper import statement instead.

Import the class and use short names:

// Add to imports section:
import org.openmetadata.schema.entity.teams.AuthenticationMechanism;
import org.openmetadata.schema.entity.teams.AuthenticationMechanism.AuthType;

// Then replace inline usages with:
.withAuthenticationMechanism(
    new AuthenticationMechanism()
        .withAuthType(AuthType.JWT));
  • Apply fix

Check the box to apply the fix or reply for a change | Was this helpful? React with 👍 / 👎

Comment on lines +125 to +135
try {
User result = UserUtil.addOrUpdateBotUser(incoming);
assertNotEquals(
stored,
result,
"When fields differ the upsert path must run and produce a different User");
} catch (RuntimeException ignored) {
// The downstream createOrUpdate call may throw against the mock; the assertion we
// care about is that the short-circuit guard did NOT fire, which we verify below.
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Quality: Swallowed RuntimeException uses flow-control exception pattern

The test at line 131 catches RuntimeException ignored to handle the case where the mock doesn't fully stub the downstream path. This is a flow-control exception anti-pattern and makes the test fragile — if addOrUpdateBotUser throws for an unexpected reason, the test silently passes. Consider stubbing createOrUpdate to return a value (or use doNothing()/doReturn(...)) so the method completes normally, and assert on the result instead.

Stub the mock properly instead of catching RuntimeException:

// Stub createOrUpdate to return the incoming user so the method completes normally:
when(userRepository.createOrUpdate(any(), any(User.class), any())).thenReturn(incoming);

try (MockedStatic<Entity> entityStatic = mockStatic(Entity.class)) {
  entityStatic.when(() -> Entity.getEntityRepository(Entity.USER)).thenReturn(userRepository);
  User result = UserUtil.addOrUpdateBotUser(incoming);
  assertNotEquals(
      stored,
      result,
      "When fields differ the upsert path must run and produce a different User");
}
  • Apply fix

Check the box to apply the fix or reply for a change | Was this helpful? React with 👍 / 👎

Comment on lines +140 to +145
@SuppressWarnings("unused")
private static List<EntityReference> roleRef(String name) {
List<EntityReference> refs = new ArrayList<>();
refs.add(new EntityReference().withId(UUID.randomUUID()).withName(name).withType("role"));
return refs;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Quality: Unused private helper method roleRef should be removed

The private method roleRef at lines 140-145 is annotated with @SuppressWarnings("unused") and is never called. Commented-out or dead code should be removed per project conventions. If it's intended for future tests, add it when those tests are written.

Was this helpful? React with 👍 / 👎

Comment on lines +342 to +346
if (originalUser != null
&& Objects.equals(listOrEmpty(originalUser.getRoles()), listOrEmpty(user.getRoles()))
&& Objects.equals(originalUser.getDescription(), user.getDescription())
&& Objects.equals(originalUser.getDisplayName(), user.getDisplayName())) {
return originalUser;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Edge Case: Short-circuit doesn't compare email field changes

The guard at line 342-346 compares roles, description, and displayName but not email. The UserUtil.user(...) method (which builds the in-memory bot user) may set an email based on domain. If an admin changes the domain configuration between restarts, the email update would be silently skipped by the short-circuit. Consider whether email should be included in the comparison, or document why it's excluded.

Was this helpful? React with 👍 / 👎

@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented May 14, 2026

Code Review ⚠️ Changes requested 0 resolved / 4 findings

Prevents redundant bot upserts during boot by adding a comparison check for roles, description, and display name. The change is currently blocked by a missing email field comparison in the guard and several minor cleanup issues in the unit test.

⚠️ Edge Case: Short-circuit doesn't compare email field changes

📄 openmetadata-service/src/main/java/org/openmetadata/service/util/UserUtil.java:342-346

The guard at line 342-346 compares roles, description, and displayName but not email. The UserUtil.user(...) method (which builds the in-memory bot user) may set an email based on domain. If an admin changes the domain configuration between restarts, the email update would be silently skipped by the short-circuit. Consider whether email should be included in the comparison, or document why it's excluded.

💡 Quality: Fully qualified class names used instead of imports

📄 openmetadata-service/src/test/java/org/openmetadata/service/util/UserUtilBotTest.java:66-67 📄 openmetadata-service/src/test/java/org/openmetadata/service/util/UserUtilBotTest.java:108-109

The test uses fully qualified names org.openmetadata.schema.entity.teams.AuthenticationMechanism inline (lines 66-67 and 108-109) instead of importing the class. Per project conventions, wildcard imports and fully qualified names should be avoided — add a proper import statement instead.

Import the class and use short names
// Add to imports section:
import org.openmetadata.schema.entity.teams.AuthenticationMechanism;
import org.openmetadata.schema.entity.teams.AuthenticationMechanism.AuthType;

// Then replace inline usages with:
.withAuthenticationMechanism(
    new AuthenticationMechanism()
        .withAuthType(AuthType.JWT));
💡 Quality: Swallowed RuntimeException uses flow-control exception pattern

📄 openmetadata-service/src/test/java/org/openmetadata/service/util/UserUtilBotTest.java:125-135

The test at line 131 catches RuntimeException ignored to handle the case where the mock doesn't fully stub the downstream path. This is a flow-control exception anti-pattern and makes the test fragile — if addOrUpdateBotUser throws for an unexpected reason, the test silently passes. Consider stubbing createOrUpdate to return a value (or use doNothing()/doReturn(...)) so the method completes normally, and assert on the result instead.

Stub the mock properly instead of catching RuntimeException
// Stub createOrUpdate to return the incoming user so the method completes normally:
when(userRepository.createOrUpdate(any(), any(User.class), any())).thenReturn(incoming);

try (MockedStatic<Entity> entityStatic = mockStatic(Entity.class)) {
  entityStatic.when(() -> Entity.getEntityRepository(Entity.USER)).thenReturn(userRepository);
  User result = UserUtil.addOrUpdateBotUser(incoming);
  assertNotEquals(
      stored,
      result,
      "When fields differ the upsert path must run and produce a different User");
}
💡 Quality: Unused private helper method roleRef should be removed

📄 openmetadata-service/src/test/java/org/openmetadata/service/util/UserUtilBotTest.java:140-145

The private method roleRef at lines 140-145 is annotated with @SuppressWarnings("unused") and is never called. Commented-out or dead code should be removed per project conventions. If it's intended for future tests, add it when those tests are written.

🤖 Prompt for agents
Code Review: Prevents redundant bot upserts during boot by adding a comparison check for roles, description, and display name. The change is currently blocked by a missing email field comparison in the guard and several minor cleanup issues in the unit test.

1. 💡 Quality: Fully qualified class names used instead of imports
   Files: openmetadata-service/src/test/java/org/openmetadata/service/util/UserUtilBotTest.java:66-67, openmetadata-service/src/test/java/org/openmetadata/service/util/UserUtilBotTest.java:108-109

   The test uses fully qualified names `org.openmetadata.schema.entity.teams.AuthenticationMechanism` inline (lines 66-67 and 108-109) instead of importing the class. Per project conventions, wildcard imports and fully qualified names should be avoided — add a proper import statement instead.

   Fix (Import the class and use short names):
   // Add to imports section:
   import org.openmetadata.schema.entity.teams.AuthenticationMechanism;
   import org.openmetadata.schema.entity.teams.AuthenticationMechanism.AuthType;
   
   // Then replace inline usages with:
   .withAuthenticationMechanism(
       new AuthenticationMechanism()
           .withAuthType(AuthType.JWT));

2. 💡 Quality: Swallowed RuntimeException uses flow-control exception pattern
   Files: openmetadata-service/src/test/java/org/openmetadata/service/util/UserUtilBotTest.java:125-135

   The test at line 131 catches `RuntimeException ignored` to handle the case where the mock doesn't fully stub the downstream path. This is a flow-control exception anti-pattern and makes the test fragile — if `addOrUpdateBotUser` throws for an unexpected reason, the test silently passes. Consider stubbing `createOrUpdate` to return a value (or use `doNothing()`/`doReturn(...)`) so the method completes normally, and assert on the result instead.

   Fix (Stub the mock properly instead of catching RuntimeException):
   // Stub createOrUpdate to return the incoming user so the method completes normally:
   when(userRepository.createOrUpdate(any(), any(User.class), any())).thenReturn(incoming);
   
   try (MockedStatic<Entity> entityStatic = mockStatic(Entity.class)) {
     entityStatic.when(() -> Entity.getEntityRepository(Entity.USER)).thenReturn(userRepository);
     User result = UserUtil.addOrUpdateBotUser(incoming);
     assertNotEquals(
         stored,
         result,
         "When fields differ the upsert path must run and produce a different User");
   }

3. 💡 Quality: Unused private helper method `roleRef` should be removed
   Files: openmetadata-service/src/test/java/org/openmetadata/service/util/UserUtilBotTest.java:140-145

   The private method `roleRef` at lines 140-145 is annotated with `@SuppressWarnings("unused")` and is never called. Commented-out or dead code should be removed per project conventions. If it's intended for future tests, add it when those tests are written.

4. ⚠️ Edge Case: Short-circuit doesn't compare `email` field changes
   Files: openmetadata-service/src/main/java/org/openmetadata/service/util/UserUtil.java:342-346

   The guard at line 342-346 compares `roles`, `description`, and `displayName` but not `email`. The `UserUtil.user(...)` method (which builds the in-memory bot user) may set an email based on `domain`. If an admin changes the domain configuration between restarts, the email update would be silently skipped by the short-circuit. Consider whether `email` should be included in the comparison, or document why it's excluded.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant