fix(bots): skip bot upsert when nothing changed to stop team-strip + reindex loop on boot by joaopamaral · Pull Request #28128 · open-metadata/OpenMetadata

joaopamaral · 2026-05-14T20:21:20Z

Summary

BotResource.initialize() runs UserUtil.addOrUpdateBotUser(user) for every bot on every OM boot. The in-memory User built by UserUtil.user(...) does not have the teams field populated, so the PUT path through userRepository.createOrUpdate -> UserUpdater.entitySpecificUpdate runs updateTeams(original, updated) with original.teams = [Organization] (or the bot's real stored teams) and updated.teams = null. updateTeams then executes:

deleteTo(original.getId(), USER, Relationship.HAS, Entity.TEAM);
assignTeams(updated, updated.getTeams()); // updated.getTeams() == null

…which strips every stored team membership the bot had, bumps the user version, and triggers an Elasticsearch reindex of both the user and each affected team. With several bots this fires on every restart and produces the reindex storm we observed in production logs:

storeNewVersion called for entity: user <bot-id>,
  changeDescription=fieldsDeleted=[
    FieldChange[name=teams,
                oldValue=[{"id":"...","name":"Organization",...}],
                newValue=<null>]]

Repeated for profiler-bot, governance-bot, usage-bot, ingestion-bot, … each one taking ~100–200 ms plus a team-side reindex. We also saw Circular dependency detected in team hierarchy for team: Organization. Skipping to prevent StackOverflowError. from SubjectContext during the same window — the boot-time team churn was tripping the cycle guard.

In a real environment with several bots this added ~3 minutes to every boot.

Fix

Short-circuit addOrUpdateBotUser when the incoming bot has no real change vs. the persisted row: compare description, displayName, and roles. If they all match, return the original user and skip the PUT entirely — no UserUpdater, no team strip, no version bump, no reindex.

Two small adjustments to make the guard actually fire:

retrieveWithAuthMechanism now also loads \"roles\" (was loading only \"authenticationMechanism\"). description and displayName are scalar JSON-column fields and were already populated by the base read.
Compare roles via listOrEmpty(...) on both sides because the database-loaded original returns an empty list while the freshly built in-memory user returns null, and Objects.equals(null, []) is false.

The first call still hits the existing code path (no originalUser -> guard skipped), so seeding new bots is unchanged.

Reproducer

Boot OpenMetadata. The standard set of bots (ingestion-bot, profiler-bot, governance-bot, usage-bot, …) is upserted by BotResource.initialize().
As an admin, assign one of those bots to a real (non-Organization) team.
Restart OpenMetadata.
Observed (before this fix): the bot is no longer in that team, only Organization. Boot logs show fieldsDeleted=[name=teams, oldValue=[...], newValue=null] for every bot, each followed by user + team ES reindex log lines.
After this fix: the bot still belongs to the assigned team after the restart. None of those fieldsDeleted=[teams] log lines appear. Boot time drops accordingly (~3 minutes saved in production).

Test plan

Added openmetadata-service/src/test/java/org/openmetadata/service/util/UserUtilBotTest with two Mockito unit tests:

addOrUpdateBotUserShortCircuitsWhenNothingChanged — mocks UserRepository, stubs getByName to return a stored bot whose roles/description/displayName match the incoming user, calls addOrUpdateBotUser(boundary), asserts the return is the same instance as the stored user, and verifies userRepository.createOrUpdate(...) was never called. Verified to fail before the fix (test reaches UserUtil.addOrUpdateUser and explodes on the unstubbed createOrUpdate); passes after the fix.
addOrUpdateBotUserGoesThroughUpsertWhenDisplayNameChanged — same setup but with mismatching displayName; verifies userRepository.createOrUpdate(...) is called, so we don't accidentally short-circuit on real changes.

Local manual verification: spun the fix into the 1.12.7 backport branch we run in production, restarted, observed no fieldsDeleted=[teams] log entries for the bot users and no follow-up bot/team reindex log lines. Boot duration dropped by ~3 minutes.

Opening as draft for maintainer feedback on:

whether the comparison set (roles, description, displayName) is acceptable or you'd prefer a broader/narrower field check;
the chosen test style (Mockito unit test) vs. a different pattern you'd rather see.

🤖 Generated with Claude Code

…reindex loop on boot `BotResource.initialize()` runs `UserUtil.addOrUpdateBotUser(user)` for every bot on every OM boot. The in-memory `User` built by `UserUtil.user(...)` does not have the `teams` field populated, so the PUT path through `userRepository.createOrUpdate -> UserUpdater.entitySpecificUpdate` runs `updateTeams(original, updated)` with `original.teams = [Organization]` (or the bot's real stored teams) and `updated.teams = null`. `updateTeams` then executes `deleteTo(user, HAS, TEAM) + assignTeams(null)`, which strips every stored team membership the bot had, bumps the user version, and triggers an Elasticsearch reindex of both the user and each affected team. With several bots this fires on every restart and produces the reindex storm plus "Circular dependency detected in team hierarchy for team: Organization" warnings in the boot logs. In one production deployment this added almost 3 minutes to every boot. Short-circuit when the incoming bot has no real change vs. the persisted row: compare `description`, `displayName`, and `roles`. If they all match, return the original user and skip the PUT entirely — no `UserUpdater`, no team strip, no version bump, no reindex. Two adjustments to make the guard actually fire: - `retrieveWithAuthMechanism` now also loads `"roles"` (was loading only `"authenticationMechanism"`); `description` and `displayName` are scalar JSON-column fields and were already populated by the base read. - Compare `roles` via `listOrEmpty(...)` on both sides because the database-loaded original returns an empty list while the freshly built in-memory user returns null, and `Objects.equals(null, [])` is false. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-14T20:21:45Z

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

gitar-bot · 2026-05-14T20:23:43Z

+                new org.openmetadata.schema.entity.teams.AuthenticationMechanism()
+                    .withAuthType(org.openmetadata.schema.entity.teams.AuthenticationMechanism.AuthType.JWT));


💡 Quality: Fully qualified class names used instead of imports

The test uses fully qualified names org.openmetadata.schema.entity.teams.AuthenticationMechanism inline (lines 66-67 and 108-109) instead of importing the class. Per project conventions, wildcard imports and fully qualified names should be avoided — add a proper import statement instead.

Import the class and use short names:

// Add to imports section: import org.openmetadata.schema.entity.teams.AuthenticationMechanism; import org.openmetadata.schema.entity.teams.AuthenticationMechanism.AuthType; // Then replace inline usages with: .withAuthenticationMechanism( new AuthenticationMechanism() .withAuthType(AuthType.JWT));

Apply fix

_{Check the box to apply the fix or reply for a change | Was this helpful? React with 👍 / 👎}

gitar-bot · 2026-05-14T20:23:45Z

+      try {
+        User result = UserUtil.addOrUpdateBotUser(incoming);
+        assertNotEquals(
+            stored,
+            result,
+            "When fields differ the upsert path must run and produce a different User");
+      } catch (RuntimeException ignored) {
+        // The downstream createOrUpdate call may throw against the mock; the assertion we
+        // care about is that the short-circuit guard did NOT fire, which we verify below.
+      }
+    }


💡 Quality: Swallowed RuntimeException uses flow-control exception pattern

The test at line 131 catches RuntimeException ignored to handle the case where the mock doesn't fully stub the downstream path. This is a flow-control exception anti-pattern and makes the test fragile — if addOrUpdateBotUser throws for an unexpected reason, the test silently passes. Consider stubbing createOrUpdate to return a value (or use doNothing()/doReturn(...)) so the method completes normally, and assert on the result instead.

Stub the mock properly instead of catching RuntimeException:

// Stub createOrUpdate to return the incoming user so the method completes normally: when(userRepository.createOrUpdate(any(), any(User.class), any())).thenReturn(incoming); try (MockedStatic<Entity> entityStatic = mockStatic(Entity.class)) { entityStatic.when(() -> Entity.getEntityRepository(Entity.USER)).thenReturn(userRepository); User result = UserUtil.addOrUpdateBotUser(incoming); assertNotEquals( stored, result, "When fields differ the upsert path must run and produce a different User"); }

Apply fix

_{Check the box to apply the fix or reply for a change | Was this helpful? React with 👍 / 👎}

gitar-bot · 2026-05-14T20:23:47Z

+  @SuppressWarnings("unused")
+  private static List<EntityReference> roleRef(String name) {
+    List<EntityReference> refs = new ArrayList<>();
+    refs.add(new EntityReference().withId(UUID.randomUUID()).withName(name).withType("role"));
+    return refs;
+  }


💡 Quality: Unused private helper method roleRef should be removed

The private method roleRef at lines 140-145 is annotated with @SuppressWarnings("unused") and is never called. Commented-out or dead code should be removed per project conventions. If it's intended for future tests, add it when those tests are written.

_{Was this helpful? React with 👍 / 👎}

gitar-bot · 2026-05-14T20:23:49Z

+    if (originalUser != null
+        && Objects.equals(listOrEmpty(originalUser.getRoles()), listOrEmpty(user.getRoles()))
+        && Objects.equals(originalUser.getDescription(), user.getDescription())
+        && Objects.equals(originalUser.getDisplayName(), user.getDisplayName())) {
+      return originalUser;


⚠️ Edge Case: Short-circuit doesn't compare email field changes

The guard at line 342-346 compares roles, description, and displayName but not email. The UserUtil.user(...) method (which builds the in-memory bot user) may set an email based on domain. If an admin changes the domain configuration between restarts, the email update would be silently skipped by the short-circuit. Consider whether email should be included in the comparison, or document why it's excluded.

_{Was this helpful? React with 👍 / 👎}

gitar-bot · 2026-05-14T20:23:49Z

Code Review ⚠️ Changes requested 0 resolved / 4 findings

Prevents redundant bot upserts during boot by adding a comparison check for roles, description, and display name. The change is currently blocked by a missing email field comparison in the guard and several minor cleanup issues in the unit test.

⚠️

Edge Case: Short-circuit doesn't compare email field changes

📄 openmetadata-service/src/main/java/org/openmetadata/service/util/UserUtil.java:342-346

The guard at line 342-346 compares roles, description, and displayName but not email. The UserUtil.user(...) method (which builds the in-memory bot user) may set an email based on domain. If an admin changes the domain configuration between restarts, the email update would be silently skipped by the short-circuit. Consider whether email should be included in the comparison, or document why it's excluded.

💡 Quality: Fully qualified class names used instead of imports

📄 openmetadata-service/src/test/java/org/openmetadata/service/util/UserUtilBotTest.java:66-67 📄 openmetadata-service/src/test/java/org/openmetadata/service/util/UserUtilBotTest.java:108-109

The test uses fully qualified names org.openmetadata.schema.entity.teams.AuthenticationMechanism inline (lines 66-67 and 108-109) instead of importing the class. Per project conventions, wildcard imports and fully qualified names should be avoided — add a proper import statement instead.

Import the class and use short names

// Add to imports section:
import org.openmetadata.schema.entity.teams.AuthenticationMechanism;
import org.openmetadata.schema.entity.teams.AuthenticationMechanism.AuthType;

// Then replace inline usages with:
.withAuthenticationMechanism(
    new AuthenticationMechanism()
        .withAuthType(AuthType.JWT));

💡 Quality: Swallowed RuntimeException uses flow-control exception pattern

📄 openmetadata-service/src/test/java/org/openmetadata/service/util/UserUtilBotTest.java:125-135

The test at line 131 catches RuntimeException ignored to handle the case where the mock doesn't fully stub the downstream path. This is a flow-control exception anti-pattern and makes the test fragile — if addOrUpdateBotUser throws for an unexpected reason, the test silently passes. Consider stubbing createOrUpdate to return a value (or use doNothing()/doReturn(...)) so the method completes normally, and assert on the result instead.

Stub the mock properly instead of catching RuntimeException

// Stub createOrUpdate to return the incoming user so the method completes normally:
when(userRepository.createOrUpdate(any(), any(User.class), any())).thenReturn(incoming);

try (MockedStatic<Entity> entityStatic = mockStatic(Entity.class)) {
  entityStatic.when(() -> Entity.getEntityRepository(Entity.USER)).thenReturn(userRepository);
  User result = UserUtil.addOrUpdateBotUser(incoming);
  assertNotEquals(
      stored,
      result,
      "When fields differ the upsert path must run and produce a different User");
}

💡 Quality: Unused private helper method roleRef should be removed

📄 openmetadata-service/src/test/java/org/openmetadata/service/util/UserUtilBotTest.java:140-145

The private method roleRef at lines 140-145 is annotated with @SuppressWarnings("unused") and is never called. Commented-out or dead code should be removed per project conventions. If it's intended for future tests, add it when those tests are written.

🤖 Prompt for agents

Code Review: Prevents redundant bot upserts during boot by adding a comparison check for roles, description, and display name. The change is currently blocked by a missing email field comparison in the guard and several minor cleanup issues in the unit test.

1. 💡 Quality: Fully qualified class names used instead of imports
   Files: openmetadata-service/src/test/java/org/openmetadata/service/util/UserUtilBotTest.java:66-67, openmetadata-service/src/test/java/org/openmetadata/service/util/UserUtilBotTest.java:108-109

   The test uses fully qualified names `org.openmetadata.schema.entity.teams.AuthenticationMechanism` inline (lines 66-67 and 108-109) instead of importing the class. Per project conventions, wildcard imports and fully qualified names should be avoided — add a proper import statement instead.

   Fix (Import the class and use short names):
   // Add to imports section:
   import org.openmetadata.schema.entity.teams.AuthenticationMechanism;
   import org.openmetadata.schema.entity.teams.AuthenticationMechanism.AuthType;
   
   // Then replace inline usages with:
   .withAuthenticationMechanism(
       new AuthenticationMechanism()
           .withAuthType(AuthType.JWT));

2. 💡 Quality: Swallowed RuntimeException uses flow-control exception pattern
   Files: openmetadata-service/src/test/java/org/openmetadata/service/util/UserUtilBotTest.java:125-135

   The test at line 131 catches `RuntimeException ignored` to handle the case where the mock doesn't fully stub the downstream path. This is a flow-control exception anti-pattern and makes the test fragile — if `addOrUpdateBotUser` throws for an unexpected reason, the test silently passes. Consider stubbing `createOrUpdate` to return a value (or use `doNothing()`/`doReturn(...)`) so the method completes normally, and assert on the result instead.

   Fix (Stub the mock properly instead of catching RuntimeException):
   // Stub createOrUpdate to return the incoming user so the method completes normally:
   when(userRepository.createOrUpdate(any(), any(User.class), any())).thenReturn(incoming);
   
   try (MockedStatic<Entity> entityStatic = mockStatic(Entity.class)) {
     entityStatic.when(() -> Entity.getEntityRepository(Entity.USER)).thenReturn(userRepository);
     User result = UserUtil.addOrUpdateBotUser(incoming);
     assertNotEquals(
         stored,
         result,
         "When fields differ the upsert path must run and produce a different User");
   }

3. 💡 Quality: Unused private helper method `roleRef` should be removed
   Files: openmetadata-service/src/test/java/org/openmetadata/service/util/UserUtilBotTest.java:140-145

   The private method `roleRef` at lines 140-145 is annotated with `@SuppressWarnings("unused")` and is never called. Commented-out or dead code should be removed per project conventions. If it's intended for future tests, add it when those tests are written.

4. ⚠️ Edge Case: Short-circuit doesn't compare `email` field changes
   Files: openmetadata-service/src/main/java/org/openmetadata/service/util/UserUtil.java:342-346

   The guard at line 342-346 compares `roles`, `description`, and `displayName` but not `email`. The `UserUtil.user(...)` method (which builds the in-memory bot user) may set an email based on `domain`. If an admin changes the domain configuration between restarts, the email update would be silently skipped by the short-circuit. Consider whether `email` should be included in the comparison, or document why it's excluded.

Options

Display: compact → Showing less information.

Comment with these commands to change:

`Compact`
`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | Gitar}

gitar-bot Bot reviewed May 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(bots): skip bot upsert when nothing changed to stop team-strip + reindex loop on boot#28128

fix(bots): skip bot upsert when nothing changed to stop team-strip + reindex loop on boot#28128
joaopamaral wants to merge 1 commit into
open-metadata:mainfrom
Automattic:bot-team-strip-fix

joaopamaral commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

gitar-bot Bot May 14, 2026

Uh oh!

gitar-bot Bot May 14, 2026

Uh oh!

gitar-bot Bot May 14, 2026

Uh oh!

gitar-bot Bot May 14, 2026

Uh oh!

gitar-bot Bot commented May 14, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		new org.openmetadata.schema.entity.teams.AuthenticationMechanism()
		.withAuthType(org.openmetadata.schema.entity.teams.AuthenticationMechanism.AuthType.JWT));

Conversation

joaopamaral commented May 14, 2026

Summary

Fix

Reproducer

Test plan

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

gitar-bot Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

gitar-bot Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

gitar-bot Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

gitar-bot Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

gitar-bot Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

gitar-bot Bot commented May 14, 2026 •

edited

Loading