Skip to content

perf: Fix N+1 query problems when creating users with group membership#23103

Draft
jason-p-pickering wants to merge 3 commits intomasterfrom
perf-user-creation
Draft

perf: Fix N+1 query problems when creating users with group membership#23103
jason-p-pickering wants to merge 3 commits intomasterfrom
perf-user-creation

Conversation

@jason-p-pickering
Copy link
Contributor

@jason-p-pickering jason-p-pickering commented Feb 27, 2026

Fix N+1 query problems when creating users with group membership

Background

Creating a new user in a large DHIS2 instance was significantly slower than expected due to two independent N+1 query problems, both triggered within the same request. This PR addresses both. Parts of the PR have been extracted from a previous attempt to fix this problem, but after review, it was decided to add performance tests for each step of the process (creation, update, deletion). This PR addresses the first part, namely creation of new users.

Problem 1: SchemaToDataFetcher loads all users for uniqueness checking

During preheat (which runs before every metadata import, including API-driven user creation), SchemaToDataFetcher.fetch() was issuing an unbounded query to load all records of a given type in order to check for uniqueness conflicts:

SELECT username, code FROM userinfo -- returns ALL 221K users

This is unnecessary because uniqueness only needs to be checked against the specific values being imported. The fix scopes the query to only records whose unique property values match what is being imported:

SELECT username, code FROM userinfo WHERE username IN (:usernameValues)

For a database with 221K users, this eliminates a 221K-row full table scan on every user creation request.

Not only does it load the full table for users, but the preheat also collects referenced types transitively.

With the old fetch(schema):

  • For every class in that traversal (whether you're importing 0 or 1000 of them), it fires SELECT unique_fields FROM entity - loading every row from that table
  • CategoryCombo has name as a unique property → full categorycombo table scan
  • CategoryOption has name unique → full scan
  • And so on

With the new fetch(schema, objectsBeingImported):

  • For CategoryCombo, objectsBeingImported is [] (we are not importing any category combos) → returns List.of() immediately, no query
  • For User, objectsBeingImported is [theUserBeingCreated] → queries only WHERE username IN ('bobbytables') OR code IN ('BOBBYTABLES)

This eliminates full-table scans for classes which are not actually being imported and it filters the query down to only relevant rows for classes are being imported. The fact that CategoryCombo appears at all in a user import is a consequence of how
deeply the preheat traverses the object graph - it's being defensive about uniqueness checking across the whole schema. However, this is entirely unnecessary since there is no uniqueness constraint on category combos with respect to users. Glowroot traces showed that trace entities were reduced from 106 to 5 during the import of a single user.

Problem 2: Adding a user to a group loads the entire group membership

DefaultUserGroupService.addUserToGroups was using Hibernate to add users to groups by calling userGroup.addUser(user) followed by userGroupStore.updateNoAcl(userGroup). Because UserGroup.members is the owning side of the usergroupmembers join table, Hibernate had to load the entire members collection before it could detect what changed and write the join table, even when only adding a single user.

For a group with 42K members this results in a 42K-row query per group the new user is assigned to.

The fix adds a direct SQL INSERT to HibernateUserGroupStore that writes to usergroupmembers without loading the members collection, along with targeted cache eviction for both User.groups and the UserGroup entity.

Performance results (Gatling, 3 iterations, group with ~42K members)

Metric Master This branch Improvement
Min response time 1,436ms 225ms 6.4× faster
Mean response time 2,009ms 397ms 5.1× faster
p95 response time 2,844ms 621ms 4.6× faster
Throughput 0.43 rps 1.5 rps 3.5×

On master, 100% of requests exceeded 1.2 seconds. On this branch, 100% completed under 800ms. The test was conducted on a copy of the database where this issue was originally observed.

Changes

  • SchemaToDataFetcher: scope uniqueness queries to values being imported instead of full table scan
  • UserGroupStore / HibernateUserGroupStore: add addMember() SQL bypass and updateLastUpdated() with cache eviction
  • UserGroupService / DefaultUserGroupService: add canAddOrRemoveMember(UserGroup, UserDetails) overload (avoids double-fetching the group); wire addUserToGroups through the new SQL path
  • UserCreationPerformanceTest: new focused Gatling simulation for benchmarking user creation with group assignment

Notes

  • User deletion has a related N+1 (UserRoleDeletionHandler, UserGroupDeletionHandler) that is not addressed here and will be covered in a separate PR
  • The addMember SQL bypass manually evicts the Hibernate L2 cache for User.groups and UserGroup.members. In a clustered deployment, other nodes will not receive a cache invalidation signal for these entries until TTL expires. This is a known limitation documented in code comments and will need to be addressed separately.

@jason-p-pickering jason-p-pickering requested review from a team as code owners February 27, 2026 15:25
@jason-p-pickering jason-p-pickering marked this pull request as draft February 27, 2026 15:29
@jason-p-pickering jason-p-pickering added run-api-tests This label will trigger an api-test job for the PR. run-perf-tests Enables performance tests labels Feb 27, 2026
@sonarqubecloud
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

run-api-tests This label will trigger an api-test job for the PR. run-perf-tests Enables performance tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant