Improve node upsert performances #5966

LucasG0 · 2025-03-07T19:06:03Z

Improve node upsert performances by going from a check-if-exist-then-create-or-update to a try-create-else-update behavior. This avoids the extra cost of checking whether the node already exists in the database. Some modifications around NodeGroupedUniquenessConstraint now differentiate hfid violation than the other ones, as violating hfid uniqueness constraint would mean the node already exists in the database.

Note that the upsert mutation does not benefit from this performance boost if:

schema defines default_filter
id is in payload -> we currently do not support (at least officiallly) creating a node with a given id, so it does not matter
hfid is in payload -> currently, hfid is sent while we upsert sdk side. I believe we don't need it: if we attempt to upsert, it means we may want to create, and thus all mandatory fields required for the node creation should be in the payload, and thus hfid is redondant information. I have a sdk PR here: Remove hfid in upsert payload infrahub-sdk-python#312 to test that.

Regarding performances, the current benchmarks do not work properly yet, so they should be ignored (they are also failing here: #6109). I tested locally to create 5000 nodes on non-main branch with this fix versus release-1.2 and performance boost is 12% (76s vs 86s). Based on previous real scenarios observations (cf a slack discussion) I believe this may vary up to 15-20% depending on database state / schemas.

codspeed-hq · 2025-03-07T19:12:14Z

CodSpeed Performance Report

Merging #5966 will degrade performances by 20.7%

_{Comparing lgu-improve-upsert (1f13406) with release-1.2 (2c7ef4b)}

Summary

❌ 1 regressions
✅ 9 untouched benchmarks
🆕 4 new benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

	Benchmark	`BASE`	`HEAD`	Change
🆕	`test_create_nodes_batch[branch2-100-False]`	N/A	6.7 s	N/A
🆕	`test_create_nodes_batch[branch2-100-True]`	N/A	6.7 s	N/A
🆕	`test_create_nodes_batch[main-100-False]`	N/A	6.9 s	N/A
🆕	`test_create_nodes_batch[main-100-True]`	N/A	6.9 s	N/A
❌	`test_get_schema`	236.8 ms	298.6 ms	-20.7%

backend/infrahub/core/constraint/node/runner.py

ajtmccarty · 2025-03-21T19:23:26Z

backend/infrahub/core/migrations/graph/m018_uniqueness_nulls.py

-            for constraint_group in schema_constraint_path_groups:
-                for schema_attribute_path in constraint_group:
+            for uniqueness_constraint_path in uniqueness_constraint_paths:
+                for schema_attribute_path in uniqueness_constraint_path.attributes_paths:


I would think that we generally leave migrations alone once they are merged, but not sure if others agree with that

I think code imported by the migration might change as well, so migration behavior might be slightly different anyway

backend/infrahub/core/schema/basenode_schema.py

ajtmccarty · 2025-03-21T20:17:09Z

backend/infrahub/core/schema/basenode_schema.py

+        if uniqueness_constraint == hfid:
+            return UniquenessConstraintType.HFID
+        if all(path in hfid for path in uniqueness_constraint):
+            return UniquenessConstraintType.SUBSET_OF_HFID


I would have thought this was not possible.
I thought that with your latest changes to how we build the HFID that the uniqueness_constraints always contain the HFID. do you have an example of when a uniqueness constraint is a subset of the HFID?

the uniqueness_constraints always contain the HFID.

Yes, but nothing prevents from having another uniqueness constraint that is a subset of the hfid. For instance the schema {"uniqueness_constraints": [["name__value"]], "human_friendly_id": ["name__value, color__value"]} will contain 2 uniqueness constraints after schema processing: [["name__value", "name__value, color__value"]]. While checking the name__value one, we would hit above condition

ajtmccarty · 2025-03-21T20:23:40Z

backend/infrahub/core/schema/basenode_schema.py

+class UniquenessConstraintViolation:
+    nodes_ids: set[str]
+    fields: list[str]
+    typ: UniquenessConstraintType


this is out of scope for this PR, but it would be more SOLID to move all of this logic for uniqueness constraints and parsing schema paths into their own separate module(s) instead of putting it all on the BaseNodeSchema
I believe I tried to do this at some point, but did not succeed for a reason that I do not recall

ajtmccarty · 2025-03-21T20:41:08Z

backend/infrahub/graphql/mutations/main.py

+            # Currently support for creating a node with a given id is not supported,
+            # so this will raise an error if node id does not exist in db.
+            # Note that upserting with an `id` in the payload should not happen then, as it would mean the node already
+            # exists, in which case we should update client side instead of upsert.


comments should only explain what this code is doing, anything referencing how the client should call this mutation belongs in a separate issue

I believe comments should more explain why over what, it is important when reading this code to know why we need to handle id here and that this code could be probably removed. Maybe what you suggest is to create a dedicated sdk issue, and to link this issue here? But imo we need this information here, it would have saved me some time if I had them in the first place

if a change is required on the SDK, then this should be an issue on the SDK
this code should not be concerned with how the SDK builds its GraphQL requests
the first two lines of the comment are just describing the behavior of NodeManager.get_one(..., raise_on_error=True), which is unnecessary
the final two lines should be covered by an issue on the SDK

Right the first two lines are unnecessary, and I definitely agree we need dedicated issues for clients
What I mean is that here it's worth to indicate that we need this path handling id because of how clients currently use this API, otherwise we could be tempted to remove this code

if you're worried about someone mistakenly thinking this code is unnecessary and deleting it, then I'd say add a unit test that covers it and include the comments there

ajtmccarty · 2025-03-21T20:43:46Z

backend/infrahub/graphql/mutations/main.py

+        if "hfid" in data:
+            # Node might already exist or not. Note that if it exists, an extra query is performed
+            # thus client should avoid specifying `hfid` as input.
+            # It is supposed to be pointless as payload already contains fields composing hfid (mandatory fields while creating).


same here. comments should only reference this code

Same response than above

this also should just be an issue on the SDK

backend/infrahub/graphql/mutations/main.py

backend/infrahub/exceptions.py

ajtmccarty · 2025-03-24T16:15:25Z

backend/infrahub/core/node/constraints/grouped_uniqueness.py

+            if violation.typ == UniquenessConstraintType.HFID:
+                error_msg = f"Violates uniqueness constraint '{'-'.join(violation.fields)}'"
+                errors = [ValidationError({field_name: error_msg}) for field_name in violation.fields]
+                raise HFIDViolatedError(errors, matching_nodes_ids=violation.nodes_ids)


please add unit tests for this, probably in TestNodeGroupedUniquenessConstraint
and graphql-level tests for upserting using an HFID would be good, but they might already exist

I added some unit tests, we already have multiple ones for GraphQL upserts.

ajtmccarty · 2025-03-24T16:24:22Z

backend/infrahub/graphql/mutations/main.py

+            # Currently support for creating a node with a given id is not supported,
+            # so this will raise an error if node id does not exist in db.
+            # Note that upserting with an `id` in the payload should not happen then, as it would mean the node already
+            # exists, in which case we should update client side instead of upsert.


if a change is required on the SDK, then this should be an issue on the SDK
this code should not be concerned with how the SDK builds its GraphQL requests
the first two lines of the comment are just describing the behavior of NodeManager.get_one(..., raise_on_error=True), which is unnecessary
the final two lines should be covered by an issue on the SDK

ajtmccarty · 2025-03-24T16:27:43Z

backend/infrahub/graphql/mutations/main.py

+        if "hfid" in data:
+            # Node might already exist or not. Note that if it exists, an extra query is performed
+            # thus client should avoid specifying `hfid` as input.
+            # It is supposed to be pointless as payload already contains fields composing hfid (mandatory fields while creating).


this also should just be an issue on the SDK

ajtmccarty

I think this all looks good. only outstanding thing it the comments that I've commented on

ajtmccarty · 2025-03-25T18:57:49Z

backend/infrahub/graphql/mutations/main.py

+            # Currently support for creating a node with a given id is not supported,
+            # so this will raise an error if node id does not exist in db.
+            # Note that upserting with an `id` in the payload should not happen then, as it would mean the node already
+            # exists, in which case we should update client side instead of upsert.


if you're worried about someone mistakenly thinking this code is unnecessary and deleting it, then I'd say add a unit test that covers it and include the comments there

dgarros

Looks good, I think we are just missing a newsfragment for the release note

github-actions bot added the group/backend Issue related to the backend (API Server, Git Agent) label Mar 7, 2025

LucasG0 force-pushed the lgu-improve-upsert branch 5 times, most recently from fdf3acd to f653f07 Compare March 12, 2025 13:55

LucasG0 force-pushed the lgu-improve-upsert branch 2 times, most recently from 98a5e4d to 5914a60 Compare March 18, 2025 18:02

github-actions bot added type/documentation Improvements or additions to documentation group/frontend Issue related to the frontend (React) group/ci Issue related to the CI pipeline labels Mar 18, 2025

LucasG0 changed the base branch from stable to release-1.2 March 18, 2025 18:03

LucasG0 force-pushed the lgu-improve-upsert branch from 5914a60 to ccaee6d Compare March 19, 2025 09:27

github-actions bot removed type/documentation Improvements or additions to documentation group/frontend Issue related to the frontend (React) group/ci Issue related to the CI pipeline labels Mar 19, 2025

LucasG0 force-pushed the lgu-improve-upsert branch 13 times, most recently from 0227a88 to 7ec17d7 Compare March 20, 2025 16:37

LucasG0 force-pushed the lgu-improve-upsert branch 3 times, most recently from ca98d9d to 2eb8923 Compare March 21, 2025 09:16

github-actions bot added the group/ci Issue related to the CI pipeline label Mar 21, 2025

LucasG0 changed the title ~~WIP: Improve node upsert performances~~ Improve node upsert performances Mar 21, 2025

LucasG0 marked this pull request as ready for review March 21, 2025 11:27

LucasG0 requested a review from a team as a code owner March 21, 2025 11:27

LucasG0 force-pushed the lgu-improve-upsert branch from 40c6ad2 to ea953d1 Compare March 21, 2025 16:14

LucasG0 requested review from a team as code owners March 21, 2025 16:14

LucasG0 changed the base branch from release-1.2 to stable March 21, 2025 16:14

ajtmccarty reviewed Mar 21, 2025

View reviewed changes

LucasG0 force-pushed the lgu-improve-upsert branch from ea953d1 to 1c89f5e Compare March 24, 2025 09:45

LucasG0 requested a review from ajtmccarty March 24, 2025 13:04

LucasG0 force-pushed the lgu-improve-upsert branch from 1c89f5e to b1ef45c Compare March 24, 2025 14:16

ajtmccarty reviewed Mar 24, 2025

View reviewed changes

LucasG0 force-pushed the lgu-improve-upsert branch 3 times, most recently from 163479b to 6923eb9 Compare March 25, 2025 13:57

ajtmccarty reviewed Mar 25, 2025

View reviewed changes

LucasG0 added 4 commits March 26, 2025 10:59

Improve node upsert performances

152381e

minor fixes

3360f10

Add some tests

67a9f99

remove extra comments

e294000

LucasG0 force-pushed the lgu-improve-upsert branch from 6923eb9 to e294000 Compare March 26, 2025 10:08

dgarros approved these changes Mar 26, 2025

View reviewed changes

changelog

871db9d

LucasG0 merged commit 4450b4a into stable Mar 26, 2025
33 of 35 checks passed

LucasG0 deleted the lgu-improve-upsert branch March 26, 2025 11:10

Improve node upsert performances #5966

Improve node upsert performances #5966

Uh oh!

Conversation

LucasG0 commented Mar 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codspeed-hq bot commented Mar 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging #5966 will degrade performances by 20.7%

Summary

Benchmarks breakdown

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ajtmccarty left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dgarros left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

LucasG0 commented Mar 7, 2025 •

edited

Loading

codspeed-hq bot commented Mar 7, 2025 •

edited

Loading