Skip to content

Conversation

@LucasG0
Copy link
Contributor

@LucasG0 LucasG0 commented Mar 7, 2025

Improve node upsert performances by going from a check-if-exist-then-create-or-update to a try-create-else-update behavior. This avoids the extra cost of checking whether the node already exists in the database. Some modifications around NodeGroupedUniquenessConstraint now differentiate hfid violation than the other ones, as violating hfid uniqueness constraint would mean the node already exists in the database.

Note that the upsert mutation does not benefit from this performance boost if:

  • schema defines default_filter
  • id is in payload -> we currently do not support (at least officiallly) creating a node with a given id, so it does not matter
  • hfid is in payload -> currently, hfid is sent while we upsert sdk side. I believe we don't need it: if we attempt to upsert, it means we may want to create, and thus all mandatory fields required for the node creation should be in the payload, and thus hfid is redondant information. I have a sdk PR here: Remove hfid in upsert payload infrahub-sdk-python#312 to test that.

Regarding performances, the current benchmarks do not work properly yet, so they should be ignored (they are also failing here: #6109). I tested locally to create 5000 nodes on non-main branch with this fix versus release-1.2 and performance boost is 12% (76s vs 86s). Based on previous real scenarios observations (cf a slack discussion) I believe this may vary up to 15-20% depending on database state / schemas.

@github-actions github-actions bot added the group/backend Issue related to the backend (API Server, Git Agent) label Mar 7, 2025
@codspeed-hq
Copy link

codspeed-hq bot commented Mar 7, 2025

CodSpeed Performance Report

Merging #5966 will degrade performances by 20.7%

Comparing lgu-improve-upsert (1f13406) with release-1.2 (2c7ef4b)

Summary

❌ 1 regressions
✅ 9 untouched benchmarks
🆕 4 new benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark BASE HEAD Change
🆕 test_create_nodes_batch[branch2-100-False] N/A 6.7 s N/A
🆕 test_create_nodes_batch[branch2-100-True] N/A 6.7 s N/A
🆕 test_create_nodes_batch[main-100-False] N/A 6.9 s N/A
🆕 test_create_nodes_batch[main-100-True] N/A 6.9 s N/A
test_get_schema 236.8 ms 298.6 ms -20.7%

@LucasG0 LucasG0 force-pushed the lgu-improve-upsert branch 5 times, most recently from fdf3acd to f653f07 Compare March 12, 2025 13:55
@LucasG0 LucasG0 force-pushed the lgu-improve-upsert branch 2 times, most recently from 98a5e4d to 5914a60 Compare March 18, 2025 18:02
@github-actions github-actions bot added type/documentation Improvements or additions to documentation group/frontend Issue related to the frontend (React) group/ci Issue related to the CI pipeline labels Mar 18, 2025
@LucasG0 LucasG0 changed the base branch from stable to release-1.2 March 18, 2025 18:03
@LucasG0 LucasG0 force-pushed the lgu-improve-upsert branch from 5914a60 to ccaee6d Compare March 19, 2025 09:27
@github-actions github-actions bot removed type/documentation Improvements or additions to documentation group/frontend Issue related to the frontend (React) group/ci Issue related to the CI pipeline labels Mar 19, 2025
@LucasG0 LucasG0 force-pushed the lgu-improve-upsert branch 13 times, most recently from 0227a88 to 7ec17d7 Compare March 20, 2025 16:37
@LucasG0 LucasG0 force-pushed the lgu-improve-upsert branch 3 times, most recently from ca98d9d to 2eb8923 Compare March 21, 2025 09:16
@github-actions github-actions bot added the group/ci Issue related to the CI pipeline label Mar 21, 2025
@LucasG0 LucasG0 changed the title WIP: Improve node upsert performances Improve node upsert performances Mar 21, 2025
@LucasG0 LucasG0 marked this pull request as ready for review March 21, 2025 11:27
@LucasG0 LucasG0 requested a review from a team as a code owner March 21, 2025 11:27
@LucasG0 LucasG0 force-pushed the lgu-improve-upsert branch from 40c6ad2 to ea953d1 Compare March 21, 2025 16:14
@LucasG0 LucasG0 requested review from a team as code owners March 21, 2025 16:14
@LucasG0 LucasG0 changed the base branch from release-1.2 to stable March 21, 2025 16:14
for constraint_group in schema_constraint_path_groups:
for schema_attribute_path in constraint_group:
for uniqueness_constraint_path in uniqueness_constraint_paths:
for schema_attribute_path in uniqueness_constraint_path.attributes_paths:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think that we generally leave migrations alone once they are merged, but not sure if others agree with that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think code imported by the migration might change as well, so migration behavior might be slightly different anyway

if uniqueness_constraint == hfid:
return UniquenessConstraintType.HFID
if all(path in hfid for path in uniqueness_constraint):
return UniquenessConstraintType.SUBSET_OF_HFID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have thought this was not possible.
I thought that with your latest changes to how we build the HFID that the uniqueness_constraints always contain the HFID. do you have an example of when a uniqueness constraint is a subset of the HFID?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the uniqueness_constraints always contain the HFID.

Yes, but nothing prevents from having another uniqueness constraint that is a subset of the hfid. For instance the schema {"uniqueness_constraints": [["name__value"]], "human_friendly_id": ["name__value, color__value"]} will contain 2 uniqueness constraints after schema processing: [["name__value", "name__value, color__value"]]. While checking the name__value one, we would hit above condition

class UniquenessConstraintViolation:
nodes_ids: set[str]
fields: list[str]
typ: UniquenessConstraintType
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is out of scope for this PR, but it would be more SOLID to move all of this logic for uniqueness constraints and parsing schema paths into their own separate module(s) instead of putting it all on the BaseNodeSchema
I believe I tried to do this at some point, but did not succeed for a reason that I do not recall

# Currently support for creating a node with a given id is not supported,
# so this will raise an error if node id does not exist in db.
# Note that upserting with an `id` in the payload should not happen then, as it would mean the node already
# exists, in which case we should update client side instead of upsert.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments should only explain what this code is doing, anything referencing how the client should call this mutation belongs in a separate issue

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe comments should more explain why over what, it is important when reading this code to know why we need to handle id here and that this code could be probably removed. Maybe what you suggest is to create a dedicated sdk issue, and to link this issue here? But imo we need this information here, it would have saved me some time if I had them in the first place

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if a change is required on the SDK, then this should be an issue on the SDK
this code should not be concerned with how the SDK builds its GraphQL requests
the first two lines of the comment are just describing the behavior of NodeManager.get_one(..., raise_on_error=True), which is unnecessary
the final two lines should be covered by an issue on the SDK

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right the first two lines are unnecessary, and I definitely agree we need dedicated issues for clients
What I mean is that here it's worth to indicate that we need this path handling id because of how clients currently use this API, otherwise we could be tempted to remove this code

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you're worried about someone mistakenly thinking this code is unnecessary and deleting it, then I'd say add a unit test that covers it and include the comments there

if "hfid" in data:
# Node might already exist or not. Note that if it exists, an extra query is performed
# thus client should avoid specifying `hfid` as input.
# It is supposed to be pointless as payload already contains fields composing hfid (mandatory fields while creating).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here. comments should only reference this code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same response than above

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this also should just be an issue on the SDK

@LucasG0 LucasG0 force-pushed the lgu-improve-upsert branch from ea953d1 to 1c89f5e Compare March 24, 2025 09:45
@LucasG0 LucasG0 requested a review from ajtmccarty March 24, 2025 13:04
@LucasG0 LucasG0 force-pushed the lgu-improve-upsert branch from 1c89f5e to b1ef45c Compare March 24, 2025 14:16
if violation.typ == UniquenessConstraintType.HFID:
error_msg = f"Violates uniqueness constraint '{'-'.join(violation.fields)}'"
errors = [ValidationError({field_name: error_msg}) for field_name in violation.fields]
raise HFIDViolatedError(errors, matching_nodes_ids=violation.nodes_ids)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add unit tests for this, probably in TestNodeGroupedUniquenessConstraint
and graphql-level tests for upserting using an HFID would be good, but they might already exist

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some unit tests, we already have multiple ones for GraphQL upserts.

# Currently support for creating a node with a given id is not supported,
# so this will raise an error if node id does not exist in db.
# Note that upserting with an `id` in the payload should not happen then, as it would mean the node already
# exists, in which case we should update client side instead of upsert.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if a change is required on the SDK, then this should be an issue on the SDK
this code should not be concerned with how the SDK builds its GraphQL requests
the first two lines of the comment are just describing the behavior of NodeManager.get_one(..., raise_on_error=True), which is unnecessary
the final two lines should be covered by an issue on the SDK

if "hfid" in data:
# Node might already exist or not. Note that if it exists, an extra query is performed
# thus client should avoid specifying `hfid` as input.
# It is supposed to be pointless as payload already contains fields composing hfid (mandatory fields while creating).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this also should just be an issue on the SDK

@LucasG0 LucasG0 force-pushed the lgu-improve-upsert branch 3 times, most recently from 163479b to 6923eb9 Compare March 25, 2025 13:57
Copy link
Contributor

@ajtmccarty ajtmccarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this all looks good. only outstanding thing it the comments that I've commented on

# Currently support for creating a node with a given id is not supported,
# so this will raise an error if node id does not exist in db.
# Note that upserting with an `id` in the payload should not happen then, as it would mean the node already
# exists, in which case we should update client side instead of upsert.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you're worried about someone mistakenly thinking this code is unnecessary and deleting it, then I'd say add a unit test that covers it and include the comments there

@LucasG0 LucasG0 force-pushed the lgu-improve-upsert branch from 6923eb9 to e294000 Compare March 26, 2025 10:08
Copy link
Collaborator

@dgarros dgarros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I think we are just missing a newsfragment for the release note

@LucasG0 LucasG0 merged commit 4450b4a into stable Mar 26, 2025
33 of 35 checks passed
@LucasG0 LucasG0 deleted the lgu-improve-upsert branch March 26, 2025 11:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/run-intensive-benchmarks Run intensive CodSpeed benchmarks, usually excluded. group/backend Issue related to the backend (API Server, Git Agent) group/ci Issue related to the CI pipeline

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants