fix(backend): performance issues when generating artifacts #7320

fatih-acar · 2025-10-03T17:38:21Z

Many small changes here:

enable caching of the git repositories by leveraging Prefect's cache policy, I set a hardcoded TTL of 30s to match the Git Sync scheduled job of 1 minute
AI generated commit to improve fetching of artifact targets attributes/rels: we were previously fetching ALL attributes even though we don't need them
batch the artifact generation workflow submits
ensure the schema cache is set for the infrahub sdk client, avoids unnecessary calls to the API server to retrieve the schema on each task

We are now able to generate 1000 artifacts under 2 minutes without errors, whereas it would take more than five minutes with failures.

Summary by CodeRabbit

Release Notes

Performance Improvements
- Artifact generation and proposed change checks now execute faster through optimized caching and reduced GraphQL queries.
Changes
- Query field is deprecated in artifact-related definitions; use query_id instead for future compatibility.

coderabbitai · 2025-10-03T17:39:00Z

Walkthrough

This pull request introduces performance optimizations and API refinements across the artifact and schema management systems. Key changes include: (1) adding query_id fields and deprecating query fields in artifact-related models across multiple modules, (2) implementing TTL-based caching for repository initialization and per-branch SDK schema caching in SchemaManager, (3) introducing batch submission for artifact generation tasks, (4) adding utility functions for efficient prefetching of definition targets with parameterized field resolution, and (5) updating dependencies to include cachetools-async. The changes maintain backward compatibility while refactoring internal data access patterns.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 13.33% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The pull request title "fix(backend): performance issues when generating artifacts" clearly and accurately summarizes the main objective of the changeset. The PR implements multiple performance optimizations including TTL-based caching for repository initialization, schema caching to prevent repeated API calls, batch submission for artifact generation workflows, and new utility functions to reduce unnecessary API queries. The title follows conventional commit format, is concise and specific, and directly reflects the primary purpose without vague language or noise. A reader scanning the project history would immediately understand that this PR addresses performance problems in artifact generation.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fac-fix-artifacts-scale

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codspeed-hq · 2025-10-03T17:44:05Z

CodSpeed Performance Report

Merging #7320 will not alter performance

_{Comparing fac-fix-artifacts-scale (f9bde88) with stable (a874ac3)¹}

Summary

✅ 10 untouched

No successful run was found on stable (59168ed) during the generation of this report, so a874ac3 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report. ↩

backend/infrahub/git/tasks.py

backend/infrahub/core/schema/manager.py

fatih-acar · 2025-10-06T13:39:13Z

There's a follow up PR related to memory usage when generating artifacts: #7327

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

backend/infrahub/core/schema/manager.py (1)
148-149: Consider documenting the side effect in get_schema_branch.

The method now has a side effect of calling set_schema_branch when the branch doesn't exist, which populates the hash mapping. While this ensures consistency, it changes the method from a pure getter to one with side effects.

Consider adding a brief comment explaining this behavior:
 def get_schema_branch(self, name: str) -> SchemaBranch:
     if name in self._branches:
         return self._branches[name]
 
+    # Create and register a new branch schema to ensure hash mapping exists
     self.set_schema_branch(name, schema=SchemaBranch(cache=self._cache, name=name))
     return self._branches[name]
backend/infrahub/artifacts/models.py (1)
28-29: Improve field descriptions and deprecation handling.

Both query and query_id have identical descriptions, which is confusing. Additionally, the deprecation is only indicated via a comment.

Consider these improvements:
-    query: str = Field(..., description="The name of the query to use when collecting data")  # Deprecated
-    query_id: str = Field(..., description="The name of the query to use when collecting data")
+    query: str = Field(..., description="(Deprecated: use query_id) The name of the query to use when collecting data", deprecated=True)
+    query_id: str = Field(..., description="The ID of the query to use when collecting data")
Note: If using Pydantic v2, the deprecated parameter provides formal deprecation warnings. If on Pydantic v1, consider adding the deprecation notice in the description.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b036efb and dd81c80.

⛔ Files ignored due to path filters (1)

poetry.lock is excluded by !**/*.lock

📒 Files selected for processing (12)

backend/infrahub/artifacts/models.py (1 hunks)
backend/infrahub/computed_attribute/tasks.py (1 hunks)
backend/infrahub/core/schema/manager.py (4 hunks)
backend/infrahub/core/schema/schema_branch.py (1 hunks)
backend/infrahub/git/integrator.py (1 hunks)
backend/infrahub/git/models.py (1 hunks)
backend/infrahub/git/repository.py (3 hunks)
backend/infrahub/git/tasks.py (7 hunks)
backend/infrahub/message_bus/types.py (1 hunks)
backend/infrahub/proposed_change/tasks.py (3 hunks)
backend/infrahub/workers/dependencies.py (2 hunks)
pyproject.toml (1 hunks)

🧰 Additional context used

📓 Path-based instructions (4)

backend/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Run backend tests with pytest or via invoke tasks

Files:

backend/infrahub/core/schema/schema_branch.py
backend/infrahub/computed_attribute/tasks.py
backend/infrahub/message_bus/types.py
backend/infrahub/core/schema/manager.py
backend/infrahub/proposed_change/tasks.py
backend/infrahub/git/integrator.py
backend/infrahub/git/models.py
backend/infrahub/artifacts/models.py
backend/infrahub/git/repository.py
backend/infrahub/workers/dependencies.py
backend/infrahub/git/tasks.py

**/*.py

📄 CodeRabbit inference engine (.github/instructions/python-docstring.instructions.md)

**/*.py: Use triple double quotes (""") for all Python docstrings
Write docstrings in Google-style format
Include a brief one-line description at the top of each docstring
Add a detailed description section when additional context is needed
Document function/method parameters under an Args/Parameters section without typing information
Include a Returns section describing the return value
Include a Raises section listing possible exceptions
Provide an Examples section demonstrating usage when helpful

Use ruff and mypy to validate and lint Python files

**/*.py: Use type hints for all Python function parameters and return values
Prefer asynchronous code in Python when feasible
Define asynchronous Python functions with async def
Use await for asynchronous calls in Python
Use Pydantic models instead of standard dataclasses
Use ruff and mypy for linting and type checking

**/*.py: Use type hints for all Python function parameters and return values
Use async/await whenever possible in Python code
Define asynchronous functions with async def
Await asynchronous calls with await
Use Pydantic models instead of standard dataclasses for data modeling
Use triple quotes (""") for all Python docstrings
Write docstrings in Google-style format
Include docstring sections when applicable: one-line summary, optional details, Args (without types), Returns, Raises, Examples
Validate and lint Python with ruff and mypy

Files:

backend/infrahub/core/schema/schema_branch.py
backend/infrahub/computed_attribute/tasks.py
backend/infrahub/message_bus/types.py
backend/infrahub/core/schema/manager.py
backend/infrahub/proposed_change/tasks.py
backend/infrahub/git/integrator.py
backend/infrahub/git/models.py
backend/infrahub/artifacts/models.py
backend/infrahub/git/repository.py
backend/infrahub/workers/dependencies.py
backend/infrahub/git/tasks.py

{pyproject.toml,poetry.lock}

📄 CodeRabbit inference engine (.github/instructions/tooling.instructions.md)

Use Poetry to manage the Python project and its dependencies

Files:

pyproject.toml

pyproject.toml

📄 CodeRabbit inference engine (CLAUDE.md)

Use Poetry to manage the Python project and dependencies

Files:

pyproject.toml

🧬 Code graph analysis (6)

backend/infrahub/core/schema/schema_branch.py (2)

backend/infrahub/core/models.py (2)

nodes (61-63)

generics (66-68)

backend/infrahub/core/schema/__init__.py (1)

get (65-72)

backend/infrahub/computed_attribute/tasks.py (1)

backend/infrahub/trigger/models.py (1)

name (79-80)

backend/infrahub/core/schema/manager.py (1)

backend/infrahub/core/schema/schema_branch.py (3)

get (303-329)

to_dict_api_schema_object (165-171)

get_hash (139-149)

backend/infrahub/git/repository.py (1)

backend/infrahub/message_bus/operations/git/file.py (1)

get (14-35)

backend/infrahub/workers/dependencies.py (1)

backend/infrahub/core/schema/manager.py (2)

get_branches (793-794)

get_sdk_schema_branch (151-159)

backend/infrahub/git/tasks.py (4)

backend/infrahub/types.py (1)

Any (312-319)

backend/infrahub/core/schema/basenode_schema.py (2)

attribute_names (339-340)

relationship_names (343-344)

backend/infrahub/workers/dependencies.py (1)

get_workflow (122-123)

backend/infrahub/services/adapters/workflow/local.py (1)

submit_workflow (33-41)

🔇 Additional comments (18)

backend/infrahub/computed_attribute/tasks.py (1)

106-112: LGTM! Query ID transition looks correct.

The change from transform.query.peer.name.value to transform.query.id aligns with the broader migration to use query IDs instead of query names across the codebase.

backend/infrahub/git/models.py (1)

41-42: LGTM! Proper deprecation pattern.

The addition of query_id while keeping the deprecated query field ensures backward compatibility during the migration period.

backend/infrahub/message_bus/types.py (1)

92-93: LGTM! Consistent deprecation pattern.

The addition of query_id while deprecating query_name follows the same pattern used elsewhere in the codebase, ensuring consistent migration.

backend/infrahub/core/schema/schema_branch.py (1)

165-171: LGTM! API-friendly serialization method.

The new to_dict_api_schema_object method provides an API-focused serialization that returns lists of dictionaries (via model_dump()), complementing the existing to_dict_schema_object which returns schema objects. This aligns with the SDK-backed schema caching introduced in the manager.

backend/infrahub/core/schema/manager.py (3)

46-47: LGTM! LRU cache addresses memory concerns.

The LRU cache with maxsize=10 provides a good balance between performance and memory usage, addressing the concerns raised in previous review discussions about memory consumption with many branches. The hash-based keying ensures efficient cache invalidation when schemas change.

151-159: LGTM! Efficient SDK schema retrieval with hash-based caching.

The method correctly uses the branch hash as the cache key, ensuring that identical schemas across different branches share the same cached SDK representation. The lazy initialization pattern (check cache, construct if missing) is appropriate.

783-784: LGTM! Proper cleanup in purge logic.

The addition correctly removes hash mappings when branches are purged, preventing memory leaks and stale cache entries.

backend/infrahub/workers/dependencies.py (1)

38-44: LGTM! Pre-population of SDK schema cache.

The initialization-time population of the client schema cache ensures that SDK representations are available immediately without triggering expensive on-the-fly conversions. The guard if registry.schema safely handles cases where the registry isn't fully initialized yet.

backend/infrahub/git/repository.py (2)

254-259: LGTM! TTL cache aligns with Git Sync schedule.

The 30-second TTL cache for repository initialization aligns well with the 1-minute Git Sync scheduled job mentioned in the PR description. The cache key correctly includes all parameters that affect repository state (repository_id, name, kind, commit).

272-282: LGTM! Clean separation of caching and Prefect task concerns.

The wrapper pattern correctly separates the cached implementation (_get_initialized_repo) from the Prefect task wrapper (get_initialized_repo), with cache_policy=NONE preventing duplicate caching by Prefect. This ensures caching is handled exclusively by cachetools as intended.

pyproject.toml (1)

79-79: No update required for cachetools-async version
The constraint ^0.0.5 already references the latest published version (0.0.5) on PyPI.

Likely an incorrect or invalid review comment.

backend/infrahub/git/integrator.py (1)

1366-1366: LGTM! Consistent with query_id propagation.

The change from message.query to message.query_id aligns with the broader refactoring to use query identifiers throughout the artifact generation flow.

backend/infrahub/artifacts/models.py (1)

33-33: LGTM! Context field addition is appropriate.

Adding the context field enables better context propagation through the artifact processing pipeline.

backend/infrahub/proposed_change/tasks.py (1)

694-694: LGTM! Consistent query_id propagation throughout the flow.

The changes correctly:

Pass query_id to CheckArtifactCreate (line 694)

Fetch the query node's id in the GraphQL query (line 1249)

Parse and assign the query_id from the response (line 1471)

The implementation is consistent with the broader query_id refactoring across the codebase.

Also applies to: 1249-1249, 1471-1471

backend/infrahub/git/tasks.py (4)

62-78: LGTM! Helper function correctly extracts parameter field names.

The _collect_parameter_first_segments function properly extracts the first segment of parameter keys (e.g., "foo__bar" → "foo") by recursively walking through nested data structures.

81-126: LGTM! Efficient prefetching optimization.

The _prefetch_group_member_nodes function implements a smart optimization by:

Grouping member IDs by kind to batch requests

Filtering attributes and relationships to fetch only required fields

Disabling ordering for better performance

Using the SDK's batch execution for efficiency

This should significantly reduce both the number of API calls and the amount of data transferred when processing artifact definitions with many targets.

396-409: LGTM! Improved target fetching with prefetching optimization.

The changes improve artifact generation by:

Explicitly fetching the group with members using client.get()

Extracting required parameter fields from the artifact definition

Prefetching member node data upfront to avoid N+1 queries

This should significantly improve performance when generating artifacts for definitions with many targets.

441-480: LGTM! Batched workflow submissions improve throughput.

The changes batch artifact generation workflow submissions, which shifts the bottleneck to the Prefect API servers (as discussed in previous reviews). This enables better scalability by allowing the Prefect infrastructure to be scaled independently.

The addition of query_id on line 462 correctly propagates the query identifier through the artifact generation flow.

When generating a lot of artifacts at the same time (for example when merging a huge branch). Repo initialization will be done for each artifact, concurrently. Avoid useless initialization steps that triggers many graphql queries to fetch and update the repo status by leveraging Prefect caching with a small TTL. Signed-off-by: Fatih Acar <[email protected]>

Only those required by the artifact target parameters. AI Signed-off-by: Fatih Acar <[email protected]>

Signed-off-by: Fatih Acar <[email protected]>

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

backend/infrahub/git/tasks.py (1)
327-397: Use the correct keyword when calling fetch_definition_targets.

fetch_definition_targets expects a definition= argument. Passing artifact_definition= raises TypeError: got an unexpected keyword argument 'artifact_definition' and aborts the flow before any artifacts are queued. Update the call to match the helper’s signature.
-    group = await fetch_definition_targets(client=client, branch=model.branch, artifact_definition=artifact_definition)
+    group = await fetch_definition_targets(client=client, branch=model.branch, definition=artifact_definition)

🧹 Nitpick comments (1)

backend/infrahub/artifacts/models.py (1)
28-29: Fix the query_id description.

The new field still says “The name of the query…”, which contradicts the field’s intent and the deprecation notice above it. Please update the description so it clearly references the query ID.
-    query_id: str = Field(..., description="The name of the query to use when collecting data")
+    query_id: str = Field(..., description="The id of the query to use when collecting data")

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dd81c80 and ee7c390.

⛔ Files ignored due to path filters (1)

poetry.lock is excluded by !**/*.lock

📒 Files selected for processing (14)

backend/infrahub/artifacts/models.py (1 hunks)
backend/infrahub/computed_attribute/tasks.py (1 hunks)
backend/infrahub/core/schema/manager.py (4 hunks)
backend/infrahub/core/schema/schema_branch.py (1 hunks)
backend/infrahub/generators/tasks.py (2 hunks)
backend/infrahub/git/integrator.py (1 hunks)
backend/infrahub/git/models.py (1 hunks)
backend/infrahub/git/repository.py (3 hunks)
backend/infrahub/git/tasks.py (6 hunks)
backend/infrahub/git/utils.py (2 hunks)
backend/infrahub/message_bus/types.py (1 hunks)
backend/infrahub/proposed_change/tasks.py (7 hunks)
backend/infrahub/workers/dependencies.py (2 hunks)
pyproject.toml (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (3)

backend/infrahub/workers/dependencies.py
pyproject.toml
backend/infrahub/computed_attribute/tasks.py

🧰 Additional context used

📓 Path-based instructions (2)

backend/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Run backend tests with pytest or via invoke tasks

Files:

backend/infrahub/message_bus/types.py
backend/infrahub/core/schema/schema_branch.py
backend/infrahub/git/repository.py
backend/infrahub/proposed_change/tasks.py
backend/infrahub/git/integrator.py
backend/infrahub/core/schema/manager.py
backend/infrahub/generators/tasks.py
backend/infrahub/git/tasks.py
backend/infrahub/artifacts/models.py
backend/infrahub/git/utils.py
backend/infrahub/git/models.py

**/*.py

📄 CodeRabbit inference engine (.github/instructions/python-docstring.instructions.md)

**/*.py: Use triple double quotes (""") for all Python docstrings
Write docstrings in Google-style format
Include a brief one-line description at the top of each docstring
Add a detailed description section when additional context is needed
Document function/method parameters under an Args/Parameters section without typing information
Include a Returns section describing the return value
Include a Raises section listing possible exceptions
Provide an Examples section demonstrating usage when helpful

Use ruff and mypy to validate and lint Python files

**/*.py: Use type hints for all Python function parameters and return values
Prefer asynchronous code in Python when feasible
Define asynchronous Python functions with async def
Use await for asynchronous calls in Python
Use Pydantic models instead of standard dataclasses
Use ruff and mypy for linting and type checking

**/*.py: Use type hints for all Python function parameters and return values
Use async/await whenever possible in Python code
Define asynchronous functions with async def
Await asynchronous calls with await
Use Pydantic models instead of standard dataclasses for data modeling
Use triple quotes (""") for all Python docstrings
Write docstrings in Google-style format
Include docstring sections when applicable: one-line summary, optional details, Args (without types), Returns, Raises, Examples
Validate and lint Python with ruff and mypy

Files:

backend/infrahub/message_bus/types.py
backend/infrahub/core/schema/schema_branch.py
backend/infrahub/git/repository.py
backend/infrahub/proposed_change/tasks.py
backend/infrahub/git/integrator.py
backend/infrahub/core/schema/manager.py
backend/infrahub/generators/tasks.py
backend/infrahub/git/tasks.py
backend/infrahub/artifacts/models.py
backend/infrahub/git/utils.py
backend/infrahub/git/models.py

🧬 Code graph analysis (7)

backend/infrahub/core/schema/schema_branch.py (3)

backend/infrahub/core/schema/__init__.py (2)

duplicate (97-99)

get (65-72)

backend/infrahub/core/schema/manager.py (1)

get (69-92)

backend/infrahub/database/__init__.py (1)

get (84-88)

backend/infrahub/git/repository.py (1)

backend/infrahub/message_bus/operations/git/file.py (1)

get (14-35)

backend/infrahub/proposed_change/tasks.py (1)

backend/infrahub/git/utils.py (1)

fetch_definition_targets (123-153)

backend/infrahub/core/schema/manager.py (1)

backend/infrahub/core/schema/schema_branch.py (3)

get (303-329)

to_dict_api_schema_object (165-171)

get_hash (139-149)

backend/infrahub/generators/tasks.py (1)

backend/infrahub/git/utils.py (1)

fetch_definition_targets (123-153)

backend/infrahub/git/tasks.py (3)

backend/infrahub/git/utils.py (1)

fetch_definition_targets (123-153)

backend/infrahub/workers/dependencies.py (1)

get_workflow (122-123)

backend/infrahub/services/adapters/workflow/local.py (1)

submit_workflow (33-41)

backend/infrahub/git/utils.py (3)

backend/infrahub/core/protocols.py (3)

CoreArtifactDefinition (280-287)

CoreCheckDefinition (308-318)

CoreGroup (130-138)

backend/infrahub/generators/models.py (1)

ProposedChangeGeneratorDefinition (34-44)

backend/infrahub/core/schema/basenode_schema.py (2)

attribute_names (339-340)

relationship_names (343-344)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)

GitHub Check: E2E-testing-version-upgrade / From 1.3.0
GitHub Check: validate-generated-documentation
GitHub Check: backend-validate-generated
GitHub Check: backend-tests-integration
GitHub Check: E2E-testing-playwright
GitHub Check: backend-tests-functional
GitHub Check: backend-tests-unit
GitHub Check: backend-benchmark
GitHub Check: E2E-testing-invoke-demo-start
GitHub Check: backend-docker-integration
GitHub Check: Cloudflare Pages

backend/infrahub/generators/tasks.py

coderabbitai · 2025-10-16T21:15:38Z

backend/infrahub/git/utils.py

+        keep_attributes = {field for field in required_fields if field in schema.attribute_names}
+        keep_relationships = {field for field in required_fields if field in schema.relationship_names}
+
+        exclude_attributes = [attr for attr in schema.attribute_names if attr not in keep_attributes]
+        exclude_relationships = [rel for rel in schema.relationship_names if rel not in keep_relationships]
+


⚠️ Potential issue | 🔴 Critical

Call schema name helpers before iterating.

schema.attribute_names / schema.relationship_names are callable helpers, not iterables. Using them without () raises TypeError: argument of type 'method' is not iterable the first time we hit a branch with members. Call the helpers before computing the include/exclude sets.

- keep_attributes = {field for field in required_fields if field in schema.attribute_names} - keep_relationships = {field for field in required_fields if field in schema.relationship_names} + attribute_names = schema.attribute_names() + relationship_names = schema.relationship_names() + keep_attributes = {field for field in required_fields if field in attribute_names} + keep_relationships = {field for field in required_fields if field in relationship_names} @@ - exclude_attributes = [attr for attr in schema.attribute_names if attr not in keep_attributes] - exclude_relationships = [rel for rel in schema.relationship_names if rel not in keep_relationships] + exclude_attributes = [attr for attr in attribute_names if attr not in keep_attributes] + exclude_relationships = [rel for rel in relationship_names if rel not in keep_relationships]

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

keep_attributes = {field for field in required_fields if field in schema.attribute_names}

keep_relationships = {field for field in required_fields if field in schema.relationship_names}

exclude_attributes = [attr for attr in schema.attribute_names if attr not in keep_attributes]

exclude_relationships = [rel for rel in schema.relationship_names if rel not in keep_relationships]

attribute_names = schema.attribute_names()

relationship_names = schema.relationship_names()

keep_attributes = {field for field in required_fields if field in attribute_names}

keep_relationships = {field for field in required_fields if field in relationship_names}

exclude_attributes = [attr for attr in attribute_names if attr not in keep_attributes]

exclude_relationships = [rel for rel in relationship_names if rel not in keep_relationships]

🤖 Prompt for AI Agents

In backend/infrahub/git/utils.py around lines 98 to 103, schema.attribute_names and schema.relationship_names are methods but are being treated as iterables; call these helpers first (e.g., attribute_names = schema.attribute_names() and relationship_names = schema.relationship_names()) and then use those returned iterables when computing keep_attributes, keep_relationships, exclude_attributes, and exclude_relationships so the comprehensions iterate over the actual lists/sets rather than the method objects.

backend/infrahub/git/utils.py

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

backend/infrahub/git/utils.py (1)

99-103: Call attribute_names() and relationship_names() methods.

schema.attribute_names and schema.relationship_names are methods, not properties. Using them without () will raise TypeError: argument of type 'method' is not iterable at runtime.

Based on the relevant code snippet at backend/infrahub/core/schema/basenode_schema.py (lines 338-343), these are methods that return lists.

Apply this diff:

+        attribute_names = schema.attribute_names()
+        relationship_names = schema.relationship_names()
-        keep_attributes = {field for field in required_fields if field in schema.attribute_names}
-        keep_relationships = {field for field in required_fields if field in schema.relationship_names}
+        keep_attributes = {field for field in required_fields if field in attribute_names}
+        keep_relationships = {field for field in required_fields if field in relationship_names}
 
-        exclude_attributes = [attr for attr in schema.attribute_names if attr not in keep_attributes]
-        exclude_relationships = [rel for rel in schema.relationship_names if rel not in keep_relationships]
+        exclude_attributes = [attr for attr in attribute_names if attr not in keep_attributes]
+        exclude_relationships = [rel for rel in relationship_names if rel not in keep_relationships]

🧹 Nitpick comments (1)

backend/infrahub/git/utils.py (1)
132-138: Consider defensive error handling for unexpected definition types.

If an InfrahubNode is passed that doesn't match ARTIFACTDEFINITION or CHECKDEFINITION, group_id and parameters remain uninitialized, causing an UnboundLocalError at line 140.

While the type hint should prevent this at static analysis time, consider adding an explicit error for robustness:
if isinstance(definition, InfrahubNode):
    if definition.typename in [InfrahubKind.ARTIFACTDEFINITION, InfrahubKind.CHECKDEFINITION]:
        group_id = definition.targets.id
        parameters = definition.parameters.value
    else:
        raise ValueError(f"Unsupported InfrahubNode typename: {definition.typename}")
elif isinstance(definition, ProposedChangeGeneratorDefinition):
    group_id = definition.group_id
    parameters = definition.parameters
else:
    raise TypeError(f"Unsupported definition type: {type(definition)}")

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between db50bd5 and 72ca590.

📒 Files selected for processing (4)

backend/infrahub/generators/tasks.py (2 hunks)
backend/infrahub/git/tasks.py (6 hunks)
backend/infrahub/git/utils.py (2 hunks)
backend/infrahub/proposed_change/tasks.py (6 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

backend/infrahub/generators/tasks.py

🧰 Additional context used

📓 Path-based instructions (2)

backend/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Run backend tests with pytest or via invoke tasks

Files:

backend/infrahub/proposed_change/tasks.py
backend/infrahub/git/utils.py
backend/infrahub/git/tasks.py

**/*.py

📄 CodeRabbit inference engine (.github/instructions/python-docstring.instructions.md)

**/*.py: Use triple double quotes (""") for all Python docstrings
Write docstrings in Google-style format
Include a brief one-line description at the top of each docstring
Add a detailed description section when additional context is needed
Document function/method parameters under an Args/Parameters section without typing information
Include a Returns section describing the return value
Include a Raises section listing possible exceptions
Provide an Examples section demonstrating usage when helpful

Use ruff and mypy to validate and lint Python files

**/*.py: Use type hints for all Python function parameters and return values
Prefer asynchronous code in Python when feasible
Define asynchronous Python functions with async def
Use await for asynchronous calls in Python
Use Pydantic models instead of standard dataclasses
Use ruff and mypy for linting and type checking

**/*.py: Use type hints for all Python function parameters and return values
Use async/await whenever possible in Python code
Define asynchronous functions with async def
Await asynchronous calls with await
Use Pydantic models instead of standard dataclasses for data modeling
Use triple quotes (""") for all Python docstrings
Write docstrings in Google-style format
Include docstring sections when applicable: one-line summary, optional details, Args (without types), Returns, Raises, Examples
Validate and lint Python with ruff and mypy

Files:

backend/infrahub/proposed_change/tasks.py
backend/infrahub/git/utils.py
backend/infrahub/git/tasks.py

🧬 Code graph analysis (3)

backend/infrahub/proposed_change/tasks.py (1)

backend/infrahub/git/utils.py (1)

fetch_definition_targets (124-155)

backend/infrahub/git/utils.py (3)

backend/infrahub/core/protocols.py (3)

CoreArtifactDefinition (280-287)

CoreCheckDefinition (308-318)

CoreGroup (130-138)

backend/infrahub/generators/models.py (1)

ProposedChangeGeneratorDefinition (34-44)

backend/infrahub/core/schema/basenode_schema.py (2)

attribute_names (339-340)

relationship_names (343-344)

backend/infrahub/git/tasks.py (2)

backend/infrahub/git/utils.py (1)

fetch_definition_targets (124-155)

backend/infrahub/services/adapters/workflow/local.py (1)

submit_workflow (33-41)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: E2E-testing-version-upgrade / From 1.3.0
GitHub Check: validate-generated-documentation
GitHub Check: E2E-testing-playwright
GitHub Check: E2E-testing-invoke-demo-start
GitHub Check: backend-tests-functional
GitHub Check: backend-benchmark
GitHub Check: backend-tests-integration
GitHub Check: backend-tests-unit
GitHub Check: backend-docker-integration
GitHub Check: Cloudflare Pages

🔇 Additional comments (6)

backend/infrahub/git/tasks.py (3)

327-328: LGTM! Centralized target resolution.

The refactor to use fetch_definition_targets consolidates target group retrieval and enables prefetching of member nodes, reducing API calls and improving performance.

359-398: LGTM! Batch submission improves throughput.

The batch-based workflow submission pattern shifts the bottleneck to Prefect API servers, enabling better scalability. The addition of query_id at line 380 aligns with the model changes to support query ID-based lookups.

580-580: LGTM! Consistent use of centralized target resolution.

backend/infrahub/proposed_change/tasks.py (3)

656-656: LGTM! Consistent use of centralized target resolution.

693-693: LGTM! Complete query_id propagation.

The addition of query_id to CheckArtifactCreate (line 693), the GraphQL query (line 1243), and the parsing logic (line 1465) ensures consistent propagation of query identifiers throughout the artifact generation flow.

Also applies to: 1243-1243, 1465-1465

920-922: LGTM! Centralized target resolution for generators.

coderabbitai · 2025-10-16T22:20:45Z

backend/infrahub/git/utils.py

+def _collect_parameter_first_segments(params: Any) -> set[str]:
+    segments: set[str] = set()
+
+    def _walk(value: Any) -> None:
+        if isinstance(value, str):
+            segment = value.split("__", 1)[0]
+            if segment:
+                segments.add(segment)
+        elif isinstance(value, dict):
+            for nested in value.values():
+                _walk(nested)
+        elif isinstance(value, (list, tuple, set)):
+            for nested in value:
+                _walk(nested)
+
+    _walk(params)
+    return segments


🛠️ Refactor suggestion | 🟠 Major

Add docstrings to new utility functions.

All three new functions (_collect_parameter_first_segments, _prefetch_group_member_nodes, and fetch_definition_targets) lack docstrings.

As per coding guidelines, Python functions should include Google-style docstrings with:

A brief one-line description

An Args section documenting parameters (without types)

A Returns section describing the return value

Example for fetch_definition_targets:

async def fetch_definition_targets( client: InfrahubClient, branch: str, definition: CoreArtifactDefinition | CoreCheckDefinition | ProposedChangeGeneratorDefinition, ) -> CoreGroup: """Fetch target group for a definition with prefetched member nodes. Retrieves the target group associated with the given artifact, check, or generator definition, and prefetches member nodes based on the parameters defined in the definition. Args: client: The Infrahub SDK client for API interactions branch: The branch name to fetch the group from definition: The definition object containing targets and parameters Returns: The target group with prefetched member nodes populated in the client store """

Also applies to: 77-122, 124-155

🤖 Prompt for AI Agents

In backend/infrahub/git/utils.py around lines 58-74, 77-122 and 124-155, the three new functions (_collect_parameter_first_segments, _prefetch_group_member_nodes, and fetch_definition_targets) are missing docstrings; add Google-style docstrings to each function consisting of a one-line summary, an Args section listing parameters (no types), and a Returns section describing the return value (matching existing project style). For fetch_definition_targets follow the provided example (brief summary, purpose, Args: client, branch, definition, and Returns: the target group with prefetched member nodes) and for the other two document their parameters and return sets/structures similarly, ensuring consistency and placement immediately below each def line.

Hey coderabbit, where is this requirement for docstring coming from?

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Look for coding guidelines or contribution documentation echo "=== Looking for coding guidelines documentation ===" fd -t f -e md | xargs grep -l -i "docstring\|coding.*guideline\|style.*guide\|contribution" 2>/dev/null | head -10 echo -e "\n=== Checking for Python linter configurations ===" # Check for ruff, pylint, or other linter configs fd -t f "ruff.toml|pylint|.pylintrc|pyproject.toml" -x cat {} echo -e "\n=== Sampling docstring usage in existing utility functions ===" # Check a few existing Python files in backend/infrahub for docstring patterns fd -e py -t f . backend/infrahub/core backend/infrahub/git | head -5 | xargs -I {} sh -c 'echo "=== File: {} ===" && head -50 {}'

Length of output: 32967

@ogenstad The requirement for docstrings is not enforced by your linter configuration. Looking at your pyproject.toml, I can see that:

ignore = [ "D", # pydocstyle "DOC", # pydoclint ... ]

Both pydocstyle (D) and pydoclint (DOC) rules are explicitly disabled.

However, I noticed that existing functions in the codebase often include docstrings. For example, the get_repositories_commit_per_branch function in the same file (backend/infrahub/git/utils.py) has a Google-style docstring:

async def get_repositories_commit_per_branch( db: InfrahubDatabase, ) -> dict[str, RepositoryData]: """Get a list of all repositories and their commit on each branches. This method is similar to 'get_list_repositories' method in the Python SDK. NOTE: At some point, we should refactor this function to use a single Database query instead of one per branch """

My suggestion was based on this existing pattern in the codebase as a best practice, not a hard requirement from your configuration.

If you'd prefer not to receive docstring suggestions for this project, I can remember that for future reviews.

🧬 Code Graph Analysis Results

File: backend/infrahub/git/utils.py | Lines 1-28

Full snippet:
async def get_repositories_commit_per_branch(
db: InfrahubDatabase,
) -> dict[str, RepositoryData]:
"""Get a list of all repositories and their commit on each branches.

This method is similar to 'get_list_repositories' method in the Python SDK. NOTE: At some point, we should refactor this function to use a single Database query instead of one per branch """ repositories: dict[str, RepositoryData] = {} for branch in list(registry.branch.values()): repos: list[CoreGenericRepository] = await NodeManager.query( db=db, branch=branch, fields={"id": None, "name": None, "commit": None, "internal_status": None}, schema=InfrahubKind.GENERICREPOSITORY, ) for repository in repos: repo_name = repository.name.value if repo_name not in repositories: repositories[repo_name] = RepositoryData( repository_id=repository.get_id(), repository_name=repo_name, branches={}, ) repositories[repo_name].branches[branch.name] = repository.commit.value # type: ignore[attr-defined] repositories[repo_name].branch_info[branch.name] = RepositoryBranchInfo( internal_status=repository.internal_status.value ) return repositories

File: backend/infrahub/core/manager.py | Lines 1-60

Summary:
Docstring for NodeManager.query describes its purpose and arguments. It states that the method:

Queries one or multiple nodes of a given type based on filter arguments.

Accepts parameters including schema (NodeSchema or type), filters, fields, offset, limit, at, branch, include_source, include_owner, prefetch_relationships, account, partial_match, branch_agnostic, and order.

Returns a list of Node objects (or a list of SchemaProtocol objects when the more specific overload is used).

Includes behavior notes such as how branch resolution and HFID handling influence querying, and how results are formed.

Includes a Python-style Args/Returns documentation block within the method docstring.

File: backend/infrahub/core/manager.py | Lines 60-110

Summary:
Docstring for NodeManager.count describes its purpose, parameters, and return value for counting nodes that match filters. It notes:

Accepts db, schema (type[SchemaProtocol] | NodeSchema | GenericSchema | ProfileSchema | TemplateSchema | str), filters, at, branch, account, partial_match, branch_agnostic.

Returns an int representing the number of responses found.

File: backend/infrahub/core/manager.py | Lines 110-170

Summary:
Docstring for NodeManager.query_peers describes its purpose and parameters for querying peer nodes. It notes:

Accepts db, ids, source_kind, schema, filters, fields (optional), offset (optional), limit (optional), at (optional), branch (optional), branch_agnostic, fetch_peers.

Returns a list of Relationship objects (or related node representations) depending on inputs.

Includes behavior about how fields and HFID-related field selection are handled when requested.

ajtmccarty

approving b/c it sounds like you need this, but I think whatever LLM wrote the functions in infrahub.git.utils did not do a good job

backend/infrahub/core/schema/schema_branch.py

backend/infrahub/artifacts/models.py

backend/infrahub/git/utils.py

dgarros

Please can you add a fragment to include something in the release note

dgarros · 2025-10-17T06:46:01Z

backend/infrahub/git/repository.py

-    name="Fetch repository commit",
-    description="Retrieve a git repository at a given commit, if it does not already exist locally",
-    cache_policy=NONE,
+@cached(


Not a blocker for this PR but I'm wondering if we could leverage prefect for the caching ? and what is the limitation if you already tried

It was done using the Prefect caching mechanism in the first iteration, but as described in the following commit: 02b289d

- local caching requires a lot of setup (wrt Blocks, etc) - it's memory intensive due to excessive pickle serialization in our case (we don't need it) Revert to basic local TTL cache.```

Basically Prefect caching is useful for distributed caching but here we want a local, efficient cache.

pyproject.toml

Signed-off-by: Fatih Acar <[email protected]>

Rebuilding the SDK SchemaBranch structure on each initialization is memory intensive and leads to spike of memory usage. Avoid this by introducing a new cache attribute in the SchemaManager. Signed-off-by: Fatih Acar <[email protected]>

Unfortunately we cannot leverage Prefect's caching because: - local caching requires a lot of setup (wrt Blocks, etc) - it's memory intensive due to excessive serialization in our case Revert to basic local TTL cache. Signed-off-by: Fatih Acar <[email protected]>

Small optimization to avoid unnecessary database query to find a graphql query by name. Signed-off-by: Fatih Acar <[email protected]>

Signed-off-by: Fatih Acar <[email protected]>

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)

backend/infrahub/computed_attribute/tasks.py (2)
267-269: Fix typo in comment.
-    # NOTE we only need the id of the nodes, we need to ooptimize the query here
+    # NOTE we only need the id of the nodes; we should optimize the query here
106-112: Based on my comprehensive verification of the codebase and external sources, I can now provide a definitive assessment.

The method query_gql_query() does not exist in the Infrahub SDK or in this codebase—calling it will result in an AttributeError at runtime.

Evidence:

The Python SDK does not expose a method named query_gql_query; use execute_graphql / executeGraphQL to run raw GraphQL

Repository-wide search found zero definitions of query_gql_query (not in backend, not in bundled SDK, not monkeypatched anywhere)

Code calls client.query_gql_query() and self.sdk.query_gql_query() despite the method not existing

python_sdk directory in the repository is empty (0 bytes)

InfrahubClient is imported from the official infrahub_sdk.client package, which does not contain this method

The code at lines 106–112 in backend/infrahub/computed_attribute/tasks.py (and similar calls in backend/infrahub/git/integrator.py) will fail immediately when executed. Replace query_gql_query() with the correct SDK method (likely execute_graphql() based on the web search result) and adjust parameters accordingly.
backend/infrahub/message_bus/types.py (1)
88-96: Avoid breaking change: make deprecated query_name optional and validate “one-of” with query_id.

Both query_name and query_id are required here. That breaks producers still sending only the deprecated field. Make query_name optional; optionally accept query_id or query_name (at least one), and prefer query_id downstream.
 class ProposedChangeArtifactDefinition(BaseModel):
@@
-    query_name: str  # Deprecated
-    query_id: str
+    query_name: str | None = Field(default=None)  # Deprecated
+    query_id: str | None = Field(default=None)
@@
     timeout: int
+
+    @staticmethod
+    def _missing(b: str | None) -> bool:
+        return b is None or b == ""
+
+    @model_validator(mode="after")
+    def _ensure_query_identifier(self) -> "ProposedChangeArtifactDefinition":
+        # Require at least one of query_id or query_name for backward compatibility
+        if self._missing(self.query_id) and self._missing(self.query_name):
+            raise ValueError("Either query_id or query_name must be provided")
+        return self
Add import at top:
-from pydantic import BaseModel, Field
+from pydantic import BaseModel, Field, model_validator
backend/infrahub/git/models.py (1)
41-46: Avoid breaking change: accept query or query_id (prefer query_id) with validator.

Same concern as above: both fields are required. Make the deprecated field optional and allow either; enforce “one-of”.
-from pydantic import BaseModel, ConfigDict, Field
+from pydantic import BaseModel, ConfigDict, Field, model_validator
@@
-    query: str = Field(..., description="The name of the query to use when collecting data")  # Deprecated
-    query_id: str = Field(..., description="The id of the query to use when collecting data")
+    query: str | None = Field(default=None, description="The name of the query to use when collecting data")  # Deprecated
+    query_id: str | None = Field(default=None, description="The id of the query to use when collecting data")
@@
     context: InfrahubContext = Field(..., description="The context of the task")
+
+    @model_validator(mode="after")
+    def _ensure_query_identifier(self) -> "RequestArtifactGenerate":
+        if not self.query_id and not self.query:
+            raise ValueError("Either query_id or query must be provided")
+        return self

♻️ Duplicate comments (1)

backend/infrahub/git/tasks.py (1)

359-399: Verify batching semantics for workflow submissions.

client.create_batch is an SDK facility; passing get_workflow().submit_workflow may not benefit from SDK batching and could be a no-op or even serialize behind one transport. Please verify this supports arbitrary callables and actually runs them concurrently; otherwise prefer asyncio.gather around submit_workflow.

If unsupported, apply:

-    batch = await client.create_batch()
-    ...
-        batch.add(
-            task=get_workflow().submit_workflow,
-            workflow=REQUEST_ARTIFACT_GENERATE,
-            context=context,
-            parameters={"model": request_artifact_generate_model},
-        )
-    async for _, _ in batch.execute():
-        pass
+    import asyncio
+    coros = []
+    for relationship in group.members.peers:
+        ...
+        coros.append(
+            get_workflow().submit_workflow(
+                workflow=REQUEST_ARTIFACT_GENERATE,
+                context=context,
+                parameters={"model": request_artifact_generate_model},
+            )
+        )
+    await asyncio.gather(*coros)

Run to confirm Batch.add accepts non-SDK callables and executes them concurrently:

#!/bin/bash
# Inspect SDK batch API in repo for accepted task types and execution model
rg -nP -C3 'class\s+Batch\b|def\s+create_batch\b|def\s+add\(' --type=py

🧹 Nitpick comments (5)

backend/infrahub/core/schema/schema_branch.py (1)

165-171: Add a short docstring and consider JSON/alias dump for API stability.

This new public serializer looks good. To align with our Python docstring guidelines and avoid future alias/None drift in API payloads, add a docstring and switch to model_dump(mode="json", by_alias=True, exclude_none=True).

 def to_dict_api_schema_object(self) -> dict[str, list[dict]]:
-        return {
-            "nodes": [self.get(name, duplicate=False).model_dump() for name in self.nodes],
-            "profiles": [self.get(name, duplicate=False).model_dump() for name in self.profiles],
-            "generics": [self.get(name, duplicate=False).model_dump() for name in self.generics],
-            "templates": [self.get(name, duplicate=False).model_dump() for name in self.templates],
-        }
+        """Return an API-friendly representation of this branch's schema.
+
+        Returns:
+            Dict with keys: nodes, profiles, generics, templates. Each value is a list of Pydantic dumps.
+        """
+        return {
+            "nodes": [self.get(name, duplicate=False).model_dump(mode="json", by_alias=True, exclude_none=True) for name in self.nodes],
+            "profiles": [self.get(name, duplicate=False).model_dump(mode="json", by_alias=True, exclude_none=True) for name in self.profiles],
+            "generics": [self.get(name, duplicate=False).model_dump(mode="json", by_alias=True, exclude_none=True) for name in self.generics],
+            "templates": [self.get(name, duplicate=False).model_dump(mode="json", by_alias=True, exclude_none=True) for name in self.templates],
+        }

backend/infrahub/core/schema/manager.py (1)

770-791: Purge orphaned SDK cache entries when branches are removed.

Currently, purge_inactive_branches deletes _branches and _branch_hash_by_name entries but leaves stale items in _sdk_branches (keyed by old hashes). Clean them up to bound memory and avoid serving stale SDK schemas.

         for branch_name in list(self._branches.keys()):
             if branch_name not in active_branches:
                 del self._branches[branch_name]
                 if branch_name in self._branch_hash_by_name:
                     del self._branch_hash_by_name[branch_name]
                 removed_branches.append(branch_name)

+        # Drop SDK cache entries for purged branches
+        active_hashes = set(self._branch_hash_by_name.values())
+        for key in list(self._sdk_branches.keys()):
+            if key not in active_hashes:
+                del self._sdk_branches[key]

backend/infrahub/git/repository.py (1)

254-282: Make cache key resilient to positional calls and add docstrings.

The cache key uses only kwargs; if someone inadvertently calls the private function positionally, the key may degenerate. Enforce keyword-only params and add docstrings per guidelines.

-@cached(
+@cached(
     TTLCache(maxsize=100, ttl=30),
     key=lambda *_, **kwargs: hashkey(
         kwargs.get("repository_id"), kwargs.get("name"), kwargs.get("repository_kind"), kwargs.get("commit")
     ),
 )
-async def _get_initialized_repo(
-    client: InfrahubClient, repository_id: str, name: str, repository_kind: str, commit: str | None = None
-) -> InfrahubReadOnlyRepository | InfrahubRepository:
+async def _get_initialized_repo(
+    *, client: InfrahubClient, repository_id: str, name: str, repository_kind: str, commit: str | None = None
+) -> InfrahubReadOnlyRepository | InfrahubRepository:
+    """Create or retrieve a cached repository wrapper for a given repo/commit."""
@@
-async def get_initialized_repo(
-    client: InfrahubClient, repository_id: str, name: str, repository_kind: str, commit: str | None = None
-) -> InfrahubReadOnlyRepository | InfrahubRepository:
+async def get_initialized_repo(
+    client: InfrahubClient, repository_id: str, name: str, repository_kind: str, commit: str | None = None
+) -> InfrahubReadOnlyRepository | InfrahubRepository:
+    """Prefect task wrapper: fetch a repository at a given commit (local cache, no Prefect caching)."""
     return await _get_initialized_repo(
         client=client, repository_id=repository_id, name=name, repository_kind=repository_kind, commit=commit
     )

All current call sites pass keyword args (good). This change prevents accidental positional calls in the future.

backend/infrahub/git/tasks.py (1)

329-335: Iterate relationships explicitly for clarity.

group.members.peers yields Relationship objects; consider naming the loop var relationship and access relationship.id or relationship.peer for readability and to avoid misuse. Example:
-    current_members = [member.id for member in group.members.peers]
+    current_members = [rel.id for rel in group.members.peers]  # relationship ids

backend/infrahub/git/utils.py (1)

58-75: Consider walking dict keys too when extracting parameter segments.

Some parameter schemas encode field paths in keys (e.g., {"device__name": "r1"}). Current walker inspects only values. Optionally include keys:

-    def _walk(value: Any) -> None:
+    def _walk(value: Any) -> None:
         if isinstance(value, str):
             segment = value.split("__", 1)[0]
             if segment:
                 segments.add(segment)
         elif isinstance(value, dict):
-            for nested in value.values():
-                _walk(nested)
+            for k, v in value.items():
+                _walk(k)
+                _walk(v)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 72ca590 and 2ce5840.

⛔ Files ignored due to path filters (1)

poetry.lock is excluded by !**/*.lock

📒 Files selected for processing (16)

CHANGELOG.md (1 hunks)
backend/infrahub/artifacts/models.py (1 hunks)
backend/infrahub/computed_attribute/tasks.py (1 hunks)
backend/infrahub/core/schema/manager.py (4 hunks)
backend/infrahub/core/schema/schema_branch.py (1 hunks)
backend/infrahub/generators/tasks.py (2 hunks)
backend/infrahub/git/integrator.py (1 hunks)
backend/infrahub/git/models.py (1 hunks)
backend/infrahub/git/repository.py (3 hunks)
backend/infrahub/git/tasks.py (6 hunks)
backend/infrahub/git/utils.py (2 hunks)
backend/infrahub/message_bus/types.py (1 hunks)
backend/infrahub/proposed_change/tasks.py (8 hunks)
backend/infrahub/workers/dependencies.py (2 hunks)
changelog/+785ca0cd.fixed.md (1 hunks)
pyproject.toml (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (6)

backend/infrahub/artifacts/models.py
backend/infrahub/git/integrator.py
backend/infrahub/workers/dependencies.py
pyproject.toml
backend/infrahub/generators/tasks.py
backend/infrahub/proposed_change/tasks.py

🧰 Additional context used

📓 Path-based instructions (3)

backend/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Run backend tests with pytest or via invoke tasks

Files:

backend/infrahub/computed_attribute/tasks.py
backend/infrahub/message_bus/types.py
backend/infrahub/git/tasks.py
backend/infrahub/core/schema/manager.py
backend/infrahub/git/models.py
backend/infrahub/git/utils.py
backend/infrahub/git/repository.py
backend/infrahub/core/schema/schema_branch.py

**/*.py

📄 CodeRabbit inference engine (.github/instructions/python-docstring.instructions.md)

**/*.py: Use triple double quotes (""") for all Python docstrings
Write docstrings in Google-style format
Include a brief one-line description at the top of each docstring
Add a detailed description section when additional context is needed
Document function/method parameters under an Args/Parameters section without typing information
Include a Returns section describing the return value
Include a Raises section listing possible exceptions
Provide an Examples section demonstrating usage when helpful

Use ruff and mypy to validate and lint Python files

**/*.py: Use type hints for all Python function parameters and return values
Prefer asynchronous code in Python when feasible
Define asynchronous Python functions with async def
Use await for asynchronous calls in Python
Use Pydantic models instead of standard dataclasses
Use ruff and mypy for linting and type checking

**/*.py: Use type hints for all Python function parameters and return values
Use async/await whenever possible in Python code
Define asynchronous functions with async def
Await asynchronous calls with await
Use Pydantic models instead of standard dataclasses for data modeling
Use triple quotes (""") for all Python docstrings
Write docstrings in Google-style format
Include docstring sections when applicable: one-line summary, optional details, Args (without types), Returns, Raises, Examples
Validate and lint Python with ruff and mypy

Files:

backend/infrahub/computed_attribute/tasks.py
backend/infrahub/message_bus/types.py
backend/infrahub/git/tasks.py
backend/infrahub/core/schema/manager.py
backend/infrahub/git/models.py
backend/infrahub/git/utils.py
backend/infrahub/git/repository.py
backend/infrahub/core/schema/schema_branch.py

**/*.{md,mdx}

📄 CodeRabbit inference engine (CLAUDE.md)

Lint Markdown/MDX files with markdownlint using .markdownlint.yaml

Files:

CHANGELOG.md
changelog/+785ca0cd.fixed.md

🧬 Code graph analysis (5)

backend/infrahub/git/tasks.py (1)

backend/infrahub/git/utils.py (2)

fetch_artifact_definition_targets (143-150)

fetch_check_definition_targets (153-160)

backend/infrahub/core/schema/manager.py (2)

backend/infrahub/core/registry.py (2)

schema (79-83)

schema (86-87)

backend/infrahub/core/schema/schema_branch.py (3)

get (303-329)

to_dict_api_schema_object (165-171)

get_hash (139-149)

backend/infrahub/git/utils.py (3)

backend/infrahub/core/protocols.py (3)

CoreArtifactDefinition (280-287)

CoreCheckDefinition (308-318)

CoreGroup (130-138)

backend/infrahub/generators/models.py (1)

ProposedChangeGeneratorDefinition (34-44)

backend/infrahub/core/schema/basenode_schema.py (2)

attribute_names (339-340)

relationship_names (343-344)

backend/infrahub/git/repository.py (1)

backend/infrahub/message_bus/operations/git/file.py (1)

get (14-35)

backend/infrahub/core/schema/schema_branch.py (4)

backend/infrahub/core/schema/manager.py (1)

get (69-92)

backend/infrahub/database/__init__.py (1)

get (84-88)

backend/infrahub/core/schema/__init__.py (2)

get (65-72)

duplicate (97-99)

backend/infrahub/core/models.py (3)

duplicate (500-502)

nodes (61-63)

generics (66-68)

🪛 LanguageTool

changelog/+785ca0cd.fixed.md

[grammar] ~1-~1: There might be a mistake here.
Context: ... and avoiding excessive GraphQL queries.

(QB_NEW_EN)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)

GitHub Check: E2E-testing-version-upgrade / From 1.3.0
GitHub Check: E2E-testing-invoke-demo-start
GitHub Check: backend-benchmark
GitHub Check: E2E-testing-playwright
GitHub Check: backend-tests-functional
GitHub Check: backend-tests-unit
GitHub Check: backend-tests-integration
GitHub Check: backend-docker-integration
GitHub Check: Cloudflare Pages

🔇 Additional comments (4)

backend/infrahub/core/schema/manager.py (1)

151-160: Verify that the codebase can accept the hash scope limitation (profiles/templates excluded from SDK cache key), or implement get_hash_full() approach.

The review identifies a real issue: line 152 can raise KeyError if get_sdk_schema_branch() is called before a branch is initialized. The suggested fix is sound—calling get_schema_branch() first ensures both the branch and its hash entry exist, since set_schema_branch() atomically sets both at lines 163–164.

However, the review also raises a valid concern about hash scope: the cache key uses get_hash() which only incorporates nodes and generics (line 140), not profiles or templates (line 155). If profiles or templates change independently, the SDK cache will become stale. The suggested diff still uses get_hash() and leaves this unresolved, though the caveat acknowledges it.

Consider:

Accept this limitation (profiles/templates changes are rare or externally managed)

Or adopt the suggested approach of using get_hash_full() or including profiles/templates in the cache key derivation

backend/infrahub/git/tasks.py (2)

327-328: Good switch to prefetched targets.

Using fetch_artifact_definition_targets centralizes logic and avoids per-item fetches.

379-381: Propagating query_id is correct.

Including query_id alongside query reduces lookups and aligns with the new models.
backend/infrahub/git/utils.py (1)
119-131: Ensure members.peers are populated before prefetch.

include=["members"] may not hydrate relationship peers in some SDK versions. If peers are empty, fetch them to get ids/kinds.
     group = await client.get(
         kind=CoreGroup,
         id=group_id,
         branch=branch,
         include=["members"],
     )
+    if not group.members.peers:
+        await group.members.fetch()

coderabbitai · 2025-10-17T09:01:51Z

backend/infrahub/git/utils.py

+async def _prefetch_group_member_nodes(
+    client: InfrahubClient,
+    members: RelationshipManager,
+    branch: str,
+    required_fields: set[str],
+) -> None:
+    ids_per_kind: dict[str, set[str]] = defaultdict(set)


🛠️ Refactor suggestion | 🟠 Major

Add Google‑style docstrings to new utilities.

Per repo guidelines, add docstrings to _prefetch_group_member_nodes, fetch_definition_targets, and public fetch* helpers.

Minimal examples:

async def _prefetch_group_member_nodes(...): + """Batch‑prefetch group member nodes for required fields. + + Args: + client: SDK client. + members: Relationship manager for group.members. + branch: Branch to query. + required_fields: Top‑level attribute/relationship names needed. + + Returns: + None. Populates the client store with fetched nodes. + """ @@ async def _fetch_definition_targets(...)-> CoreGroup: + """Fetch target group and prefetch members based on definition parameters.""" @@ async def fetch_artifact_definition_targets(...)-> CoreGroup: + """Public helper to resolve targets for an artifact definition.""" @@ async def fetch_check_definition_targets(...)-> CoreGroup: + """Public helper to resolve targets for a check definition.""" @@ async def fetch_proposed_change_generator_definition_targets(...)-> CoreGroup: + """Public helper to resolve targets for a proposed change generator definition."""

Also applies to: 119-141, 143-170

coderabbitai · 2025-10-17T09:01:51Z

backend/infrahub/git/utils.py

+    for peer in members.peers:
+        if peer.id and peer.typename:
+            ids_per_kind[peer.typename].add(peer.id)
+


⚠️ Potential issue | 🔴 Critical

Wrong entity from RelationshipManager.peers; use relationship.peer.

peers yields Relationship objects. Access the node via relationship.peer; otherwise peer.typename may not exist and ids_per_kind stays empty.

- for peer in members.peers: - if peer.id and peer.typename: - ids_per_kind[peer.typename].add(peer.id) + for relationship in members.peers: + node = relationship.peer + if node.id and node.typename: + ids_per_kind[node.typename].add(node.id)

🤖 Prompt for AI Agents

In backend/infrahub/git/utils.py around lines 84 to 87, the loop is iterating Relationship objects from members.peers but treats them as if they were nodes; change the loop to treat each item as a relationship, retrieve the node via relationship.peer, then check that that peer has id and typename before adding to ids_per_kind (e.g., for relationship in members.peers: peer = relationship.peer; if peer and peer.id and peer.typename: ids_per_kind[peer.typename].add(peer.id)). Ensure you don't assume peers yields raw nodes and guard against missing peer attributes.

coderabbitai · 2025-10-17T09:01:51Z

backend/infrahub/git/utils.py

+        schema = await client.schema.get(kind=kind, branch=branch)
+
+        # FIXME: https://github.com/opsmill/infrahub-sdk-python/pull/205
+        valid_fields = set(schema.attribute_names) | set(schema.relationship_names)
+        keep_relationships = set(schema.relationship_names) & required_fields
+        cleaned_fields = required_fields ^ valid_fields


⚠️ Potential issue | 🔴 Critical

Call schema helpers, fix exclude set, and guard empty required_fields.

attribute_names()/relationship_names() are methods; call them.

Use difference (valid_fields - required_fields) for exclude to avoid unknown keys from XOR.

Skip prefetch when required_fields is empty to avoid requesting “nothing”.

- # FIXME: https://github.com/opsmill/infrahub-sdk-python/pull/205 - valid_fields = set(schema.attribute_names) | set(schema.relationship_names) - keep_relationships = set(schema.relationship_names) & required_fields - cleaned_fields = required_fields ^ valid_fields + # FIXME: https://github.com/opsmill/infrahub-sdk-python/pull/205 + attribute_names = set(schema.attribute_names()) + relationship_names = set(schema.relationship_names()) + valid_fields = attribute_names | relationship_names + keep_relationships = relationship_names & required_fields + cleaned_fields = valid_fields - required_fields @@ - kwargs: dict[str, Any] = { + kwargs: dict[str, Any] = { "kind": kind, "ids": ids, "branch": branch, "exclude": cleaned_fields, "populate_store": True, "order": Order(disable=True), } @@ - parameter_fields = _collect_parameter_first_segments(parameters) - await _prefetch_group_member_nodes( - client=client, - members=group.members, - branch=branch, - required_fields=parameter_fields, - ) + parameter_fields = _collect_parameter_first_segments(parameters) + if parameter_fields: + await _prefetch_group_member_nodes( + client=client, + members=group.members, + branch=branch, + required_fields=parameter_fields, + )

Also applies to: 101-114, 132-139

🤖 Prompt for AI Agents

In backend/infrahub/git/utils.py around lines 94-99, 101-114, and 132-139, fix three issues: call schema.attribute_names() and schema.relationship_names() (they are methods), compute the exclude set as valid_fields - required_fields instead of symmetric difference (required_fields ^ valid_fields), and skip the prefetch/prefetch-related logic when required_fields is empty to avoid requesting nothing; update any keep_relationships calculations to use set(schema.relationship_names()) & required_fields accordingly and apply these same changes in the other noted ranges.

coderabbitai · 2025-10-17T09:01:52Z

CHANGELOG.md

+## [Infrahub - v1.4.10](https://github.com/opsmill/infrahub/tree/infrahub-v1.4.10) - 2025-10-17
+
+### Added
+
+- The login form now automatically focuses on the first field.
+
+### Fixed
+
+- - Consistent font size for all events in the Proposed Change timeline
+  - Proposed Change action buttons now keep their size and does not strectch anymore
+  - Prevent overflow on the create new relationship button within the relationship input
+  - fixed typos
+
+  ([#IFC-1780](https://github.com/opsmill/infrahub/issues/IFC-1780))
+- - Improved logging for SSO authentication to provide better debugging information
+  - Enhanced error handling to properly support all error codes returned by identity providers
+
+  ([#6969](https://github.com/opsmill/infrahub/issues/6969))
+- - Correctly display XML and CSV artifacts in the UI.
+  - Added a fallback to plain text for unsupported content types.
+
+  ([#7294](https://github.com/opsmill/infrahub/issues/7294))
+- Fix a bug that allowed duplicate attributes and/or relationships on Node or Generic schemas to be merged into the default branch,
+  which would cause the application and workers to crash with an error message similar to the following:
+
+  > ValueError: SchemaName: Names of attributes and relationships must be unique : ['field_name_1', 'field_name_2']
+
+  Added a new CLI command `infrahub db check-duplicate-schema-fields` to resolve this duplicated schema fields issue if it appears. ([#7346](https://github.com/opsmill/infrahub/issues/7346))
+- Fixed an issue where boolean fields in the object Details view always displayed a checkmark, even when the value was false. ([#7372](https://github.com/opsmill/infrahub/issues/7372))
+- Fixed prefix utilization showing as greater than 100% after setting the pool attribute to false ([#7388](https://github.com/opsmill/infrahub/issues/7388))
+- Corrected the labels on the branch list and detailed view to use the correct terminology
+- Fixed issue with number pool popover stuck in the top-left corner and not expandable during the initial render in some cases.
+


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Duplicate version header and typos.

Two entries named v1.4.10 (2025-10-17 and 2025-10-01). Please bump the new entry’s version (e.g., v1.4.11) or merge into the existing 1.4.10 section.

Fix typos: “strectch” → “stretch”.

-## [Infrahub - v1.4.10] ... - 2025-10-17 +## [Infrahub - v1.4.11] ... - 2025-10-17 # or merge into 1.4.10 @@ - - Proposed Change action buttons now keep their size and does not strectch anymore + - Proposed Change action buttons now keep their size and do not stretch anymore

🏁 Script executed:

#!/bin/bash # Check for duplicate v1.4.10 entries and find the typo echo "=== All v1.4.10 entries in CHANGELOG.md ===" rg -n "v1\.4\.10" CHANGELOG.md echo "" echo "=== Line with 'strectch' typo ===" rg -n "strectch" CHANGELOG.md

Length of output: 364

Duplicate version header and typos confirmed—resolve version conflict and fix typos.

Verification confirms:

Two v1.4.10 entries exist (lines 14 and 47 with dates 2025-10-17 and 2025-10-01 respectively)

Typo "strectch" on line 23

Grammar error "does not" should be "do not" (plural subject "buttons")

Resolve by either bumping the newer entry to v1.4.11 or merging both entries into a single v1.4.10 section, and apply the typo/grammar fixes:

- - Proposed Change action buttons now keep their size and does not strectch anymore + - Proposed Change action buttons now keep their size and do not stretch anymore

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

## [Infrahub - v1.4.10](https://github.com/opsmill/infrahub/tree/infrahub-v1.4.10) - 2025-10-17

### Added

- The login form now automatically focuses on the first field.

### Fixed

- - Consistent font size for all events in the Proposed Change timeline

- Proposed Change action buttons now keep their size and does not strectch anymore

- Prevent overflow on the create new relationship button within the relationship input

- fixed typos

([#IFC-1780](https://github.com/opsmill/infrahub/issues/IFC-1780))

- - Improved logging for SSO authentication to provide better debugging information

- Enhanced error handling to properly support all error codes returned by identity providers

([#6969](https://github.com/opsmill/infrahub/issues/6969))

- - Correctly display XML and CSV artifacts in the UI.

- Added a fallback to plain text for unsupported content types.

([#7294](https://github.com/opsmill/infrahub/issues/7294))

- Fix a bug that allowed duplicate attributes and/or relationships on Node or Generic schemas to be merged into the default branch,

which would cause the application and workers to crash with an error message similar to the following:

> ValueError: SchemaName: Names of attributes and relationships must be unique : ['field_name_1', 'field_name_2']

Added a new CLI command `infrahub db check-duplicate-schema-fields` to resolve this duplicated schema fields issue if it appears. ([#7346](https://github.com/opsmill/infrahub/issues/7346))

- Fixed an issue where boolean fields in the object Details view always displayed a checkmark, even when the value was false. ([#7372](https://github.com/opsmill/infrahub/issues/7372))

- Fixed prefix utilization showing as greater than 100% after setting the pool attribute to false ([#7388](https://github.com/opsmill/infrahub/issues/7388))

- Corrected the labels on the branch list and detailed view to use the correct terminology

- Fixed issue with number pool popover stuck in the top-left corner and not expandable during the initial render in some cases.

## [Infrahub - v1.4.10](https://github.com/opsmill/infrahub/tree/infrahub-v1.4.10) - 2025-10-17

### Added

- The login form now automatically focuses on the first field.

### Fixed

- - Consistent font size for all events in the Proposed Change timeline

- Proposed Change action buttons now keep their size and do not stretch anymore

- Prevent overflow on the create new relationship button within the relationship input

- fixed typos

([#IFC-1780](https://github.com/opsmill/infrahub/issues/IFC-1780))

- - Improved logging for SSO authentication to provide better debugging information

- Enhanced error handling to properly support all error codes returned by identity providers

([#6969](https://github.com/opsmill/infrahub/issues/6969))

- - Correctly display XML and CSV artifacts in the UI.

- Added a fallback to plain text for unsupported content types.

([#7294](https://github.com/opsmill/infrahub/issues/7294))

- Fix a bug that allowed duplicate attributes and/or relationships on Node or Generic schemas to be merged into the default branch,

which would cause the application and workers to crash with an error message similar to the following:

> ValueError: SchemaName: Names of attributes and relationships must be unique : ['field_name_1', 'field_name_2']

Added a new CLI command `infrahub db check-duplicate-schema-fields` to resolve this duplicated schema fields issue if it appears. ([#7346](https://github.com/opsmill/infrahub/issues/7346))

- Fixed an issue where boolean fields in the object Details view always displayed a checkmark, even when the value was false. ([#7372](https://github.com/opsmill/infrahub/issues/7372))

- Fixed prefix utilization showing as greater than 100% after setting the pool attribute to false ([#7388](https://github.com/opsmill/infrahub/issues/7388))

- Corrected the labels on the branch list and detailed view to use the correct terminology

- Fixed issue with number pool popover stuck in the top-left corner and not expandable during the initial render in some cases.

🤖 Prompt for AI Agents

In CHANGELOG.md around lines 14 to 46 there is a duplicate v1.4.10 header (also present at line ~47) and two text issues: "strectch" should be "stretch" and "does not" should be "do not" for the plural "buttons"; resolve the duplication by either renaming the newer entry to "Infrahub - v1.4.11" with the 2025-10-17 date or by merging the two v1.4.10 blocks into one consolidated v1.4.10 entry (remove the duplicate block), then apply the typo fixes ("stretch") and grammar fix ("do not") and ensure references/issue links remain intact.

coderabbitai · 2025-10-17T09:01:52Z

changelog/+785ca0cd.fixed.md

@@ -0,0 +1 @@
+Improved artifacts generation and proposed change checks performance by leveraging caching and avoiding excessive GraphQL queries.


⚠️ Potential issue | 🟡 Minor

Tighten phrasing.

Suggest active voice and remove “excessive”:

-Improved artifacts generation and proposed change checks performance by leveraging caching and avoiding excessive GraphQL queries. +Improve artifact generation and proposed change checks by leveraging caching and reducing GraphQL queries.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

Improved artifacts generation and proposed change checks performance by leveraging caching and avoiding excessive GraphQL queries.

Improve artifact generation and proposed change checks by leveraging caching and reducing GraphQL queries.

🧰 Tools

🪛 LanguageTool

[grammar] ~1-~1: There might be a mistake here.
Context: ... and avoiding excessive GraphQL queries.

(QB_NEW_EN)

🤖 Prompt for AI Agents

In changelog/+785ca0cd.fixed.md around line 1, the sentence uses passive phrasing and the word “excessive”; change it to active voice and remove “excessive.” Replace the line with a concise active version such as: "Improved artifacts generation and sped up proposed change checks by leveraging caching and reducing GraphQL queries."

Signed-off-by: Fatih Acar <[email protected]>

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (2)

backend/infrahub/git/utils.py (2)
97-99: Call schema methods to retrieve field names.

schema.attribute_names and schema.relationship_names are methods that must be called with () to return the actual lists of field names.

Apply this diff:
-        valid_fields = set(schema.attribute_names) | set(schema.relationship_names)
-        keep_relationships = set(schema.relationship_names) & required_fields
+        attribute_names = set(schema.attribute_names())
+        relationship_names = set(schema.relationship_names())
+        valid_fields = attribute_names | relationship_names
+        keep_relationships = relationship_names & required_fields
84-86: Fix peers iteration to access node attributes correctly.

members.peers yields Relationship objects, not nodes directly. You need to access relationship.peer to get the actual node before checking id and typename.

Apply this diff:
-    for peer in members.peers:
-        if peer.id and peer.typename:
-            ids_per_kind[peer.typename].add(peer.id)
+    for relationship in members.peers:
+        node = relationship.peer
+        if node and node.id and node.typename:
+            ids_per_kind[node.typename].add(node.id)

🧹 Nitpick comments (4)

backend/infrahub/git/utils.py (4)

58-74: Consider adding a docstring.

Per coding guidelines, Python functions should include Google-style docstrings. This helper would benefit from a brief description of its purpose and parameters.

Example:

def _collect_parameter_first_segments(params: Any) -> set[str]:
    """Extract top-level field names from parameter structures.
    
    Recursively walks nested parameter structures (dicts, lists, tuples, sets)
    and extracts the first segment (before '__') from string values.
    
    Args:
        params: Parameter structure to walk (can be dict, list, or scalar)
        
    Returns:
        Set of top-level field names referenced in the parameters
    """

77-116: Consider adding a docstring.

Per coding guidelines, this async function should have a Google-style docstring documenting its purpose, parameters, and behavior.

Example:

async def _prefetch_group_member_nodes(
    client: InfrahubClient,
    members: RelationshipManager,
    branch: str,
    required_fields: set[str],
) -> None:
    """Batch-prefetch group member nodes with required fields.
    
    Groups member nodes by kind, computes minimal field sets based on
    required_fields, and uses batched client requests to populate the
    client store efficiently.
    
    Args:
        client: SDK client for API interactions
        members: RelationshipManager for group.members
        branch: Branch name to query
        required_fields: Top-level attribute/relationship names needed
        
    Returns:
        None. Populates the client store with fetched nodes.
    """

119-140: Consider adding a docstring.

This core utility function would benefit from a Google-style docstring explaining its purpose and flow.

Example:

async def _fetch_definition_targets(
    client: InfrahubClient,
    branch: str,
    group_id: str,
    parameters: Any,
) -> CoreGroup:
    """Fetch target group and prefetch member nodes based on definition parameters.
    
    Retrieves the target group by ID, analyzes the definition parameters to
    determine required fields, and prefetches member nodes with those fields
    populated in the client store.
    
    Args:
        client: SDK client for API interactions
        branch: Branch name to query
        group_id: ID of the target group
        parameters: Definition parameters used to determine required fields
        
    Returns:
        The target group with prefetched member nodes
    """

143-170: Consider adding docstrings to public functions.

These public API functions should have Google-style docstrings per coding guidelines. Brief examples:

async def fetch_artifact_definition_targets(
    client: InfrahubClient,
    branch: str,
    definition: CoreArtifactDefinition,
) -> CoreGroup:
    """Fetch target group for an artifact definition with prefetched members.
    
    Args:
        client: SDK client for API interactions
        branch: Branch name to query
        definition: Artifact definition containing targets and parameters
        
    Returns:
        Target group with prefetched member nodes
    """

Similar docstrings for fetch_check_definition_targets and fetch_proposed_change_generator_definition_targets.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 072eece and f9bde88.

📒 Files selected for processing (5)

backend/infrahub/generators/tasks.py (2 hunks)
backend/infrahub/git/tasks.py (6 hunks)
backend/infrahub/git/utils.py (2 hunks)
backend/infrahub/proposed_change/tasks.py (8 hunks)
changelog/+785ca0cd.fixed.md (1 hunks)

🧰 Additional context used

📓 Path-based instructions (3)

backend/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Run backend tests with pytest or via invoke tasks

Files:

backend/infrahub/git/tasks.py
backend/infrahub/proposed_change/tasks.py
backend/infrahub/generators/tasks.py
backend/infrahub/git/utils.py

**/*.py

📄 CodeRabbit inference engine (.github/instructions/python-docstring.instructions.md)

**/*.py: Use triple double quotes (""") for all Python docstrings
Write docstrings in Google-style format
Include a brief one-line description at the top of each docstring
Add a detailed description section when additional context is needed
Document function/method parameters under an Args/Parameters section without typing information
Include a Returns section describing the return value
Include a Raises section listing possible exceptions
Provide an Examples section demonstrating usage when helpful

Use ruff and mypy to validate and lint Python files

**/*.py: Use type hints for all Python function parameters and return values
Prefer asynchronous code in Python when feasible
Define asynchronous Python functions with async def
Use await for asynchronous calls in Python
Use Pydantic models instead of standard dataclasses
Use ruff and mypy for linting and type checking

**/*.py: Use type hints for all Python function parameters and return values
Use async/await whenever possible in Python code
Define asynchronous functions with async def
Await asynchronous calls with await
Use Pydantic models instead of standard dataclasses for data modeling
Use triple quotes (""") for all Python docstrings
Write docstrings in Google-style format
Include docstring sections when applicable: one-line summary, optional details, Args (without types), Returns, Raises, Examples
Validate and lint Python with ruff and mypy

Files:

backend/infrahub/git/tasks.py
backend/infrahub/proposed_change/tasks.py
backend/infrahub/generators/tasks.py
backend/infrahub/git/utils.py

**/*.{md,mdx}

📄 CodeRabbit inference engine (CLAUDE.md)

Lint Markdown/MDX files with markdownlint using .markdownlint.yaml

Files:

changelog/+785ca0cd.fixed.md

🧬 Code graph analysis (4)

backend/infrahub/git/tasks.py (2)

backend/infrahub/git/utils.py (2)

fetch_artifact_definition_targets (143-150)

fetch_check_definition_targets (153-160)

backend/infrahub/services/adapters/workflow/local.py (1)

submit_workflow (33-41)

backend/infrahub/proposed_change/tasks.py (2)

backend/infrahub/core/protocols.py (1)

CoreArtifactDefinition (280-287)

backend/infrahub/git/utils.py (2)

fetch_artifact_definition_targets (143-150)

fetch_proposed_change_generator_definition_targets (163-170)

backend/infrahub/generators/tasks.py (1)

backend/infrahub/git/utils.py (1)

fetch_proposed_change_generator_definition_targets (163-170)

backend/infrahub/git/utils.py (4)

backend/infrahub/core/relationship/model.py (1)

RelationshipManager (718-1242)

backend/infrahub/core/protocols.py (3)

CoreArtifactDefinition (280-287)

CoreCheckDefinition (308-318)

CoreGroup (130-138)

backend/infrahub/generators/models.py (1)

ProposedChangeGeneratorDefinition (34-44)

backend/infrahub/core/schema/basenode_schema.py (2)

attribute_names (339-340)

relationship_names (343-344)

🪛 LanguageTool

changelog/+785ca0cd.fixed.md

[grammar] ~1-~1: There might be a mistake here.
Context: ... and avoiding excessive GraphQL queries.

(QB_NEW_EN)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: E2E-testing-version-upgrade / From 1.3.0
GitHub Check: backend-benchmark
GitHub Check: E2E-testing-playwright
GitHub Check: E2E-testing-invoke-demo-start
GitHub Check: validate-generated-documentation
GitHub Check: backend-tests-functional
GitHub Check: documentation
GitHub Check: backend-tests-integration
GitHub Check: backend-tests-unit
GitHub Check: backend-docker-integration

🔇 Additional comments (11)

backend/infrahub/generators/tasks.py (2)

24-24: LGTM!

The import is correctly added to support the new fetch utility function used below.

181-183: LGTM!

The switch to the prefetch utility function is correct and will improve performance by batching member node fetches based on definition parameters.

backend/infrahub/proposed_change/tasks.py (5)

47-47: LGTM!

The new imports support the updated artifact definition handling and prefetch utilities used throughout the file.

Also applies to: 61-61

616-616: LGTM!

Using the protocol type CoreArtifactDefinition provides better type safety, and the fetch utility efficiently prefetches member nodes based on the definition's parameters.

Also applies to: 656-658

695-695: LGTM!

Adding query_id enables artifact generation to use the query identifier directly, reducing the need for name-based lookups and improving performance.

922-924: LGTM!

Consistent use of the prefetch utility for generator definitions improves performance through batched member node retrieval.

1245-1245: LGTM!

Adding the query id field to the GraphQL query and parsing it into query_id supports the migration from name-based to ID-based query references throughout the artifact generation flow.

Also applies to: 1467-1467

backend/infrahub/git/tasks.py (4)

56-56: LGTM!

The import provides the prefetch utilities used throughout the file for efficient target resolution.

327-328: LGTM!

The prefetch utility efficiently resolves artifact definition targets and pre-populates member nodes in the client store based on parameter requirements.

359-398: LGTM!

The batch submission pattern correctly groups artifact generation requests to reduce Prefect API overhead. Including query_id (line 380) supports the shift from name-based to ID-based query references, which should reduce database lookups during artifact generation.

580-580: LGTM!

Consistent use of the prefetch utility for check definitions ensures member nodes are efficiently fetched before iteration.

ogenstad · 2025-10-17T11:29:28Z

backend/infrahub/git/repository.py

+    TTLCache(maxsize=100, ttl=30),
+    key=lambda *_, **kwargs: hashkey(
+        kwargs.get("repository_id"), kwargs.get("name"), kwargs.get("repository_kind"), kwargs.get("commit")
+    ),


Given the TTL it might not have a practical meaning now, but given that commit can be None I think we might occasionally end up with a repo using an earlier commit that we would have expected. In another iteration we could instead cache after we have the commit. Repository_id and commit is probably enough to use for the key.

The TTL is lower than the git sync scheduled job on purpose because I'd expect this kind of edge case... I think it's fine like that for the moment.
+1 for the key.

ogenstad · 2025-10-17T11:32:49Z

backend/infrahub/git/utils.py

    return repositories
+
+
+def _collect_parameter_first_segments(params: Any) -> set[str]:


Do we need an Any here looks like we expect str | list | tuple | set | dict and won't use anything else.

I think LLM used Any here because we pass parameters straight from the JSONAttribute protocol where value is typed as Any. This looks legit?

ogenstad · 2025-10-17T11:33:37Z

backend/infrahub/git/utils.py

+def _collect_parameter_first_segments(params: Any) -> set[str]:
+    segments: set[str] = set()
+
+    def _walk(value: Any) -> None:
+        if isinstance(value, str):
+            segment = value.split("__", 1)[0]
+            if segment:
+                segments.add(segment)
+        elif isinstance(value, dict):
+            for nested in value.values():
+                _walk(nested)
+        elif isinstance(value, (list, tuple, set)):
+            for nested in value:
+                _walk(nested)
+
+    _walk(params)
+    return segments


Hey coderabbit, where is this requirement for docstring coming from?

github-actions bot added the group/backend Issue related to the backend (API Server, Git Agent) label Oct 3, 2025

fatih-acar force-pushed the fac-fix-artifacts-scale branch from 72e7b83 to 408ce30 Compare October 3, 2025 18:06

dgarros reviewed Oct 4, 2025

View reviewed changes

backend/infrahub/git/tasks.py Show resolved Hide resolved

fatih-acar force-pushed the fac-fix-artifacts-scale branch from 73e9d80 to 9662b59 Compare October 5, 2025 23:04

dgarros reviewed Oct 6, 2025

View reviewed changes

backend/infrahub/core/schema/manager.py Outdated Show resolved Hide resolved

ogenstad reviewed Oct 6, 2025

View reviewed changes

backend/infrahub/core/schema/manager.py Outdated Show resolved Hide resolved

fatih-acar force-pushed the fac-fix-artifacts-scale branch from 207ffc8 to 4bd0e78 Compare October 6, 2025 13:56

fatih-acar mentioned this pull request Oct 6, 2025

bug: Schema Cache Not Updated Before Running Tasks #6814

Open

fatih-acar force-pushed the fac-fix-artifacts-scale branch 2 times, most recently from ea5302a to 9dcedfc Compare October 6, 2025 20:58

fatih-acar force-pushed the fac-fix-artifacts-scale branch from 9dcedfc to dd81c80 Compare October 15, 2025 13:44

fatih-acar marked this pull request as ready for review October 15, 2025 14:09

fatih-acar requested a review from a team as a code owner October 15, 2025 14:09

coderabbitai bot reviewed Oct 15, 2025

View reviewed changes

fatih-acar added 3 commits October 16, 2025 23:11

fix(backend): do not fetch artifact targets all fields

da45814

Only those required by the artifact target parameters. AI Signed-off-by: Fatih Acar <[email protected]>

fix(backend): batch artifact generate requests

4b3bc97

Signed-off-by: Fatih Acar <[email protected]>

fatih-acar force-pushed the fac-fix-artifacts-scale branch from 93246c1 to ee7c390 Compare October 16, 2025 21:11

coderabbitai bot reviewed Oct 16, 2025

View reviewed changes

fatih-acar force-pushed the fac-fix-artifacts-scale branch 3 times, most recently from db50bd5 to 72ca590 Compare October 16, 2025 22:17

coderabbitai bot reviewed Oct 16, 2025

View reviewed changes

ajtmccarty approved these changes Oct 17, 2025

View reviewed changes

dgarros approved these changes Oct 17, 2025

View reviewed changes

fatih-acar added 2 commits October 17, 2025 10:02

fix(backend): use internal schema cache for task workers

3616582

Signed-off-by: Fatih Acar <[email protected]>

fatih-acar force-pushed the fac-fix-artifacts-scale branch from 72ca590 to ce5b6b5 Compare October 17, 2025 08:35

fatih-acar added 3 commits October 17, 2025 10:42

fix(backend): use query ids for artifacts, checks, computed attributes

06d1d58

Small optimization to avoid unnecessary database query to find a graphql query by name. Signed-off-by: Fatih Acar <[email protected]>

fix(backend): use LRU cache to store SDK schemas

b276efb

Signed-off-by: Fatih Acar <[email protected]>

fatih-acar force-pushed the fac-fix-artifacts-scale branch from ce5b6b5 to 2ce5840 Compare October 17, 2025 08:43

coderabbitai bot reviewed Oct 17, 2025

View reviewed changes

fatih-acar force-pushed the fac-fix-artifacts-scale branch from 2ce5840 to 072eece Compare October 17, 2025 09:04

fatih-acar added 2 commits October 17, 2025 11:10

fix(backend): fetch only required attributes for group members

146c6cf

Signed-off-by: Fatih Acar <[email protected]>

fix: add newsfragment

f9bde88

Signed-off-by: Fatih Acar <[email protected]>

fatih-acar force-pushed the fac-fix-artifacts-scale branch from 072eece to f9bde88 Compare October 17, 2025 09:11

coderabbitai bot reviewed Oct 17, 2025

View reviewed changes

ogenstad approved these changes Oct 17, 2025

View reviewed changes

fatih-acar merged commit 35e5793 into stable Oct 17, 2025
40 checks passed

fatih-acar deleted the fac-fix-artifacts-scale branch October 17, 2025 12:47

		@@ -0,0 +1 @@
		Improved artifacts generation and proposed change checks performance by leveraging caching and avoiding excessive GraphQL queries.

	Improved artifacts generation and proposed change checks performance by leveraging caching and avoiding excessive GraphQL queries.
	Improve artifact generation and proposed change checks by leveraging caching and reducing GraphQL queries.

		return repositories


		def _collect_parameter_first_segments(params: Any) -> set[str]:

Uh oh!

fix(backend): performance issues when generating artifacts #7320

fix(backend): performance issues when generating artifacts #7320

Uh oh!

Conversation

fatih-acar commented Oct 3, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Pre-merge checks and finishing touches

Uh oh!

codspeed-hq bot commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging #7320 will not alter performance

Summary

Footnotes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fatih-acar commented Oct 6, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

ajtmccarty left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dgarros left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 17, 2025

fatih-acar commented Oct 3, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 3, 2025 •

edited

Loading

codspeed-hq bot commented Oct 3, 2025 •

edited

Loading

coderabbitai bot Oct 16, 2025 •

edited

Loading