Skip to content

Conversation

@fatih-acar
Copy link
Contributor

@fatih-acar fatih-acar commented Oct 3, 2025

Many small changes here:

  • enable caching of the git repositories by leveraging Prefect's cache policy, I set a hardcoded TTL of 30s to match the Git Sync scheduled job of 1 minute
  • AI generated commit to improve fetching of artifact targets attributes/rels: we were previously fetching ALL attributes even though we don't need them
  • batch the artifact generation workflow submits
  • ensure the schema cache is set for the infrahub sdk client, avoids unnecessary calls to the API server to retrieve the schema on each task

We are now able to generate 1000 artifacts under 2 minutes without errors, whereas it would take more than five minutes with failures.

Summary by CodeRabbit

Release Notes

  • Performance Improvements

    • Artifact generation and proposed change checks now execute faster through optimized caching and reduced GraphQL queries.
  • Changes

    • Query field is deprecated in artifact-related definitions; use query_id instead for future compatibility.

@github-actions github-actions bot added the group/backend Issue related to the backend (API Server, Git Agent) label Oct 3, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 3, 2025

Walkthrough

This pull request introduces performance optimizations and API refinements across the artifact and schema management systems. Key changes include: (1) adding query_id fields and deprecating query fields in artifact-related models across multiple modules, (2) implementing TTL-based caching for repository initialization and per-branch SDK schema caching in SchemaManager, (3) introducing batch submission for artifact generation tasks, (4) adding utility functions for efficient prefetching of definition targets with parameterized field resolution, and (5) updating dependencies to include cachetools-async. The changes maintain backward compatibility while refactoring internal data access patterns.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 13.33% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title "fix(backend): performance issues when generating artifacts" clearly and accurately summarizes the main objective of the changeset. The PR implements multiple performance optimizations including TTL-based caching for repository initialization, schema caching to prevent repeated API calls, batch submission for artifact generation workflows, and new utility functions to reduce unnecessary API queries. The title follows conventional commit format, is concise and specific, and directly reflects the primary purpose without vague language or noise. A reader scanning the project history would immediately understand that this PR addresses performance problems in artifact generation.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fac-fix-artifacts-scale

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codspeed-hq
Copy link

codspeed-hq bot commented Oct 3, 2025

CodSpeed Performance Report

Merging #7320 will not alter performance

Comparing fac-fix-artifacts-scale (f9bde88) with stable (a874ac3)1

Summary

✅ 10 untouched

Footnotes

  1. No successful run was found on stable (59168ed) during the generation of this report, so a874ac3 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

@fatih-acar fatih-acar force-pushed the fac-fix-artifacts-scale branch from 72e7b83 to 408ce30 Compare October 3, 2025 18:06
@fatih-acar fatih-acar force-pushed the fac-fix-artifacts-scale branch from 73e9d80 to 9662b59 Compare October 5, 2025 23:04
@fatih-acar
Copy link
Contributor Author

There's a follow up PR related to memory usage when generating artifacts: #7327

@fatih-acar fatih-acar force-pushed the fac-fix-artifacts-scale branch from 207ffc8 to 4bd0e78 Compare October 6, 2025 13:56
@fatih-acar fatih-acar force-pushed the fac-fix-artifacts-scale branch 2 times, most recently from ea5302a to 9dcedfc Compare October 6, 2025 20:58
@fatih-acar fatih-acar force-pushed the fac-fix-artifacts-scale branch from 9dcedfc to dd81c80 Compare October 15, 2025 13:44
@fatih-acar fatih-acar marked this pull request as ready for review October 15, 2025 14:09
@fatih-acar fatih-acar requested a review from a team as a code owner October 15, 2025 14:09
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
backend/infrahub/core/schema/manager.py (1)

148-149: Consider documenting the side effect in get_schema_branch.

The method now has a side effect of calling set_schema_branch when the branch doesn't exist, which populates the hash mapping. While this ensures consistency, it changes the method from a pure getter to one with side effects.

Consider adding a brief comment explaining this behavior:

 def get_schema_branch(self, name: str) -> SchemaBranch:
     if name in self._branches:
         return self._branches[name]
 
+    # Create and register a new branch schema to ensure hash mapping exists
     self.set_schema_branch(name, schema=SchemaBranch(cache=self._cache, name=name))
     return self._branches[name]
backend/infrahub/artifacts/models.py (1)

28-29: Improve field descriptions and deprecation handling.

Both query and query_id have identical descriptions, which is confusing. Additionally, the deprecation is only indicated via a comment.

Consider these improvements:

-    query: str = Field(..., description="The name of the query to use when collecting data")  # Deprecated
-    query_id: str = Field(..., description="The name of the query to use when collecting data")
+    query: str = Field(..., description="(Deprecated: use query_id) The name of the query to use when collecting data", deprecated=True)
+    query_id: str = Field(..., description="The ID of the query to use when collecting data")

Note: If using Pydantic v2, the deprecated parameter provides formal deprecation warnings. If on Pydantic v1, consider adding the deprecation notice in the description.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b036efb and dd81c80.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (12)
  • backend/infrahub/artifacts/models.py (1 hunks)
  • backend/infrahub/computed_attribute/tasks.py (1 hunks)
  • backend/infrahub/core/schema/manager.py (4 hunks)
  • backend/infrahub/core/schema/schema_branch.py (1 hunks)
  • backend/infrahub/git/integrator.py (1 hunks)
  • backend/infrahub/git/models.py (1 hunks)
  • backend/infrahub/git/repository.py (3 hunks)
  • backend/infrahub/git/tasks.py (7 hunks)
  • backend/infrahub/message_bus/types.py (1 hunks)
  • backend/infrahub/proposed_change/tasks.py (3 hunks)
  • backend/infrahub/workers/dependencies.py (2 hunks)
  • pyproject.toml (1 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
backend/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Run backend tests with pytest or via invoke tasks

Files:

  • backend/infrahub/core/schema/schema_branch.py
  • backend/infrahub/computed_attribute/tasks.py
  • backend/infrahub/message_bus/types.py
  • backend/infrahub/core/schema/manager.py
  • backend/infrahub/proposed_change/tasks.py
  • backend/infrahub/git/integrator.py
  • backend/infrahub/git/models.py
  • backend/infrahub/artifacts/models.py
  • backend/infrahub/git/repository.py
  • backend/infrahub/workers/dependencies.py
  • backend/infrahub/git/tasks.py
**/*.py

📄 CodeRabbit inference engine (.github/instructions/python-docstring.instructions.md)

**/*.py: Use triple double quotes (""") for all Python docstrings
Write docstrings in Google-style format
Include a brief one-line description at the top of each docstring
Add a detailed description section when additional context is needed
Document function/method parameters under an Args/Parameters section without typing information
Include a Returns section describing the return value
Include a Raises section listing possible exceptions
Provide an Examples section demonstrating usage when helpful

Use ruff and mypy to validate and lint Python files

**/*.py: Use type hints for all Python function parameters and return values
Prefer asynchronous code in Python when feasible
Define asynchronous Python functions with async def
Use await for asynchronous calls in Python
Use Pydantic models instead of standard dataclasses
Use ruff and mypy for linting and type checking

**/*.py: Use type hints for all Python function parameters and return values
Use async/await whenever possible in Python code
Define asynchronous functions with async def
Await asynchronous calls with await
Use Pydantic models instead of standard dataclasses for data modeling
Use triple quotes (""") for all Python docstrings
Write docstrings in Google-style format
Include docstring sections when applicable: one-line summary, optional details, Args (without types), Returns, Raises, Examples
Validate and lint Python with ruff and mypy

Files:

  • backend/infrahub/core/schema/schema_branch.py
  • backend/infrahub/computed_attribute/tasks.py
  • backend/infrahub/message_bus/types.py
  • backend/infrahub/core/schema/manager.py
  • backend/infrahub/proposed_change/tasks.py
  • backend/infrahub/git/integrator.py
  • backend/infrahub/git/models.py
  • backend/infrahub/artifacts/models.py
  • backend/infrahub/git/repository.py
  • backend/infrahub/workers/dependencies.py
  • backend/infrahub/git/tasks.py
{pyproject.toml,poetry.lock}

📄 CodeRabbit inference engine (.github/instructions/tooling.instructions.md)

Use Poetry to manage the Python project and its dependencies

Files:

  • pyproject.toml
pyproject.toml

📄 CodeRabbit inference engine (CLAUDE.md)

Use Poetry to manage the Python project and dependencies

Files:

  • pyproject.toml
🧬 Code graph analysis (6)
backend/infrahub/core/schema/schema_branch.py (2)
backend/infrahub/core/models.py (2)
  • nodes (61-63)
  • generics (66-68)
backend/infrahub/core/schema/__init__.py (1)
  • get (65-72)
backend/infrahub/computed_attribute/tasks.py (1)
backend/infrahub/trigger/models.py (1)
  • name (79-80)
backend/infrahub/core/schema/manager.py (1)
backend/infrahub/core/schema/schema_branch.py (3)
  • get (303-329)
  • to_dict_api_schema_object (165-171)
  • get_hash (139-149)
backend/infrahub/git/repository.py (1)
backend/infrahub/message_bus/operations/git/file.py (1)
  • get (14-35)
backend/infrahub/workers/dependencies.py (1)
backend/infrahub/core/schema/manager.py (2)
  • get_branches (793-794)
  • get_sdk_schema_branch (151-159)
backend/infrahub/git/tasks.py (4)
backend/infrahub/types.py (1)
  • Any (312-319)
backend/infrahub/core/schema/basenode_schema.py (2)
  • attribute_names (339-340)
  • relationship_names (343-344)
backend/infrahub/workers/dependencies.py (1)
  • get_workflow (122-123)
backend/infrahub/services/adapters/workflow/local.py (1)
  • submit_workflow (33-41)
🔇 Additional comments (18)
backend/infrahub/computed_attribute/tasks.py (1)

106-112: LGTM! Query ID transition looks correct.

The change from transform.query.peer.name.value to transform.query.id aligns with the broader migration to use query IDs instead of query names across the codebase.

backend/infrahub/git/models.py (1)

41-42: LGTM! Proper deprecation pattern.

The addition of query_id while keeping the deprecated query field ensures backward compatibility during the migration period.

backend/infrahub/message_bus/types.py (1)

92-93: LGTM! Consistent deprecation pattern.

The addition of query_id while deprecating query_name follows the same pattern used elsewhere in the codebase, ensuring consistent migration.

backend/infrahub/core/schema/schema_branch.py (1)

165-171: LGTM! API-friendly serialization method.

The new to_dict_api_schema_object method provides an API-focused serialization that returns lists of dictionaries (via model_dump()), complementing the existing to_dict_schema_object which returns schema objects. This aligns with the SDK-backed schema caching introduced in the manager.

backend/infrahub/core/schema/manager.py (3)

46-47: LGTM! LRU cache addresses memory concerns.

The LRU cache with maxsize=10 provides a good balance between performance and memory usage, addressing the concerns raised in previous review discussions about memory consumption with many branches. The hash-based keying ensures efficient cache invalidation when schemas change.


151-159: LGTM! Efficient SDK schema retrieval with hash-based caching.

The method correctly uses the branch hash as the cache key, ensuring that identical schemas across different branches share the same cached SDK representation. The lazy initialization pattern (check cache, construct if missing) is appropriate.


783-784: LGTM! Proper cleanup in purge logic.

The addition correctly removes hash mappings when branches are purged, preventing memory leaks and stale cache entries.

backend/infrahub/workers/dependencies.py (1)

38-44: LGTM! Pre-population of SDK schema cache.

The initialization-time population of the client schema cache ensures that SDK representations are available immediately without triggering expensive on-the-fly conversions. The guard if registry.schema safely handles cases where the registry isn't fully initialized yet.

backend/infrahub/git/repository.py (2)

254-259: LGTM! TTL cache aligns with Git Sync schedule.

The 30-second TTL cache for repository initialization aligns well with the 1-minute Git Sync scheduled job mentioned in the PR description. The cache key correctly includes all parameters that affect repository state (repository_id, name, kind, commit).


272-282: LGTM! Clean separation of caching and Prefect task concerns.

The wrapper pattern correctly separates the cached implementation (_get_initialized_repo) from the Prefect task wrapper (get_initialized_repo), with cache_policy=NONE preventing duplicate caching by Prefect. This ensures caching is handled exclusively by cachetools as intended.

pyproject.toml (1)

79-79: No update required for cachetools-async version
The constraint ^0.0.5 already references the latest published version (0.0.5) on PyPI.

Likely an incorrect or invalid review comment.

backend/infrahub/git/integrator.py (1)

1366-1366: LGTM! Consistent with query_id propagation.

The change from message.query to message.query_id aligns with the broader refactoring to use query identifiers throughout the artifact generation flow.

backend/infrahub/artifacts/models.py (1)

33-33: LGTM! Context field addition is appropriate.

Adding the context field enables better context propagation through the artifact processing pipeline.

backend/infrahub/proposed_change/tasks.py (1)

694-694: LGTM! Consistent query_id propagation throughout the flow.

The changes correctly:

  1. Pass query_id to CheckArtifactCreate (line 694)
  2. Fetch the query node's id in the GraphQL query (line 1249)
  3. Parse and assign the query_id from the response (line 1471)

The implementation is consistent with the broader query_id refactoring across the codebase.

Also applies to: 1249-1249, 1471-1471

backend/infrahub/git/tasks.py (4)

62-78: LGTM! Helper function correctly extracts parameter field names.

The _collect_parameter_first_segments function properly extracts the first segment of parameter keys (e.g., "foo__bar""foo") by recursively walking through nested data structures.


81-126: LGTM! Efficient prefetching optimization.

The _prefetch_group_member_nodes function implements a smart optimization by:

  1. Grouping member IDs by kind to batch requests
  2. Filtering attributes and relationships to fetch only required fields
  3. Disabling ordering for better performance
  4. Using the SDK's batch execution for efficiency

This should significantly reduce both the number of API calls and the amount of data transferred when processing artifact definitions with many targets.


396-409: LGTM! Improved target fetching with prefetching optimization.

The changes improve artifact generation by:

  1. Explicitly fetching the group with members using client.get()
  2. Extracting required parameter fields from the artifact definition
  3. Prefetching member node data upfront to avoid N+1 queries

This should significantly improve performance when generating artifacts for definitions with many targets.


441-480: LGTM! Batched workflow submissions improve throughput.

The changes batch artifact generation workflow submissions, which shifts the bottleneck to the Prefect API servers (as discussed in previous reviews). This enables better scalability by allowing the Prefect infrastructure to be scaled independently.

The addition of query_id on line 462 correctly propagates the query identifier through the artifact generation flow.

When generating a lot of artifacts at the same time (for example when
merging a huge branch). Repo initialization will be done for each
artifact, concurrently.
Avoid useless initialization steps that triggers many graphql queries to
fetch and update the repo status by leveraging Prefect caching with a
small TTL.

Signed-off-by: Fatih Acar <[email protected]>
Only those required by the artifact target parameters.

AI

Signed-off-by: Fatih Acar <[email protected]>
@fatih-acar fatih-acar force-pushed the fac-fix-artifacts-scale branch from 93246c1 to ee7c390 Compare October 16, 2025 21:11
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/infrahub/git/tasks.py (1)

327-397: Use the correct keyword when calling fetch_definition_targets.

fetch_definition_targets expects a definition= argument. Passing artifact_definition= raises TypeError: got an unexpected keyword argument 'artifact_definition' and aborts the flow before any artifacts are queued. Update the call to match the helper’s signature.

-    group = await fetch_definition_targets(client=client, branch=model.branch, artifact_definition=artifact_definition)
+    group = await fetch_definition_targets(client=client, branch=model.branch, definition=artifact_definition)
🧹 Nitpick comments (1)
backend/infrahub/artifacts/models.py (1)

28-29: Fix the query_id description.

The new field still says “The name of the query…”, which contradicts the field’s intent and the deprecation notice above it. Please update the description so it clearly references the query ID.

-    query_id: str = Field(..., description="The name of the query to use when collecting data")
+    query_id: str = Field(..., description="The id of the query to use when collecting data")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dd81c80 and ee7c390.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (14)
  • backend/infrahub/artifacts/models.py (1 hunks)
  • backend/infrahub/computed_attribute/tasks.py (1 hunks)
  • backend/infrahub/core/schema/manager.py (4 hunks)
  • backend/infrahub/core/schema/schema_branch.py (1 hunks)
  • backend/infrahub/generators/tasks.py (2 hunks)
  • backend/infrahub/git/integrator.py (1 hunks)
  • backend/infrahub/git/models.py (1 hunks)
  • backend/infrahub/git/repository.py (3 hunks)
  • backend/infrahub/git/tasks.py (6 hunks)
  • backend/infrahub/git/utils.py (2 hunks)
  • backend/infrahub/message_bus/types.py (1 hunks)
  • backend/infrahub/proposed_change/tasks.py (7 hunks)
  • backend/infrahub/workers/dependencies.py (2 hunks)
  • pyproject.toml (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • backend/infrahub/workers/dependencies.py
  • pyproject.toml
  • backend/infrahub/computed_attribute/tasks.py
🧰 Additional context used
📓 Path-based instructions (2)
backend/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Run backend tests with pytest or via invoke tasks

Files:

  • backend/infrahub/message_bus/types.py
  • backend/infrahub/core/schema/schema_branch.py
  • backend/infrahub/git/repository.py
  • backend/infrahub/proposed_change/tasks.py
  • backend/infrahub/git/integrator.py
  • backend/infrahub/core/schema/manager.py
  • backend/infrahub/generators/tasks.py
  • backend/infrahub/git/tasks.py
  • backend/infrahub/artifacts/models.py
  • backend/infrahub/git/utils.py
  • backend/infrahub/git/models.py
**/*.py

📄 CodeRabbit inference engine (.github/instructions/python-docstring.instructions.md)

**/*.py: Use triple double quotes (""") for all Python docstrings
Write docstrings in Google-style format
Include a brief one-line description at the top of each docstring
Add a detailed description section when additional context is needed
Document function/method parameters under an Args/Parameters section without typing information
Include a Returns section describing the return value
Include a Raises section listing possible exceptions
Provide an Examples section demonstrating usage when helpful

Use ruff and mypy to validate and lint Python files

**/*.py: Use type hints for all Python function parameters and return values
Prefer asynchronous code in Python when feasible
Define asynchronous Python functions with async def
Use await for asynchronous calls in Python
Use Pydantic models instead of standard dataclasses
Use ruff and mypy for linting and type checking

**/*.py: Use type hints for all Python function parameters and return values
Use async/await whenever possible in Python code
Define asynchronous functions with async def
Await asynchronous calls with await
Use Pydantic models instead of standard dataclasses for data modeling
Use triple quotes (""") for all Python docstrings
Write docstrings in Google-style format
Include docstring sections when applicable: one-line summary, optional details, Args (without types), Returns, Raises, Examples
Validate and lint Python with ruff and mypy

Files:

  • backend/infrahub/message_bus/types.py
  • backend/infrahub/core/schema/schema_branch.py
  • backend/infrahub/git/repository.py
  • backend/infrahub/proposed_change/tasks.py
  • backend/infrahub/git/integrator.py
  • backend/infrahub/core/schema/manager.py
  • backend/infrahub/generators/tasks.py
  • backend/infrahub/git/tasks.py
  • backend/infrahub/artifacts/models.py
  • backend/infrahub/git/utils.py
  • backend/infrahub/git/models.py
🧬 Code graph analysis (7)
backend/infrahub/core/schema/schema_branch.py (3)
backend/infrahub/core/schema/__init__.py (2)
  • duplicate (97-99)
  • get (65-72)
backend/infrahub/core/schema/manager.py (1)
  • get (69-92)
backend/infrahub/database/__init__.py (1)
  • get (84-88)
backend/infrahub/git/repository.py (1)
backend/infrahub/message_bus/operations/git/file.py (1)
  • get (14-35)
backend/infrahub/proposed_change/tasks.py (1)
backend/infrahub/git/utils.py (1)
  • fetch_definition_targets (123-153)
backend/infrahub/core/schema/manager.py (1)
backend/infrahub/core/schema/schema_branch.py (3)
  • get (303-329)
  • to_dict_api_schema_object (165-171)
  • get_hash (139-149)
backend/infrahub/generators/tasks.py (1)
backend/infrahub/git/utils.py (1)
  • fetch_definition_targets (123-153)
backend/infrahub/git/tasks.py (3)
backend/infrahub/git/utils.py (1)
  • fetch_definition_targets (123-153)
backend/infrahub/workers/dependencies.py (1)
  • get_workflow (122-123)
backend/infrahub/services/adapters/workflow/local.py (1)
  • submit_workflow (33-41)
backend/infrahub/git/utils.py (3)
backend/infrahub/core/protocols.py (3)
  • CoreArtifactDefinition (280-287)
  • CoreCheckDefinition (308-318)
  • CoreGroup (130-138)
backend/infrahub/generators/models.py (1)
  • ProposedChangeGeneratorDefinition (34-44)
backend/infrahub/core/schema/basenode_schema.py (2)
  • attribute_names (339-340)
  • relationship_names (343-344)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: E2E-testing-version-upgrade / From 1.3.0
  • GitHub Check: validate-generated-documentation
  • GitHub Check: backend-validate-generated
  • GitHub Check: backend-tests-integration
  • GitHub Check: E2E-testing-playwright
  • GitHub Check: backend-tests-functional
  • GitHub Check: backend-tests-unit
  • GitHub Check: backend-benchmark
  • GitHub Check: E2E-testing-invoke-demo-start
  • GitHub Check: backend-docker-integration
  • GitHub Check: Cloudflare Pages

Comment on lines 98 to 104
keep_attributes = {field for field in required_fields if field in schema.attribute_names}
keep_relationships = {field for field in required_fields if field in schema.relationship_names}

exclude_attributes = [attr for attr in schema.attribute_names if attr not in keep_attributes]
exclude_relationships = [rel for rel in schema.relationship_names if rel not in keep_relationships]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Call schema name helpers before iterating.

schema.attribute_names / schema.relationship_names are callable helpers, not iterables. Using them without () raises TypeError: argument of type 'method' is not iterable the first time we hit a branch with members. Call the helpers before computing the include/exclude sets.

-        keep_attributes = {field for field in required_fields if field in schema.attribute_names}
-        keep_relationships = {field for field in required_fields if field in schema.relationship_names}
+        attribute_names = schema.attribute_names()
+        relationship_names = schema.relationship_names()
+        keep_attributes = {field for field in required_fields if field in attribute_names}
+        keep_relationships = {field for field in required_fields if field in relationship_names}
@@
-        exclude_attributes = [attr for attr in schema.attribute_names if attr not in keep_attributes]
-        exclude_relationships = [rel for rel in schema.relationship_names if rel not in keep_relationships]
+        exclude_attributes = [attr for attr in attribute_names if attr not in keep_attributes]
+        exclude_relationships = [rel for rel in relationship_names if rel not in keep_relationships]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
keep_attributes = {field for field in required_fields if field in schema.attribute_names}
keep_relationships = {field for field in required_fields if field in schema.relationship_names}
exclude_attributes = [attr for attr in schema.attribute_names if attr not in keep_attributes]
exclude_relationships = [rel for rel in schema.relationship_names if rel not in keep_relationships]
attribute_names = schema.attribute_names()
relationship_names = schema.relationship_names()
keep_attributes = {field for field in required_fields if field in attribute_names}
keep_relationships = {field for field in required_fields if field in relationship_names}
exclude_attributes = [attr for attr in attribute_names if attr not in keep_attributes]
exclude_relationships = [rel for rel in relationship_names if rel not in keep_relationships]
🤖 Prompt for AI Agents
In backend/infrahub/git/utils.py around lines 98 to 103, schema.attribute_names
and schema.relationship_names are methods but are being treated as iterables;
call these helpers first (e.g., attribute_names = schema.attribute_names() and
relationship_names = schema.relationship_names()) and then use those returned
iterables when computing keep_attributes, keep_relationships,
exclude_attributes, and exclude_relationships so the comprehensions iterate over
the actual lists/sets rather than the method objects.

@fatih-acar fatih-acar force-pushed the fac-fix-artifacts-scale branch 3 times, most recently from db50bd5 to 72ca590 Compare October 16, 2025 22:17
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
backend/infrahub/git/utils.py (1)

99-103: Call attribute_names() and relationship_names() methods.

schema.attribute_names and schema.relationship_names are methods, not properties. Using them without () will raise TypeError: argument of type 'method' is not iterable at runtime.

Based on the relevant code snippet at backend/infrahub/core/schema/basenode_schema.py (lines 338-343), these are methods that return lists.

Apply this diff:

+        attribute_names = schema.attribute_names()
+        relationship_names = schema.relationship_names()
-        keep_attributes = {field for field in required_fields if field in schema.attribute_names}
-        keep_relationships = {field for field in required_fields if field in schema.relationship_names}
+        keep_attributes = {field for field in required_fields if field in attribute_names}
+        keep_relationships = {field for field in required_fields if field in relationship_names}
 
-        exclude_attributes = [attr for attr in schema.attribute_names if attr not in keep_attributes]
-        exclude_relationships = [rel for rel in schema.relationship_names if rel not in keep_relationships]
+        exclude_attributes = [attr for attr in attribute_names if attr not in keep_attributes]
+        exclude_relationships = [rel for rel in relationship_names if rel not in keep_relationships]
🧹 Nitpick comments (1)
backend/infrahub/git/utils.py (1)

132-138: Consider defensive error handling for unexpected definition types.

If an InfrahubNode is passed that doesn't match ARTIFACTDEFINITION or CHECKDEFINITION, group_id and parameters remain uninitialized, causing an UnboundLocalError at line 140.

While the type hint should prevent this at static analysis time, consider adding an explicit error for robustness:

if isinstance(definition, InfrahubNode):
    if definition.typename in [InfrahubKind.ARTIFACTDEFINITION, InfrahubKind.CHECKDEFINITION]:
        group_id = definition.targets.id
        parameters = definition.parameters.value
    else:
        raise ValueError(f"Unsupported InfrahubNode typename: {definition.typename}")
elif isinstance(definition, ProposedChangeGeneratorDefinition):
    group_id = definition.group_id
    parameters = definition.parameters
else:
    raise TypeError(f"Unsupported definition type: {type(definition)}")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between db50bd5 and 72ca590.

📒 Files selected for processing (4)
  • backend/infrahub/generators/tasks.py (2 hunks)
  • backend/infrahub/git/tasks.py (6 hunks)
  • backend/infrahub/git/utils.py (2 hunks)
  • backend/infrahub/proposed_change/tasks.py (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/infrahub/generators/tasks.py
🧰 Additional context used
📓 Path-based instructions (2)
backend/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Run backend tests with pytest or via invoke tasks

Files:

  • backend/infrahub/proposed_change/tasks.py
  • backend/infrahub/git/utils.py
  • backend/infrahub/git/tasks.py
**/*.py

📄 CodeRabbit inference engine (.github/instructions/python-docstring.instructions.md)

**/*.py: Use triple double quotes (""") for all Python docstrings
Write docstrings in Google-style format
Include a brief one-line description at the top of each docstring
Add a detailed description section when additional context is needed
Document function/method parameters under an Args/Parameters section without typing information
Include a Returns section describing the return value
Include a Raises section listing possible exceptions
Provide an Examples section demonstrating usage when helpful

Use ruff and mypy to validate and lint Python files

**/*.py: Use type hints for all Python function parameters and return values
Prefer asynchronous code in Python when feasible
Define asynchronous Python functions with async def
Use await for asynchronous calls in Python
Use Pydantic models instead of standard dataclasses
Use ruff and mypy for linting and type checking

**/*.py: Use type hints for all Python function parameters and return values
Use async/await whenever possible in Python code
Define asynchronous functions with async def
Await asynchronous calls with await
Use Pydantic models instead of standard dataclasses for data modeling
Use triple quotes (""") for all Python docstrings
Write docstrings in Google-style format
Include docstring sections when applicable: one-line summary, optional details, Args (without types), Returns, Raises, Examples
Validate and lint Python with ruff and mypy

Files:

  • backend/infrahub/proposed_change/tasks.py
  • backend/infrahub/git/utils.py
  • backend/infrahub/git/tasks.py
🧬 Code graph analysis (3)
backend/infrahub/proposed_change/tasks.py (1)
backend/infrahub/git/utils.py (1)
  • fetch_definition_targets (124-155)
backend/infrahub/git/utils.py (3)
backend/infrahub/core/protocols.py (3)
  • CoreArtifactDefinition (280-287)
  • CoreCheckDefinition (308-318)
  • CoreGroup (130-138)
backend/infrahub/generators/models.py (1)
  • ProposedChangeGeneratorDefinition (34-44)
backend/infrahub/core/schema/basenode_schema.py (2)
  • attribute_names (339-340)
  • relationship_names (343-344)
backend/infrahub/git/tasks.py (2)
backend/infrahub/git/utils.py (1)
  • fetch_definition_targets (124-155)
backend/infrahub/services/adapters/workflow/local.py (1)
  • submit_workflow (33-41)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: E2E-testing-version-upgrade / From 1.3.0
  • GitHub Check: validate-generated-documentation
  • GitHub Check: E2E-testing-playwright
  • GitHub Check: E2E-testing-invoke-demo-start
  • GitHub Check: backend-tests-functional
  • GitHub Check: backend-benchmark
  • GitHub Check: backend-tests-integration
  • GitHub Check: backend-tests-unit
  • GitHub Check: backend-docker-integration
  • GitHub Check: Cloudflare Pages
🔇 Additional comments (6)
backend/infrahub/git/tasks.py (3)

327-328: LGTM! Centralized target resolution.

The refactor to use fetch_definition_targets consolidates target group retrieval and enables prefetching of member nodes, reducing API calls and improving performance.


359-398: LGTM! Batch submission improves throughput.

The batch-based workflow submission pattern shifts the bottleneck to Prefect API servers, enabling better scalability. The addition of query_id at line 380 aligns with the model changes to support query ID-based lookups.


580-580: LGTM! Consistent use of centralized target resolution.

backend/infrahub/proposed_change/tasks.py (3)

656-656: LGTM! Consistent use of centralized target resolution.


693-693: LGTM! Complete query_id propagation.

The addition of query_id to CheckArtifactCreate (line 693), the GraphQL query (line 1243), and the parsing logic (line 1465) ensures consistent propagation of query identifiers throughout the artifact generation flow.

Also applies to: 1243-1243, 1465-1465


920-922: LGTM! Centralized target resolution for generators.

Comment on lines +58 to +74
def _collect_parameter_first_segments(params: Any) -> set[str]:
segments: set[str] = set()

def _walk(value: Any) -> None:
if isinstance(value, str):
segment = value.split("__", 1)[0]
if segment:
segments.add(segment)
elif isinstance(value, dict):
for nested in value.values():
_walk(nested)
elif isinstance(value, (list, tuple, set)):
for nested in value:
_walk(nested)

_walk(params)
return segments
Copy link
Contributor

@coderabbitai coderabbitai bot Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add docstrings to new utility functions.

All three new functions (_collect_parameter_first_segments, _prefetch_group_member_nodes, and fetch_definition_targets) lack docstrings.

As per coding guidelines, Python functions should include Google-style docstrings with:

  • A brief one-line description
  • An Args section documenting parameters (without types)
  • A Returns section describing the return value

Example for fetch_definition_targets:

async def fetch_definition_targets(
    client: InfrahubClient,
    branch: str,
    definition: CoreArtifactDefinition | CoreCheckDefinition | ProposedChangeGeneratorDefinition,
) -> CoreGroup:
    """Fetch target group for a definition with prefetched member nodes.
    
    Retrieves the target group associated with the given artifact, check, or generator
    definition, and prefetches member nodes based on the parameters defined in the
    definition.
    
    Args:
        client: The Infrahub SDK client for API interactions
        branch: The branch name to fetch the group from
        definition: The definition object containing targets and parameters
        
    Returns:
        The target group with prefetched member nodes populated in the client store
    """

Also applies to: 77-122, 124-155

🤖 Prompt for AI Agents
In backend/infrahub/git/utils.py around lines 58-74, 77-122 and 124-155, the
three new functions (_collect_parameter_first_segments,
_prefetch_group_member_nodes, and fetch_definition_targets) are missing
docstrings; add Google-style docstrings to each function consisting of a
one-line summary, an Args section listing parameters (no types), and a Returns
section describing the return value (matching existing project style). For
fetch_definition_targets follow the provided example (brief summary, purpose,
Args: client, branch, definition, and Returns: the target group with prefetched
member nodes) and for the other two document their parameters and return
sets/structures similarly, ensuring consistency and placement immediately below
each def line.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey coderabbit, where is this requirement for docstring coming from?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash

# Look for coding guidelines or contribution documentation
echo "=== Looking for coding guidelines documentation ==="
fd -t f -e md | xargs grep -l -i "docstring\|coding.*guideline\|style.*guide\|contribution" 2>/dev/null | head -10

echo -e "\n=== Checking for Python linter configurations ==="
# Check for ruff, pylint, or other linter configs
fd -t f "ruff.toml|pylint|.pylintrc|pyproject.toml" -x cat {}

echo -e "\n=== Sampling docstring usage in existing utility functions ==="
# Check a few existing Python files in backend/infrahub for docstring patterns
fd -e py -t f . backend/infrahub/core backend/infrahub/git | head -5 | xargs -I {} sh -c 'echo "=== File: {} ===" && head -50 {}'

Length of output: 32967


@ogenstad The requirement for docstrings is not enforced by your linter configuration. Looking at your pyproject.toml, I can see that:

ignore = [
    "D",        # pydocstyle
    "DOC",      # pydoclint
    ...
]

Both pydocstyle (D) and pydoclint (DOC) rules are explicitly disabled.

However, I noticed that existing functions in the codebase often include docstrings. For example, the get_repositories_commit_per_branch function in the same file (backend/infrahub/git/utils.py) has a Google-style docstring:

async def get_repositories_commit_per_branch(
    db: InfrahubDatabase,
) -> dict[str, RepositoryData]:
    """Get a list of all repositories and their commit on each branches.

    This method is similar to 'get_list_repositories' method in the Python SDK.

    NOTE: At some point, we should refactor this function to use a single Database query instead of one per branch
    """

My suggestion was based on this existing pattern in the codebase as a best practice, not a hard requirement from your configuration.

If you'd prefer not to receive docstring suggestions for this project, I can remember that for future reviews.


🧬 Code Graph Analysis Results
  • File: backend/infrahub/git/utils.py | Lines 1-28

    • Full snippet:
      async def get_repositories_commit_per_branch(
      db: InfrahubDatabase,
      ) -> dict[str, RepositoryData]:
      """Get a list of all repositories and their commit on each branches.

      This method is similar to 'get_list_repositories' method in the Python SDK.
      
      NOTE: At some point, we should refactor this function to use a single Database query instead of one per branch
      """
      
      repositories: dict[str, RepositoryData] = {}
      
      for branch in list(registry.branch.values()):
          repos: list[CoreGenericRepository] = await NodeManager.query(
              db=db,
              branch=branch,
              fields={"id": None, "name": None, "commit": None, "internal_status": None},
              schema=InfrahubKind.GENERICREPOSITORY,
          )
      
          for repository in repos:
              repo_name = repository.name.value
              if repo_name not in repositories:
                  repositories[repo_name] = RepositoryData(
                      repository_id=repository.get_id(),
                      repository_name=repo_name,
                      branches={},
                  )
      
              repositories[repo_name].branches[branch.name] = repository.commit.value  # type: ignore[attr-defined]
              repositories[repo_name].branch_info[branch.name] = RepositoryBranchInfo(
                  internal_status=repository.internal_status.value
              )
      
      return repositories
      
  • File: backend/infrahub/core/manager.py | Lines 1-60

    • Summary:
      Docstring for NodeManager.query describes its purpose and arguments. It states that the method:
      • Queries one or multiple nodes of a given type based on filter arguments.
      • Accepts parameters including schema (NodeSchema or type), filters, fields, offset, limit, at, branch, include_source, include_owner, prefetch_relationships, account, partial_match, branch_agnostic, and order.
      • Returns a list of Node objects (or a list of SchemaProtocol objects when the more specific overload is used).
      • Includes behavior notes such as how branch resolution and HFID handling influence querying, and how results are formed.
      • Includes a Python-style Args/Returns documentation block within the method docstring.
  • File: backend/infrahub/core/manager.py | Lines 60-110

    • Summary:
      Docstring for NodeManager.count describes its purpose, parameters, and return value for counting nodes that match filters. It notes:
      • Accepts db, schema (type[SchemaProtocol] | NodeSchema | GenericSchema | ProfileSchema | TemplateSchema | str), filters, at, branch, account, partial_match, branch_agnostic.
      • Returns an int representing the number of responses found.
  • File: backend/infrahub/core/manager.py | Lines 110-170

    • Summary:
      Docstring for NodeManager.query_peers describes its purpose and parameters for querying peer nodes. It notes:
      • Accepts db, ids, source_kind, schema, filters, fields (optional), offset (optional), limit (optional), at (optional), branch (optional), branch_agnostic, fetch_peers.
      • Returns a list of Relationship objects (or related node representations) depending on inputs.
      • Includes behavior about how fields and HFID-related field selection are handled when requested.

Copy link
Contributor

@ajtmccarty ajtmccarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approving b/c it sounds like you need this, but I think whatever LLM wrote the functions in infrahub.git.utils did not do a good job

Copy link
Collaborator

@dgarros dgarros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please can you add a fragment to include something in the release note

name="Fetch repository commit",
description="Retrieve a git repository at a given commit, if it does not already exist locally",
cache_policy=NONE,
@cached(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a blocker for this PR but I'm wondering if we could leverage prefect for the caching ? and what is the limitation if you already tried

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was done using the Prefect caching mechanism in the first iteration, but as described in the following commit: 02b289d

- local caching requires a lot of setup (wrt Blocks, etc)
- it's memory intensive due to excessive pickle serialization in our case (we don't need it)

Revert to basic local TTL cache.```

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically Prefect caching is useful for distributed caching but here we want a local, efficient cache.

Rebuilding the SDK SchemaBranch structure on each initialization is
memory intensive and leads to spike of memory usage.
Avoid this by introducing a new cache attribute in the SchemaManager.

Signed-off-by: Fatih Acar <[email protected]>
@fatih-acar fatih-acar force-pushed the fac-fix-artifacts-scale branch from 72ca590 to ce5b6b5 Compare October 17, 2025 08:35
Unfortunately we cannot leverage Prefect's caching because:
- local caching requires a lot of setup (wrt Blocks, etc)
- it's memory intensive due to excessive serialization in our case

Revert to basic local TTL cache.

Signed-off-by: Fatih Acar <[email protected]>
Small optimization to avoid unnecessary database query to find a graphql
query by name.

Signed-off-by: Fatih Acar <[email protected]>
@fatih-acar fatih-acar force-pushed the fac-fix-artifacts-scale branch from ce5b6b5 to 2ce5840 Compare October 17, 2025 08:43
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
backend/infrahub/computed_attribute/tasks.py (2)

267-269: Fix typo in comment.

-    # NOTE we only need the id of the nodes, we need to ooptimize the query here
+    # NOTE we only need the id of the nodes; we should optimize the query here

106-112: Based on my comprehensive verification of the codebase and external sources, I can now provide a definitive assessment.

The method query_gql_query() does not exist in the Infrahub SDK or in this codebase—calling it will result in an AttributeError at runtime.

Evidence:

  • The Python SDK does not expose a method named query_gql_query; use execute_graphql / executeGraphQL to run raw GraphQL
  • Repository-wide search found zero definitions of query_gql_query (not in backend, not in bundled SDK, not monkeypatched anywhere)
  • Code calls client.query_gql_query() and self.sdk.query_gql_query() despite the method not existing
  • python_sdk directory in the repository is empty (0 bytes)
  • InfrahubClient is imported from the official infrahub_sdk.client package, which does not contain this method

The code at lines 106–112 in backend/infrahub/computed_attribute/tasks.py (and similar calls in backend/infrahub/git/integrator.py) will fail immediately when executed. Replace query_gql_query() with the correct SDK method (likely execute_graphql() based on the web search result) and adjust parameters accordingly.

backend/infrahub/message_bus/types.py (1)

88-96: Avoid breaking change: make deprecated query_name optional and validate “one-of” with query_id.

Both query_name and query_id are required here. That breaks producers still sending only the deprecated field. Make query_name optional; optionally accept query_id or query_name (at least one), and prefer query_id downstream.

 class ProposedChangeArtifactDefinition(BaseModel):
@@
-    query_name: str  # Deprecated
-    query_id: str
+    query_name: str | None = Field(default=None)  # Deprecated
+    query_id: str | None = Field(default=None)
@@
     timeout: int
+
+    @staticmethod
+    def _missing(b: str | None) -> bool:
+        return b is None or b == ""
+
+    @model_validator(mode="after")
+    def _ensure_query_identifier(self) -> "ProposedChangeArtifactDefinition":
+        # Require at least one of query_id or query_name for backward compatibility
+        if self._missing(self.query_id) and self._missing(self.query_name):
+            raise ValueError("Either query_id or query_name must be provided")
+        return self

Add import at top:

-from pydantic import BaseModel, Field
+from pydantic import BaseModel, Field, model_validator
backend/infrahub/git/models.py (1)

41-46: Avoid breaking change: accept query or query_id (prefer query_id) with validator.

Same concern as above: both fields are required. Make the deprecated field optional and allow either; enforce “one-of”.

-from pydantic import BaseModel, ConfigDict, Field
+from pydantic import BaseModel, ConfigDict, Field, model_validator
@@
-    query: str = Field(..., description="The name of the query to use when collecting data")  # Deprecated
-    query_id: str = Field(..., description="The id of the query to use when collecting data")
+    query: str | None = Field(default=None, description="The name of the query to use when collecting data")  # Deprecated
+    query_id: str | None = Field(default=None, description="The id of the query to use when collecting data")
@@
     context: InfrahubContext = Field(..., description="The context of the task")
+
+    @model_validator(mode="after")
+    def _ensure_query_identifier(self) -> "RequestArtifactGenerate":
+        if not self.query_id and not self.query:
+            raise ValueError("Either query_id or query must be provided")
+        return self
♻️ Duplicate comments (1)
backend/infrahub/git/tasks.py (1)

359-399: Verify batching semantics for workflow submissions.

client.create_batch is an SDK facility; passing get_workflow().submit_workflow may not benefit from SDK batching and could be a no-op or even serialize behind one transport. Please verify this supports arbitrary callables and actually runs them concurrently; otherwise prefer asyncio.gather around submit_workflow.

If unsupported, apply:

-    batch = await client.create_batch()
-    ...
-        batch.add(
-            task=get_workflow().submit_workflow,
-            workflow=REQUEST_ARTIFACT_GENERATE,
-            context=context,
-            parameters={"model": request_artifact_generate_model},
-        )
-    async for _, _ in batch.execute():
-        pass
+    import asyncio
+    coros = []
+    for relationship in group.members.peers:
+        ...
+        coros.append(
+            get_workflow().submit_workflow(
+                workflow=REQUEST_ARTIFACT_GENERATE,
+                context=context,
+                parameters={"model": request_artifact_generate_model},
+            )
+        )
+    await asyncio.gather(*coros)

Run to confirm Batch.add accepts non-SDK callables and executes them concurrently:

#!/bin/bash
# Inspect SDK batch API in repo for accepted task types and execution model
rg -nP -C3 'class\s+Batch\b|def\s+create_batch\b|def\s+add\(' --type=py
🧹 Nitpick comments (5)
backend/infrahub/core/schema/schema_branch.py (1)

165-171: Add a short docstring and consider JSON/alias dump for API stability.

This new public serializer looks good. To align with our Python docstring guidelines and avoid future alias/None drift in API payloads, add a docstring and switch to model_dump(mode="json", by_alias=True, exclude_none=True).

 def to_dict_api_schema_object(self) -> dict[str, list[dict]]:
-        return {
-            "nodes": [self.get(name, duplicate=False).model_dump() for name in self.nodes],
-            "profiles": [self.get(name, duplicate=False).model_dump() for name in self.profiles],
-            "generics": [self.get(name, duplicate=False).model_dump() for name in self.generics],
-            "templates": [self.get(name, duplicate=False).model_dump() for name in self.templates],
-        }
+        """Return an API-friendly representation of this branch's schema.
+
+        Returns:
+            Dict with keys: nodes, profiles, generics, templates. Each value is a list of Pydantic dumps.
+        """
+        return {
+            "nodes": [self.get(name, duplicate=False).model_dump(mode="json", by_alias=True, exclude_none=True) for name in self.nodes],
+            "profiles": [self.get(name, duplicate=False).model_dump(mode="json", by_alias=True, exclude_none=True) for name in self.profiles],
+            "generics": [self.get(name, duplicate=False).model_dump(mode="json", by_alias=True, exclude_none=True) for name in self.generics],
+            "templates": [self.get(name, duplicate=False).model_dump(mode="json", by_alias=True, exclude_none=True) for name in self.templates],
+        }
backend/infrahub/core/schema/manager.py (1)

770-791: Purge orphaned SDK cache entries when branches are removed.

Currently, purge_inactive_branches deletes _branches and _branch_hash_by_name entries but leaves stale items in _sdk_branches (keyed by old hashes). Clean them up to bound memory and avoid serving stale SDK schemas.

         for branch_name in list(self._branches.keys()):
             if branch_name not in active_branches:
                 del self._branches[branch_name]
                 if branch_name in self._branch_hash_by_name:
                     del self._branch_hash_by_name[branch_name]
                 removed_branches.append(branch_name)

+        # Drop SDK cache entries for purged branches
+        active_hashes = set(self._branch_hash_by_name.values())
+        for key in list(self._sdk_branches.keys()):
+            if key not in active_hashes:
+                del self._sdk_branches[key]
backend/infrahub/git/repository.py (1)

254-282: Make cache key resilient to positional calls and add docstrings.

The cache key uses only kwargs; if someone inadvertently calls the private function positionally, the key may degenerate. Enforce keyword-only params and add docstrings per guidelines.

-@cached(
+@cached(
     TTLCache(maxsize=100, ttl=30),
     key=lambda *_, **kwargs: hashkey(
         kwargs.get("repository_id"), kwargs.get("name"), kwargs.get("repository_kind"), kwargs.get("commit")
     ),
 )
-async def _get_initialized_repo(
-    client: InfrahubClient, repository_id: str, name: str, repository_kind: str, commit: str | None = None
-) -> InfrahubReadOnlyRepository | InfrahubRepository:
+async def _get_initialized_repo(
+    *, client: InfrahubClient, repository_id: str, name: str, repository_kind: str, commit: str | None = None
+) -> InfrahubReadOnlyRepository | InfrahubRepository:
+    """Create or retrieve a cached repository wrapper for a given repo/commit."""
@@
-async def get_initialized_repo(
-    client: InfrahubClient, repository_id: str, name: str, repository_kind: str, commit: str | None = None
-) -> InfrahubReadOnlyRepository | InfrahubRepository:
+async def get_initialized_repo(
+    client: InfrahubClient, repository_id: str, name: str, repository_kind: str, commit: str | None = None
+) -> InfrahubReadOnlyRepository | InfrahubRepository:
+    """Prefect task wrapper: fetch a repository at a given commit (local cache, no Prefect caching)."""
     return await _get_initialized_repo(
         client=client, repository_id=repository_id, name=name, repository_kind=repository_kind, commit=commit
     )

All current call sites pass keyword args (good). This change prevents accidental positional calls in the future.

backend/infrahub/git/tasks.py (1)

329-335: Iterate relationships explicitly for clarity.

group.members.peers yields Relationship objects; consider naming the loop var relationship and access relationship.id or relationship.peer for readability and to avoid misuse. Example:

-    current_members = [member.id for member in group.members.peers]
+    current_members = [rel.id for rel in group.members.peers]  # relationship ids
backend/infrahub/git/utils.py (1)

58-75: Consider walking dict keys too when extracting parameter segments.

Some parameter schemas encode field paths in keys (e.g., {"device__name": "r1"}). Current walker inspects only values. Optionally include keys:

-    def _walk(value: Any) -> None:
+    def _walk(value: Any) -> None:
         if isinstance(value, str):
             segment = value.split("__", 1)[0]
             if segment:
                 segments.add(segment)
         elif isinstance(value, dict):
-            for nested in value.values():
-                _walk(nested)
+            for k, v in value.items():
+                _walk(k)
+                _walk(v)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 72ca590 and 2ce5840.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (16)
  • CHANGELOG.md (1 hunks)
  • backend/infrahub/artifacts/models.py (1 hunks)
  • backend/infrahub/computed_attribute/tasks.py (1 hunks)
  • backend/infrahub/core/schema/manager.py (4 hunks)
  • backend/infrahub/core/schema/schema_branch.py (1 hunks)
  • backend/infrahub/generators/tasks.py (2 hunks)
  • backend/infrahub/git/integrator.py (1 hunks)
  • backend/infrahub/git/models.py (1 hunks)
  • backend/infrahub/git/repository.py (3 hunks)
  • backend/infrahub/git/tasks.py (6 hunks)
  • backend/infrahub/git/utils.py (2 hunks)
  • backend/infrahub/message_bus/types.py (1 hunks)
  • backend/infrahub/proposed_change/tasks.py (8 hunks)
  • backend/infrahub/workers/dependencies.py (2 hunks)
  • changelog/+785ca0cd.fixed.md (1 hunks)
  • pyproject.toml (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
  • backend/infrahub/artifacts/models.py
  • backend/infrahub/git/integrator.py
  • backend/infrahub/workers/dependencies.py
  • pyproject.toml
  • backend/infrahub/generators/tasks.py
  • backend/infrahub/proposed_change/tasks.py
🧰 Additional context used
📓 Path-based instructions (3)
backend/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Run backend tests with pytest or via invoke tasks

Files:

  • backend/infrahub/computed_attribute/tasks.py
  • backend/infrahub/message_bus/types.py
  • backend/infrahub/git/tasks.py
  • backend/infrahub/core/schema/manager.py
  • backend/infrahub/git/models.py
  • backend/infrahub/git/utils.py
  • backend/infrahub/git/repository.py
  • backend/infrahub/core/schema/schema_branch.py
**/*.py

📄 CodeRabbit inference engine (.github/instructions/python-docstring.instructions.md)

**/*.py: Use triple double quotes (""") for all Python docstrings
Write docstrings in Google-style format
Include a brief one-line description at the top of each docstring
Add a detailed description section when additional context is needed
Document function/method parameters under an Args/Parameters section without typing information
Include a Returns section describing the return value
Include a Raises section listing possible exceptions
Provide an Examples section demonstrating usage when helpful

Use ruff and mypy to validate and lint Python files

**/*.py: Use type hints for all Python function parameters and return values
Prefer asynchronous code in Python when feasible
Define asynchronous Python functions with async def
Use await for asynchronous calls in Python
Use Pydantic models instead of standard dataclasses
Use ruff and mypy for linting and type checking

**/*.py: Use type hints for all Python function parameters and return values
Use async/await whenever possible in Python code
Define asynchronous functions with async def
Await asynchronous calls with await
Use Pydantic models instead of standard dataclasses for data modeling
Use triple quotes (""") for all Python docstrings
Write docstrings in Google-style format
Include docstring sections when applicable: one-line summary, optional details, Args (without types), Returns, Raises, Examples
Validate and lint Python with ruff and mypy

Files:

  • backend/infrahub/computed_attribute/tasks.py
  • backend/infrahub/message_bus/types.py
  • backend/infrahub/git/tasks.py
  • backend/infrahub/core/schema/manager.py
  • backend/infrahub/git/models.py
  • backend/infrahub/git/utils.py
  • backend/infrahub/git/repository.py
  • backend/infrahub/core/schema/schema_branch.py
**/*.{md,mdx}

📄 CodeRabbit inference engine (CLAUDE.md)

Lint Markdown/MDX files with markdownlint using .markdownlint.yaml

Files:

  • CHANGELOG.md
  • changelog/+785ca0cd.fixed.md
🧬 Code graph analysis (5)
backend/infrahub/git/tasks.py (1)
backend/infrahub/git/utils.py (2)
  • fetch_artifact_definition_targets (143-150)
  • fetch_check_definition_targets (153-160)
backend/infrahub/core/schema/manager.py (2)
backend/infrahub/core/registry.py (2)
  • schema (79-83)
  • schema (86-87)
backend/infrahub/core/schema/schema_branch.py (3)
  • get (303-329)
  • to_dict_api_schema_object (165-171)
  • get_hash (139-149)
backend/infrahub/git/utils.py (3)
backend/infrahub/core/protocols.py (3)
  • CoreArtifactDefinition (280-287)
  • CoreCheckDefinition (308-318)
  • CoreGroup (130-138)
backend/infrahub/generators/models.py (1)
  • ProposedChangeGeneratorDefinition (34-44)
backend/infrahub/core/schema/basenode_schema.py (2)
  • attribute_names (339-340)
  • relationship_names (343-344)
backend/infrahub/git/repository.py (1)
backend/infrahub/message_bus/operations/git/file.py (1)
  • get (14-35)
backend/infrahub/core/schema/schema_branch.py (4)
backend/infrahub/core/schema/manager.py (1)
  • get (69-92)
backend/infrahub/database/__init__.py (1)
  • get (84-88)
backend/infrahub/core/schema/__init__.py (2)
  • get (65-72)
  • duplicate (97-99)
backend/infrahub/core/models.py (3)
  • duplicate (500-502)
  • nodes (61-63)
  • generics (66-68)
🪛 LanguageTool
changelog/+785ca0cd.fixed.md

[grammar] ~1-~1: There might be a mistake here.
Context: ... and avoiding excessive GraphQL queries.

(QB_NEW_EN)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: E2E-testing-version-upgrade / From 1.3.0
  • GitHub Check: E2E-testing-invoke-demo-start
  • GitHub Check: backend-benchmark
  • GitHub Check: E2E-testing-playwright
  • GitHub Check: backend-tests-functional
  • GitHub Check: backend-tests-unit
  • GitHub Check: backend-tests-integration
  • GitHub Check: backend-docker-integration
  • GitHub Check: Cloudflare Pages
🔇 Additional comments (4)
backend/infrahub/core/schema/manager.py (1)

151-160: Verify that the codebase can accept the hash scope limitation (profiles/templates excluded from SDK cache key), or implement get_hash_full() approach.

The review identifies a real issue: line 152 can raise KeyError if get_sdk_schema_branch() is called before a branch is initialized. The suggested fix is sound—calling get_schema_branch() first ensures both the branch and its hash entry exist, since set_schema_branch() atomically sets both at lines 163–164.

However, the review also raises a valid concern about hash scope: the cache key uses get_hash() which only incorporates nodes and generics (line 140), not profiles or templates (line 155). If profiles or templates change independently, the SDK cache will become stale. The suggested diff still uses get_hash() and leaves this unresolved, though the caveat acknowledges it.

Consider:

  • Accept this limitation (profiles/templates changes are rare or externally managed)
  • Or adopt the suggested approach of using get_hash_full() or including profiles/templates in the cache key derivation
backend/infrahub/git/tasks.py (2)

327-328: Good switch to prefetched targets.

Using fetch_artifact_definition_targets centralizes logic and avoids per-item fetches.


379-381: Propagating query_id is correct.

Including query_id alongside query reduces lookups and aligns with the new models.

backend/infrahub/git/utils.py (1)

119-131: Ensure members.peers are populated before prefetch.

include=["members"] may not hydrate relationship peers in some SDK versions. If peers are empty, fetch them to get ids/kinds.

     group = await client.get(
         kind=CoreGroup,
         id=group_id,
         branch=branch,
         include=["members"],
     )
+    if not group.members.peers:
+        await group.members.fetch()

Comment on lines +77 to +83
async def _prefetch_group_member_nodes(
client: InfrahubClient,
members: RelationshipManager,
branch: str,
required_fields: set[str],
) -> None:
ids_per_kind: dict[str, set[str]] = defaultdict(set)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add Google‑style docstrings to new utilities.

Per repo guidelines, add docstrings to _prefetch_group_member_nodes, fetch_definition_targets, and public fetch* helpers.

Minimal examples:

 async def _prefetch_group_member_nodes(...):
+    """Batch‑prefetch group member nodes for required fields.
+
+    Args:
+        client: SDK client.
+        members: Relationship manager for group.members.
+        branch: Branch to query.
+        required_fields: Top‑level attribute/relationship names needed.
+
+    Returns:
+        None. Populates the client store with fetched nodes.
+    """
@@
 async def _fetch_definition_targets(...)-> CoreGroup:
+    """Fetch target group and prefetch members based on definition parameters."""
@@
 async def fetch_artifact_definition_targets(...)-> CoreGroup:
+    """Public helper to resolve targets for an artifact definition."""
@@
 async def fetch_check_definition_targets(...)-> CoreGroup:
+    """Public helper to resolve targets for a check definition."""
@@
 async def fetch_proposed_change_generator_definition_targets(...)-> CoreGroup:
+    """Public helper to resolve targets for a proposed change generator definition."""

Also applies to: 119-141, 143-170

Comment on lines +84 to +87
for peer in members.peers:
if peer.id and peer.typename:
ids_per_kind[peer.typename].add(peer.id)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Wrong entity from RelationshipManager.peers; use relationship.peer.

peers yields Relationship objects. Access the node via relationship.peer; otherwise peer.typename may not exist and ids_per_kind stays empty.

-    for peer in members.peers:
-        if peer.id and peer.typename:
-            ids_per_kind[peer.typename].add(peer.id)
+    for relationship in members.peers:
+        node = relationship.peer
+        if node.id and node.typename:
+            ids_per_kind[node.typename].add(node.id)
🤖 Prompt for AI Agents
In backend/infrahub/git/utils.py around lines 84 to 87, the loop is iterating
Relationship objects from members.peers but treats them as if they were nodes;
change the loop to treat each item as a relationship, retrieve the node via
relationship.peer, then check that that peer has id and typename before adding
to ids_per_kind (e.g., for relationship in members.peers: peer =
relationship.peer; if peer and peer.id and peer.typename:
ids_per_kind[peer.typename].add(peer.id)). Ensure you don't assume peers yields
raw nodes and guard against missing peer attributes.

Comment on lines 94 to 99
schema = await client.schema.get(kind=kind, branch=branch)

# FIXME: https://github.com/opsmill/infrahub-sdk-python/pull/205
valid_fields = set(schema.attribute_names) | set(schema.relationship_names)
keep_relationships = set(schema.relationship_names) & required_fields
cleaned_fields = required_fields ^ valid_fields
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Call schema helpers, fix exclude set, and guard empty required_fields.

  • attribute_names()/relationship_names() are methods; call them.
  • Use difference (valid_fields - required_fields) for exclude to avoid unknown keys from XOR.
  • Skip prefetch when required_fields is empty to avoid requesting “nothing”.
-        # FIXME: https://github.com/opsmill/infrahub-sdk-python/pull/205
-        valid_fields = set(schema.attribute_names) | set(schema.relationship_names)
-        keep_relationships = set(schema.relationship_names) & required_fields
-        cleaned_fields = required_fields ^ valid_fields
+        # FIXME: https://github.com/opsmill/infrahub-sdk-python/pull/205
+        attribute_names = set(schema.attribute_names())
+        relationship_names = set(schema.relationship_names())
+        valid_fields = attribute_names | relationship_names
+        keep_relationships = relationship_names & required_fields
+        cleaned_fields = valid_fields - required_fields
@@
-        kwargs: dict[str, Any] = {
+        kwargs: dict[str, Any] = {
             "kind": kind,
             "ids": ids,
             "branch": branch,
             "exclude": cleaned_fields,
             "populate_store": True,
             "order": Order(disable=True),
         }
@@
-    parameter_fields = _collect_parameter_first_segments(parameters)
-    await _prefetch_group_member_nodes(
-        client=client,
-        members=group.members,
-        branch=branch,
-        required_fields=parameter_fields,
-    )
+    parameter_fields = _collect_parameter_first_segments(parameters)
+    if parameter_fields:
+        await _prefetch_group_member_nodes(
+            client=client,
+            members=group.members,
+            branch=branch,
+            required_fields=parameter_fields,
+        )

Also applies to: 101-114, 132-139

🤖 Prompt for AI Agents
In backend/infrahub/git/utils.py around lines 94-99, 101-114, and 132-139, fix
three issues: call schema.attribute_names() and schema.relationship_names()
(they are methods), compute the exclude set as valid_fields - required_fields
instead of symmetric difference (required_fields ^ valid_fields), and skip the
prefetch/prefetch-related logic when required_fields is empty to avoid
requesting nothing; update any keep_relationships calculations to use
set(schema.relationship_names()) & required_fields accordingly and apply these
same changes in the other noted ranges.

CHANGELOG.md Outdated
Comment on lines 14 to 46
## [Infrahub - v1.4.10](https://github.com/opsmill/infrahub/tree/infrahub-v1.4.10) - 2025-10-17

### Added

- The login form now automatically focuses on the first field.

### Fixed

- - Consistent font size for all events in the Proposed Change timeline
- Proposed Change action buttons now keep their size and does not strectch anymore
- Prevent overflow on the create new relationship button within the relationship input
- fixed typos

([#IFC-1780](https://github.com/opsmill/infrahub/issues/IFC-1780))
- - Improved logging for SSO authentication to provide better debugging information
- Enhanced error handling to properly support all error codes returned by identity providers

([#6969](https://github.com/opsmill/infrahub/issues/6969))
- - Correctly display XML and CSV artifacts in the UI.
- Added a fallback to plain text for unsupported content types.

([#7294](https://github.com/opsmill/infrahub/issues/7294))
- Fix a bug that allowed duplicate attributes and/or relationships on Node or Generic schemas to be merged into the default branch,
which would cause the application and workers to crash with an error message similar to the following:

> ValueError: SchemaName: Names of attributes and relationships must be unique : ['field_name_1', 'field_name_2']
Added a new CLI command `infrahub db check-duplicate-schema-fields` to resolve this duplicated schema fields issue if it appears. ([#7346](https://github.com/opsmill/infrahub/issues/7346))
- Fixed an issue where boolean fields in the object Details view always displayed a checkmark, even when the value was false. ([#7372](https://github.com/opsmill/infrahub/issues/7372))
- Fixed prefix utilization showing as greater than 100% after setting the pool attribute to false ([#7388](https://github.com/opsmill/infrahub/issues/7388))
- Corrected the labels on the branch list and detailed view to use the correct terminology
- Fixed issue with number pool popover stuck in the top-left corner and not expandable during the initial render in some cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Duplicate version header and typos.

  • Two entries named v1.4.10 (2025-10-17 and 2025-10-01). Please bump the new entry’s version (e.g., v1.4.11) or merge into the existing 1.4.10 section.
  • Fix typos: “strectch” → “stretch”.
-## [Infrahub - v1.4.10] ... - 2025-10-17
+## [Infrahub - v1.4.11] ... - 2025-10-17  # or merge into 1.4.10
@@
-  - Proposed Change action buttons now keep their size and does not strectch anymore
+  - Proposed Change action buttons now keep their size and do not stretch anymore

🏁 Script executed:

#!/bin/bash
# Check for duplicate v1.4.10 entries and find the typo
echo "=== All v1.4.10 entries in CHANGELOG.md ==="
rg -n "v1\.4\.10" CHANGELOG.md

echo ""
echo "=== Line with 'strectch' typo ==="
rg -n "strectch" CHANGELOG.md

Length of output: 364


Duplicate version header and typos confirmed—resolve version conflict and fix typos.

Verification confirms:

  • Two v1.4.10 entries exist (lines 14 and 47 with dates 2025-10-17 and 2025-10-01 respectively)
  • Typo "strectch" on line 23
  • Grammar error "does not" should be "do not" (plural subject "buttons")

Resolve by either bumping the newer entry to v1.4.11 or merging both entries into a single v1.4.10 section, and apply the typo/grammar fixes:

-  - Proposed Change action buttons now keep their size and does not strectch anymore
+  - Proposed Change action buttons now keep their size and do not stretch anymore
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
## [Infrahub - v1.4.10](https://github.com/opsmill/infrahub/tree/infrahub-v1.4.10) - 2025-10-17
### Added
- The login form now automatically focuses on the first field.
### Fixed
- - Consistent font size for all events in the Proposed Change timeline
- Proposed Change action buttons now keep their size and does not strectch anymore
- Prevent overflow on the create new relationship button within the relationship input
- fixed typos
([#IFC-1780](https://github.com/opsmill/infrahub/issues/IFC-1780))
- - Improved logging for SSO authentication to provide better debugging information
- Enhanced error handling to properly support all error codes returned by identity providers
([#6969](https://github.com/opsmill/infrahub/issues/6969))
- - Correctly display XML and CSV artifacts in the UI.
- Added a fallback to plain text for unsupported content types.
([#7294](https://github.com/opsmill/infrahub/issues/7294))
- Fix a bug that allowed duplicate attributes and/or relationships on Node or Generic schemas to be merged into the default branch,
which would cause the application and workers to crash with an error message similar to the following:
> ValueError: SchemaName: Names of attributes and relationships must be unique : ['field_name_1', 'field_name_2']
Added a new CLI command `infrahub db check-duplicate-schema-fields` to resolve this duplicated schema fields issue if it appears. ([#7346](https://github.com/opsmill/infrahub/issues/7346))
- Fixed an issue where boolean fields in the object Details view always displayed a checkmark, even when the value was false. ([#7372](https://github.com/opsmill/infrahub/issues/7372))
- Fixed prefix utilization showing as greater than 100% after setting the pool attribute to false ([#7388](https://github.com/opsmill/infrahub/issues/7388))
- Corrected the labels on the branch list and detailed view to use the correct terminology
- Fixed issue with number pool popover stuck in the top-left corner and not expandable during the initial render in some cases.
## [Infrahub - v1.4.10](https://github.com/opsmill/infrahub/tree/infrahub-v1.4.10) - 2025-10-17
### Added
- The login form now automatically focuses on the first field.
### Fixed
- - Consistent font size for all events in the Proposed Change timeline
- Proposed Change action buttons now keep their size and do not stretch anymore
- Prevent overflow on the create new relationship button within the relationship input
- fixed typos
([#IFC-1780](https://github.com/opsmill/infrahub/issues/IFC-1780))
- - Improved logging for SSO authentication to provide better debugging information
- Enhanced error handling to properly support all error codes returned by identity providers
([#6969](https://github.com/opsmill/infrahub/issues/6969))
- - Correctly display XML and CSV artifacts in the UI.
- Added a fallback to plain text for unsupported content types.
([#7294](https://github.com/opsmill/infrahub/issues/7294))
- Fix a bug that allowed duplicate attributes and/or relationships on Node or Generic schemas to be merged into the default branch,
which would cause the application and workers to crash with an error message similar to the following:
> ValueError: SchemaName: Names of attributes and relationships must be unique : ['field_name_1', 'field_name_2']
Added a new CLI command `infrahub db check-duplicate-schema-fields` to resolve this duplicated schema fields issue if it appears. ([#7346](https://github.com/opsmill/infrahub/issues/7346))
- Fixed an issue where boolean fields in the object Details view always displayed a checkmark, even when the value was false. ([#7372](https://github.com/opsmill/infrahub/issues/7372))
- Fixed prefix utilization showing as greater than 100% after setting the pool attribute to false ([#7388](https://github.com/opsmill/infrahub/issues/7388))
- Corrected the labels on the branch list and detailed view to use the correct terminology
- Fixed issue with number pool popover stuck in the top-left corner and not expandable during the initial render in some cases.
🤖 Prompt for AI Agents
In CHANGELOG.md around lines 14 to 46 there is a duplicate v1.4.10 header (also
present at line ~47) and two text issues: "strectch" should be "stretch" and
"does not" should be "do not" for the plural "buttons"; resolve the duplication
by either renaming the newer entry to "Infrahub - v1.4.11" with the 2025-10-17
date or by merging the two v1.4.10 blocks into one consolidated v1.4.10 entry
(remove the duplicate block), then apply the typo fixes ("stretch") and grammar
fix ("do not") and ensure references/issue links remain intact.

@@ -0,0 +1 @@
Improved artifacts generation and proposed change checks performance by leveraging caching and avoiding excessive GraphQL queries.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Tighten phrasing.

Suggest active voice and remove “excessive”:

-Improved artifacts generation and proposed change checks performance by leveraging caching and avoiding excessive GraphQL queries.
+Improve artifact generation and proposed change checks by leveraging caching and reducing GraphQL queries.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Improved artifacts generation and proposed change checks performance by leveraging caching and avoiding excessive GraphQL queries.
Improve artifact generation and proposed change checks by leveraging caching and reducing GraphQL queries.
🧰 Tools
🪛 LanguageTool

[grammar] ~1-~1: There might be a mistake here.
Context: ... and avoiding excessive GraphQL queries.

(QB_NEW_EN)

🤖 Prompt for AI Agents
In changelog/+785ca0cd.fixed.md around line 1, the sentence uses passive
phrasing and the word “excessive”; change it to active voice and remove
“excessive.” Replace the line with a concise active version such as: "Improved
artifacts generation and sped up proposed change checks by leveraging caching
and reducing GraphQL queries."

@fatih-acar fatih-acar force-pushed the fac-fix-artifacts-scale branch from 2ce5840 to 072eece Compare October 17, 2025 09:04
@fatih-acar fatih-acar force-pushed the fac-fix-artifacts-scale branch from 072eece to f9bde88 Compare October 17, 2025 09:11
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
backend/infrahub/git/utils.py (2)

97-99: Call schema methods to retrieve field names.

schema.attribute_names and schema.relationship_names are methods that must be called with () to return the actual lists of field names.

Apply this diff:

-        valid_fields = set(schema.attribute_names) | set(schema.relationship_names)
-        keep_relationships = set(schema.relationship_names) & required_fields
+        attribute_names = set(schema.attribute_names())
+        relationship_names = set(schema.relationship_names())
+        valid_fields = attribute_names | relationship_names
+        keep_relationships = relationship_names & required_fields

84-86: Fix peers iteration to access node attributes correctly.

members.peers yields Relationship objects, not nodes directly. You need to access relationship.peer to get the actual node before checking id and typename.

Apply this diff:

-    for peer in members.peers:
-        if peer.id and peer.typename:
-            ids_per_kind[peer.typename].add(peer.id)
+    for relationship in members.peers:
+        node = relationship.peer
+        if node and node.id and node.typename:
+            ids_per_kind[node.typename].add(node.id)
🧹 Nitpick comments (4)
backend/infrahub/git/utils.py (4)

58-74: Consider adding a docstring.

Per coding guidelines, Python functions should include Google-style docstrings. This helper would benefit from a brief description of its purpose and parameters.

Example:

def _collect_parameter_first_segments(params: Any) -> set[str]:
    """Extract top-level field names from parameter structures.
    
    Recursively walks nested parameter structures (dicts, lists, tuples, sets)
    and extracts the first segment (before '__') from string values.
    
    Args:
        params: Parameter structure to walk (can be dict, list, or scalar)
        
    Returns:
        Set of top-level field names referenced in the parameters
    """

77-116: Consider adding a docstring.

Per coding guidelines, this async function should have a Google-style docstring documenting its purpose, parameters, and behavior.

Example:

async def _prefetch_group_member_nodes(
    client: InfrahubClient,
    members: RelationshipManager,
    branch: str,
    required_fields: set[str],
) -> None:
    """Batch-prefetch group member nodes with required fields.
    
    Groups member nodes by kind, computes minimal field sets based on
    required_fields, and uses batched client requests to populate the
    client store efficiently.
    
    Args:
        client: SDK client for API interactions
        members: RelationshipManager for group.members
        branch: Branch name to query
        required_fields: Top-level attribute/relationship names needed
        
    Returns:
        None. Populates the client store with fetched nodes.
    """

119-140: Consider adding a docstring.

This core utility function would benefit from a Google-style docstring explaining its purpose and flow.

Example:

async def _fetch_definition_targets(
    client: InfrahubClient,
    branch: str,
    group_id: str,
    parameters: Any,
) -> CoreGroup:
    """Fetch target group and prefetch member nodes based on definition parameters.
    
    Retrieves the target group by ID, analyzes the definition parameters to
    determine required fields, and prefetches member nodes with those fields
    populated in the client store.
    
    Args:
        client: SDK client for API interactions
        branch: Branch name to query
        group_id: ID of the target group
        parameters: Definition parameters used to determine required fields
        
    Returns:
        The target group with prefetched member nodes
    """

143-170: Consider adding docstrings to public functions.

These public API functions should have Google-style docstrings per coding guidelines. Brief examples:

async def fetch_artifact_definition_targets(
    client: InfrahubClient,
    branch: str,
    definition: CoreArtifactDefinition,
) -> CoreGroup:
    """Fetch target group for an artifact definition with prefetched members.
    
    Args:
        client: SDK client for API interactions
        branch: Branch name to query
        definition: Artifact definition containing targets and parameters
        
    Returns:
        Target group with prefetched member nodes
    """

Similar docstrings for fetch_check_definition_targets and fetch_proposed_change_generator_definition_targets.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 072eece and f9bde88.

📒 Files selected for processing (5)
  • backend/infrahub/generators/tasks.py (2 hunks)
  • backend/infrahub/git/tasks.py (6 hunks)
  • backend/infrahub/git/utils.py (2 hunks)
  • backend/infrahub/proposed_change/tasks.py (8 hunks)
  • changelog/+785ca0cd.fixed.md (1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
backend/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Run backend tests with pytest or via invoke tasks

Files:

  • backend/infrahub/git/tasks.py
  • backend/infrahub/proposed_change/tasks.py
  • backend/infrahub/generators/tasks.py
  • backend/infrahub/git/utils.py
**/*.py

📄 CodeRabbit inference engine (.github/instructions/python-docstring.instructions.md)

**/*.py: Use triple double quotes (""") for all Python docstrings
Write docstrings in Google-style format
Include a brief one-line description at the top of each docstring
Add a detailed description section when additional context is needed
Document function/method parameters under an Args/Parameters section without typing information
Include a Returns section describing the return value
Include a Raises section listing possible exceptions
Provide an Examples section demonstrating usage when helpful

Use ruff and mypy to validate and lint Python files

**/*.py: Use type hints for all Python function parameters and return values
Prefer asynchronous code in Python when feasible
Define asynchronous Python functions with async def
Use await for asynchronous calls in Python
Use Pydantic models instead of standard dataclasses
Use ruff and mypy for linting and type checking

**/*.py: Use type hints for all Python function parameters and return values
Use async/await whenever possible in Python code
Define asynchronous functions with async def
Await asynchronous calls with await
Use Pydantic models instead of standard dataclasses for data modeling
Use triple quotes (""") for all Python docstrings
Write docstrings in Google-style format
Include docstring sections when applicable: one-line summary, optional details, Args (without types), Returns, Raises, Examples
Validate and lint Python with ruff and mypy

Files:

  • backend/infrahub/git/tasks.py
  • backend/infrahub/proposed_change/tasks.py
  • backend/infrahub/generators/tasks.py
  • backend/infrahub/git/utils.py
**/*.{md,mdx}

📄 CodeRabbit inference engine (CLAUDE.md)

Lint Markdown/MDX files with markdownlint using .markdownlint.yaml

Files:

  • changelog/+785ca0cd.fixed.md
🧬 Code graph analysis (4)
backend/infrahub/git/tasks.py (2)
backend/infrahub/git/utils.py (2)
  • fetch_artifact_definition_targets (143-150)
  • fetch_check_definition_targets (153-160)
backend/infrahub/services/adapters/workflow/local.py (1)
  • submit_workflow (33-41)
backend/infrahub/proposed_change/tasks.py (2)
backend/infrahub/core/protocols.py (1)
  • CoreArtifactDefinition (280-287)
backend/infrahub/git/utils.py (2)
  • fetch_artifact_definition_targets (143-150)
  • fetch_proposed_change_generator_definition_targets (163-170)
backend/infrahub/generators/tasks.py (1)
backend/infrahub/git/utils.py (1)
  • fetch_proposed_change_generator_definition_targets (163-170)
backend/infrahub/git/utils.py (4)
backend/infrahub/core/relationship/model.py (1)
  • RelationshipManager (718-1242)
backend/infrahub/core/protocols.py (3)
  • CoreArtifactDefinition (280-287)
  • CoreCheckDefinition (308-318)
  • CoreGroup (130-138)
backend/infrahub/generators/models.py (1)
  • ProposedChangeGeneratorDefinition (34-44)
backend/infrahub/core/schema/basenode_schema.py (2)
  • attribute_names (339-340)
  • relationship_names (343-344)
🪛 LanguageTool
changelog/+785ca0cd.fixed.md

[grammar] ~1-~1: There might be a mistake here.
Context: ... and avoiding excessive GraphQL queries.

(QB_NEW_EN)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: E2E-testing-version-upgrade / From 1.3.0
  • GitHub Check: backend-benchmark
  • GitHub Check: E2E-testing-playwright
  • GitHub Check: E2E-testing-invoke-demo-start
  • GitHub Check: validate-generated-documentation
  • GitHub Check: backend-tests-functional
  • GitHub Check: documentation
  • GitHub Check: backend-tests-integration
  • GitHub Check: backend-tests-unit
  • GitHub Check: backend-docker-integration
🔇 Additional comments (11)
backend/infrahub/generators/tasks.py (2)

24-24: LGTM!

The import is correctly added to support the new fetch utility function used below.


181-183: LGTM!

The switch to the prefetch utility function is correct and will improve performance by batching member node fetches based on definition parameters.

backend/infrahub/proposed_change/tasks.py (5)

47-47: LGTM!

The new imports support the updated artifact definition handling and prefetch utilities used throughout the file.

Also applies to: 61-61


616-616: LGTM!

Using the protocol type CoreArtifactDefinition provides better type safety, and the fetch utility efficiently prefetches member nodes based on the definition's parameters.

Also applies to: 656-658


695-695: LGTM!

Adding query_id enables artifact generation to use the query identifier directly, reducing the need for name-based lookups and improving performance.


922-924: LGTM!

Consistent use of the prefetch utility for generator definitions improves performance through batched member node retrieval.


1245-1245: LGTM!

Adding the query id field to the GraphQL query and parsing it into query_id supports the migration from name-based to ID-based query references throughout the artifact generation flow.

Also applies to: 1467-1467

backend/infrahub/git/tasks.py (4)

56-56: LGTM!

The import provides the prefetch utilities used throughout the file for efficient target resolution.


327-328: LGTM!

The prefetch utility efficiently resolves artifact definition targets and pre-populates member nodes in the client store based on parameter requirements.


359-398: LGTM!

The batch submission pattern correctly groups artifact generation requests to reduce Prefect API overhead. Including query_id (line 380) supports the shift from name-based to ID-based query references, which should reduce database lookups during artifact generation.


580-580: LGTM!

Consistent use of the prefetch utility for check definitions ensures member nodes are efficiently fetched before iteration.

TTLCache(maxsize=100, ttl=30),
key=lambda *_, **kwargs: hashkey(
kwargs.get("repository_id"), kwargs.get("name"), kwargs.get("repository_kind"), kwargs.get("commit")
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the TTL it might not have a practical meaning now, but given that commit can be None I think we might occasionally end up with a repo using an earlier commit that we would have expected. In another iteration we could instead cache after we have the commit. Repository_id and commit is probably enough to use for the key.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TTL is lower than the git sync scheduled job on purpose because I'd expect this kind of edge case... I think it's fine like that for the moment.
+1 for the key.

return repositories


def _collect_parameter_first_segments(params: Any) -> set[str]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need an Any here looks like we expect str | list | tuple | set | dict and won't use anything else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think LLM used Any here because we pass parameters straight from the JSONAttribute protocol where value is typed as Any. This looks legit?

Comment on lines +58 to +74
def _collect_parameter_first_segments(params: Any) -> set[str]:
segments: set[str] = set()

def _walk(value: Any) -> None:
if isinstance(value, str):
segment = value.split("__", 1)[0]
if segment:
segments.add(segment)
elif isinstance(value, dict):
for nested in value.values():
_walk(nested)
elif isinstance(value, (list, tuple, set)):
for nested in value:
_walk(nested)

_walk(params)
return segments
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey coderabbit, where is this requirement for docstring coming from?

@fatih-acar fatih-acar merged commit 35e5793 into stable Oct 17, 2025
40 checks passed
@fatih-acar fatih-acar deleted the fac-fix-artifacts-scale branch October 17, 2025 12:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

group/backend Issue related to the backend (API Server, Git Agent)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants