feat: resolve lru cache issues when variables for graph or vector engine are ordered in different way #1938

Vasilije1990 · 2025-12-28T17:28:22Z

Description

Acceptance Criteria

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation update
Code refactoring
Performance improvement
Other (please specify):

Screenshots/Videos (if applicable)

Pre-submission Checklist

I have tested my changes thoroughly before submitting this PR
This PR contains minimal changes necessary to address the issue/feature
My code follows the project's coding standards and style guidelines
I have added tests that prove my fix is effective or that my feature works
I have added necessary documentation (if applicable)
All new and existing tests pass
I have searched existing PRs to ensure this change hasn't been submitted already
I have linked any relevant issues in the description
My commits have clear and descriptive messages

DCO Affirmation

I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

Note

Ensures consistent caching by normalizing config kwargs order to target engine signatures.

Add canonicalize_kwargs_for_signature utility to merge defaults with context and order keys per function signature
Update get_graph_context_config and get_vectordb_context_config to return canonical mappings aligned with create_graph_engine(...) and create_vector_engine(...)
Add VectorConfig.to_hashable_dict and adjust GraphConfig.to_hashable_dict ordering to align with engine signatures

^{Written by Cursor Bugbot for commit 9582e1b. This will update automatically on new commits. Configure here.}

Summary by CodeRabbit

Refactor
- Improved internal database configuration canonicalization for more stable and consistent parameter handling across graph and vector database engines.

Note: This release contains primarily internal infrastructure improvements with no direct user-visible changes.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

pull-checklist · 2025-12-28T17:28:27Z

Please make sure all the checkboxes are checked:

I have tested these changes locally.
I have reviewed the code changes.
I have added end-to-end and unit tests (if applicable).
I have updated the documentation and README.md file (if necessary).
I have removed unnecessary code and debug statements.
PR title is clear and follows the convention.
I have tagged reviewers or team members for feedback.

coderabbitai · 2025-12-28T17:28:35Z

Walkthrough

This PR standardizes configuration handling for graph and vector databases by introducing argument canonicalization. A new utility function orders configuration parameters to match their target engine function signatures, ensuring consistent argument ordering for stable caching keys.

Changes

Cohort / File(s)	Summary
Canonicalization utility `cognee/infrastructure/databases/utils/canonicalize.py`	Introduces `canonicalize_kwargs_for_signature()` to order kwargs matching target function signature; merges defaults, filters accepted parameters, supports POSITIONAL_OR_KEYWORD and KEYWORD_ONLY parameters
Graph config standardization `cognee/infrastructure/databases/graph/config.py`	Updates `get_graph_context_config()` to use canonicalization for stable ordering; refactors conditional logic to compose base config with context-specific overrides; adjusted `to_hashable_dict()` entry positioning
Vector config standardization `cognee/infrastructure/databases/vector/config.py`	Adds `VectorConfig.to_hashable_dict()` method; updates `get_vectordb_context_config()` to return canonicalized argument mapping via `canonicalize_kwargs_for_signature()` for consistent cache keys

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description provides only a template without a concrete human-written summary of changes, acceptance criteria, test verification, or checklist completion details required by the template.	Fill in the Description section with human-written explanation of changes, complete Acceptance Criteria with requirements and test instructions, check the relevant Type of Change option, and confirm pre-submission checklist items.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main objective: resolving LRU cache issues by standardizing the order of configuration variables for graph and vector engines.
Linked Issues check	✅ Passed	The changes directly address issue COG-3483 by introducing canonicalization utilities and updating config functions to ensure consistent, order-independent configuration handling for LRU caching.
Out of Scope Changes check	✅ Passed	All changes are tightly scoped to resolve the LRU cache ordering issue: new canonicalization utility, updated graph and vector config functions, and corresponding to_hashable_dict methods.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch COG-3483

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

cognee/infrastructure/databases/utils/canonicalize.py (1)
5-29: Consider using public API inspect.Parameter.empty instead of inspect._empty.

The implementation is correct and well-documented. However, line 25 uses inspect._empty, which is a private/internal constant. The public equivalent is inspect.Parameter.empty, which is more stable across Python versions.
🔎 Proposed fix
-            elif param.default is not inspect._empty:
+            elif param.default is not inspect.Parameter.empty:
As per coding guidelines, the docstring could be enhanced to include explicit parameter and return type documentation in the docstring body for completeness.
cognee/infrastructure/databases/vector/config.py (1)
11-24: Update class docstring to include to_hashable_dict.

The class docstring lists to_dict under "Public methods" but doesn't include the newly added to_hashable_dict. If this method is intended to be kept, please update the docstring for consistency.
🔎 Proposed fix
     """
     Manage the configuration settings for the vector database.

     Public methods:
     - to_dict: Convert the configuration to a dictionary.
+    - to_hashable_dict: Return a canonical, hashable configuration dictionary.

     Instance variables:

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 310e9e9 and 9582e1b.

📒 Files selected for processing (3)

cognee/infrastructure/databases/graph/config.py
cognee/infrastructure/databases/utils/canonicalize.py
cognee/infrastructure/databases/vector/config.py

🧰 Additional context used

📓 Path-based instructions (3)

**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Use 4-space indentation in Python code
Use snake_case for Python module and function names
Use PascalCase for Python class names
Use ruff format before committing Python code
Use ruff check for import hygiene and style enforcement with line-length 100 configured in pyproject.toml
Prefer explicit, structured error handling in Python code

Files:

cognee/infrastructure/databases/utils/canonicalize.py
cognee/infrastructure/databases/graph/config.py
cognee/infrastructure/databases/vector/config.py

⚙️ CodeRabbit configuration file

**/*.py: When reviewing Python code for this project:

Prioritize portability over clarity, especially when dealing with cross-Python compatibility. However, with the priority in mind, do still consider improvements to clarity when relevant.

As a general guideline, consider the code style advocated in the PEP 8 standard (excluding the use of spaces for indentation) and evaluate suggested changes for code style compliance.

As a style convention, consider the code style advocated in CEP-8 and evaluate suggested changes for code style compliance.

As a general guideline, try to provide any relevant, official, and supporting documentation links to any tool's suggestions in review comments. This guideline is important for posterity.

As a general rule, undocumented function definitions and class definitions in the project's Python code are assumed incomplete. Please consider suggesting a short summary of the code for any of these incomplete definitions as docstrings when reviewing.

Files:

cognee/infrastructure/databases/utils/canonicalize.py
cognee/infrastructure/databases/graph/config.py
cognee/infrastructure/databases/vector/config.py

cognee/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use shared logging utilities from cognee.shared.logging_utils in Python code

Files:

cognee/infrastructure/databases/utils/canonicalize.py
cognee/infrastructure/databases/graph/config.py
cognee/infrastructure/databases/vector/config.py

cognee/{modules,infrastructure,tasks}/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Co-locate feature-specific helpers under their respective package (modules/, infrastructure/, or tasks/)

Files:

cognee/infrastructure/databases/utils/canonicalize.py
cognee/infrastructure/databases/graph/config.py
cognee/infrastructure/databases/vector/config.py

🧬 Code graph analysis (1)

cognee/infrastructure/databases/graph/config.py (2)

cognee/infrastructure/databases/graph/get_graph_engine.py (2)

get_graph_engine (10-24)

create_graph_engine (28-169)

cognee/infrastructure/databases/utils/canonicalize.py (1)

canonicalize_kwargs_for_signature (5-30)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (24)

GitHub Check: End-to-End Tests / Test Entity Extraction
GitHub Check: End-to-End Tests / Test multi tenancy with different situations in Cognee
GitHub Check: End-to-End Tests / Test dataset database handlers in Cognee
GitHub Check: End-to-End Tests / Test Feedback Enrichment
GitHub Check: End-to-End Tests / Test Cognify - Edge Centered Payload
GitHub Check: End-to-End Tests / Test dataset database deletion in Cognee
GitHub Check: End-to-End Tests / Deduplication Test
GitHub Check: End-to-End Tests / Test permissions with different situations in Cognee
GitHub Check: End-to-End Tests / Concurrent Subprocess access test
GitHub Check: End-to-End Tests / Test graph edge ingestion
GitHub Check: End-to-End Tests / Test adding of label for data in Cognee
GitHub Check: End-to-End Tests / Conversation sessions test (FS)
GitHub Check: End-to-End Tests / Conversation sessions test (Redis)
GitHub Check: End-to-End Tests / S3 Bucket Test
GitHub Check: End-to-End Tests / Run Telemetry Pipeline Test
GitHub Check: End-to-End Tests / Test using different async databases in parallel in Cognee
GitHub Check: End-to-End Tests / Server Start Test
GitHub Check: Basic Tests / Run Unit Tests
GitHub Check: Basic Tests / Run Simple Examples BAML
GitHub Check: Basic Tests / Run Integration Tests
GitHub Check: Basic Tests / Run Simple Examples
GitHub Check: CLI Tests / CLI Integration Tests
GitHub Check: CLI Tests / CLI Functionality Tests
GitHub Check: Cursor Bugbot

🔇 Additional comments (3)

cognee/infrastructure/databases/utils/canonicalize.py (1)

1-2: LGTM on imports.

The imports are appropriate for the functionality. Using Mapping for input parameters is correct as it allows for dict-like objects without requiring a mutable type.

cognee/infrastructure/databases/graph/config.py (1)

147-163: Implementation looks correct for the stated goal.

The function correctly:

Retrieves context-specific configuration (or empty dict)

Obtains base configuration from the singleton

Canonicalizes the merged parameters aligned with create_graph_engine signature

The function-level imports (lines 153-157) are a reasonable choice to avoid circular import issues between config and engine modules.

cognee/infrastructure/databases/vector/config.py (1)

105-121: Consistent implementation with graph config, LGTM.

The function correctly mirrors the approach in get_graph_context_config:

Uses function-level imports to avoid circular dependencies

Merges context-specific config over base defaults

Canonicalizes via signature alignment

The consistency between graph and vector config implementations is good for maintainability.

coderabbitai · 2025-12-28T17:31:42Z

cognee/infrastructure/databases/graph/config.py

        return {
            "graph_database_provider": self.graph_database_provider,
+            "graph_file_path": self.graph_file_path,
            "graph_database_url": self.graph_database_url,
            "graph_database_name": self.graph_database_name,
            "graph_database_username": self.graph_database_username,
            "graph_database_password": self.graph_database_password,
            "graph_database_port": self.graph_database_port,
            "graph_database_key": self.graph_database_key,
-            "graph_file_path": self.graph_file_path,
            "graph_dataset_database_handler": self.graph_dataset_database_handler,
        }


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

rg -n "to_hashable_dict" --type py

Repository: topoteretes/cognee

Length of output: 319

🏁 Script executed:

cat -n cognee/infrastructure/databases/graph/config.py | head -30

Repository: topoteretes/cognee

Length of output: 1100

🏁 Script executed:

cat -n cognee/infrastructure/databases/graph/config.py | sed -n '104,130p'

Repository: topoteretes/cognee

Length of output: 1306

🏁 Script executed:

cat -n cognee/infrastructure/databases/graph/config.py | sed -n '130,165p'

Repository: topoteretes/cognee

Length of output: 1636

Remove the unused to_hashable_dict method and update the class docstring.

The to_hashable_dict method (lines 104–127) is not called anywhere in the codebase. The new get_graph_context_config function uses to_dict() combined with canonicalize_kwargs_for_signature instead. Remove this method and update the class docstring (line 20) to remove the reference to to_hashable_dict as a public method.

🤖 Prompt for AI Agents

In cognee/infrastructure/databases/graph/config.py around lines 104 to 127, remove the unused to_hashable_dict method entirely and update the class docstring at line 20 to remove any mention of to_hashable_dict as a public API; ensure no other references remain in the class or module, and run tests/type checks to confirm nothing else depends on that method.

coderabbitai · 2025-12-28T17:31:42Z

cognee/infrastructure/databases/vector/config.py

+    def to_hashable_dict(self) -> dict:
+        """
+        Return a canonical, hashable configuration dictionary for vector DB engines.
+        The keys and their order are aligned with create_vector_engine(...) signature to
+        ensure consistent lru_cache keys regardless of how the config was originally set.
+        """
+        # NOTE: Order here mirrors create_vector_engine signature
+        return {
+            "vector_db_provider": self.vector_db_provider,
+            "vector_db_url": self.vector_db_url,
+            "vector_db_name": self.vector_db_name,
+            "vector_db_port": self.vector_db_port,
+            "vector_db_key": self.vector_db_key,
+            "vector_dataset_database_handler": self.vector_dataset_database_handler,
+        }
+


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

rg -n "to_hashable_dict" --type py -C 2

Repository: topoteretes/cognee

Length of output: 1222

🏁 Script executed:

rg -n "\.to_hashable_dict" --type py

Repository: topoteretes/cognee

Length of output: 44

🏁 Script executed:

rg -n "to_hashable_dict" --type py | grep -v "def to_hashable_dict" | grep -v "- to_hashable_dict"

Repository: topoteretes/cognee

Length of output: 172

🏁 Script executed:

rg "to_hashable_dict" --type py | rg -v "def to_hashable_dict" | rg -v "to_hashable_dict$"

Repository: topoteretes/cognee

Length of output: 44

Remove to_hashable_dict or verify its intended use.

The to_hashable_dict method is defined but never called anywhere in the codebase. Additionally, get_vectordb_context_config uses to_dict() instead (line 118). This introduces dead code and potential maintenance confusion. Either remove the method or clarify if it's intended for external consumption.

🤖 Prompt for AI Agents

In cognee/infrastructure/databases/vector/config.py around lines 70 to 85, the to_hashable_dict method is dead code (never referenced) while get_vectordb_context_config uses to_dict(), causing potential confusion; either delete to_hashable_dict entirely (and remove its docstring) or replace usages of to_dict() in get_vectordb_context_config (and any other call sites) with to_hashable_dict to ensure a canonical, ordered dict is used for caching/keys; if you keep to_hashable_dict, update its docstring to state it’s the canonical representation and add or modify unit tests to cover its usage, otherwise remove the method and run tests/lint to confirm no references remain.

cursor

This PR is being reviewed by Cursor Bugbot

Details

You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

cursor · 2025-12-28T17:33:53Z

cognee/infrastructure/databases/graph/config.py

+    )
+
+    context_cfg = graph_db_config.get() or {}
+    base_cfg = get_graph_config().to_dict()


Missing graph_database_name causes config value loss

The get_graph_context_config() function now uses to_dict() instead of to_hashable_dict(), but to_dict() is missing graph_database_name while to_hashable_dict() includes it. Since create_graph_engine accepts graph_database_name as a parameter, any configured value for graph_database_name in GraphConfig will be silently lost and replaced with the function's default empty string. This breaks database name configuration for graph databases.

dexters1 · 2026-01-13T11:15:58Z

The Kuzu test failing with: "AssertionError: Kuzu graph database is not empty" signalizes that the lru_cache is not working in that case. So some additional changes are necessary.

I got it to work with the DatasetDatabase handlers by making sure parameter order was the same when calling the function, but also that the default values of the function call are explicitly set even if they are not provided like:

graph_dataset_database_handler=""

added fixes

9582e1b

coderabbitai bot reviewed Dec 28, 2025

View reviewed changes

cursor bot reviewed Dec 28, 2025

View reviewed changes

Merge branch 'dev' into COG-3483

7bc750e

dexters1 self-requested a review January 14, 2026 13:43

dexters1 marked this pull request as draft January 14, 2026 13:43

Vasilije1990 closed this Jan 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: resolve lru cache issues when variables for graph or vector engine are ordered in different way #1938

feat: resolve lru cache issues when variables for graph or vector engine are ordered in different way #1938

Vasilije1990 commented Dec 28, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

pull-checklist bot commented Dec 28, 2025

Uh oh!

coderabbitai bot commented Dec 28, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Dec 28, 2025

Uh oh!

coderabbitai bot Dec 28, 2025

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Dec 28, 2025

Uh oh!

dexters1 commented Jan 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: resolve lru cache issues when variables for graph or vector engine are ordered in different way #1938

feat: resolve lru cache issues when variables for graph or vector engine are ordered in different way #1938

Conversation

Vasilije1990 commented Dec 28, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Acceptance Criteria

Type of Change

Screenshots/Videos (if applicable)

Pre-submission Checklist

DCO Affirmation

Summary by CodeRabbit

Uh oh!

pull-checklist bot commented Dec 28, 2025

Please make sure all the checkboxes are checked:

Uh oh!

coderabbitai bot commented Dec 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

This PR is being reviewed by Cursor Bugbot

Uh oh!

cursor bot Dec 28, 2025

Choose a reason for hiding this comment

Missing graph_database_name causes config value loss

Uh oh!

dexters1 commented Jan 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Vasilije1990 commented Dec 28, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 28, 2025 •

edited

Loading

Missing `graph_database_name` causes config value loss