-
Notifications
You must be signed in to change notification settings - Fork 1.1k
feat: Configurable batch size #1941
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Please make sure all the checkboxes are checked:
|
WalkthroughMakes Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 3❌ Failed checks (3 warnings)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
cognee/api/v1/cognify/routers/get_cognify_router.py (1)
76-81: Update endpoint documentation with new parameter.The endpoint docstring comprehensively documents other request parameters but is missing documentation for the new
chunks_per_batchfield. API users may not discover this configuration option.🔎 Proposed documentation addition
## Request Parameters - **datasets** (Optional[List[str]]): List of dataset names to process. Dataset names are resolved to datasets owned by the authenticated user. - **dataset_ids** (Optional[List[UUID]]): List of existing dataset UUIDs to process. UUIDs allow processing of datasets not owned by the user (if permitted). - **run_in_background** (Optional[bool]): Whether to execute processing asynchronously. Defaults to False (blocking). - **custom_prompt** (Optional[str]): Custom prompt for entity extraction and graph generation. If provided, this prompt will be used instead of the default prompts for knowledge graph extraction. - **ontology_key** (Optional[List[str]]): Reference to one or more previously uploaded ontology files to use for knowledge graph construction. + - **chunks_per_batch** (Optional[int]): Number of chunks to process per task batch. Higher values (e.g., 50) can speed up processing for large documents but may cause max_token errors if too large. Defaults to 100 for standard tasks, 10 for temporal tasks.cognee/api/v1/cognify/cognify.py (1)
47-47: Fix type annotation for optional parameter.The parameter
chunks_per_batch: int = Noneshould useOptional[int]to properly indicate it accepts None. This is particularly important for public API functions per the coding guidelines.🔎 Proposed fix
- chunks_per_batch: int = None, + chunks_per_batch: Optional[int] = None,
🧹 Nitpick comments (1)
cognee/cli/commands/cognify_command.py (1)
16-33: Consider mentioning batch size configuration in command description.The command description thoroughly explains the processing pipeline but doesn't mention the new
--chunks-per-batchoption. While users can discover it via--help, adding a brief mention would improve discoverability.Example addition
Processing Pipeline: 1. **Document Classification**: Identifies document types and structures 2. **Permission Validation**: Ensures user has processing rights 3. **Text Chunking**: Breaks content into semantically meaningful segments 4. **Entity Extraction**: Identifies key concepts, people, places, organizations 5. **Relationship Detection**: Discovers connections between entities 6. **Graph Construction**: Builds semantic knowledge graph with embeddings 7. **Content Summarization**: Creates hierarchical summaries for navigation + +Performance tuning is available via --chunks-per-batch to control batch processing size.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
cognee/api/v1/cognify/cognify.pycognee/api/v1/cognify/routers/get_cognify_router.pycognee/cli/commands/cognify_command.pycognee/modules/cognify/config.py
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Use 4-space indentation in Python code
Use snake_case for Python module and function names
Use PascalCase for Python class names
Use ruff format before committing Python code
Use ruff check for import hygiene and style enforcement with line-length 100 configured in pyproject.toml
Prefer explicit, structured error handling in Python code
Files:
cognee/api/v1/cognify/routers/get_cognify_router.pycognee/cli/commands/cognify_command.pycognee/modules/cognify/config.pycognee/api/v1/cognify/cognify.py
⚙️ CodeRabbit configuration file
**/*.py: When reviewing Python code for this project:
- Prioritize portability over clarity, especially when dealing with cross-Python compatibility. However, with the priority in mind, do still consider improvements to clarity when relevant.
- As a general guideline, consider the code style advocated in the PEP 8 standard (excluding the use of spaces for indentation) and evaluate suggested changes for code style compliance.
- As a style convention, consider the code style advocated in CEP-8 and evaluate suggested changes for code style compliance.
- As a general guideline, try to provide any relevant, official, and supporting documentation links to any tool's suggestions in review comments. This guideline is important for posterity.
- As a general rule, undocumented function definitions and class definitions in the project's Python code are assumed incomplete. Please consider suggesting a short summary of the code for any of these incomplete definitions as docstrings when reviewing.
Files:
cognee/api/v1/cognify/routers/get_cognify_router.pycognee/cli/commands/cognify_command.pycognee/modules/cognify/config.pycognee/api/v1/cognify/cognify.py
cognee/api/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Public APIs should be type-annotated in Python where practical
Files:
cognee/api/v1/cognify/routers/get_cognify_router.pycognee/api/v1/cognify/cognify.py
cognee/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use shared logging utilities from cognee.shared.logging_utils in Python code
Files:
cognee/api/v1/cognify/routers/get_cognify_router.pycognee/cli/commands/cognify_command.pycognee/modules/cognify/config.pycognee/api/v1/cognify/cognify.py
cognee/{modules,infrastructure,tasks}/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Co-locate feature-specific helpers under their respective package (modules/, infrastructure/, or tasks/)
Files:
cognee/modules/cognify/config.py
🧠 Learnings (3)
📓 Common learnings
Learnt from: 0xideas
Repo: topoteretes/cognee PR: 205
File: cognee/tests/unit/processing/chunks/chunk_by_paragraph_test.py:7-7
Timestamp: 2024-11-13T14:55:05.912Z
Learning: When changes are made to the chunking implementation in `cognee/tasks/chunks`, the ground truth values in the corresponding tests in `cognee/tests/unit/processing/chunks` need to be updated accordingly.
📚 Learning: 2024-11-13T14:55:05.912Z
Learnt from: 0xideas
Repo: topoteretes/cognee PR: 205
File: cognee/tests/unit/processing/chunks/chunk_by_paragraph_test.py:7-7
Timestamp: 2024-11-13T14:55:05.912Z
Learning: When changes are made to the chunking implementation in `cognee/tasks/chunks`, the ground truth values in the corresponding tests in `cognee/tests/unit/processing/chunks` need to be updated accordingly.
Applied to files:
cognee/api/v1/cognify/routers/get_cognify_router.pycognee/cli/commands/cognify_command.pycognee/api/v1/cognify/cognify.py
📚 Learning: 2024-10-07T11:20:44.876Z
Learnt from: borisarzentar
Repo: topoteretes/cognee PR: 144
File: cognee/tasks/chunking/query_chunks.py:1-17
Timestamp: 2024-10-07T11:20:44.876Z
Learning: The `query_chunks` function in `cognee/tasks/chunking/query_chunks.py` is used within the `search` function in `cognee/api/v1/search/search_v2.py`.
Applied to files:
cognee/api/v1/cognify/cognify.py
🧬 Code graph analysis (2)
cognee/api/v1/cognify/routers/get_cognify_router.py (1)
new-examples/demos/custom_graph_model_entity_schema_definition.py (1)
Field(35-38)
cognee/api/v1/cognify/cognify.py (2)
cognee/api/v1/cognify/routers/get_cognify_router.py (1)
cognify(60-162)cognee/modules/cognify/config.py (1)
get_cognify_config(25-26)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (23)
- GitHub Check: End-to-End Tests / Conversation sessions test (Redis)
- GitHub Check: End-to-End Tests / Test Entity Extraction
- GitHub Check: End-to-End Tests / Test dataset database deletion in Cognee
- GitHub Check: End-to-End Tests / Test Feedback Enrichment
- GitHub Check: End-to-End Tests / Concurrent Subprocess access test
- GitHub Check: End-to-End Tests / Test using different async databases in parallel in Cognee
- GitHub Check: End-to-End Tests / Test multi tenancy with different situations in Cognee
- GitHub Check: End-to-End Tests / Test dataset database handlers in Cognee
- GitHub Check: End-to-End Tests / Test Cognify - Edge Centered Payload
- GitHub Check: End-to-End Tests / Test adding of label for data in Cognee
- GitHub Check: End-to-End Tests / Conversation sessions test (FS)
- GitHub Check: End-to-End Tests / Test graph edge ingestion
- GitHub Check: End-to-End Tests / Test permissions with different situations in Cognee
- GitHub Check: End-to-End Tests / Deduplication Test
- GitHub Check: End-to-End Tests / S3 Bucket Test
- GitHub Check: Basic Tests / Run Simple Examples BAML
- GitHub Check: End-to-End Tests / Run Telemetry Pipeline Test
- GitHub Check: End-to-End Tests / Server Start Test
- GitHub Check: Basic Tests / Run Unit Tests
- GitHub Check: Basic Tests / Run Simple Examples
- GitHub Check: Basic Tests / Run Integration Tests
- GitHub Check: CLI Tests / CLI Functionality Tests
- GitHub Check: CLI Tests / CLI Integration Tests
🔇 Additional comments (4)
cognee/api/v1/cognify/routers/get_cognify_router.py (1)
49-53: LGTM! Well-documented field.The
chunks_per_batchfield is properly typed, has a clear description, and includes helpful examples. Good attention to API documentation.cognee/cli/commands/cognify_command.py (1)
65-69: LGTM! CLI option properly configured.The
--chunks-per-batchargument is well-defined with helpful guidance for users. The practical example ("try 50 for large single documents") aligns with the PR objective of improving processing speed.cognee/api/v1/cognify/cognify.py (2)
278-281: LGTM! Config-driven default properly implemented.The logic correctly derives
chunks_per_batchfrom configuration when not explicitly provided, with a sensible fallback to 100. This aligns with the PR objective of making batch size configurable.
278-281: Verify different default values are intentional.The fallback values differ between task types: 100 for default tasks (line 280) and 10 for temporal tasks (line 338). While this appears to preserve existing behavior, please confirm this difference is intentional and appropriate for the different processing characteristics.
Consider adding a comment explaining why temporal tasks use a different default:
# Temporal tasks use smaller batch size (10) to handle time-series processing efficiently chunks_per_batch = configured if configured is not None else 10Also applies to: 334-338
| config: Config = None, | ||
| custom_prompt: Optional[str] = None, | ||
| chunks_per_batch: int = 100, | ||
| chunks_per_batch: int = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix type annotation for optional parameter.
The parameter chunks_per_batch: int = None should use Optional[int] for proper type checking, consistent with the Optional import at line 3 and the coding guidelines requiring type annotations for public APIs.
🔎 Proposed fix
- chunks_per_batch: int = None,
+ chunks_per_batch: Optional[int] = None,🤖 Prompt for AI Agents
In cognee/api/v1/cognify/cognify.py around line 255, the optional parameter is
annotated as "chunks_per_batch: int = None" but should use Optional[int]; update
the signature to "chunks_per_batch: Optional[int] = None" (the Optional import
exists at line 3) so static type checkers and the public API follow the
project's typing guidelines.
|
|
||
| async def get_temporal_tasks( | ||
| user: User = None, chunker=TextChunker, chunk_size: int = None, chunks_per_batch: int = 10 | ||
| user: User = None, chunker=TextChunker, chunk_size: int = None, chunks_per_batch: int = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix type annotation for optional parameter.
The parameter chunks_per_batch: int = None should use Optional[int] for consistency with other functions in this file and proper type checking.
🔎 Proposed fix
- user: User = None, chunker=TextChunker, chunk_size: int = None, chunks_per_batch: int = None
+ user: User = None, chunker=TextChunker, chunk_size: int = None, chunks_per_batch: Optional[int] = None📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| user: User = None, chunker=TextChunker, chunk_size: int = None, chunks_per_batch: int = None | |
| user: User = None, chunker=TextChunker, chunk_size: int = None, chunks_per_batch: Optional[int] = None |
🤖 Prompt for AI Agents
In cognee/api/v1/cognify/cognify.py around line 313, the parameter annotation
chunks_per_batch: int = None is incorrect for an optional parameter; change it
to chunks_per_batch: Optional[int] and ensure typing.Optional is imported (add
"from typing import Optional" if not already present) so the signature matches
other functions and type checkers accept None as a valid value.
| if chunks_per_batch is None: | ||
| chunks_per_batch = 10 | ||
| from cognee.modules.cognify.config import get_cognify_config | ||
|
|
||
| configured = get_cognify_config().chunks_per_batch | ||
| chunks_per_batch = configured if configured is not None else 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Remove redundant import.
The lazy import of get_cognify_config at line 335 is unnecessary since it's already imported at the module level (line 6).
🔎 Proposed fix
if chunks_per_batch is None:
- from cognee.modules.cognify.config import get_cognify_config
-
configured = get_cognify_config().chunks_per_batch
chunks_per_batch = configured if configured is not None else 10📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if chunks_per_batch is None: | |
| chunks_per_batch = 10 | |
| from cognee.modules.cognify.config import get_cognify_config | |
| configured = get_cognify_config().chunks_per_batch | |
| chunks_per_batch = configured if configured is not None else 10 | |
| if chunks_per_batch is None: | |
| configured = get_cognify_config().chunks_per_batch | |
| chunks_per_batch = configured if configured is not None else 10 |
🤖 Prompt for AI Agents
In cognee/api/v1/cognify/cognify.py around lines 334 to 338, remove the
redundant lazy import of get_cognify_config inside the if block (the function is
already imported at module level on line 6); delete the "from
cognee.modules.cognify.config import get_cognify_config" line and leave the code
to call get_cognify_config() directly so chunks_per_batch is assigned from the
already-imported function.
| classification_model: object = DefaultContentPrediction | ||
| summarization_model: object = SummarizedContent | ||
| triplet_embedding: bool = False | ||
| chunks_per_batch: Optional[int] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Add field documentation.
The new chunks_per_batch field lacks a docstring or Field description explaining its purpose, valid range, and environment variable name. Based on the coding guidelines, undocumented fields are considered incomplete.
🔎 Proposed documentation addition
+ """
+ chunks_per_batch: Number of chunks to process per task batch in Cognify.
+ Can be configured via CHUNKS_PER_BATCH environment variable.
+ Higher values (e.g., 50) can improve processing speed for large documents,
+ but may cause max_token errors if set too high. Defaults to 100 for default tasks
+ and 10 for temporal tasks when not specified.
+ """
chunks_per_batch: Optional[int] = NoneAlternatively, use Pydantic's Field with description:
- chunks_per_batch: Optional[int] = None
+ chunks_per_batch: Optional[int] = Field(
+ default=None,
+ description="Number of chunks to process per task batch (configurable via CHUNKS_PER_BATCH env var)"
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| chunks_per_batch: Optional[int] = None | |
| chunks_per_batch: Optional[int] = Field( | |
| default=None, | |
| description="Number of chunks to process per task batch in Cognify. Can be configured via CHUNKS_PER_BATCH environment variable. Higher values can improve processing speed but may cause max_token errors if set too high." | |
| ) |
🤖 Prompt for AI Agents
In cognee/modules/cognify/config.py around line 12, the new chunks_per_batch
field is missing documentation; add a concise Field description (or a docstring
above the attribute) that states its purpose (how it controls batching of
chunks), the valid range (e.g., positive integer limits or None meaning no
batching), and the corresponding environment variable name if applicable; use
Pydantic's Field(description="...") to include this metadata and update any
README or env docs to match.
Fix AttributeError when args.chunks_per_batch is not present in the argparse.Namespace object. Use getattr() with default value of None to safely access the optional chunks_per_batch parameter. This resolves test failures in test_cli_edge_cases.py where Namespace objects were created without the chunks_per_batch attribute. Changes: - Use getattr(args, 'chunks_per_batch', None) instead of direct access - Update test assertion to expect chunks_per_batch=None parameter 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
…meter Update all cognify test assertions to expect the chunks_per_batch=None parameter that was added to the CLI command. This fixes three failing tests: - test_execute_basic_cognify - test_cognify_invalid_chunk_size - test_cognify_nonexistent_ontology_file 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cognee/tests/cli_tests/cli_unit_tests/test_cli_edge_cases.py (1)
259-265: Update all test assertions to include chunks_per_batch parameter.Only
test_cognify_empty_datasets_listwas updated to expectchunks_per_batch=None(line 376), but the other test methods in this class (test_cognify_invalid_chunk_sizeat lines 259-265 andtest_cognify_nonexistent_ontology_fileat lines 292-298) still use the old assertion withoutchunks_per_batch. Sinceassert_awaited_once_with()performs exact parameter matching, these tests will fail if the actualcognify()call now includes thechunks_per_batchparameter.🔧 Proposed fix to add chunks_per_batch to all test assertions
Fix for test_cognify_invalid_chunk_size:
mock_cognee.cognify.assert_awaited_once_with( datasets=None, chunk_size=-100, ontology_file_path=None, chunker=TextChunker, run_in_background=False, + chunks_per_batch=None, )Fix for test_cognify_nonexistent_ontology_file:
mock_cognee.cognify.assert_awaited_once_with( datasets=None, chunk_size=None, ontology_file_path="/nonexistent/path/ontology.owl", chunker=TextChunker, run_in_background=False, + chunks_per_batch=None, )Also applies to: 292-298, 370-377
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
CLAUDE.mdcognee/cli/commands/cognify_command.pycognee/tests/cli_tests/cli_unit_tests/test_cli_edge_cases.py
✅ Files skipped from review due to trivial changes (1)
- CLAUDE.md
🚧 Files skipped from review as they are similar to previous changes (1)
- cognee/cli/commands/cognify_command.py
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Use 4-space indentation in Python code
Use snake_case for Python module and function names
Use PascalCase for Python class names
Use ruff format before committing Python code
Use ruff check for import hygiene and style enforcement with line-length 100 configured in pyproject.toml
Prefer explicit, structured error handling in Python code
Files:
cognee/tests/cli_tests/cli_unit_tests/test_cli_edge_cases.py
⚙️ CodeRabbit configuration file
**/*.py: When reviewing Python code for this project:
- Prioritize portability over clarity, especially when dealing with cross-Python compatibility. However, with the priority in mind, do still consider improvements to clarity when relevant.
- As a general guideline, consider the code style advocated in the PEP 8 standard (excluding the use of spaces for indentation) and evaluate suggested changes for code style compliance.
- As a style convention, consider the code style advocated in CEP-8 and evaluate suggested changes for code style compliance.
- As a general guideline, try to provide any relevant, official, and supporting documentation links to any tool's suggestions in review comments. This guideline is important for posterity.
- As a general rule, undocumented function definitions and class definitions in the project's Python code are assumed incomplete. Please consider suggesting a short summary of the code for any of these incomplete definitions as docstrings when reviewing.
Files:
cognee/tests/cli_tests/cli_unit_tests/test_cli_edge_cases.py
cognee/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use shared logging utilities from cognee.shared.logging_utils in Python code
Files:
cognee/tests/cli_tests/cli_unit_tests/test_cli_edge_cases.py
cognee/tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
cognee/tests/**/*.py: Place Python tests under cognee/tests/ organized by type (unit, integration, cli_tests)
Name Python test files test_*.py and use pytest.mark.asyncio for async tests
Files:
cognee/tests/cli_tests/cli_unit_tests/test_cli_edge_cases.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: 0xideas
Repo: topoteretes/cognee PR: 205
File: cognee/tests/unit/processing/chunks/chunk_by_paragraph_test.py:7-7
Timestamp: 2024-11-13T14:55:05.912Z
Learning: When changes are made to the chunking implementation in `cognee/tasks/chunks`, the ground truth values in the corresponding tests in `cognee/tests/unit/processing/chunks` need to be updated accordingly.
📚 Learning: 2024-11-13T14:55:05.912Z
Learnt from: 0xideas
Repo: topoteretes/cognee PR: 205
File: cognee/tests/unit/processing/chunks/chunk_by_paragraph_test.py:7-7
Timestamp: 2024-11-13T14:55:05.912Z
Learning: When changes are made to the chunking implementation in `cognee/tasks/chunks`, the ground truth values in the corresponding tests in `cognee/tests/unit/processing/chunks` need to be updated accordingly.
Applied to files:
cognee/tests/cli_tests/cli_unit_tests/test_cli_edge_cases.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: End-to-End Tests / S3 Bucket Test
- GitHub Check: End-to-End Tests / Conversation sessions test (Redis)
- GitHub Check: End-to-End Tests / Test Entity Extraction
- GitHub Check: End-to-End Tests / Test Feedback Enrichment
- GitHub Check: End-to-End Tests / Test multi tenancy with different situations in Cognee
- GitHub Check: End-to-End Tests / Test permissions with different situations in Cognee
- GitHub Check: End-to-End Tests / Concurrent Subprocess access test
- GitHub Check: End-to-End Tests / Conversation sessions test (FS)
- GitHub Check: End-to-End Tests / Run Telemetry Pipeline Test
- GitHub Check: Basic Tests / Run Integration Tests
- GitHub Check: Basic Tests / Run Unit Tests
- GitHub Check: CLI Tests / CLI Integration Tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
cognee/tests/cli_tests/cli_unit_tests/test_cli_edge_cases.py (1)
265-265: Consider adding test coverage for the new--chunks-per-batchCLI option.The tests correctly verify that
chunks_per_batch=Noneis passed when the CLI argument is not provided. However, there are no tests verifying the positive case where a user actually provides--chunks-per-batchwith various values.Consider adding test cases such as:
- Valid positive integer value (e.g.,
--chunks-per-batch 50)- Edge cases (e.g.,
--chunks-per-batch 1, very large values)- Invalid values if validation exists (e.g., negative numbers, zero, non-integers)
This would ensure the CLI option is properly parsed, validated, and threaded through to the cognify function as intended by the PR objectives.
Also applies to: 299-299, 378-378
cognee/tests/cli_tests/cli_unit_tests/test_cli_commands.py (1)
241-241: Add test coverage for the--chunks-per-batchCLI option.While the test correctly verifies the default behavior (
chunks_per_batch=None), there's no test case in this file that verifies what happens when a user actually provides the--chunks-per-batchargument.Consider adding a test method such as:
@patch("cognee.cli.commands.cognify_command.asyncio.run", side_effect=_mock_run) def test_execute_with_chunks_per_batch(self, mock_asyncio_run): """Test execute with chunks_per_batch specified""" mock_cognee = MagicMock() mock_cognee.cognify = AsyncMock(return_value="success") with patch.dict(sys.modules, {"cognee": mock_cognee}): command = CognifyCommand() args = argparse.Namespace( datasets=None, chunk_size=None, ontology_file=None, chunker="TextChunker", background=False, verbose=False, chunks_per_batch=50, ) command.execute(args) from cognee.modules.chunking.TextChunker import TextChunker mock_cognee.cognify.assert_awaited_once_with( datasets=None, chunk_size=None, ontology_file_path=None, chunker=TextChunker, run_in_background=False, chunks_per_batch=50, )This would verify that the new configurable parameter is properly threaded through the command execution path, which is a key objective of this PR.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
cognee/tests/cli_tests/cli_unit_tests/test_cli_commands.pycognee/tests/cli_tests/cli_unit_tests/test_cli_edge_cases.py
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Use 4-space indentation in Python code
Use snake_case for Python module and function names
Use PascalCase for Python class names
Use ruff format before committing Python code
Use ruff check for import hygiene and style enforcement with line-length 100 configured in pyproject.toml
Prefer explicit, structured error handling in Python code
Files:
cognee/tests/cli_tests/cli_unit_tests/test_cli_edge_cases.pycognee/tests/cli_tests/cli_unit_tests/test_cli_commands.py
⚙️ CodeRabbit configuration file
**/*.py: When reviewing Python code for this project:
- Prioritize portability over clarity, especially when dealing with cross-Python compatibility. However, with the priority in mind, do still consider improvements to clarity when relevant.
- As a general guideline, consider the code style advocated in the PEP 8 standard (excluding the use of spaces for indentation) and evaluate suggested changes for code style compliance.
- As a style convention, consider the code style advocated in CEP-8 and evaluate suggested changes for code style compliance.
- As a general guideline, try to provide any relevant, official, and supporting documentation links to any tool's suggestions in review comments. This guideline is important for posterity.
- As a general rule, undocumented function definitions and class definitions in the project's Python code are assumed incomplete. Please consider suggesting a short summary of the code for any of these incomplete definitions as docstrings when reviewing.
Files:
cognee/tests/cli_tests/cli_unit_tests/test_cli_edge_cases.pycognee/tests/cli_tests/cli_unit_tests/test_cli_commands.py
cognee/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use shared logging utilities from cognee.shared.logging_utils in Python code
Files:
cognee/tests/cli_tests/cli_unit_tests/test_cli_edge_cases.pycognee/tests/cli_tests/cli_unit_tests/test_cli_commands.py
cognee/tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
cognee/tests/**/*.py: Place Python tests under cognee/tests/ organized by type (unit, integration, cli_tests)
Name Python test files test_*.py and use pytest.mark.asyncio for async tests
Files:
cognee/tests/cli_tests/cli_unit_tests/test_cli_edge_cases.pycognee/tests/cli_tests/cli_unit_tests/test_cli_commands.py
🧠 Learnings (3)
📓 Common learnings
Learnt from: 0xideas
Repo: topoteretes/cognee PR: 205
File: cognee/tests/unit/processing/chunks/chunk_by_paragraph_test.py:7-7
Timestamp: 2024-11-13T14:55:05.912Z
Learning: When changes are made to the chunking implementation in `cognee/tasks/chunks`, the ground truth values in the corresponding tests in `cognee/tests/unit/processing/chunks` need to be updated accordingly.
Learnt from: borisarzentar
Repo: topoteretes/cognee PR: 144
File: cognee/tasks/chunking/query_chunks.py:1-17
Timestamp: 2024-10-07T11:20:44.876Z
Learning: The `query_chunks` function in `cognee/tasks/chunking/query_chunks.py` is used within the `search` function in `cognee/api/v1/search/search_v2.py`.
Learnt from: borisarzentar
Repo: topoteretes/cognee PR: 144
File: cognee/tasks/chunking/query_chunks.py:1-17
Timestamp: 2024-10-16T07:06:28.669Z
Learning: The `query_chunks` function in `cognee/tasks/chunking/query_chunks.py` is used within the `search` function in `cognee/api/v1/search/search_v2.py`.
📚 Learning: 2024-11-13T14:55:05.912Z
Learnt from: 0xideas
Repo: topoteretes/cognee PR: 205
File: cognee/tests/unit/processing/chunks/chunk_by_paragraph_test.py:7-7
Timestamp: 2024-11-13T14:55:05.912Z
Learning: When changes are made to the chunking implementation in `cognee/tasks/chunks`, the ground truth values in the corresponding tests in `cognee/tests/unit/processing/chunks` need to be updated accordingly.
Applied to files:
cognee/tests/cli_tests/cli_unit_tests/test_cli_edge_cases.pycognee/tests/cli_tests/cli_unit_tests/test_cli_commands.py
📚 Learning: 2024-10-16T07:06:28.669Z
Learnt from: borisarzentar
Repo: topoteretes/cognee PR: 144
File: cognee/tasks/chunking/query_chunks.py:1-17
Timestamp: 2024-10-16T07:06:28.669Z
Learning: The `query_chunks` function in `cognee/tasks/chunking/query_chunks.py` is used within the `search` function in `cognee/api/v1/search/search_v2.py`.
Applied to files:
cognee/tests/cli_tests/cli_unit_tests/test_cli_edge_cases.pycognee/tests/cli_tests/cli_unit_tests/test_cli_commands.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (23)
- GitHub Check: End-to-End Tests / Test Feedback Enrichment
- GitHub Check: End-to-End Tests / Test graph edge ingestion
- GitHub Check: End-to-End Tests / Test multi tenancy with different situations in Cognee
- GitHub Check: End-to-End Tests / Test permissions with different situations in Cognee
- GitHub Check: End-to-End Tests / Test dataset database deletion in Cognee
- GitHub Check: End-to-End Tests / Conversation sessions test (Redis)
- GitHub Check: End-to-End Tests / Conversation sessions test (FS)
- GitHub Check: End-to-End Tests / Test Entity Extraction
- GitHub Check: End-to-End Tests / Test Cognify - Edge Centered Payload
- GitHub Check: End-to-End Tests / Run Telemetry Pipeline Test
- GitHub Check: End-to-End Tests / Deduplication Test
- GitHub Check: End-to-End Tests / Test dataset database handlers in Cognee
- GitHub Check: End-to-End Tests / Concurrent Subprocess access test
- GitHub Check: End-to-End Tests / Test adding of label for data in Cognee
- GitHub Check: End-to-End Tests / Test using different async databases in parallel in Cognee
- GitHub Check: End-to-End Tests / S3 Bucket Test
- GitHub Check: Basic Tests / Run Unit Tests
- GitHub Check: End-to-End Tests / Server Start Test
- GitHub Check: Basic Tests / Run Simple Examples BAML
- GitHub Check: Basic Tests / Run Integration Tests
- GitHub Check: Basic Tests / Run Simple Examples
- GitHub Check: CLI Tests / CLI Integration Tests
- GitHub Check: CLI Tests / CLI Functionality Tests
Description
Acceptance Criteria
Type of Change
Screenshots/Videos (if applicable)
Pre-submission Checklist
DCO Affirmation
I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
Summary by CodeRabbit
New Features
Style / Documentation
Tests
✏️ Tip: You can customize this high-level summary in your review settings.