Skip to content

Conversation

@Vasilije1990
Copy link
Contributor

@Vasilije1990 Vasilije1990 commented Dec 28, 2025

Description

Acceptance Criteria

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Code refactoring
  • Performance improvement
  • Other (please specify):

Screenshots/Videos (if applicable)

Pre-submission Checklist

  • I have tested my changes thoroughly before submitting this PR
  • This PR contains minimal changes necessary to address the issue/feature
  • My code follows the project's coding standards and style guidelines
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if applicable)
  • All new and existing tests pass
  • I have searched existing PRs to ensure this change hasn't been submitted already
  • I have linked any relevant issues in the description
  • My commits have clear and descriptive messages

DCO Affirmation

I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

Summary by CodeRabbit

  • New Features

    • Added configurable chunks-per-batch to control per-batch processing size via CLI flag, API payload, and configuration; defaults are now driven by config with an automatic fallback.
  • Style / Documentation

    • Updated contribution/style guidelines (formatting, line length, string-quote rule, pre-commit note).
  • Tests

    • Updated CLI tests to verify propagation of the new chunks-per-batch parameter.

✏️ Tip: You can customize this high-level summary in your review settings.

@pull-checklist
Copy link

Please make sure all the checkboxes are checked:

  • I have tested these changes locally.
  • I have reviewed the code changes.
  • I have added end-to-end and unit tests (if applicable).
  • I have updated the documentation and README.md file (if necessary).
  • I have removed unnecessary code and debug statements.
  • PR title is clear and follows the convention.
  • I have tagged reviewers or team members for feedback.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 28, 2025

Walkthrough

Makes chunks_per_batch configurable for cognify: adds the field to CognifyConfig, threads an optional chunks_per_batch through API router and CLI into core task builders, and moves defaulting to read from CognifyConfig with a fallback to 100.

Changes

Cohort / File(s) Summary
Configuration
cognee/modules/cognify/config.py
Added optional chunks_per_batch: Optional[int] = None to CognifyConfig and included it in to_dict().
Core cognify logic
cognee/api/v1/cognify/cognify.py
Updated get_default_tasks and get_temporal_tasks signatures to accept chunks_per_batch: int = None; derive default from get_cognify_config() at runtime with a fallback of 100; added lazy config retrieval in temporal flow.
API router
cognee/api/v1/cognify/routers/get_cognify_router.py
Added chunks_per_batch: Optional[int] to CognifyPayloadDTO and pass payload.chunks_per_batch into the cognify invocation.
CLI
cognee/cli/commands/cognify_command.py
Added --chunks-per-batch CLI option and forward chunks_per_batch=args.chunks_per_batch to cognify call.
Tests
cognee/tests/cli_tests/cli_unit_tests/test_cli_edge_cases.py, cognee/tests/cli_tests/cli_unit_tests/test_cli_commands.py
Updated tests to expect chunks_per_batch=None threaded through the CognifyCommand → cognify call path.
Docs / Style
CLAUDE.md
Documentation/style edits (formatter, double-quote rule, line length, pre-commit instruction).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

run-checks

Suggested reviewers

  • dexters1
  • lxobr
  • hajdul88
🚥 Pre-merge checks | ✅ 2 | ❌ 3
❌ Failed checks (3 warnings)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description contains only the template boilerplate with no actual content filled in—all required sections (Description, Acceptance Criteria, Type of Change, Pre-submission Checklist) are empty. Provide a human-written description explaining the changes, acceptance criteria for testing, type of change classification, and completion status of the pre-submission checklist items.
Out of Scope Changes check ⚠️ Warning The PR includes an unrelated documentation update to CLAUDE.md (Code Style guidelines) that is out of scope for the batch size configuration feature [COG-3146]. Remove the CLAUDE.md changes or move them to a separate PR, as they are unrelated to the batch size configuration objective in COG-3146.
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: Configurable batch size' clearly and concisely summarizes the main change, which aligns with the implementation of making batch size configurable via the chunks_per_batch parameter.
Linked Issues check ✅ Passed The code changes successfully implement the requirement to make batch_size configurable. The chunks_per_batch parameter has been added to the cognify pipeline (API, CLI, config, and core logic), allowing users to configure batch size instead of relying on fixed defaults [COG-3146].

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
cognee/api/v1/cognify/routers/get_cognify_router.py (1)

76-81: Update endpoint documentation with new parameter.

The endpoint docstring comprehensively documents other request parameters but is missing documentation for the new chunks_per_batch field. API users may not discover this configuration option.

🔎 Proposed documentation addition
         ## Request Parameters
         - **datasets** (Optional[List[str]]): List of dataset names to process. Dataset names are resolved to datasets owned by the authenticated user.
         - **dataset_ids** (Optional[List[UUID]]): List of existing dataset UUIDs to process. UUIDs allow processing of datasets not owned by the user (if permitted).
         - **run_in_background** (Optional[bool]): Whether to execute processing asynchronously. Defaults to False (blocking).
         - **custom_prompt** (Optional[str]): Custom prompt for entity extraction and graph generation. If provided, this prompt will be used instead of the default prompts for knowledge graph extraction.
         - **ontology_key** (Optional[List[str]]): Reference to one or more previously uploaded ontology files to use for knowledge graph construction.
+        - **chunks_per_batch** (Optional[int]): Number of chunks to process per task batch. Higher values (e.g., 50) can speed up processing for large documents but may cause max_token errors if too large. Defaults to 100 for standard tasks, 10 for temporal tasks.
cognee/api/v1/cognify/cognify.py (1)

47-47: Fix type annotation for optional parameter.

The parameter chunks_per_batch: int = None should use Optional[int] to properly indicate it accepts None. This is particularly important for public API functions per the coding guidelines.

🔎 Proposed fix
-    chunks_per_batch: int = None,
+    chunks_per_batch: Optional[int] = None,
🧹 Nitpick comments (1)
cognee/cli/commands/cognify_command.py (1)

16-33: Consider mentioning batch size configuration in command description.

The command description thoroughly explains the processing pipeline but doesn't mention the new --chunks-per-batch option. While users can discover it via --help, adding a brief mention would improve discoverability.

Example addition
 Processing Pipeline:
 1. **Document Classification**: Identifies document types and structures
 2. **Permission Validation**: Ensures user has processing rights
 3. **Text Chunking**: Breaks content into semantically meaningful segments
 4. **Entity Extraction**: Identifies key concepts, people, places, organizations
 5. **Relationship Detection**: Discovers connections between entities
 6. **Graph Construction**: Builds semantic knowledge graph with embeddings
 7. **Content Summarization**: Creates hierarchical summaries for navigation
+
+Performance tuning is available via --chunks-per-batch to control batch processing size.
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 310e9e9 and d1b13b1.

📒 Files selected for processing (4)
  • cognee/api/v1/cognify/cognify.py
  • cognee/api/v1/cognify/routers/get_cognify_router.py
  • cognee/cli/commands/cognify_command.py
  • cognee/modules/cognify/config.py
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Use 4-space indentation in Python code
Use snake_case for Python module and function names
Use PascalCase for Python class names
Use ruff format before committing Python code
Use ruff check for import hygiene and style enforcement with line-length 100 configured in pyproject.toml
Prefer explicit, structured error handling in Python code

Files:

  • cognee/api/v1/cognify/routers/get_cognify_router.py
  • cognee/cli/commands/cognify_command.py
  • cognee/modules/cognify/config.py
  • cognee/api/v1/cognify/cognify.py

⚙️ CodeRabbit configuration file

**/*.py: When reviewing Python code for this project:

  1. Prioritize portability over clarity, especially when dealing with cross-Python compatibility. However, with the priority in mind, do still consider improvements to clarity when relevant.
  2. As a general guideline, consider the code style advocated in the PEP 8 standard (excluding the use of spaces for indentation) and evaluate suggested changes for code style compliance.
  3. As a style convention, consider the code style advocated in CEP-8 and evaluate suggested changes for code style compliance.
  4. As a general guideline, try to provide any relevant, official, and supporting documentation links to any tool's suggestions in review comments. This guideline is important for posterity.
  5. As a general rule, undocumented function definitions and class definitions in the project's Python code are assumed incomplete. Please consider suggesting a short summary of the code for any of these incomplete definitions as docstrings when reviewing.

Files:

  • cognee/api/v1/cognify/routers/get_cognify_router.py
  • cognee/cli/commands/cognify_command.py
  • cognee/modules/cognify/config.py
  • cognee/api/v1/cognify/cognify.py
cognee/api/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Public APIs should be type-annotated in Python where practical

Files:

  • cognee/api/v1/cognify/routers/get_cognify_router.py
  • cognee/api/v1/cognify/cognify.py
cognee/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use shared logging utilities from cognee.shared.logging_utils in Python code

Files:

  • cognee/api/v1/cognify/routers/get_cognify_router.py
  • cognee/cli/commands/cognify_command.py
  • cognee/modules/cognify/config.py
  • cognee/api/v1/cognify/cognify.py
cognee/{modules,infrastructure,tasks}/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Co-locate feature-specific helpers under their respective package (modules/, infrastructure/, or tasks/)

Files:

  • cognee/modules/cognify/config.py
🧠 Learnings (3)
📓 Common learnings
Learnt from: 0xideas
Repo: topoteretes/cognee PR: 205
File: cognee/tests/unit/processing/chunks/chunk_by_paragraph_test.py:7-7
Timestamp: 2024-11-13T14:55:05.912Z
Learning: When changes are made to the chunking implementation in `cognee/tasks/chunks`, the ground truth values in the corresponding tests in `cognee/tests/unit/processing/chunks` need to be updated accordingly.
📚 Learning: 2024-11-13T14:55:05.912Z
Learnt from: 0xideas
Repo: topoteretes/cognee PR: 205
File: cognee/tests/unit/processing/chunks/chunk_by_paragraph_test.py:7-7
Timestamp: 2024-11-13T14:55:05.912Z
Learning: When changes are made to the chunking implementation in `cognee/tasks/chunks`, the ground truth values in the corresponding tests in `cognee/tests/unit/processing/chunks` need to be updated accordingly.

Applied to files:

  • cognee/api/v1/cognify/routers/get_cognify_router.py
  • cognee/cli/commands/cognify_command.py
  • cognee/api/v1/cognify/cognify.py
📚 Learning: 2024-10-07T11:20:44.876Z
Learnt from: borisarzentar
Repo: topoteretes/cognee PR: 144
File: cognee/tasks/chunking/query_chunks.py:1-17
Timestamp: 2024-10-07T11:20:44.876Z
Learning: The `query_chunks` function in `cognee/tasks/chunking/query_chunks.py` is used within the `search` function in `cognee/api/v1/search/search_v2.py`.

Applied to files:

  • cognee/api/v1/cognify/cognify.py
🧬 Code graph analysis (2)
cognee/api/v1/cognify/routers/get_cognify_router.py (1)
new-examples/demos/custom_graph_model_entity_schema_definition.py (1)
  • Field (35-38)
cognee/api/v1/cognify/cognify.py (2)
cognee/api/v1/cognify/routers/get_cognify_router.py (1)
  • cognify (60-162)
cognee/modules/cognify/config.py (1)
  • get_cognify_config (25-26)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (23)
  • GitHub Check: End-to-End Tests / Conversation sessions test (Redis)
  • GitHub Check: End-to-End Tests / Test Entity Extraction
  • GitHub Check: End-to-End Tests / Test dataset database deletion in Cognee
  • GitHub Check: End-to-End Tests / Test Feedback Enrichment
  • GitHub Check: End-to-End Tests / Concurrent Subprocess access test
  • GitHub Check: End-to-End Tests / Test using different async databases in parallel in Cognee
  • GitHub Check: End-to-End Tests / Test multi tenancy with different situations in Cognee
  • GitHub Check: End-to-End Tests / Test dataset database handlers in Cognee
  • GitHub Check: End-to-End Tests / Test Cognify - Edge Centered Payload
  • GitHub Check: End-to-End Tests / Test adding of label for data in Cognee
  • GitHub Check: End-to-End Tests / Conversation sessions test (FS)
  • GitHub Check: End-to-End Tests / Test graph edge ingestion
  • GitHub Check: End-to-End Tests / Test permissions with different situations in Cognee
  • GitHub Check: End-to-End Tests / Deduplication Test
  • GitHub Check: End-to-End Tests / S3 Bucket Test
  • GitHub Check: Basic Tests / Run Simple Examples BAML
  • GitHub Check: End-to-End Tests / Run Telemetry Pipeline Test
  • GitHub Check: End-to-End Tests / Server Start Test
  • GitHub Check: Basic Tests / Run Unit Tests
  • GitHub Check: Basic Tests / Run Simple Examples
  • GitHub Check: Basic Tests / Run Integration Tests
  • GitHub Check: CLI Tests / CLI Functionality Tests
  • GitHub Check: CLI Tests / CLI Integration Tests
🔇 Additional comments (4)
cognee/api/v1/cognify/routers/get_cognify_router.py (1)

49-53: LGTM! Well-documented field.

The chunks_per_batch field is properly typed, has a clear description, and includes helpful examples. Good attention to API documentation.

cognee/cli/commands/cognify_command.py (1)

65-69: LGTM! CLI option properly configured.

The --chunks-per-batch argument is well-defined with helpful guidance for users. The practical example ("try 50 for large single documents") aligns with the PR objective of improving processing speed.

cognee/api/v1/cognify/cognify.py (2)

278-281: LGTM! Config-driven default properly implemented.

The logic correctly derives chunks_per_batch from configuration when not explicitly provided, with a sensible fallback to 100. This aligns with the PR objective of making batch size configurable.


278-281: Verify different default values are intentional.

The fallback values differ between task types: 100 for default tasks (line 280) and 10 for temporal tasks (line 338). While this appears to preserve existing behavior, please confirm this difference is intentional and appropriate for the different processing characteristics.

Consider adding a comment explaining why temporal tasks use a different default:

# Temporal tasks use smaller batch size (10) to handle time-series processing efficiently
chunks_per_batch = configured if configured is not None else 10

Also applies to: 334-338

config: Config = None,
custom_prompt: Optional[str] = None,
chunks_per_batch: int = 100,
chunks_per_batch: int = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix type annotation for optional parameter.

The parameter chunks_per_batch: int = None should use Optional[int] for proper type checking, consistent with the Optional import at line 3 and the coding guidelines requiring type annotations for public APIs.

🔎 Proposed fix
-    chunks_per_batch: int = None,
+    chunks_per_batch: Optional[int] = None,
🤖 Prompt for AI Agents
In cognee/api/v1/cognify/cognify.py around line 255, the optional parameter is
annotated as "chunks_per_batch: int = None" but should use Optional[int]; update
the signature to "chunks_per_batch: Optional[int] = None" (the Optional import
exists at line 3) so static type checkers and the public API follow the
project's typing guidelines.


async def get_temporal_tasks(
user: User = None, chunker=TextChunker, chunk_size: int = None, chunks_per_batch: int = 10
user: User = None, chunker=TextChunker, chunk_size: int = None, chunks_per_batch: int = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix type annotation for optional parameter.

The parameter chunks_per_batch: int = None should use Optional[int] for consistency with other functions in this file and proper type checking.

🔎 Proposed fix
-    user: User = None, chunker=TextChunker, chunk_size: int = None, chunks_per_batch: int = None
+    user: User = None, chunker=TextChunker, chunk_size: int = None, chunks_per_batch: Optional[int] = None
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
user: User = None, chunker=TextChunker, chunk_size: int = None, chunks_per_batch: int = None
user: User = None, chunker=TextChunker, chunk_size: int = None, chunks_per_batch: Optional[int] = None
🤖 Prompt for AI Agents
In cognee/api/v1/cognify/cognify.py around line 313, the parameter annotation
chunks_per_batch: int = None is incorrect for an optional parameter; change it
to chunks_per_batch: Optional[int] and ensure typing.Optional is imported (add
"from typing import Optional" if not already present) so the signature matches
other functions and type checkers accept None as a valid value.

Comment on lines 334 to +338
if chunks_per_batch is None:
chunks_per_batch = 10
from cognee.modules.cognify.config import get_cognify_config

configured = get_cognify_config().chunks_per_batch
chunks_per_batch = configured if configured is not None else 10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Remove redundant import.

The lazy import of get_cognify_config at line 335 is unnecessary since it's already imported at the module level (line 6).

🔎 Proposed fix
     if chunks_per_batch is None:
-        from cognee.modules.cognify.config import get_cognify_config
-
         configured = get_cognify_config().chunks_per_batch
         chunks_per_batch = configured if configured is not None else 10
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if chunks_per_batch is None:
chunks_per_batch = 10
from cognee.modules.cognify.config import get_cognify_config
configured = get_cognify_config().chunks_per_batch
chunks_per_batch = configured if configured is not None else 10
if chunks_per_batch is None:
configured = get_cognify_config().chunks_per_batch
chunks_per_batch = configured if configured is not None else 10
🤖 Prompt for AI Agents
In cognee/api/v1/cognify/cognify.py around lines 334 to 338, remove the
redundant lazy import of get_cognify_config inside the if block (the function is
already imported at module level on line 6); delete the "from
cognee.modules.cognify.config import get_cognify_config" line and leave the code
to call get_cognify_config() directly so chunks_per_batch is assigned from the
already-imported function.

classification_model: object = DefaultContentPrediction
summarization_model: object = SummarizedContent
triplet_embedding: bool = False
chunks_per_batch: Optional[int] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add field documentation.

The new chunks_per_batch field lacks a docstring or Field description explaining its purpose, valid range, and environment variable name. Based on the coding guidelines, undocumented fields are considered incomplete.

🔎 Proposed documentation addition
+    """
+    chunks_per_batch: Number of chunks to process per task batch in Cognify.
+    Can be configured via CHUNKS_PER_BATCH environment variable.
+    Higher values (e.g., 50) can improve processing speed for large documents,
+    but may cause max_token errors if set too high. Defaults to 100 for default tasks
+    and 10 for temporal tasks when not specified.
+    """
     chunks_per_batch: Optional[int] = None

Alternatively, use Pydantic's Field with description:

-    chunks_per_batch: Optional[int] = None
+    chunks_per_batch: Optional[int] = Field(
+        default=None,
+        description="Number of chunks to process per task batch (configurable via CHUNKS_PER_BATCH env var)"
+    )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
chunks_per_batch: Optional[int] = None
chunks_per_batch: Optional[int] = Field(
default=None,
description="Number of chunks to process per task batch in Cognify. Can be configured via CHUNKS_PER_BATCH environment variable. Higher values can improve processing speed but may cause max_token errors if set too high."
)
🤖 Prompt for AI Agents
In cognee/modules/cognify/config.py around line 12, the new chunks_per_batch
field is missing documentation; add a concise Field description (or a docstring
above the attribute) that states its purpose (how it controls batching of
chunks), the valid range (e.g., positive integer limits or None meaning no
batching), and the corresponding environment variable name if applicable; use
Pydantic's Field(description="...") to include this metadata and update any
README or env docs to match.

@Vasilije1990 Vasilije1990 requested a review from dexters1 January 10, 2026 07:49
Vasilije1990 and others added 5 commits January 11, 2026 16:15
Fix AttributeError when args.chunks_per_batch is not present in the
argparse.Namespace object. Use getattr() with default value of None
to safely access the optional chunks_per_batch parameter.

This resolves test failures in test_cli_edge_cases.py where Namespace
objects were created without the chunks_per_batch attribute.

Changes:
- Use getattr(args, 'chunks_per_batch', None) instead of direct access
- Update test assertion to expect chunks_per_batch=None parameter

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
…meter

Update all cognify test assertions to expect the chunks_per_batch=None
parameter that was added to the CLI command. This fixes three failing tests:
- test_execute_basic_cognify
- test_cognify_invalid_chunk_size
- test_cognify_nonexistent_ontology_file

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cognee/tests/cli_tests/cli_unit_tests/test_cli_edge_cases.py (1)

259-265: Update all test assertions to include chunks_per_batch parameter.

Only test_cognify_empty_datasets_list was updated to expect chunks_per_batch=None (line 376), but the other test methods in this class (test_cognify_invalid_chunk_size at lines 259-265 and test_cognify_nonexistent_ontology_file at lines 292-298) still use the old assertion without chunks_per_batch. Since assert_awaited_once_with() performs exact parameter matching, these tests will fail if the actual cognify() call now includes the chunks_per_batch parameter.

🔧 Proposed fix to add chunks_per_batch to all test assertions

Fix for test_cognify_invalid_chunk_size:

 mock_cognee.cognify.assert_awaited_once_with(
     datasets=None,
     chunk_size=-100,
     ontology_file_path=None,
     chunker=TextChunker,
     run_in_background=False,
+    chunks_per_batch=None,
 )

Fix for test_cognify_nonexistent_ontology_file:

 mock_cognee.cognify.assert_awaited_once_with(
     datasets=None,
     chunk_size=None,
     ontology_file_path="/nonexistent/path/ontology.owl",
     chunker=TextChunker,
     run_in_background=False,
+    chunks_per_batch=None,
 )

Also applies to: 292-298, 370-377

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between d1b13b1 and 98394fc.

📒 Files selected for processing (3)
  • CLAUDE.md
  • cognee/cli/commands/cognify_command.py
  • cognee/tests/cli_tests/cli_unit_tests/test_cli_edge_cases.py
✅ Files skipped from review due to trivial changes (1)
  • CLAUDE.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • cognee/cli/commands/cognify_command.py
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Use 4-space indentation in Python code
Use snake_case for Python module and function names
Use PascalCase for Python class names
Use ruff format before committing Python code
Use ruff check for import hygiene and style enforcement with line-length 100 configured in pyproject.toml
Prefer explicit, structured error handling in Python code

Files:

  • cognee/tests/cli_tests/cli_unit_tests/test_cli_edge_cases.py

⚙️ CodeRabbit configuration file

**/*.py: When reviewing Python code for this project:

  1. Prioritize portability over clarity, especially when dealing with cross-Python compatibility. However, with the priority in mind, do still consider improvements to clarity when relevant.
  2. As a general guideline, consider the code style advocated in the PEP 8 standard (excluding the use of spaces for indentation) and evaluate suggested changes for code style compliance.
  3. As a style convention, consider the code style advocated in CEP-8 and evaluate suggested changes for code style compliance.
  4. As a general guideline, try to provide any relevant, official, and supporting documentation links to any tool's suggestions in review comments. This guideline is important for posterity.
  5. As a general rule, undocumented function definitions and class definitions in the project's Python code are assumed incomplete. Please consider suggesting a short summary of the code for any of these incomplete definitions as docstrings when reviewing.

Files:

  • cognee/tests/cli_tests/cli_unit_tests/test_cli_edge_cases.py
cognee/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use shared logging utilities from cognee.shared.logging_utils in Python code

Files:

  • cognee/tests/cli_tests/cli_unit_tests/test_cli_edge_cases.py
cognee/tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

cognee/tests/**/*.py: Place Python tests under cognee/tests/ organized by type (unit, integration, cli_tests)
Name Python test files test_*.py and use pytest.mark.asyncio for async tests

Files:

  • cognee/tests/cli_tests/cli_unit_tests/test_cli_edge_cases.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: 0xideas
Repo: topoteretes/cognee PR: 205
File: cognee/tests/unit/processing/chunks/chunk_by_paragraph_test.py:7-7
Timestamp: 2024-11-13T14:55:05.912Z
Learning: When changes are made to the chunking implementation in `cognee/tasks/chunks`, the ground truth values in the corresponding tests in `cognee/tests/unit/processing/chunks` need to be updated accordingly.
📚 Learning: 2024-11-13T14:55:05.912Z
Learnt from: 0xideas
Repo: topoteretes/cognee PR: 205
File: cognee/tests/unit/processing/chunks/chunk_by_paragraph_test.py:7-7
Timestamp: 2024-11-13T14:55:05.912Z
Learning: When changes are made to the chunking implementation in `cognee/tasks/chunks`, the ground truth values in the corresponding tests in `cognee/tests/unit/processing/chunks` need to be updated accordingly.

Applied to files:

  • cognee/tests/cli_tests/cli_unit_tests/test_cli_edge_cases.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: End-to-End Tests / S3 Bucket Test
  • GitHub Check: End-to-End Tests / Conversation sessions test (Redis)
  • GitHub Check: End-to-End Tests / Test Entity Extraction
  • GitHub Check: End-to-End Tests / Test Feedback Enrichment
  • GitHub Check: End-to-End Tests / Test multi tenancy with different situations in Cognee
  • GitHub Check: End-to-End Tests / Test permissions with different situations in Cognee
  • GitHub Check: End-to-End Tests / Concurrent Subprocess access test
  • GitHub Check: End-to-End Tests / Conversation sessions test (FS)
  • GitHub Check: End-to-End Tests / Run Telemetry Pipeline Test
  • GitHub Check: Basic Tests / Run Integration Tests
  • GitHub Check: Basic Tests / Run Unit Tests
  • GitHub Check: CLI Tests / CLI Integration Tests

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
cognee/tests/cli_tests/cli_unit_tests/test_cli_edge_cases.py (1)

265-265: Consider adding test coverage for the new --chunks-per-batch CLI option.

The tests correctly verify that chunks_per_batch=None is passed when the CLI argument is not provided. However, there are no tests verifying the positive case where a user actually provides --chunks-per-batch with various values.

Consider adding test cases such as:

  • Valid positive integer value (e.g., --chunks-per-batch 50)
  • Edge cases (e.g., --chunks-per-batch 1, very large values)
  • Invalid values if validation exists (e.g., negative numbers, zero, non-integers)

This would ensure the CLI option is properly parsed, validated, and threaded through to the cognify function as intended by the PR objectives.

Also applies to: 299-299, 378-378

cognee/tests/cli_tests/cli_unit_tests/test_cli_commands.py (1)

241-241: Add test coverage for the --chunks-per-batch CLI option.

While the test correctly verifies the default behavior (chunks_per_batch=None), there's no test case in this file that verifies what happens when a user actually provides the --chunks-per-batch argument.

Consider adding a test method such as:

@patch("cognee.cli.commands.cognify_command.asyncio.run", side_effect=_mock_run)
def test_execute_with_chunks_per_batch(self, mock_asyncio_run):
    """Test execute with chunks_per_batch specified"""
    mock_cognee = MagicMock()
    mock_cognee.cognify = AsyncMock(return_value="success")

    with patch.dict(sys.modules, {"cognee": mock_cognee}):
        command = CognifyCommand()
        args = argparse.Namespace(
            datasets=None,
            chunk_size=None,
            ontology_file=None,
            chunker="TextChunker",
            background=False,
            verbose=False,
            chunks_per_batch=50,
        )
        command.execute(args)

    from cognee.modules.chunking.TextChunker import TextChunker
    mock_cognee.cognify.assert_awaited_once_with(
        datasets=None,
        chunk_size=None,
        ontology_file_path=None,
        chunker=TextChunker,
        run_in_background=False,
        chunks_per_batch=50,
    )

This would verify that the new configurable parameter is properly threaded through the command execution path, which is a key objective of this PR.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 98394fc and b7d5bf5.

📒 Files selected for processing (2)
  • cognee/tests/cli_tests/cli_unit_tests/test_cli_commands.py
  • cognee/tests/cli_tests/cli_unit_tests/test_cli_edge_cases.py
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Use 4-space indentation in Python code
Use snake_case for Python module and function names
Use PascalCase for Python class names
Use ruff format before committing Python code
Use ruff check for import hygiene and style enforcement with line-length 100 configured in pyproject.toml
Prefer explicit, structured error handling in Python code

Files:

  • cognee/tests/cli_tests/cli_unit_tests/test_cli_edge_cases.py
  • cognee/tests/cli_tests/cli_unit_tests/test_cli_commands.py

⚙️ CodeRabbit configuration file

**/*.py: When reviewing Python code for this project:

  1. Prioritize portability over clarity, especially when dealing with cross-Python compatibility. However, with the priority in mind, do still consider improvements to clarity when relevant.
  2. As a general guideline, consider the code style advocated in the PEP 8 standard (excluding the use of spaces for indentation) and evaluate suggested changes for code style compliance.
  3. As a style convention, consider the code style advocated in CEP-8 and evaluate suggested changes for code style compliance.
  4. As a general guideline, try to provide any relevant, official, and supporting documentation links to any tool's suggestions in review comments. This guideline is important for posterity.
  5. As a general rule, undocumented function definitions and class definitions in the project's Python code are assumed incomplete. Please consider suggesting a short summary of the code for any of these incomplete definitions as docstrings when reviewing.

Files:

  • cognee/tests/cli_tests/cli_unit_tests/test_cli_edge_cases.py
  • cognee/tests/cli_tests/cli_unit_tests/test_cli_commands.py
cognee/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use shared logging utilities from cognee.shared.logging_utils in Python code

Files:

  • cognee/tests/cli_tests/cli_unit_tests/test_cli_edge_cases.py
  • cognee/tests/cli_tests/cli_unit_tests/test_cli_commands.py
cognee/tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

cognee/tests/**/*.py: Place Python tests under cognee/tests/ organized by type (unit, integration, cli_tests)
Name Python test files test_*.py and use pytest.mark.asyncio for async tests

Files:

  • cognee/tests/cli_tests/cli_unit_tests/test_cli_edge_cases.py
  • cognee/tests/cli_tests/cli_unit_tests/test_cli_commands.py
🧠 Learnings (3)
📓 Common learnings
Learnt from: 0xideas
Repo: topoteretes/cognee PR: 205
File: cognee/tests/unit/processing/chunks/chunk_by_paragraph_test.py:7-7
Timestamp: 2024-11-13T14:55:05.912Z
Learning: When changes are made to the chunking implementation in `cognee/tasks/chunks`, the ground truth values in the corresponding tests in `cognee/tests/unit/processing/chunks` need to be updated accordingly.
Learnt from: borisarzentar
Repo: topoteretes/cognee PR: 144
File: cognee/tasks/chunking/query_chunks.py:1-17
Timestamp: 2024-10-07T11:20:44.876Z
Learning: The `query_chunks` function in `cognee/tasks/chunking/query_chunks.py` is used within the `search` function in `cognee/api/v1/search/search_v2.py`.
Learnt from: borisarzentar
Repo: topoteretes/cognee PR: 144
File: cognee/tasks/chunking/query_chunks.py:1-17
Timestamp: 2024-10-16T07:06:28.669Z
Learning: The `query_chunks` function in `cognee/tasks/chunking/query_chunks.py` is used within the `search` function in `cognee/api/v1/search/search_v2.py`.
📚 Learning: 2024-11-13T14:55:05.912Z
Learnt from: 0xideas
Repo: topoteretes/cognee PR: 205
File: cognee/tests/unit/processing/chunks/chunk_by_paragraph_test.py:7-7
Timestamp: 2024-11-13T14:55:05.912Z
Learning: When changes are made to the chunking implementation in `cognee/tasks/chunks`, the ground truth values in the corresponding tests in `cognee/tests/unit/processing/chunks` need to be updated accordingly.

Applied to files:

  • cognee/tests/cli_tests/cli_unit_tests/test_cli_edge_cases.py
  • cognee/tests/cli_tests/cli_unit_tests/test_cli_commands.py
📚 Learning: 2024-10-16T07:06:28.669Z
Learnt from: borisarzentar
Repo: topoteretes/cognee PR: 144
File: cognee/tasks/chunking/query_chunks.py:1-17
Timestamp: 2024-10-16T07:06:28.669Z
Learning: The `query_chunks` function in `cognee/tasks/chunking/query_chunks.py` is used within the `search` function in `cognee/api/v1/search/search_v2.py`.

Applied to files:

  • cognee/tests/cli_tests/cli_unit_tests/test_cli_edge_cases.py
  • cognee/tests/cli_tests/cli_unit_tests/test_cli_commands.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (23)
  • GitHub Check: End-to-End Tests / Test Feedback Enrichment
  • GitHub Check: End-to-End Tests / Test graph edge ingestion
  • GitHub Check: End-to-End Tests / Test multi tenancy with different situations in Cognee
  • GitHub Check: End-to-End Tests / Test permissions with different situations in Cognee
  • GitHub Check: End-to-End Tests / Test dataset database deletion in Cognee
  • GitHub Check: End-to-End Tests / Conversation sessions test (Redis)
  • GitHub Check: End-to-End Tests / Conversation sessions test (FS)
  • GitHub Check: End-to-End Tests / Test Entity Extraction
  • GitHub Check: End-to-End Tests / Test Cognify - Edge Centered Payload
  • GitHub Check: End-to-End Tests / Run Telemetry Pipeline Test
  • GitHub Check: End-to-End Tests / Deduplication Test
  • GitHub Check: End-to-End Tests / Test dataset database handlers in Cognee
  • GitHub Check: End-to-End Tests / Concurrent Subprocess access test
  • GitHub Check: End-to-End Tests / Test adding of label for data in Cognee
  • GitHub Check: End-to-End Tests / Test using different async databases in parallel in Cognee
  • GitHub Check: End-to-End Tests / S3 Bucket Test
  • GitHub Check: Basic Tests / Run Unit Tests
  • GitHub Check: End-to-End Tests / Server Start Test
  • GitHub Check: Basic Tests / Run Simple Examples BAML
  • GitHub Check: Basic Tests / Run Integration Tests
  • GitHub Check: Basic Tests / Run Simple Examples
  • GitHub Check: CLI Tests / CLI Integration Tests
  • GitHub Check: CLI Tests / CLI Functionality Tests

@Vasilije1990 Vasilije1990 merged commit 1d674d4 into dev Jan 16, 2026
170 of 175 checks passed
@Vasilije1990 Vasilije1990 deleted the COG-3146 branch January 16, 2026 14:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants