Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 6 additions & 4 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -427,10 +427,12 @@ git checkout -b feature/your-feature-name

## Code Style

- Ruff for linting and formatting (configured in `pyproject.toml`)
- Line length: 100 characters
- Pre-commit hooks run ruff automatically
- Type hints encouraged (mypy checks enabled)
- **Formatter**: Ruff (configured in `pyproject.toml`)
- **Line length**: 100 characters
- **String quotes**: Use double quotes `"` not single quotes `'` (enforced by ruff-format)
- **Pre-commit hooks**: Run ruff linting and formatting automatically
- **Type hints**: Encouraged (mypy checks enabled)
- **Important**: Always run `pre-commit run --all-files` before committing to catch formatting issues

## Testing Strategy

Expand Down
17 changes: 11 additions & 6 deletions cognee/api/v1/cognify/cognify.py
Original file line number Diff line number Diff line change
Expand Up @@ -252,7 +252,7 @@ async def get_default_tasks( # TODO: Find out a better way to do this (Boris's
chunk_size: int = None,
config: Config = None,
custom_prompt: Optional[str] = None,
chunks_per_batch: int = 100,
chunks_per_batch: int = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix type annotation for optional parameter.

The parameter chunks_per_batch: int = None should use Optional[int] for proper type checking, consistent with the Optional import at line 3 and the coding guidelines requiring type annotations for public APIs.

🔎 Proposed fix
-    chunks_per_batch: int = None,
+    chunks_per_batch: Optional[int] = None,
🤖 Prompt for AI Agents
In cognee/api/v1/cognify/cognify.py around line 255, the optional parameter is
annotated as "chunks_per_batch: int = None" but should use Optional[int]; update
the signature to "chunks_per_batch: Optional[int] = None" (the Optional import
exists at line 3) so static type checkers and the public API follow the
project's typing guidelines.

**kwargs,
) -> list[Task]:
if config is None:
Expand All @@ -272,12 +272,14 @@ async def get_default_tasks( # TODO: Find out a better way to do this (Boris's
"ontology_config": {"ontology_resolver": get_default_ontology_resolver()}
}

if chunks_per_batch is None:
chunks_per_batch = 100

cognify_config = get_cognify_config()
embed_triplets = cognify_config.triplet_embedding

if chunks_per_batch is None:
chunks_per_batch = (
cognify_config.chunks_per_batch if cognify_config.chunks_per_batch is not None else 100
)

default_tasks = [
Task(classify_documents),
Task(
Expand Down Expand Up @@ -308,7 +310,7 @@ async def get_default_tasks( # TODO: Find out a better way to do this (Boris's


async def get_temporal_tasks(
user: User = None, chunker=TextChunker, chunk_size: int = None, chunks_per_batch: int = 10
user: User = None, chunker=TextChunker, chunk_size: int = None, chunks_per_batch: int = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix type annotation for optional parameter.

The parameter chunks_per_batch: int = None should use Optional[int] for consistency with other functions in this file and proper type checking.

🔎 Proposed fix
-    user: User = None, chunker=TextChunker, chunk_size: int = None, chunks_per_batch: int = None
+    user: User = None, chunker=TextChunker, chunk_size: int = None, chunks_per_batch: Optional[int] = None
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
user: User = None, chunker=TextChunker, chunk_size: int = None, chunks_per_batch: int = None
user: User = None, chunker=TextChunker, chunk_size: int = None, chunks_per_batch: Optional[int] = None
🤖 Prompt for AI Agents
In cognee/api/v1/cognify/cognify.py around line 313, the parameter annotation
chunks_per_batch: int = None is incorrect for an optional parameter; change it
to chunks_per_batch: Optional[int] and ensure typing.Optional is imported (add
"from typing import Optional" if not already present) so the signature matches
other functions and type checkers accept None as a valid value.

) -> list[Task]:
"""
Builds and returns a list of temporal processing tasks to be executed in sequence.
Expand All @@ -330,7 +332,10 @@ async def get_temporal_tasks(
list[Task]: A list of Task objects representing the temporal processing pipeline.
"""
if chunks_per_batch is None:
chunks_per_batch = 10
from cognee.modules.cognify.config import get_cognify_config

configured = get_cognify_config().chunks_per_batch
chunks_per_batch = configured if configured is not None else 10
Comment on lines 334 to +338
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Remove redundant import.

The lazy import of get_cognify_config at line 335 is unnecessary since it's already imported at the module level (line 6).

🔎 Proposed fix
     if chunks_per_batch is None:
-        from cognee.modules.cognify.config import get_cognify_config
-
         configured = get_cognify_config().chunks_per_batch
         chunks_per_batch = configured if configured is not None else 10
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if chunks_per_batch is None:
chunks_per_batch = 10
from cognee.modules.cognify.config import get_cognify_config
configured = get_cognify_config().chunks_per_batch
chunks_per_batch = configured if configured is not None else 10
if chunks_per_batch is None:
configured = get_cognify_config().chunks_per_batch
chunks_per_batch = configured if configured is not None else 10
🤖 Prompt for AI Agents
In cognee/api/v1/cognify/cognify.py around lines 334 to 338, remove the
redundant lazy import of get_cognify_config inside the if block (the function is
already imported at module level on line 6); delete the "from
cognee.modules.cognify.config import get_cognify_config" line and leave the code
to call get_cognify_config() directly so chunks_per_batch is assigned from the
already-imported function.


temporal_tasks = [
Task(classify_documents),
Expand Down
6 changes: 6 additions & 0 deletions cognee/api/v1/cognify/routers/get_cognify_router.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,11 @@ class CognifyPayloadDTO(InDTO):
examples=[[]],
description="Reference to one or more previously uploaded ontologies",
)
chunks_per_batch: Optional[int] = Field(
default=None,
description="Number of chunks to process per task batch in Cognify (overrides default).",
examples=[10, 20, 50, 100],
)


def get_cognify_router() -> APIRouter:
Expand Down Expand Up @@ -146,6 +151,7 @@ async def cognify(payload: CognifyPayloadDTO, user: User = Depends(get_authentic
config=config_to_use,
run_in_background=payload.run_in_background,
custom_prompt=payload.custom_prompt,
chunks_per_batch=payload.chunks_per_batch,
)

# If any cognify run errored return JSONResponse with proper error status code
Expand Down
6 changes: 6 additions & 0 deletions cognee/cli/commands/cognify_command.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,11 @@ def configure_parser(self, parser: argparse.ArgumentParser) -> None:
parser.add_argument(
"--verbose", "-v", action="store_true", help="Show detailed progress information"
)
parser.add_argument(
"--chunks-per-batch",
type=int,
help="Number of chunks to process per task batch (try 50 for large single documents).",
)

def execute(self, args: argparse.Namespace) -> None:
try:
Expand Down Expand Up @@ -111,6 +116,7 @@ async def run_cognify():
chunk_size=args.chunk_size,
ontology_file_path=args.ontology_file,
run_in_background=args.background,
chunks_per_batch=getattr(args, "chunks_per_batch", None),
)
return result
except Exception as e:
Expand Down
2 changes: 2 additions & 0 deletions cognee/modules/cognify/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,15 @@ class CognifyConfig(BaseSettings):
classification_model: object = DefaultContentPrediction
summarization_model: object = SummarizedContent
triplet_embedding: bool = False
chunks_per_batch: Optional[int] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add field documentation.

The new chunks_per_batch field lacks a docstring or Field description explaining its purpose, valid range, and environment variable name. Based on the coding guidelines, undocumented fields are considered incomplete.

🔎 Proposed documentation addition
+    """
+    chunks_per_batch: Number of chunks to process per task batch in Cognify.
+    Can be configured via CHUNKS_PER_BATCH environment variable.
+    Higher values (e.g., 50) can improve processing speed for large documents,
+    but may cause max_token errors if set too high. Defaults to 100 for default tasks
+    and 10 for temporal tasks when not specified.
+    """
     chunks_per_batch: Optional[int] = None

Alternatively, use Pydantic's Field with description:

-    chunks_per_batch: Optional[int] = None
+    chunks_per_batch: Optional[int] = Field(
+        default=None,
+        description="Number of chunks to process per task batch (configurable via CHUNKS_PER_BATCH env var)"
+    )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
chunks_per_batch: Optional[int] = None
chunks_per_batch: Optional[int] = Field(
default=None,
description="Number of chunks to process per task batch in Cognify. Can be configured via CHUNKS_PER_BATCH environment variable. Higher values can improve processing speed but may cause max_token errors if set too high."
)
🤖 Prompt for AI Agents
In cognee/modules/cognify/config.py around line 12, the new chunks_per_batch
field is missing documentation; add a concise Field description (or a docstring
above the attribute) that states its purpose (how it controls batching of
chunks), the valid range (e.g., positive integer limits or None meaning no
batching), and the corresponding environment variable name if applicable; use
Pydantic's Field(description="...") to include this metadata and update any
README or env docs to match.

model_config = SettingsConfigDict(env_file=".env", extra="allow")

def to_dict(self) -> dict:
return {
"classification_model": self.classification_model,
"summarization_model": self.summarization_model,
"triplet_embedding": self.triplet_embedding,
"chunks_per_batch": self.chunks_per_batch,
}


Expand Down
1 change: 1 addition & 0 deletions cognee/tests/cli_tests/cli_unit_tests/test_cli_commands.py
Original file line number Diff line number Diff line change
Expand Up @@ -238,6 +238,7 @@ def test_execute_basic_cognify(self, mock_asyncio_run):
ontology_file_path=None,
chunker=TextChunker,
run_in_background=False,
chunks_per_batch=None,
)

@patch("cognee.cli.commands.cognify_command.asyncio.run")
Expand Down
3 changes: 3 additions & 0 deletions cognee/tests/cli_tests/cli_unit_tests/test_cli_edge_cases.py
Original file line number Diff line number Diff line change
Expand Up @@ -262,6 +262,7 @@ def test_cognify_invalid_chunk_size(self, mock_asyncio_run):
ontology_file_path=None,
chunker=TextChunker,
run_in_background=False,
chunks_per_batch=None,
)

@patch("cognee.cli.commands.cognify_command.asyncio.run", side_effect=_mock_run)
Expand Down Expand Up @@ -295,6 +296,7 @@ def test_cognify_nonexistent_ontology_file(self, mock_asyncio_run):
ontology_file_path="/nonexistent/path/ontology.owl",
chunker=TextChunker,
run_in_background=False,
chunks_per_batch=None,
)

@patch("cognee.cli.commands.cognify_command.asyncio.run")
Expand Down Expand Up @@ -373,6 +375,7 @@ def test_cognify_empty_datasets_list(self, mock_asyncio_run):
ontology_file_path=None,
chunker=TextChunker,
run_in_background=False,
chunks_per_batch=None,
)


Expand Down
Loading