Feat/minimal custom labels #1937

Anushre2005 · 2025-12-27T13:21:41Z

Description

Acceptance Criteria

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation update
Code refactoring
Performance improvement
Other (please specify):

Screenshots/Videos (if applicable)

Pre-submission Checklist

I have tested my changes thoroughly before submitting this PR
This PR contains minimal changes necessary to address the issue/feature
My code follows the project's coding standards and style guidelines
I have added tests that prove my fix is effective or that my feature works
I have added necessary documentation (if applicable)
All new and existing tests pass
I have searched existing PRs to ensure this change hasn't been submitted already
I have linked any relevant issues in the description
My commits have clear and descriptive messages

DCO Affirmation

I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

Summary by CodeRabbit

New Features
- Added optional label support for datasets and data items. Labels are now extracted during data ingestion, stored persistently, and included in API responses.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-27T13:21:53Z

Walkthrough

Adds an optional label field to the Data model across three layers: the core data model with database column support, the API serialization layer via DataDTO, and the data ingestion pipeline to extract and propagate labels from input items.

Changes

Cohort / File(s)	Summary
Data Model `cognee/modules/data/models/Data.py`	Adds nullable `label` string field as a new database column; includes label in `to_json()` serialization output
API/DTO Layer `cognee/api/v1/datasets/routers/get_datasets_router.py`	Adds optional `label` field to DataDTO for API response serialization
Ingestion Pipeline `cognee/tasks/ingestion/ingest_data.py`	Extracts label from input data items; propagates label through data creation and update flows in ingestion process

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related issues

[Feature]: Custom label names #1769: Directly related—implements the same feature of adding nullable label field to Data model, propagating through ingestion, and exposing in API serialization.

Possibly related PRs

feat: Add custom label by contributor: apenade #1913: Adds identical label field support across Data model, Data.to_json, DataDTO, and ingest_data ingestion flow.

Suggested labels

core-team

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Feat/minimal custom labels' clearly summarizes the main feature being added—custom labels for data items—and accurately reflects the PR's primary objective.
Description check	✅ Passed	The PR description covers all required template sections: a clear human-written explanation of changes, acceptance criteria with verification steps, feature type selection, and pre-submission checklist completion. However, testing was incomplete (no new tests added, full test suite not run locally).

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions

Hello @Anushre2005, thank you for submitting a PR! We will respond as soon as possible.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

cognee/modules/data/models/Data.py (1)
49-60: Consider adding a docstring to to_json method.

Per coding guidelines, undocumented function definitions are considered incomplete. A brief docstring would improve maintainability.
🔎 Proposed docstring
 def to_json(self) -> dict:
+    """Serialize the Data instance to a JSON-compatible dictionary."""
     return {
         "id": str(self.id),
cognee/api/v1/datasets/routers/get_datasets_router.py (1)
316-328: Update docstring to document the new label field.

The response documentation should include the new label field for API completeness.
🔎 Proposed docstring update
         ## Response
         Returns a list of data objects containing:
         - **id**: Unique data item identifier
         - **name**: Data item name
         - **created_at**: When the data was added
         - **updated_at**: When the data was last updated
         - **extension**: File extension
         - **mime_type**: MIME type of the data
         - **raw_data_location**: Storage location of the raw data
+        - **label**: Optional user-provided label for the data item

         ## Error Codes
cognee/tasks/ingestion/ingest_data.py (1)
25-32: Consider adding a docstring to ingest_data function.

Per coding guidelines, undocumented function definitions are considered incomplete. Given the function's complexity and public-facing nature, a brief docstring documenting parameters and return value would improve maintainability.
🔎 Proposed docstring
 async def ingest_data(
     data: Any,
     dataset_name: str,
     user: User,
     node_set: Optional[List[str]] = None,
     dataset_id: UUID = None,
     preferred_loaders: dict[str, dict[str, Any]] = None,
 ):
+    """Ingest data items into a dataset.
+
+    Args:
+        data: Data item(s) to ingest (single item or list).
+        dataset_name: Name of the target dataset.
+        user: User performing the ingestion.
+        node_set: Optional list of node identifiers for the data.
+        dataset_id: Optional existing dataset UUID to add data to.
+        preferred_loaders: Optional loader configuration by file type.
+
+    Returns:
+        List of Data objects created or updated during ingestion.
+    """
     if not user:
         user = await get_default_user()

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 1061258 and e072407.

📒 Files selected for processing (3)

cognee/api/v1/datasets/routers/get_datasets_router.py
cognee/modules/data/models/Data.py
cognee/tasks/ingestion/ingest_data.py

🧰 Additional context used

📓 Path-based instructions (4)

**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Use 4-space indentation in Python code
Use snake_case for Python module and function names
Use PascalCase for Python class names
Use ruff format before committing Python code
Use ruff check for import hygiene and style enforcement with line-length 100 configured in pyproject.toml
Prefer explicit, structured error handling in Python code

Files:

cognee/api/v1/datasets/routers/get_datasets_router.py
cognee/modules/data/models/Data.py
cognee/tasks/ingestion/ingest_data.py

⚙️ CodeRabbit configuration file

**/*.py: When reviewing Python code for this project:

Prioritize portability over clarity, especially when dealing with cross-Python compatibility. However, with the priority in mind, do still consider improvements to clarity when relevant.

As a general guideline, consider the code style advocated in the PEP 8 standard (excluding the use of spaces for indentation) and evaluate suggested changes for code style compliance.

As a style convention, consider the code style advocated in CEP-8 and evaluate suggested changes for code style compliance.

As a general guideline, try to provide any relevant, official, and supporting documentation links to any tool's suggestions in review comments. This guideline is important for posterity.

As a general rule, undocumented function definitions and class definitions in the project's Python code are assumed incomplete. Please consider suggesting a short summary of the code for any of these incomplete definitions as docstrings when reviewing.

Files:

cognee/api/v1/datasets/routers/get_datasets_router.py
cognee/modules/data/models/Data.py
cognee/tasks/ingestion/ingest_data.py

cognee/api/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Public APIs should be type-annotated in Python where practical

Files:

cognee/api/v1/datasets/routers/get_datasets_router.py

cognee/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use shared logging utilities from cognee.shared.logging_utils in Python code

Files:

cognee/api/v1/datasets/routers/get_datasets_router.py
cognee/modules/data/models/Data.py
cognee/tasks/ingestion/ingest_data.py

cognee/{modules,infrastructure,tasks}/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Co-locate feature-specific helpers under their respective package (modules/, infrastructure/, or tasks/)

Files:

cognee/modules/data/models/Data.py
cognee/tasks/ingestion/ingest_data.py

🔇 Additional comments (5)

cognee/modules/data/models/Data.py (1)

32-32: LGTM! The new label column is correctly defined.

The nullable string column is appropriate for an optional, user-provided label. This maintains backward compatibility with existing data.

cognee/api/v1/datasets/routers/get_datasets_router.py (1)

44-54: LGTM! The label field is correctly added to the DTO.

The Optional[str] = None typing is appropriate and aligns with the nullable column in the Data model.

cognee/tasks/ingestion/ingest_data.py (3)

123-125: LGTM! Label extraction is safely handled.

Using getattr(data_item, "label", None) correctly handles cases where the label attribute may not exist, returning None as the fallback. The inline comment adds clarity.

144-144: LGTM! Label is correctly propagated in the update path.

The label assignment is consistent with other field updates in this block.

157-177: LGTM! Label is correctly included in the Data constructor.

The label=label parameter is properly positioned and follows the same pattern as node_set handling.

Anushre2005 · 2026-01-01T13:20:22Z

Hi maintainers
This PR adds minimal optional label support across the data model, ingestion, and API layers.
I’ve kept the change scoped and non-breaking.
Happy to rebase, adjust, or close this PR if it overlaps with existing work — just let me know.
Thanks for your time!

Vasilije1990 · 2026-01-09T06:46:39Z

@Anushre2005 please check contributing.md, open against right branch and provide screenshots of tests

Vasilije1990 · 2026-01-16T14:24:58Z

Already implemented. Closing. We will add this to API

Anushree2005 added 3 commits December 27, 2025 17:57

feat(data): add optional label field to Data model

e29a7df

feat(ingestion): persist optional label when provided

3da1156

feat(api): include label in dataset data response

e072407

github-actions bot reviewed Dec 27, 2025

View reviewed changes

coderabbitai bot reviewed Dec 27, 2025

View reviewed changes

Vasilije1990 closed this Jan 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/minimal custom labels #1937

Feat/minimal custom labels #1937

Uh oh!

Anushre2005 commented Dec 27, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 27, 2025 •

edited

Loading

Uh oh!

github-actions bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Anushre2005 commented Jan 1, 2026

Uh oh!

Vasilije1990 commented Jan 9, 2026

Uh oh!

Vasilije1990 commented Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Feat/minimal custom labels #1937

Feat/minimal custom labels #1937

Uh oh!

Conversation

Anushre2005 commented Dec 27, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Acceptance Criteria

Type of Change

Screenshots/Videos (if applicable)

Pre-submission Checklist

DCO Affirmation

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested labels

Pre-merge checks and finishing touches

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Anushre2005 commented Jan 1, 2026

Uh oh!

Vasilije1990 commented Jan 9, 2026

Uh oh!

Vasilije1990 commented Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Anushre2005 commented Dec 27, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 27, 2025 •

edited

Loading