-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Feat/minimal custom labels #1937
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/minimal custom labels #1937
Conversation
WalkthroughAdds an optional Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related issues
Possibly related PRs
Suggested labels
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @Anushre2005, thank you for submitting a PR! We will respond as soon as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
cognee/modules/data/models/Data.py (1)
49-60: Consider adding a docstring toto_jsonmethod.Per coding guidelines, undocumented function definitions are considered incomplete. A brief docstring would improve maintainability.
🔎 Proposed docstring
def to_json(self) -> dict: + """Serialize the Data instance to a JSON-compatible dictionary.""" return { "id": str(self.id),cognee/api/v1/datasets/routers/get_datasets_router.py (1)
316-328: Update docstring to document the newlabelfield.The response documentation should include the new
labelfield for API completeness.🔎 Proposed docstring update
## Response Returns a list of data objects containing: - **id**: Unique data item identifier - **name**: Data item name - **created_at**: When the data was added - **updated_at**: When the data was last updated - **extension**: File extension - **mime_type**: MIME type of the data - **raw_data_location**: Storage location of the raw data + - **label**: Optional user-provided label for the data item ## Error Codescognee/tasks/ingestion/ingest_data.py (1)
25-32: Consider adding a docstring toingest_datafunction.Per coding guidelines, undocumented function definitions are considered incomplete. Given the function's complexity and public-facing nature, a brief docstring documenting parameters and return value would improve maintainability.
🔎 Proposed docstring
async def ingest_data( data: Any, dataset_name: str, user: User, node_set: Optional[List[str]] = None, dataset_id: UUID = None, preferred_loaders: dict[str, dict[str, Any]] = None, ): + """Ingest data items into a dataset. + + Args: + data: Data item(s) to ingest (single item or list). + dataset_name: Name of the target dataset. + user: User performing the ingestion. + node_set: Optional list of node identifiers for the data. + dataset_id: Optional existing dataset UUID to add data to. + preferred_loaders: Optional loader configuration by file type. + + Returns: + List of Data objects created or updated during ingestion. + """ if not user: user = await get_default_user()
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
cognee/api/v1/datasets/routers/get_datasets_router.pycognee/modules/data/models/Data.pycognee/tasks/ingestion/ingest_data.py
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Use 4-space indentation in Python code
Use snake_case for Python module and function names
Use PascalCase for Python class names
Use ruff format before committing Python code
Use ruff check for import hygiene and style enforcement with line-length 100 configured in pyproject.toml
Prefer explicit, structured error handling in Python code
Files:
cognee/api/v1/datasets/routers/get_datasets_router.pycognee/modules/data/models/Data.pycognee/tasks/ingestion/ingest_data.py
⚙️ CodeRabbit configuration file
**/*.py: When reviewing Python code for this project:
- Prioritize portability over clarity, especially when dealing with cross-Python compatibility. However, with the priority in mind, do still consider improvements to clarity when relevant.
- As a general guideline, consider the code style advocated in the PEP 8 standard (excluding the use of spaces for indentation) and evaluate suggested changes for code style compliance.
- As a style convention, consider the code style advocated in CEP-8 and evaluate suggested changes for code style compliance.
- As a general guideline, try to provide any relevant, official, and supporting documentation links to any tool's suggestions in review comments. This guideline is important for posterity.
- As a general rule, undocumented function definitions and class definitions in the project's Python code are assumed incomplete. Please consider suggesting a short summary of the code for any of these incomplete definitions as docstrings when reviewing.
Files:
cognee/api/v1/datasets/routers/get_datasets_router.pycognee/modules/data/models/Data.pycognee/tasks/ingestion/ingest_data.py
cognee/api/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Public APIs should be type-annotated in Python where practical
Files:
cognee/api/v1/datasets/routers/get_datasets_router.py
cognee/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use shared logging utilities from cognee.shared.logging_utils in Python code
Files:
cognee/api/v1/datasets/routers/get_datasets_router.pycognee/modules/data/models/Data.pycognee/tasks/ingestion/ingest_data.py
cognee/{modules,infrastructure,tasks}/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Co-locate feature-specific helpers under their respective package (modules/, infrastructure/, or tasks/)
Files:
cognee/modules/data/models/Data.pycognee/tasks/ingestion/ingest_data.py
🔇 Additional comments (5)
cognee/modules/data/models/Data.py (1)
32-32: LGTM! The newlabelcolumn is correctly defined.The nullable string column is appropriate for an optional, user-provided label. This maintains backward compatibility with existing data.
cognee/api/v1/datasets/routers/get_datasets_router.py (1)
44-54: LGTM! Thelabelfield is correctly added to the DTO.The
Optional[str] = Nonetyping is appropriate and aligns with the nullable column in the Data model.cognee/tasks/ingestion/ingest_data.py (3)
123-125: LGTM! Label extraction is safely handled.Using
getattr(data_item, "label", None)correctly handles cases where the label attribute may not exist, returningNoneas the fallback. The inline comment adds clarity.
144-144: LGTM! Label is correctly propagated in the update path.The label assignment is consistent with other field updates in this block.
157-177: LGTM! Label is correctly included in the Data constructor.The
label=labelparameter is properly positioned and follows the same pattern asnode_sethandling.
|
Hi maintainers |
|
@Anushre2005 please check contributing.md, open against right branch and provide screenshots of tests |
|
Already implemented. Closing. We will add this to API |
Description
Acceptance Criteria
Type of Change
Screenshots/Videos (if applicable)
Pre-submission Checklist
DCO Affirmation
I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.