Skip to content

Conversation

Dishant1804
Copy link
Collaborator

Proposed change

Resolves #1873

  • Define Context model with required fields and generic relation to OWASP entities.
  • Update Chunk model to reference Context.
  • Write migration to move existing content_type and object_id from Chunk to Context.
  • Update any code paths that rely on Chunk.content_object to use Chunk.context.
  • Add tests to ensure data integrity and backward compatibility.

Checklist

  • I've read and followed the contributing guidelines.
  • I've run make check-test locally; all checks and tests passed.

Copy link
Contributor

coderabbitai bot commented Jul 28, 2025

Summary by CodeRabbit

  • New Features

    • Introduced a Context system powering richer, entity-aware RAG results with enhanced source metadata.
    • Added management commands to update context and chunks for chapters, committees, events, projects, and Slack messages.
    • New extractors generate high-quality prose/metadata per entity.
    • Admin: Context listing with filters/search; improved Chunk admin filters.
  • Refactor

    • Moved chunks to context-based storage; unified commands on new base classes; retriever now entity-driven.
    • Improved embeddings creation/regeneration reliability.
  • Chores

    • Updated Makefile to “update” targets; lowered default similarity threshold to 0.4.
  • Tests

    • Extensive new and updated test coverage across RAG, commands, models, and extractors.

Walkthrough

Add a new Context model and migrate Chunk to reference Context; update retrieval to use context.entity; refactor chunk/context creation into shared Base commands and extractors; update admin, Makefile, utilities, RAG tooling, and tests; adjust constants and error handling.

Changes

Cohort / File(s) Summary
Models & Migrations
backend/apps/ai/models/context.py, backend/apps/ai/models/chunk.py, backend/apps/ai/migrations/0005_*, .../0006_*, .../0007_*, .../0008_*, .../0009_*
Add Context model (content, entity_type/entity_id GenericFK, source); remove GenericFK fields from Chunk, add context = ForeignKey(Context, ...); update unique constraints and rename fields via migrations.
Admin
backend/apps/ai/admin.py
Register Context with ContextAdmin; update ChunkAdmin list_display/list_filter/search_fields to surface context/entity fields.
Retriever & RAG
backend/apps/ai/agent/tools/rag/retriever.py, backend/apps/ai/agent/tools/rag/rag_tool.py
Retriever now uses chunk.context / context.entity; rename SUPPORTED_CONTENT_TYPES → SUPPORTED_ENTITY_TYPES; adjust signatures to accept entity; RagTool constructor no longer swallows init exceptions.
Utilities & Constants
backend/apps/ai/common/utils.py, backend/apps/ai/common/constants.py
create_chunks_and_embeddings signature changed to (chunk_texts, context, openai_client, ...); added regenerate_chunks_for_context; tightened OpenAI error handling; DEFAULT_SIMILARITY_THRESHOLD changed 0.5→0.4.
Base command infra
backend/apps/ai/common/base/__init__.py, .../ai_command.py, .../chunk_command.py, .../context_command.py
Add BaseAICommand, BaseChunkCommand, BaseContextCommand to centralize OpenAI setup, batching, and chunk/context workflows.
Extractors
backend/apps/ai/common/extractors/*
Add extractors for chapter, committee, event, project producing (prose, metadata) tuples and package init.
Management commands
backend/apps/ai/management/commands/ai_update_*_context.py, .../ai_update_*_chunks.py, .../ai_create_*.py, .../ai_update_slack_message_*
Convert many commands to subclass the new base classes, delegate extraction to extractors, standardize CLI and batch behavior, add Slack-specific hooks.
Makefile
backend/apps/ai/Makefile
Replace ai-create-* and ai-run-rag-tool targets with ai-update--chunks and ai-update--context targets.
Tests
backend/tests/apps/ai/...
Add/update extensive tests for Context model, Chunk changes, utils, base commands, extractors, management commands, RAG components; update OpenAI error logging expectations.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Assessment against linked issues

Objective Addressed Explanation
Define Context model with GenericFK and metadata (#1873)
Update Chunk to reference Context instead of GenericFK (#1873)
Migrate existing Chunk.content_type/object_id data to Context (#1873) Migration files alter schemas and rename fields but no explicit data-migration RunPython copying Chunk rows into new Context is present in diff.
Update code paths to use Chunk.context (#1873)

Out-of-scope changes

Code Change (file) Explanation
Removal of try/except around RagTool init (backend/apps/ai/agent/tools/rag/rag_tool.py) Behavioral error-handling change not required by the Context/Chunk migration objective.
DEFAULT_SIMILARITY_THRESHOLD change (backend/apps/ai/common/constants.py) Configuration tuning unrelated to schema migration or Context model implementation.
Makefile target renames and removal of ai-run-rag-tool (backend/apps/ai/Makefile) Infrastructure/Makefile updates not required by the linked issue's schema/model tasks.

Possibly related PRs

Suggested labels

nestbot

Suggested reviewers

  • kasya

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@Dishant1804 Dishant1804 changed the title Context model NestBot AI Assistant Contexts Jul 28, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (7)
backend/apps/ai/models/context.py (1)

23-33: Improve the string representation method.

The current implementation could raise AttributeError and has redundant logic.

Consider this more defensive approach:

 def __str__(self):
     """Human readable representation."""
-    entity = (
-        getattr(self.content_object, "name", None)
-        or getattr(self.content_object, "key", None)
-        or str(self.content_object)
-    )
+    if self.content_object:
+        entity = (
+            getattr(self.content_object, "name", None)
+            or getattr(self.content_object, "key", None)
+            or str(self.content_object)
+        )
+    else:
+        entity = "No linked object"
+        
     return (
         f"{self.content_type.model if self.content_type else 'None'} {entity}: "
         f"{self.generated_text[:50]}"
     )
backend/apps/ai/management/commands/ai_run_rag_tool.py (2)

34-39: Improve the help text for content-types argument.

The current help text "Content types to filter by (e.g., project chapter)" could be more descriptive.

Consider improving the help text:

-            help="Content types to filter by (e.g., project chapter)",
+            help="Content types to filter by (e.g., 'project', 'chapter', 'committee')",

64-70: Add input validation and improve output formatting.

Consider adding validation for argument values and better output formatting.

Add validation before processing:

+        # Validate arguments
+        if not (0.0 <= options["threshold"] <= 1.0):
+            self.stderr.write(self.style.ERROR("Threshold must be between 0.0 and 1.0"))
+            return
+        if options["limit"] <= 0:
+            self.stderr.write(self.style.ERROR("Limit must be positive"))
+            return
+            
         self.stdout.write("\nProcessing query...")
         result = rag_tool.query(
             question=options["query"],
             limit=options["limit"],
             similarity_threshold=options["threshold"],
             content_types=options["content_types"],
         )
-        self.stdout.write(f"\nAnswer: {result}")
+        self.stdout.write(self.style.SUCCESS(f"\nAnswer: {result}"))
backend/tests/apps/ai/models/context_test.py (1)

10-15: Consider extracting shared test utility.

The create_model_mock function is duplicated from chunk_test.py. Consider extracting this to a shared test utility module to avoid code duplication.

You could create a shared test utilities module:

# backend/tests/apps/ai/test_utils.py
from unittest.mock import Mock

def create_model_mock(model_class):
    mock = Mock(spec=model_class)
    mock._state = Mock()
    mock.pk = 1
    mock.id = 1
    return mock

Then import it in both test files.

backend/apps/ai/agent/tools/rag/generator.py (2)

16-32: Fix grammatical issues in SYSTEM_PROMPT.

The system prompt contains several grammatical inconsistencies that should be corrected for professionalism:

-3. you will answer questions only related to OWASP and within the scope of OWASP.
+3. You will answer questions only related to OWASP and within the scope of OWASP.
 4. Be concise and directly answer the user's query.
 5. Provide the necessary link if the context contains a URL.
-6. If there is any query based on location, you need to look for latitude and longitude in the
-context and provide the nearest OWASP chapter based on that.
+6. If there is any query based on location, you need to look for latitude and longitude in the context and provide the nearest OWASP chapter based on that.
 7. You can ask for more information if the query is very personalized or user-centric.
-8. after trying all of the above, If the context does not contain the information or you think that
-it is out of scope for OWASP, you MUST state: "please ask question related to OWASP."
+8. After trying all of the above, if the context does not contain the information or you think that it is out of scope for OWASP, you MUST state: "please ask question related to OWASP."

115-119: Add query context to exception logging.

When logging OpenAI API errors, include relevant context to aid debugging.

         except openai.OpenAIError:
-            logger.exception("OpenAI API error")
+            logger.exception("OpenAI API error for query: %s", query[:100])
             answer = "I'm sorry, I'm currently unable to process your request."
backend/apps/ai/agent/tools/rag/retriever.py (1)

75-183: Refactor get_additional_context for better maintainability.

This method is over 100 lines with repetitive patterns. Consider extracting content-type-specific handlers.

def get_additional_context(self, content_object, content_type: str) -> dict[str, Any]:
    """Get additional context information based on content type."""
    clean_content_type = content_type.split(".")[-1] if "." in content_type else content_type
    
    handlers = {
        "chapter": self._get_chapter_context,
        "project": self._get_project_context,
        "event": self._get_event_context,
        "committee": self._get_committee_context,
        "message": self._get_message_context,
    }
    
    handler = handlers.get(clean_content_type)
    if handler:
        context = handler(content_object)
        return {k: v for k, v in context.items() if v is not None}
    
    return {}

def _get_chapter_context(self, content_object) -> dict[str, Any]:
    """Extract context for chapter objects."""
    return {
        "location": getattr(content_object, "suggested_location", None),
        "region": getattr(content_object, "region", None),
        # ... rest of the attributes
    }

This approach would make the code more modular and easier to extend with new content types.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 00437c0 and e120962.

📒 Files selected for processing (13)
  • backend/apps/ai/Makefile (1 hunks)
  • backend/apps/ai/admin.py (2 hunks)
  • backend/apps/ai/agent/tools/rag/generator.py (1 hunks)
  • backend/apps/ai/agent/tools/rag/rag_tool.py (1 hunks)
  • backend/apps/ai/agent/tools/rag/retriever.py (1 hunks)
  • backend/apps/ai/common/constants.py (1 hunks)
  • backend/apps/ai/common/utils.py (3 hunks)
  • backend/apps/ai/management/commands/ai_run_rag_tool.py (1 hunks)
  • backend/apps/ai/migrations/0005_context_alter_chunk_unique_together_chunk_context_and_more.py (1 hunks)
  • backend/apps/ai/models/chunk.py (4 hunks)
  • backend/apps/ai/models/context.py (1 hunks)
  • backend/tests/apps/ai/models/chunk_test.py (5 hunks)
  • backend/tests/apps/ai/models/context_test.py (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (8)
backend/apps/ai/management/commands/ai_run_rag_tool.py (1)
backend/apps/ai/agent/tools/rag/rag_tool.py (2)
  • RagTool (13-72)
  • query (38-72)
backend/apps/ai/admin.py (1)
backend/apps/ai/models/context.py (1)
  • Context (10-33)
backend/tests/apps/ai/models/chunk_test.py (2)
backend/apps/ai/models/context.py (1)
  • Context (10-33)
backend/apps/ai/models/chunk.py (2)
  • Chunk (12-74)
  • update_data (47-74)
backend/apps/ai/common/utils.py (1)
backend/apps/ai/models/context.py (1)
  • Context (10-33)
backend/apps/ai/agent/tools/rag/retriever.py (3)
backend/apps/ai/models/chunk.py (1)
  • Chunk (12-74)
backend/apps/ai/agent/tools/rag/rag_tool.py (1)
  • query (38-72)
backend/apps/slack/models/message.py (1)
  • text (83-85)
backend/tests/apps/ai/models/context_test.py (3)
backend/apps/ai/models/context.py (1)
  • Context (10-33)
backend/tests/apps/ai/models/chunk_test.py (1)
  • create_model_mock (7-12)
backend/apps/common/models.py (1)
  • TimestampedModel (37-46)
backend/apps/ai/models/context.py (2)
backend/apps/common/models.py (1)
  • TimestampedModel (37-46)
backend/apps/ai/models/chunk.py (1)
  • Meta (15-18)
backend/apps/ai/models/chunk.py (2)
backend/apps/ai/models/context.py (1)
  • Context (10-33)
backend/apps/common/utils.py (1)
  • truncate (164-176)
🔇 Additional comments (27)
backend/apps/ai/Makefile (1)

21-23: LGTM!

The new Makefile target follows the established pattern and provides a convenient way to invoke the RAG tool management command.

backend/apps/ai/common/constants.py (1)

4-5: LGTM!

The new constants have sensible default values and follow the established naming conventions. The retrieval limit of 5 and similarity threshold of 0.5 are reasonable defaults for RAG functionality.

backend/apps/ai/common/utils.py (2)

12-12: LGTM!

The import of the Context model is correctly added to support the new data model structure.


63-63: LGTM!

The chunk creation correctly uses the new context-based association instead of the previous direct content_object relationship.

backend/apps/ai/models/context.py (1)

19-21: LGTM!

The Meta configuration is appropriate with a clear database table name and verbose name.

backend/apps/ai/management/commands/ai_run_rag_tool.py (1)

15-51: LGTM! Well-structured arguments with good defaults.

The argument definitions are comprehensive and use appropriate defaults from the constants module. The help text is clear and informative.

backend/apps/ai/admin.py (4)

6-6: LGTM!

Proper import of the new Context model.


9-20: Well-configured admin interface for Context model.

The ContextAdmin provides comprehensive management capabilities with appropriate fields for display, search, and filtering. The configuration aligns well with the Context model's structure.


29-32: Admin updates correctly reflect the model refactoring.

The changes properly adapt the ChunkAdmin to use the new context relationship instead of the previous generic foreign key approach. The list_filter on context__content_type maintains the ability to filter by content type through the related Context model.


35-35: LGTM!

Proper registration of the Context model with its admin class.

backend/tests/apps/ai/models/chunk_test.py (5)

4-4: LGTM!

Proper import of the new Context model for testing.


17-26: Test properly updated for new Context relationship.

The test_str_method correctly mocks the context attribute instead of the previous generic foreign key fields, maintaining test coverage for the string representation functionality.


57-58: Update data tests correctly adapted to new Context model.

The test properly uses Context mocks and validates the context parameter in the update_data method calls, reflecting the model refactoring accurately.

Also applies to: 67-67, 69-69


129-129: Unique constraint updated correctly.

The unique constraint test properly reflects the change from ("content_type", "object_id", "text") to ("context", "text"), aligning with the model schema changes.


131-137: Good test coverage for the new context relationship.

The test validates the foreign key relationship properties including the related model, null/blank settings, ensuring the relationship is properly configured.

backend/apps/ai/migrations/0005_context_alter_chunk_unique_together_chunk_context_and_more.py (1)

14-42: Context model creation looks good.

The Context model is properly defined with all necessary fields including the generic foreign key components (content_type, object_id) and additional metadata fields.

backend/apps/ai/agent/tools/rag/rag_tool.py (2)

16-36: Good initialization with proper error handling.

The constructor properly initializes the retriever and generator components with appropriate error handling and logging. The exception handling ensures any initialization failures are properly logged and re-raised.


58-64: Well-structured retrieval logic.

The retrieval step properly passes all the necessary parameters and includes appropriate logging for debugging purposes.

backend/tests/apps/ai/models/context_test.py (4)

19-67: Excellent test coverage for string representation.

The tests comprehensively cover various scenarios for the __str__ method including edge cases like empty text, exact 50-character limit, and text truncation. This ensures the string representation works correctly in all situations.


69-100: Thorough field property validation.

The tests properly validate all field properties including types, constraints, defaults, and relationships. This ensures the model is correctly configured.


115-151: Good coverage of model manager operations.

The tests properly validate the standard Django model manager operations with appropriate mocking, ensuring the Context model integrates correctly with Django's ORM.


153-176: Proper validation testing.

The tests cover both successful validation and validation error scenarios, ensuring the model's validation logic works correctly.

backend/apps/ai/agent/tools/rag/generator.py (1)

35-52: LGTM!

The initialization method properly validates the API key and provides clear error messaging.

backend/apps/ai/models/chunk.py (2)

26-29: LGTM!

The __str__ method properly handles nullable context and provides clear representation.


46-74: All Chunk.update_data Callers Updated

No remaining calls use the old content_object parameter. Verified in:

  • backend/apps/ai/common/utils.py
  • backend/tests/apps/ai/models/chunk_test.py
  • backend/apps/ai/admin.py
  • backend/apps/ai/agent/tools/rag/retriever.py
  • All ai_create_*_chunks management commands

No further changes are needed.

backend/apps/ai/agent/tools/rag/retriever.py (2)

44-66: LGTM!

Excellent error handling with appropriate logging and re-raising of exceptions.


255-278: LGTM!

Clean implementation with good handling of singular/plural forms.

@Dishant1804 Dishant1804 marked this pull request as ready for review July 29, 2025 18:46
@Dishant1804 Dishant1804 requested a review from arkid15r as a code owner July 29, 2025 18:46
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 981277a and 8b46f08.

📒 Files selected for processing (7)
  • backend/apps/ai/admin.py (2 hunks)
  • backend/apps/ai/common/utils.py (3 hunks)
  • backend/apps/ai/migrations/0006_rename_generated_text_context_content_and_more.py (1 hunks)
  • backend/apps/ai/models/chunk.py (4 hunks)
  • backend/apps/ai/models/context.py (1 hunks)
  • backend/tests/apps/ai/models/chunk_test.py (5 hunks)
  • backend/tests/apps/ai/models/context_test.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
  • backend/apps/ai/common/utils.py
  • backend/apps/ai/models/context.py
  • backend/apps/ai/admin.py
  • backend/tests/apps/ai/models/context_test.py
  • backend/tests/apps/ai/models/chunk_test.py
  • backend/apps/ai/models/chunk.py
🔇 Additional comments (1)
backend/apps/ai/migrations/0006_rename_generated_text_context_content_and_more.py (1)

13-17: LGTM! Field rename looks correct.

The renaming of generated_text to content in the Context model improves naming consistency and aligns with the broader refactor objectives.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8b46f08 and 16fabcf.

📒 Files selected for processing (2)
  • backend/apps/ai/migrations/0007_alter_chunk_context_alter_context_unique_together.py (1 hunks)
  • backend/apps/ai/models/chunk.py (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/apps/ai/models/chunk.py
🔇 Additional comments (2)
backend/apps/ai/migrations/0007_alter_chunk_context_alter_context_unique_together.py (2)

24-27: LGTM! Proper unique constraint for generic foreign key pattern.

The unique constraint on (content_type, object_id) is appropriate for preventing duplicate contexts for the same OWASP entity. This follows Django best practices for generic foreign key relationships.


1-11: LGTM! Well-structured Django migration.

The migration follows Django conventions with proper dependencies, imports, and class structure. The dependency on the previous migration maintains the correct sequence for the Context model refactoring.

@Dishant1804 Dishant1804 requested a review from arkid15r July 30, 2025 19:16
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
backend/tests/apps/ai/common/utils_test.py (1)

124-147: Misleading test name/docstring: it asserts no sleep but says/calls “sleep_called”.

This is the same issue previously flagged. The test name and docstring claim sleep is called, but the assertions enforce that it is not called.

Apply this rename to align with actual behavior:

-    def test_create_chunks_and_embeddings_sleep_called(self, mock_datetime, mock_sleep):
-        """Tests that sleep is called when needed."""
+    def test_create_chunks_and_embeddings_no_sleep_on_first_call(self, mock_datetime, mock_sleep):
+        """Tests that sleep is not called on the first request."""
🧹 Nitpick comments (24)
backend/apps/ai/management/commands/ai_update_committee_chunks.py (1)

3-6: Avoid N+1 queries by selecting related repository

extract_committee_content accesses committee.owasp_repository. Selecting it up-front improves performance for batch processing.

Apply:

+from django.db.models import QuerySet

And add:

 class Command(BaseChunkCommand):
     key_field_name = "key"
     model_class = Committee
 
     def extract_content(self, entity: Committee) -> tuple[str, str]:
         """Extract content from the committee."""
         return extract_committee_content(entity)
+
+    def get_base_queryset(self) -> QuerySet:
+        """Select related fields to cut down on DB queries."""
+        return super().get_base_queryset().select_related("owasp_repository")

Also applies to: 8-14

backend/apps/ai/management/commands/ai_update_chapter_context.py (1)

3-6: Select related repository to improve batch performance

extract_chapter_content uses chapter.owasp_repository; selecting it reduces per-entity DB hits.

Apply:

+from django.db.models import QuerySet

And add:

 class Command(BaseContextCommand):
     key_field_name = "key"
     model_class = Chapter
 
     def extract_content(self, entity: Chapter) -> tuple[str, str]:
         """Extract content from the chapter."""
         return extract_chapter_content(entity)
+
+    def get_base_queryset(self) -> QuerySet:
+        """Select related fields to avoid N+1 queries when extracting content."""
+        return super().get_base_queryset().select_related("owasp_repository")

Also applies to: 8-14

backend/apps/ai/management/commands/ai_update_project_chunks.py (1)

18-20: Select related repository to reduce DB roundtrips

Since extract_project_content references project.owasp_repository, selecting it avoids N+1 queries during batch processing.

Apply:

 def get_base_queryset(self) -> QuerySet:
     """Return the base queryset with ordering."""
-    return super().get_base_queryset()
+    return super().get_base_queryset().select_related("owasp_repository")
backend/tests/apps/ai/management/commands/ai_update_project_chunks_test.py (2)

22-22: Rename test class for consistency with command name

The test class is named “Create...”, but the command is ai_update_project_chunks. Rename for clarity.

Apply:

-class TestAiCreateProjectChunksCommand:
+class TestAiUpdateProjectChunksCommand:

1-7: Strengthen inheritance assertion

Also assert instance of BaseChunkCommand to better reflect the intended contract.

Apply:

 from unittest.mock import Mock, patch
 
 import pytest
 from django.core.management.base import BaseCommand
+from apps.ai.common.base.chunk_command import BaseChunkCommand

And:

     def test_command_inheritance(self, command):
-        assert isinstance(command, BaseCommand)
+        assert isinstance(command, BaseCommand)
+        assert isinstance(command, BaseChunkCommand)

Also applies to: 23-25

backend/tests/apps/ai/common/utils_test.py (3)

26-30: Unnecessary test setup: mocking datetime.UTC and datetime.timedelta.

In utils, UTC and timedelta are imported into module scope (from datetime import UTC, timedelta). Patching apps.ai.common.utils.datetime is sufficient; setting mock_datetime.UTC and mock_datetime.timedelta has no effect.

Drop these lines to simplify tests.

Also applies to: 126-130, 157-161, 193-195


44-50: Rename local variable for clarity: it’s a Context, not content_object.

Now that utils takes a Context, prefer naming accordingly to avoid confusion.

-        mock_content_object = MagicMock()
+        mock_context = MagicMock()
@@
-            mock_content_object,
+            mock_context,
@@
-                    context=mock_content_object,
+                    context=mock_context,
@@
-                    context=mock_content_object,
+                    context=mock_context,

Also applies to: 57-71


183-217: Strengthen assertion: verify the actual sleep duration under rate limit.

Given DEFAULT_LAST_REQUEST_OFFSET_SECONDS=5 and MIN_REQUEST_INTERVAL_SECONDS=10, expected sleep is 5 seconds. Assert the exact call.

-            mock_sleep.assert_called_once()
+            mock_sleep.assert_called_once_with(5)
backend/tests/apps/ai/common/extractors/project_test.py (1)

1-514: Thorough and high-signal test suite.

Covers wide scenarios: full/minimal data, partial/zero stats, URL filtering (with/without invalid_urls), date-only, zero health score, repo-only topics, and empty strings. The expectations align with extractor behavior.

Minor optional refinements:

  • Consider parametrizing date formatting and statistics cases to reduce duplication.
  • You can assert DELIMITER-aware boundaries by splitting metadata on the delimiter if you want to validate sections more strictly without relying on substring contains.
backend/tests/apps/ai/management/commands/ai_update_chapter_chunks_test.py (1)

22-24: Nit: class name references “create” while command is “ai_update_*”.

For consistency with the command name and other modules, prefer “Update” in the test class name.

-class TestAiCreateChapterChunksCommand:
+class TestAiUpdateChapterChunksCommand:
backend/tests/apps/ai/management/commands/ai_update_event_chunks_test.py (2)

1-1: Docstring mismatch: refers to ai_create_event_chunks.

Update module docstring to “ai_update_event_chunks” to match the command and file path.

-"""Tests for the ai_create_event_chunks Django management command."""
+"""Tests for the ai_update_event_chunks Django management command."""

26-28: Nit: class name/docstring still use “create”.

Align names with “update” for consistency across the codebase.

-class TestAiCreateEventChunksCommand:
-    """Test suite for the ai_create_event_chunks command."""
+class TestAiUpdateEventChunksCommand:
+    """Test suite for the ai_update_event_chunks command."""
backend/tests/apps/ai/management/commands/ai_update_slack_message_chunks_test.py (1)

22-23: Nit: class name references “create” while command is “ai_update_*”.

Rename for consistency with the command and other tests.

-class TestAiCreateSlackMessageChunksCommand:
+class TestAiUpdateSlackMessageChunksCommand:
backend/apps/ai/management/commands/ai_update_slack_message_chunks.py (1)

13-31: Make --message-key and --all mutually exclusive

Avoid ambiguous CLI usage by preventing both flags from being supplied together.

Apply:

-        parser.add_argument(
-            "--message-key",
-            type=str,
-            help="Process only the message with this key",
-        )
-        parser.add_argument(
-            "--all",
-            action="store_true",
-            help="Process all the messages",
-        )
+        group = parser.add_mutually_exclusive_group()
+        group.add_argument(
+            "--message-key",
+            type=str,
+            help="Process only the message with this key",
+        )
+        group.add_argument(
+            "--all",
+            action="store_true",
+            help="Process all the messages",
+        )
backend/apps/ai/models/context.py (1)

44-66: Limit DB writes to changed fields and keep timestamps fresh

When updating existing contexts, save only the modified fields and ensure the auto-updated timestamp is included in update_fields to reflect the change.

Apply:

-        if not created and (context.content != content or context.source != source):
-            context.content = content
-            context.source = source
-            if save:
-                context.save()
+        if not created and (context.content != content or context.source != source):
+            context.content = content
+            context.source = source
+            if save:
+                # Include nest_updated_at so the auto_now timestamp is refreshed even with update_fields.
+                context.save(update_fields=["content", "source", "nest_updated_at"])

Note: Always returning the Context instance is consistent with the maintainers’ preference (per retrieved learnings).

backend/apps/ai/Makefile (1)

5-44: Declare .PHONY targets to satisfy checkmake and avoid name clashes

Marking these targets as phony is a low-effort improvement that pacifies checkmake and prevents accidental filename conflicts.

Apply:

+ .PHONY: ai-run-rag-tool ai-update-chapter-chunks ai-update-chapter-context \
+ ai-update-committee-chunks ai-update-committee-context \
+ ai-update-event-chunks ai-update-event-context \
+ ai-update-project-chunks ai-update-project-context \
+ ai-update-slack-message-chunks ai-update-slack-message-context

If you want to fully appease checkmake’s minphony rules for this sub-makefile, you could also add no-op “all/clean/test” targets here or at a higher level, but that’s optional in this context.

backend/tests/apps/ai/common/base/context_command_test.py (2)

53-55: Use new Context field names (entity_type_id/entity_id) in mocks

The Context model uses entity_type/entity_id now. These mock attributes still reflect the old names and can cause confusion when reading tests.

Apply this diff:

-    context.content_type_id = 1
-    context.object_id = 1
+    context.entity_type_id = 1
+    context.entity_id = 1

121-134: Unreachable “creation fails” branch given update_data always returns a Context

Context.update_data currently always returns a Context instance (truthy). This test simulates a None return to drive the failure branch. If that’s intentional (to validate defensive handling), consider adding a brief comment in the test to explain why we mock a falsey return despite the implementation never doing so. Otherwise, this branch is effectively dead and could be removed.

backend/tests/apps/ai/common/base/ai_command_test.py (2)

78-85: Tighten get_base_queryset test and remove confusing Mock callability

Mocking objects as callable can mislead future readers. Also assert the returned queryset for stronger verification.

Apply this diff:

     def test_get_base_queryset(self, command):
         with patch.object(MockTestModel, "objects") as mock_objects:
             mock_manager = Mock()
             mock_objects.all.return_value = mock_manager
-            mock_objects.return_value = mock_manager
-
-            command.get_base_queryset()
+            result = command.get_base_queryset()
             mock_objects.all.assert_called_once()
+            assert result == mock_manager

99-120: Make add_argument assertions order-agnostic (optional)

Relying on the exact order of parser.add_argument calls makes the test brittle. Consider asserting using a mapping over the flags rather than sequence positions.

backend/tests/apps/ai/common/base/chunk_command_test.py (2)

55-56: Update mock Context fields to match the new schema

These fields still use the legacy names. Align them with Context’s entity_type/entity_id to avoid future confusion.

Apply this diff:

-    context.content_type_id = 1
-    context.object_id = 1
+    context.entity_type_id = 1
+    context.entity_id = 1

189-231: Consider adding a test for the “chunks up to date” branch

You comprehensively cover creation/update paths. A small test where latest_chunk_timestamp is present and >= context.nest_updated_at would validate the “already up to date” message and the no-op path.

Happy to draft this test if you want it included.

backend/tests/apps/ai/management/commands/ai_update_committee_chunks_test.py (2)

1-1: Docstring mismatch with command name

The file tests ai_update_committee_chunks but the docstring says ai_create_committee_chunks. Align the wording for clarity.

Apply this diff:

-"""Tests for the ai_create_committee_chunks command."""
+"""Tests for the ai_update_committee_chunks command."""

29-31: Rename test class to reflect “update” (not “create”)

The class name still references “create,” which can be misleading.

Apply this diff:

-class TestAiCreateCommitteeChunksCommand:
+class TestAiUpdateCommitteeChunksCommand:
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 2d86dcb and 9c2556c.

📒 Files selected for processing (26)
  • backend/apps/ai/Makefile (1 hunks)
  • backend/apps/ai/common/base/ai_command.py (1 hunks)
  • backend/apps/ai/common/base/chunk_command.py (1 hunks)
  • backend/apps/ai/management/commands/ai_update_chapter_chunks.py (1 hunks)
  • backend/apps/ai/management/commands/ai_update_chapter_context.py (1 hunks)
  • backend/apps/ai/management/commands/ai_update_committee_chunks.py (1 hunks)
  • backend/apps/ai/management/commands/ai_update_committee_context.py (1 hunks)
  • backend/apps/ai/management/commands/ai_update_event_chunks.py (1 hunks)
  • backend/apps/ai/management/commands/ai_update_event_context.py (1 hunks)
  • backend/apps/ai/management/commands/ai_update_project_chunks.py (1 hunks)
  • backend/apps/ai/management/commands/ai_update_project_context.py (1 hunks)
  • backend/apps/ai/management/commands/ai_update_slack_message_chunks.py (1 hunks)
  • backend/apps/ai/management/commands/ai_update_slack_message_context.py (1 hunks)
  • backend/apps/ai/models/context.py (1 hunks)
  • backend/tests/apps/ai/common/base/ai_command_test.py (1 hunks)
  • backend/tests/apps/ai/common/base/chunk_command_test.py (1 hunks)
  • backend/tests/apps/ai/common/base/context_command_test.py (1 hunks)
  • backend/tests/apps/ai/common/extractors/project_test.py (1 hunks)
  • backend/tests/apps/ai/common/utils_test.py (4 hunks)
  • backend/tests/apps/ai/management/commands/ai_update_chapter_chunks_test.py (1 hunks)
  • backend/tests/apps/ai/management/commands/ai_update_committee_chunks_test.py (1 hunks)
  • backend/tests/apps/ai/management/commands/ai_update_event_chunks_test.py (1 hunks)
  • backend/tests/apps/ai/management/commands/ai_update_project_chunks_test.py (1 hunks)
  • backend/tests/apps/ai/management/commands/ai_update_slack_message_chunks_test.py (1 hunks)
  • backend/tests/apps/ai/models/context_test.py (1 hunks)
  • backend/tests/apps/common/open_ai_test.py (0 hunks)
💤 Files with no reviewable changes (1)
  • backend/tests/apps/common/open_ai_test.py
🚧 Files skipped from review as they are similar to previous changes (6)
  • backend/apps/ai/management/commands/ai_update_committee_context.py
  • backend/apps/ai/common/base/chunk_command.py
  • backend/tests/apps/ai/models/context_test.py
  • backend/apps/ai/management/commands/ai_update_project_context.py
  • backend/apps/ai/management/commands/ai_update_slack_message_context.py
  • backend/apps/ai/common/base/ai_command.py
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-08-16T12:07:38.116Z
Learnt from: Dishant1804
PR: OWASP/Nest#1891
File: backend/apps/ai/models/context.py:44-66
Timestamp: 2025-08-16T12:07:38.116Z
Learning: In the Context.update_data method, the Context object should always be returned (never None), even when no changes are made to the content or source fields. This ensures consistent behavior for callers.

Applied to files:

  • backend/apps/ai/models/context.py
📚 Learning: 2025-07-28T14:51:14.736Z
Learnt from: adithya-naik
PR: OWASP/Nest#1894
File: frontend/src/components/TopContributorsList.tsx:74-74
Timestamp: 2025-07-28T14:51:14.736Z
Learning: In the OWASP/Nest project, the maintainer adithya-naik prefers not to create separate components for code that's only used in two specific cases, following the YAGNI principle to avoid over-engineering when the duplication is limited and manageable.

Applied to files:

  • backend/apps/ai/models/context.py
🧬 Code Graph Analysis (18)
backend/apps/ai/management/commands/ai_update_committee_chunks.py (4)
backend/apps/ai/common/base/chunk_command.py (1)
  • BaseChunkCommand (12-91)
backend/apps/ai/common/extractors/committee.py (1)
  • extract_committee_content (6-55)
backend/apps/ai/management/commands/ai_update_chapter_chunks.py (2)
  • Command (8-14)
  • extract_content (12-14)
backend/apps/ai/management/commands/ai_update_project_chunks.py (2)
  • Command (10-20)
  • extract_content (14-16)
backend/apps/ai/management/commands/ai_update_project_chunks.py (7)
backend/apps/ai/common/base/chunk_command.py (1)
  • BaseChunkCommand (12-91)
backend/apps/ai/common/extractors/project.py (1)
  • extract_project_content (6-97)
backend/apps/ai/management/commands/ai_update_committee_chunks.py (2)
  • Command (8-14)
  • extract_content (12-14)
backend/apps/ai/management/commands/ai_update_chapter_chunks.py (2)
  • Command (8-14)
  • extract_content (12-14)
backend/apps/ai/management/commands/ai_update_slack_message_chunks.py (2)
  • Command (9-41)
  • extract_content (32-34)
backend/apps/ai/management/commands/ai_update_event_chunks.py (3)
  • Command (10-24)
  • extract_content (14-16)
  • get_base_queryset (18-20)
backend/apps/ai/management/commands/ai_create_project_chunks.py (1)
  • Command (14-178)
backend/tests/apps/ai/management/commands/ai_update_project_chunks_test.py (3)
backend/apps/ai/management/commands/ai_update_chapter_chunks.py (2)
  • Command (8-14)
  • extract_content (12-14)
backend/apps/ai/management/commands/ai_update_project_chunks.py (3)
  • Command (10-20)
  • extract_content (14-16)
  • get_base_queryset (18-20)
backend/apps/ai/management/commands/ai_update_event_chunks.py (3)
  • Command (10-24)
  • extract_content (14-16)
  • get_base_queryset (18-20)
backend/tests/apps/ai/management/commands/ai_update_slack_message_chunks_test.py (1)
backend/apps/ai/management/commands/ai_update_slack_message_chunks.py (5)
  • Command (9-41)
  • source_name (40-41)
  • extract_content (32-34)
  • get_default_queryset (36-38)
  • add_arguments (13-30)
backend/tests/apps/ai/management/commands/ai_update_chapter_chunks_test.py (2)
backend/apps/ai/management/commands/ai_update_committee_chunks.py (2)
  • Command (8-14)
  • extract_content (12-14)
backend/apps/ai/management/commands/ai_update_chapter_chunks.py (2)
  • Command (8-14)
  • extract_content (12-14)
backend/tests/apps/ai/management/commands/ai_update_event_chunks_test.py (3)
backend/apps/ai/management/commands/ai_update_chapter_chunks.py (2)
  • Command (8-14)
  • extract_content (12-14)
backend/apps/ai/management/commands/ai_update_project_chunks.py (3)
  • Command (10-20)
  • extract_content (14-16)
  • get_base_queryset (18-20)
backend/apps/ai/management/commands/ai_update_event_chunks.py (4)
  • Command (10-24)
  • extract_content (14-16)
  • get_base_queryset (18-20)
  • get_default_queryset (22-24)
backend/tests/apps/ai/common/extractors/project_test.py (1)
backend/apps/ai/common/extractors/project.py (1)
  • extract_project_content (6-97)
backend/apps/ai/management/commands/ai_update_chapter_chunks.py (10)
backend/apps/ai/common/base/chunk_command.py (1)
  • BaseChunkCommand (12-91)
backend/apps/owasp/api/internal/queries/chapter.py (1)
  • chapter (14-19)
backend/apps/ai/common/extractors/chapter.py (1)
  • extract_chapter_content (6-76)
backend/apps/ai/management/commands/ai_update_committee_chunks.py (2)
  • Command (8-14)
  • extract_content (12-14)
backend/apps/ai/management/commands/ai_update_project_chunks.py (2)
  • Command (10-20)
  • extract_content (14-16)
backend/apps/ai/management/commands/ai_update_slack_message_chunks.py (2)
  • Command (9-41)
  • extract_content (32-34)
backend/apps/ai/management/commands/ai_update_event_chunks.py (2)
  • Command (10-24)
  • extract_content (14-16)
backend/tests/apps/ai/common/base/ai_command_test.py (1)
  • extract_content (50-51)
backend/tests/apps/ai/common/base/chunk_command_test.py (1)
  • extract_content (23-25)
backend/tests/apps/ai/common/base/context_command_test.py (1)
  • extract_content (21-22)
backend/apps/ai/management/commands/ai_update_slack_message_chunks.py (6)
backend/apps/ai/common/base/chunk_command.py (1)
  • BaseChunkCommand (12-91)
backend/apps/slack/models/message.py (1)
  • cleaned_text (52-63)
backend/apps/ai/management/commands/ai_update_committee_chunks.py (2)
  • Command (8-14)
  • extract_content (12-14)
backend/apps/ai/management/commands/ai_update_chapter_chunks.py (2)
  • Command (8-14)
  • extract_content (12-14)
backend/apps/ai/management/commands/ai_update_project_chunks.py (3)
  • Command (10-20)
  • extract_content (14-16)
  • get_base_queryset (18-20)
backend/apps/ai/management/commands/ai_update_event_chunks.py (4)
  • Command (10-24)
  • extract_content (14-16)
  • get_default_queryset (22-24)
  • get_base_queryset (18-20)
backend/apps/ai/management/commands/ai_update_event_chunks.py (6)
backend/apps/ai/common/base/chunk_command.py (1)
  • BaseChunkCommand (12-91)
backend/apps/ai/common/extractors/event.py (1)
  • extract_event_content (6-49)
backend/apps/ai/management/commands/ai_update_committee_chunks.py (2)
  • Command (8-14)
  • extract_content (12-14)
backend/apps/ai/management/commands/ai_update_chapter_chunks.py (2)
  • Command (8-14)
  • extract_content (12-14)
backend/apps/ai/management/commands/ai_update_project_chunks.py (3)
  • Command (10-20)
  • extract_content (14-16)
  • get_base_queryset (18-20)
backend/apps/ai/management/commands/ai_update_slack_message_chunks.py (3)
  • Command (9-41)
  • extract_content (32-34)
  • get_default_queryset (36-38)
backend/tests/apps/ai/common/base/ai_command_test.py (1)
backend/apps/ai/common/base/ai_command.py (10)
  • BaseAICommand (12-109)
  • source_name (25-27)
  • get_base_queryset (29-31)
  • get_default_queryset (33-35)
  • add_common_arguments (37-54)
  • add_arguments (56-58)
  • get_queryset (60-69)
  • get_entity_key (71-73)
  • setup_openai_client (75-83)
  • handle_batch_processing (85-109)
backend/tests/apps/ai/common/base/chunk_command_test.py (4)
backend/apps/ai/common/base/chunk_command.py (4)
  • BaseChunkCommand (12-91)
  • help (15-17)
  • process_chunks_batch (19-77)
  • handle (79-91)
backend/apps/ai/models/chunk.py (1)
  • Chunk (12-76)
backend/apps/ai/models/context.py (1)
  • Context (21-65)
backend/apps/ai/common/base/context_command.py (2)
  • help (12-14)
  • handle (45-54)
backend/apps/ai/management/commands/ai_update_event_context.py (6)
backend/apps/ai/common/base/context_command.py (1)
  • BaseContextCommand (9-54)
backend/apps/ai/common/extractors/event.py (1)
  • extract_event_content (6-49)
backend/apps/ai/management/commands/ai_update_chapter_context.py (2)
  • Command (8-14)
  • extract_content (12-14)
backend/apps/ai/management/commands/ai_update_committee_context.py (2)
  • Command (8-14)
  • extract_content (12-14)
backend/apps/ai/management/commands/ai_update_project_context.py (3)
  • Command (10-20)
  • extract_content (14-16)
  • get_base_queryset (18-20)
backend/apps/ai/common/base/ai_command.py (2)
  • get_base_queryset (29-31)
  • get_default_queryset (33-35)
backend/tests/apps/ai/common/utils_test.py (2)
backend/apps/ai/common/utils.py (2)
  • create_chunks_and_embeddings (19-70)
  • regenerate_chunks_for_context (73-101)
backend/apps/ai/models/context.py (1)
  • regenerate_chunks_for_context (14-18)
backend/tests/apps/ai/management/commands/ai_update_committee_chunks_test.py (4)
backend/apps/ai/management/commands/ai_update_committee_chunks.py (2)
  • Command (8-14)
  • extract_content (12-14)
backend/apps/ai/management/commands/ai_update_chapter_chunks.py (2)
  • Command (8-14)
  • extract_content (12-14)
backend/apps/ai/common/base/ai_command.py (1)
  • get_entity_key (71-73)
backend/apps/ai/common/base/chunk_command.py (1)
  • help (15-17)
backend/tests/apps/ai/common/base/context_command_test.py (2)
backend/apps/ai/common/base/context_command.py (4)
  • BaseContextCommand (9-54)
  • help (12-14)
  • process_context_batch (16-43)
  • handle (45-54)
backend/apps/ai/models/context.py (2)
  • Context (21-65)
  • update_data (45-65)
backend/apps/ai/models/context.py (3)
backend/apps/common/models.py (1)
  • TimestampedModel (37-46)
backend/apps/ai/common/utils.py (1)
  • regenerate_chunks_for_context (73-101)
backend/apps/ai/models/chunk.py (2)
  • Meta (15-18)
  • update_data (45-76)
backend/apps/ai/management/commands/ai_update_chapter_context.py (7)
backend/apps/ai/common/base/context_command.py (1)
  • BaseContextCommand (9-54)
backend/apps/owasp/api/internal/queries/chapter.py (1)
  • chapter (14-19)
backend/apps/ai/common/extractors/chapter.py (1)
  • extract_chapter_content (6-76)
backend/apps/ai/management/commands/ai_update_committee_context.py (2)
  • Command (8-14)
  • extract_content (12-14)
backend/tests/apps/ai/common/base/ai_command_test.py (1)
  • extract_content (50-51)
backend/tests/apps/ai/common/base/chunk_command_test.py (1)
  • extract_content (23-25)
backend/tests/apps/ai/common/base/context_command_test.py (1)
  • extract_content (21-22)
🪛 checkmake (0.2.2)
backend/apps/ai/Makefile

[warning] 1-1: Missing required phony target "all"

(minphony)


[warning] 1-1: Missing required phony target "clean"

(minphony)


[warning] 1-1: Missing required phony target "test"

(minphony)

🔇 Additional comments (20)
backend/apps/ai/management/commands/ai_update_committee_chunks.py (1)

8-14: Solid, minimal command skeleton

Clean delegation to the extractor and proper BaseChunkCommand usage. Looks good.

backend/apps/ai/management/commands/ai_update_chapter_context.py (1)

8-14: Straightforward context updater

Delegation to extract_chapter_content and BaseContextCommand usage are correct.

backend/apps/ai/management/commands/ai_update_project_chunks.py (1)

10-17: Good adherence to the shared chunking flow

The command correctly sets model_class/key_field_name and delegates extraction.

backend/apps/ai/management/commands/ai_update_event_context.py (1)

10-24: LGTM: sensible defaults and delegation

  • Delegates to extract_event_content.
  • get_default_queryset uses Event.upcoming_events(), which better fits events than a generic is_active filter.
backend/tests/apps/ai/common/utils_test.py (3)

245-294: Regeneration flow looks solid.

Good coverage: deletes existing chunks when present, splits content, instantiates OpenAI, and delegates to create_chunks_and_embeddings with the expected args. Logging assertions are precise.


295-327: Correct handling of no-content case.

Appropriate early exit with a clear warning; avoids unnecessary OpenAI calls. Assertions look accurate.


329-372: No-existing-chunks path is correct.

Skips delete when count is zero, proceeds to split and create. Assertions capture expected behavior.

backend/tests/apps/ai/management/commands/ai_update_chapter_chunks_test.py (1)

40-47: Delegation test is precise and correct.

Patches the extractor at the command module path, verifies single call with the entity, and validates returned tuple. Nicely done.

backend/tests/apps/ai/management/commands/ai_update_event_chunks_test.py (2)

61-69: Base queryset delegation check is good.

Verifies the super call path without over-specifying internals. Looks correct.


71-79: Default queryset logic is correct.

Asserting Event.upcoming_events() is used by default matches the command contract.

backend/tests/apps/ai/management/commands/ai_update_slack_message_chunks_test.py (2)

64-84: Argument wiring test is clear and sufficient.

Verifies the expected flags and defaults without over-constraining implementation details. Looks good.


44-55: cleaned_text is a @Property, not a method—no changes needed

The Message model in backend/apps/slack/models/message.py (lines 52–54) defines cleaned_text with @property, so extract_content always receives a string (or None). The existing tests already cover both non-empty and None cases. You can safely ignore the proposed callable‐branch change and additional test.

Likely an incorrect or invalid review comment.

backend/apps/ai/management/commands/ai_update_chapter_chunks.py (1)

12-14: Extractor delegation and return order confirmed

File: backend/apps/ai/management/commands/ai_update_chapter_chunks.py
Lines: 12-14

Snippet showing the final state of code at these lines

    def extract_content(self, entity: Chapter) -> tuple[str, str]:
        """Extract content from the chapter."""
        return extract_chapter_content(entity)

extract_chapter_content’s implementation (lines 73–76 in backend/apps/ai/common/extractors/chapter.py) returns

  1. prose content
  2. metadata content
    per its signature and docstring, matching BaseChunkCommand’s expected (prose, metadata) contract. Ready to merge.
backend/apps/ai/management/commands/ai_update_slack_message_chunks.py (2)

36-41: LGTM on queryset and source naming

Defaulting to the base queryset (no is_active) for messages and naming the source "slack_message" fit the model and logging conventions.


32-35: No change needed for cleaned_text usage
cleaned_text is decorated with @property, so entity.cleaned_text already returns a str. Calling it as a method isn’t required and would actually error out.

Likely an incorrect or invalid review comment.

backend/apps/ai/models/context.py (2)

24-34: Solid GFK design and uniqueness guarantees

Required entity_type/entity_id with a GenericForeignKey and unique_together enforces the “one context per entity” invariant discussed in the PR. This aligns with the project direction and keeps lookups efficient via the implicit unique index.


14-19: Wrapper to avoid circular import is fine

The local indirection for regenerate_chunks_for_context avoids import cycles cleanly. No issues.

backend/apps/ai/management/commands/ai_update_event_chunks.py (2)

14-16: Good use of extractor with BaseChunkCommand

Delegation to extract_event_content(entity) matches the shared chunking flow and keeps entity-specific logic out of the command.


22-24: Sensible default queryset for events

Defaulting to upcoming events instead of an is_active filter is appropriate for reducing churn and processing scope.

backend/apps/ai/Makefile (1)

1-3: Makefile target valid – ai_run_rag_tool exists

The ai_run_rag_tool management command is present at
• backend/apps/ai/management/commands/ai_run_rag_tool.py
and is exercised by tests in
• backend/tests/apps/ai/management/commands/ai_run_rag_tool_test.py

No changes to the ai-run-rag-tool: target are needed.

@Dishant1804 Dishant1804 requested a review from arkid15r August 18, 2025 18:07
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (6)
backend/apps/ai/models/context.py (3)

30-34: Prefer named UniqueConstraint over unique_together (clearer, non-deprecated style).

UniqueConstraint provides a named, explicit constraint that’s easier to manage in migrations.

Apply:

 class Meta:
     db_table = "ai_contexts"
     verbose_name = "Context"
-    unique_together = ("entity_type", "entity_id")
+    constraints = [
+        models.UniqueConstraint(
+            fields=("entity_type", "entity_id"),
+            name="uniq_ai_context_entity",
+        )
+    ]

35-43: Make str robust when the related entity is missing.

If the generic target has been deleted, self.entity may be None, resulting in “model None: …”. Prefer a clearer placeholder.

-        entity = (
-            getattr(self.entity, "name", None)
-            or getattr(self.entity, "key", None)
-            or str(self.entity)
-        )
+        entity_obj = self.entity
+        entity = (
+            getattr(entity_obj, "name", None)
+            or getattr(entity_obj, "key", None)
+            or (str(entity_obj) if entity_obj is not None else "-")
+        )

14-19: Consider removing the wrapper regenerate_chunks_for_context to reduce coupling.

The indirection exists only to sidestep import cycles, but it reintroduces a model-to-utils dependency surface. Since update_data no longer calls it, prefer importing utils where needed or keeping the wrapper in a non-model module (e.g., apps.ai.common.init).

If retained, add a type hint for clarity:

-def regenerate_chunks_for_context(context):
+def regenerate_chunks_for_context(context: "Context"):
     """Import regenerate_chunks_for_context to avoid circular import."""
backend/tests/apps/ai/models/context_test.py (3)

32-38: Rename to reflect field name (entity_type).

The test name still uses the old “content_type” wording.

-    def test_content_type_field_properties(self):
+    def test_entity_type_field_properties(self):

39-42: Rename to reflect field name and, if adopting BigInteger, update the asserted type.

If you switch entity_id to PositiveBigIntegerField, the test should assert that. Otherwise, just rename the test.

-    def test_object_id_field_properties(self):
+    def test_entity_id_field_properties(self):
         field = Context._meta.get_field("entity_id")
-        assert field.__class__.__name__ == "PositiveIntegerField"
+        assert field.__class__.__name__ == "PositiveBigIntegerField"

If you keep PositiveIntegerField, only apply the rename and keep the original assertion.


377-377: Add a test to ensure unchanged contexts don’t trigger save (preserve timestamps).

This guards against accidental regressions that would bump nest_updated_at unnecessarily.

+    @patch("apps.ai.models.context.Context.objects.get_or_create")
+    def test_update_data_noop_does_not_save_when_unchanged(self, mock_get_or_create):
+        content_object = Mock()
+        content_object.pk = 1
+        with patch("apps.ai.models.context.ContentType.objects.get_for_model") as mock_gfm:
+            ct = Mock(spec=ContentType)
+            ct.get_source_expressions = Mock(return_value=[])
+            mock_gfm.return_value = ct
+            existing = create_model_mock(Context)
+            existing.content = "same"
+            existing.source = "src"
+            existing.save = Mock()
+            mock_get_or_create.return_value = (existing, False)
+
+            result = Context.update_data("same", content_object, source="src", save=True)
+
+            assert result == existing
+            existing.save.assert_not_called()
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 9c2556c and 4dc3800.

📒 Files selected for processing (2)
  • backend/apps/ai/models/context.py (1 hunks)
  • backend/tests/apps/ai/models/context_test.py (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-08-16T12:07:38.116Z
Learnt from: Dishant1804
PR: OWASP/Nest#1891
File: backend/apps/ai/models/context.py:44-66
Timestamp: 2025-08-16T12:07:38.116Z
Learning: In the Context.update_data method, the Context object should always be returned (never None), even when no changes are made to the content or source fields. This ensures consistent behavior for callers.

Applied to files:

  • backend/apps/ai/models/context.py
📚 Learning: 2025-07-28T14:51:14.736Z
Learnt from: adithya-naik
PR: OWASP/Nest#1894
File: frontend/src/components/TopContributorsList.tsx:74-74
Timestamp: 2025-07-28T14:51:14.736Z
Learning: In the OWASP/Nest project, the maintainer adithya-naik prefers not to create separate components for code that's only used in two specific cases, following the YAGNI principle to avoid over-engineering when the duplication is limited and manageable.

Applied to files:

  • backend/apps/ai/models/context.py
🧬 Code Graph Analysis (2)
backend/apps/ai/models/context.py (4)
backend/apps/common/models.py (1)
  • TimestampedModel (37-46)
backend/apps/ai/common/utils.py (1)
  • regenerate_chunks_for_context (73-101)
backend/apps/owasp/admin/entity_member.py (1)
  • entity (61-74)
backend/apps/ai/models/chunk.py (2)
  • Meta (15-18)
  • update_data (45-76)
backend/tests/apps/ai/models/context_test.py (2)
backend/apps/ai/models/context.py (2)
  • Context (21-67)
  • update_data (45-67)
backend/apps/common/models.py (1)
  • TimestampedModel (37-46)

Copy link
Collaborator

@arkid15r arkid15r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is still far from what'd approve with no hesitations. Please address the current comments, create issue to track your technical debt (both refactoring I suggested and tests). This can be merged after that.

Comment on lines 14 to 18
def regenerate_chunks_for_context(context):
"""Import regenerate_chunks_for_context to avoid circular import."""
from apps.ai.common.utils import regenerate_chunks_for_context as _regenerate_chunks

return _regenerate_chunks(context)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is here for what purpose?

Copy link
Collaborator Author

@Dishant1804 Dishant1804 Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to resolve the circular import I cleaned out the code but this missed out, removed it

from apps.ai.common.constants import DELIMITER


def extract_chapter_content(chapter) -> tuple[str, str]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have 2 suggestions for the extractors:

  • why do you need separation for prose/metadata if you join them later anyway?
  • this needs to be migrated to Jinja templates like ones we have for Slack responses

Please create an issue to track this refactoring.

"""
from apps.ai.models.chunk import Chunk

old_chunk_count = context.chunks.count()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to delete them w/o pre-checks?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes -- removed the checks

additional_context = self.get_additional_context(
chunk.content_object, chunk.content_type.model
chunk.context.entity, chunk.context.entity_type.model
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems you don't need the second param as you can figure that our from the model instance inside of get_additional_context

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/apps/ai/agent/tools/rag/retriever.py (1)

59-67: Update exception handling for OpenAI SDK v1.x

The v1.x Python client no longer defines OpenAIError. Instead you should catch its new base exception (openai.APIError) and/or the more specific subclasses (e.g., RateLimitError, APIConnectionError, APITimeoutError) to handle errors appropriately.

• Affected file: backend/apps/ai/agent/tools/rag/retriever.py (lines 59–67)
• Replace the generic except openai.OpenAIError: with granular catches and a final APIError fallback.

Proposed diff:

         try:
             response = self.openai_client.embeddings.create(
                 input=[query],
                 model=self.embedding_model,
             )
             return response.data[0].embedding
-        except openai.OpenAIError:
-            logger.exception("OpenAI API error")
-            raise
+        except openai.RateLimitError:
+            logger.warning("Rate limit exceeded, backing off before retry")
+            # backoff & retry logic
+        except (openai.APIConnectionError, openai.APITimeoutError) as e:
+            logger.warning("Network error from OpenAI API: %s", e)
+            # retry logic
+        except openai.APIError as e:
+            logger.exception("Unexpected OpenAI API error")
+            raise
♻️ Duplicate comments (1)
backend/apps/ai/agent/tools/rag/retriever.py (1)

217-222: Good fix: filter through Context entity_type (matches the model changes)

Filtering by context__entity_type__… is correct post-refactor away from Chunk.content_type. This addresses the earlier review note.

🧹 Nitpick comments (12)
backend/tests/apps/ai/management/commands/ai_update_committee_context_test.py (4)

69-79: Patch the correct base method: target BaseAICommand.get_base_queryset, not BaseContextCommand.

get_base_queryset is defined on BaseAICommand per the shared snippets. Patching BaseContextCommand here works by overshadowing via attribute injection, but it doesn’t validate the real inheritance path and can hide regressions.

Apply this diff:

-        with patch(
-            "apps.ai.common.base.context_command.BaseContextCommand.get_base_queryset"
-        ) as mock_super:
+        with patch(
+            "apps.ai.common.base.ai_command.BaseAICommand.get_base_queryset"
+        ) as mock_super:

152-173: Mirror production semantics: have update_data return a truthy object, not True.

Context.update_data returns a Context instance (truthy), not a boolean. Returning a mock object here reduces the risk of a future mismatch if the implementation changes.

-        with patch("apps.ai.common.base.context_command.Context") as mock_context_class:
-            mock_context_class.update_data.return_value = True
+        with patch("apps.ai.common.base.context_command.Context") as mock_context_class:
+            mock_context_class.update_data.return_value = Mock()

188-212: Failure path may be unreachable with current update_data contract.

Context.update_data returns a Context object and raises on DB errors; it doesn’t return False. The tested “failure” branch will not occur in production unless the contract changes or you explicitly return falsy values from update_data.

Consider one of:

  • Adjust BaseContextCommand.process_context_batch to catch exceptions from update_data and log an error (then add a test using side_effect=Exception("db error")), or
  • Document that update_data returns a truthy object on success and update this test to validate exception handling rather than a False return.

1-1: Rename stale ai_create_committee_context references in test file

Please update all remaining occurrences of ai_create_committee_context to ai_update_committee_context and rename the test class accordingly:

• File: backend/tests/apps/ai/management/commands/ai_update_committee_context_test.py
– Line 1:
diff -"""Tests for the ai_create_committee_context command.""" +"""Tests for the ai_update_committee_context command."""
– Around line 27–29:
diff -class TestAiCreateCommitteeContextCommand: - """Test suite for the ai_create_committee_context command.""" +class TestAiUpdateCommitteeContextCommand: + """Test suite for the ai_update_committee_context command."""

This will ensure clarity and make future greps accurate.

backend/apps/ai/agent/tools/rag/retriever.py (8)

24-30: Make SUPPORTED_ENTITY_TYPES immutable for intent and tiny membership wins

Tuples are fine, but a frozenset communicates immutability and gives O(1) membership checks (you iterate and also check membership later). This is a tiny nit, not a blocker.

-    SUPPORTED_ENTITY_TYPES = (
+    SUPPORTED_ENTITY_TYPES = frozenset((
         "chapter",
         "committee",
         "event",
         "message",
         "project",
-    )
+    ))

72-78: Prefer persisted source_name when available; tighten docstring and add a slug fallback

If Context stores a source/source_name (as suggested by the AI command snippets), prefer using that value when available for consistency across pipelines. Also, the docstring still says “content object”; here it’s an entity. Adding “slug” is a cheap, useful fallback.

-    def get_source_name(self, entity) -> str:
-        """Get the name/identifier for the content object."""
-        for attr in ("name", "title", "login", "key", "summary"):
+    def get_source_name(self, entity) -> str:
+        """Get a human-readable identifier for the entity."""
+        for attr in ("name", "title", "login", "key", "summary", "slug"):
             if getattr(entity, attr, None):
                 return str(getattr(entity, attr))
         return str(entity)

79-85: Docstring drift: “content type” → “entity type”; clarify params

The function operates on entities now. Tightening the docstring avoids future confusion.

-    def get_additional_context(self, entity, entity_type: str) -> dict[str, Any]:
-        """Get additional context information based on content type.
+    def get_additional_context(self, entity, entity_type: str) -> dict[str, Any]:
+        """Get additional context information based on entity type.
@@
-            entity: The source object.
-            entity_type: The model name of the content object.
+            entity: The related entity (from Context.entity).
+            entity_type: The entity model name (e.g., "project", "chapter").

91-91: Minor naming clarity: clean_content_type → clean_entity_type

Variable name can be misleading after the refactor to entities. Not a blocker; consider renaming across this method when convenient.


224-224: Avoid hauling the embedding payload; defer the “embedding” field

After the annotate is done, the embedding value isn’t used in Python. Deferring it trims network I/O and memory, especially with large vector dims.

-        chunks = queryset.select_related("context__entity_type").order_by("-similarity")[:limit]
+        chunks = (
+            queryset
+            .select_related("context__entity_type")
+            .defer("embedding")
+            .order_by("-similarity")[:limit]
+        )

228-235: Avoid double entity fetches, correct log message, and prefer persisted context source_name

Accessing GenericForeignKey triggers a DB hit; you access it for the guard and then again for source/extra context. Cache it once, fix the log message, and prefer a stored source_name when available to avoid recomputing.

-        for chunk in chunks:
-            if not chunk.context or not chunk.context.entity:
-                logger.warning("Content object is None for chunk %s. Skipping.", chunk.id)
-                continue
-
-            source_name = self.get_source_name(chunk.context.entity)
-            additional_context = self.get_additional_context(
-                chunk.context.entity, chunk.context.entity_type.model
-            )
+        for chunk in chunks:
+            if not chunk.context:
+                logger.warning("Context is None for chunk %s. Skipping.", chunk.id)
+                continue
+
+            entity = getattr(chunk.context, "entity", None)
+            if not entity:
+                logger.warning("Context entity is None for chunk %s. Skipping.", chunk.id)
+                continue
+
+            source_name = getattr(chunk.context, "source_name", None) or self.get_source_name(entity)
+            additional_context = self.get_additional_context(entity, chunk.context.entity_type.model)

If Context does not have source/source_name, ignore that piece of the change and keep the fallback.


263-265: Optional: enrich keyword detection with synonyms

The simple singular/plural check works; consider a small synonyms map (e.g., “repo”, “repos” → “project”; “chapter(s)” → “chapter”) to improve recall.


232-235: Use Context metadata where possible to reduce N+1s

Longer-term, consider persisting and reading “additional_context” and “source_name” from Context at creation time. That would avoid hitting the GFK per chunk at retrieval time, especially when limit is high.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 4dc3800 and 346d324.

📒 Files selected for processing (6)
  • backend/apps/ai/agent/tools/rag/retriever.py (4 hunks)
  • backend/apps/ai/common/base/context_command.py (1 hunks)
  • backend/apps/ai/models/context.py (1 hunks)
  • backend/tests/apps/ai/agent/tools/rag/retriever_test.py (1 hunks)
  • backend/tests/apps/ai/common/base/context_command_test.py (1 hunks)
  • backend/tests/apps/ai/management/commands/ai_update_committee_context_test.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • backend/apps/ai/common/base/context_command.py
  • backend/tests/apps/ai/agent/tools/rag/retriever_test.py
  • backend/tests/apps/ai/common/base/context_command_test.py
  • backend/apps/ai/models/context.py
🧰 Additional context used
🧬 Code Graph Analysis (2)
backend/tests/apps/ai/management/commands/ai_update_committee_context_test.py (4)
backend/apps/ai/management/commands/ai_update_committee_context.py (2)
  • Command (8-14)
  • extract_content (12-14)
backend/apps/ai/common/base/context_command.py (4)
  • BaseContextCommand (9-54)
  • help (12-14)
  • process_context_batch (16-43)
  • handle (45-54)
backend/apps/ai/common/base/ai_command.py (6)
  • get_base_queryset (29-31)
  • get_default_queryset (33-35)
  • add_arguments (56-58)
  • get_queryset (60-69)
  • get_entity_key (71-73)
  • source_name (25-27)
backend/apps/ai/models/context.py (1)
  • update_data (45-67)
backend/apps/ai/agent/tools/rag/retriever.py (3)
backend/apps/owasp/admin/entity_member.py (1)
  • entity (61-74)
backend/apps/ai/common/base/ai_command.py (1)
  • source_name (25-27)
backend/apps/ai/management/commands/ai_update_slack_message_context.py (1)
  • source_name (41-42)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: CodeQL (javascript-typescript)
🔇 Additional comments (2)
backend/tests/apps/ai/management/commands/ai_update_committee_context_test.py (2)

79-90: LGTM on default queryset filtering.

Good verification that the default path applies is_active=True.


213-231: LGTM on handle() batch plumbing.

Solid assertion of option forwarding and process_batch_func binding.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (9)
backend/apps/ai/models/context.py (2)

49-60: Close race on create and avoid bumping timestamps when unchanged

Two concurrent callers can both miss the get() and try to create, causing IntegrityError on save. Also, saving when content/source are unchanged will bump nest_updated_at, breaking “stale-chunk” detection based on timestamps.

Apply atomic get_or_create and only save when fields actually change; always return Context per your established contract:

     @staticmethod
     def update_data(
         content: str,
         entity,
         source: str = "",
         *,
         save: bool = True,
-    ) -> "Context":
+    ) -> "Context":
         """Create or update context for a given entity."""
-        entity_type = ContentType.objects.get_for_model(entity)
-        entity_id = entity.pk
-
-        try:
-            context = Context.objects.get(entity_type=entity_type, entity_id=entity_id)
-        except Context.DoesNotExist:
-            context = Context(entity_type=entity_type, entity_id=entity_id)
-
-        context.content = content
-        context.source = source
-
-        if save:
-            context.save()
-
-        return context
+        entity_type = ContentType.objects.get_for_model(entity)
+        entity_id = entity.pk
+
+        if save:
+            # Atomic to avoid duplicate rows under concurrency
+            context, created = Context.objects.get_or_create(
+                entity_type=entity_type,
+                entity_id=entity_id,
+                defaults={"content": content, "source": source},
+            )
+            if created:
+                return context
+            # No-op if unchanged; preserve timestamps
+            if context.content == content and context.source == source:
+                return context
+            # Update only changed fields
+            context.content = content
+            context.source = source
+            context.save(update_fields=["content", "source"])
+            return context
+        else:
+            # Return unsaved instance for bulk-save flows
+            try:
+                context = Context.objects.get(entity_type=entity_type, entity_id=entity_id)
+            except Context.DoesNotExist:
+                return Context(
+                    entity_type=entity_type, entity_id=entity_id, content=content, source=source
+                )
+            context.content = content
+            context.source = source
+            return context

This keeps the “always return Context” behavior while removing the race and spurious timestamp updates.


19-21: Upgrade entity_id to PositiveBigIntegerField to match 64-bit primary keys

We’ve confirmed that your project’s DEFAULT_AUTO_FIELD is set to BigAutoField in backend/settings/base.py:194. Since GenericForeignKey may point to any model using the default auto field, you should update the entity_id field to PositiveBigIntegerField to avoid potential overflows.

• File: backend/apps/ai/models/context.py, Lines 19–21

-    entity_id = models.PositiveIntegerField()
+    entity_id = models.PositiveBigIntegerField()

• After changing the field type, generate and apply a Django migration to alter the column in the database.

backend/tests/apps/ai/agent/tools/rag/retriever_test.py (2)

19-20: Patch where the symbol is used: target apps.ai.agent.tools.rag.retriever.openai.OpenAI

Patching the top-level openai.OpenAI is brittle and can leak across modules. Patch the symbol in the module under test.

-            patch("openai.OpenAI") as mock_openai,
+            patch("apps.ai.agent.tools.rag.retriever.openai.OpenAI") as mock_openai,

Apply the same change to all occurrences listed in the line ranges.

Also applies to: 45-46, 54-55, 74-75, 89-90, 104-105, 118-120, 135-136, 151-153, 171-172, 451-454, 483-485, 534-536, 567-569


70-84: Make exception type resilient across OpenAI SDK versions

openai.OpenAIError may not exist in some SDK versions. Use the exception from the retriever module’s namespace and assert accordingly.

-        with (
-            patch.dict(os.environ, {"DJANGO_OPEN_AI_SECRET_KEY": "test-key"}),
-            patch("openai.OpenAI") as mock_openai,
-        ):
+        from apps.ai.agent.tools.rag import retriever as retriever_mod
+        with (
+            patch.dict(os.environ, {"DJANGO_OPEN_AI_SECRET_KEY": "test-key"}),
+            patch("apps.ai.agent.tools.rag.retriever.openai.OpenAI") as mock_openai,
+        ):
             mock_client = MagicMock()
-            mock_client.embeddings.create.side_effect = openai.OpenAIError("API Error")
+            Exc = getattr(retriever_mod.openai, "OpenAIError", Exception)
+            mock_client.embeddings.create.side_effect = Exc("API Error")
             mock_openai.return_value = mock_client
 
             retriever = Retriever()
 
-            with pytest.raises(openai.OpenAIError):
+            with pytest.raises(Exc):
                 retriever.get_query_embedding("test query")

Repeat similarly for any other tests that rely on OpenAIError.

backend/apps/ai/common/utils.py (3)

39-41: Docstring claims ValueError is raised, but code does not raise it

Either enforce a ValueError when context is invalid, or align the docstring with current behavior (log-and-return-empty on error). Given current tests, remove the incorrect Raises section.

-    Raises:
-        ValueError: If context is None or invalid
-

43-44: Remove duplicate import of Chunk inside the function

Chunk is already imported at module scope. The inner import is leftover and unnecessary.

-    from apps.ai.models.chunk import Chunk

91-99: Log success only when chunks were actually (re)created

Currently logs success unconditionally. Use the return value to decide.

-    create_chunks_and_embeddings(
+    new_chunks = create_chunks_and_embeddings(
         chunk_texts=new_chunk_texts,
         context=context,
         openai_client=openai_client,
         save=True,
     )
 
-    logger.info("Successfully completed chunk regeneration for new context")
+    if not new_chunks:
+        logger.error(
+            "Chunk regeneration failed or produced no chunks for context id=%s",
+            getattr(context, "id", None),
+        )
+        return
+    logger.info(
+        "Successfully completed chunk regeneration for context id=%s (%d chunks)",
+        getattr(context, "id", None),
+        len(new_chunks),
+    )
backend/tests/apps/ai/common/utils_test.py (1)

117-121: Rename to reflect actual assertion (sleep is NOT called here)

The name/docstring say “sleep called” but the test asserts not_called.

-    def test_create_chunks_and_embeddings_sleep_called(
+    def test_create_chunks_and_embeddings_no_sleep_on_first_call(
         self, mock_datetime, mock_sleep, mock_update_data
     ):
-        """Tests that sleep is called when needed."""
+        """Tests that sleep is not called on the first request."""
backend/apps/ai/agent/tools/rag/retriever.py (1)

151-158: Normalize leaders to a JSON-serializable shape (avoid M2M manager)

For committees, leaders currently reads the relation (likely a manager/QuerySet), which won’t serialize cleanly and is inconsistent with leaders_raw used elsewhere.

-                    "leaders": getattr(entity, "leaders", []),
+                    "leaders": getattr(entity, "leaders_raw", []),

If you must use the relation, convert explicitly, e.g., [u.login for u in entity.leaders.all()].

🧹 Nitpick comments (9)
backend/apps/ai/models/context.py (2)

23-27: Prefer UniqueConstraint over unique_together (modern Django API)

unique_together is legacy. Use UniqueConstraint for clarity and flexibility (e.g., conditionals, names).

 class Meta:
     db_table = "ai_contexts"
     verbose_name = "Context"
-    unique_together = ("entity_type", "entity_id")
+    constraints = [
+        models.UniqueConstraint(
+            fields=("entity_type", "entity_id"),
+            name="uniq_ai_context_entity",
+        ),
+    ]

28-35: Minor: friendlier fallback in str when entity is missing

If the entity was deleted, str(None) renders “None”, which can be confusing in admin logs. Consider a clearer placeholder.

-        entity = (
+        entity = (
             getattr(self.entity, "name", None)
             or getattr(self.entity, "key", None)
-            or str(self.entity)
+            or ("<deleted>" if self.entity is None else str(self.entity))
         )
backend/apps/ai/common/utils.py (3)

45-53: Rate limiting logic has no state; time_since_last_request is constant

You re-derive “last request time” from now, so time_since_last_request is always DEFAULT_LAST_REQUEST_OFFSET_SECONDS, not the real elapsed time. Consider tracking last_request_at across calls.

Example approach:

  • Add a module-global last_request_at: datetime | None = None
  • Before calling the API, compare now - last_request_at to MIN_REQUEST_INTERVAL_SECONDS
  • Update last_request_at = now after the request

I can draft a patch if you want to persist this state here or behind a small helper.


66-68: Catch ValueError from zip(strict=True) length mismatches

If the embeddings response length differs, zip(..., strict=True) raises ValueError. Include it in the handled exceptions.

-    except (OpenAIError, AttributeError, TypeError):
+    except (OpenAIError, AttributeError, TypeError, ValueError):
         logger.exception("Failed to create chunks and embeddings")
         return []

Optionally, validate len(response.data) == len(chunk_texts) and log a clearer error before zipping.


60-65: Optional: deduplicate chunk_texts before embedding to reduce cost

Sending duplicate texts increases API usage and can hit uniqueness on (context, text). You can preserve order with a simple de-dup.

# Before calling embeddings.create(...)
seen = set()
unique_texts = []
for t in chunk_texts:
    if t not in seen:
        unique_texts.append(t)
        seen.add(t)
chunk_texts = unique_texts
backend/tests/apps/ai/common/utils_test.py (2)

82-92: Use the exception class from the module under test

To avoid SDK-version drift, set side_effect to the OpenAIError imported by utils, not openai.OpenAIError.

-        mock_openai_client.embeddings.create.side_effect = openai.OpenAIError(
-            "API connection failed"
-        )
+        from apps.ai.common import utils as utils_mod
+        mock_openai_client.embeddings.create.side_effect = utils_mod.OpenAIError(
+            "API connection failed"
+        )

263-281: Stabilize success-path assertion by returning a non-empty list

With the proposed utils change to log success only when chunks are created, ensure the mocked helper returns a truthy, non-empty list.

         mock_openai_client = MagicMock()
         mock_openai_class.return_value = mock_openai_client
 
+        mock_create_chunks.return_value = [MagicMock()]
         regenerate_chunks_for_context(context)
 
         mock_existing_chunks.all.assert_called_once()
         mock_existing_chunks.all().delete.assert_called_once()

Repeat similarly in test_regenerate_chunks_for_context_no_existing_chunks if you want to assert the success log path.

backend/apps/ai/agent/tools/rag/retriever.py (2)

72-77: Nit: docstring still says “content object”; it’s actually an entity

Minor naming mismatch; update for clarity.

-    def get_source_name(self, entity) -> str:
-        """Get the name/identifier for the content object."""
+    def get_source_name(self, entity) -> str:
+        """Get the name/identifier for the entity."""

223-224: Performance nit: also select_related("context")

You already select_related("context__entity_type"). Adding "context" avoids an extra fetch for the FK itself.

-        chunks = queryset.select_related("context__entity_type").order_by("-similarity")[:limit]
+        chunks = queryset.select_related("context", "context__entity_type").order_by("-similarity")[:limit]
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 346d324 and 6c353d1.

📒 Files selected for processing (6)
  • backend/apps/ai/agent/tools/rag/retriever.py (4 hunks)
  • backend/apps/ai/common/utils.py (2 hunks)
  • backend/apps/ai/models/context.py (1 hunks)
  • backend/tests/apps/ai/agent/tools/rag/retriever_test.py (1 hunks)
  • backend/tests/apps/ai/common/utils_test.py (4 hunks)
  • backend/tests/apps/ai/models/context_test.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/tests/apps/ai/models/context_test.py
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-08-16T12:07:38.124Z
Learnt from: Dishant1804
PR: OWASP/Nest#1891
File: backend/apps/ai/models/context.py:44-66
Timestamp: 2025-08-16T12:07:38.124Z
Learning: In the Context.update_data method, the Context object should always be returned (never None), even when no changes are made to the content or source fields. This ensures consistent behavior for callers.

Applied to files:

  • backend/apps/ai/models/context.py
📚 Learning: 2025-07-28T14:51:14.736Z
Learnt from: adithya-naik
PR: OWASP/Nest#1894
File: frontend/src/components/TopContributorsList.tsx:74-74
Timestamp: 2025-07-28T14:51:14.736Z
Learning: In the OWASP/Nest project, the maintainer adithya-naik prefers not to create separate components for code that's only used in two specific cases, following the YAGNI principle to avoid over-engineering when the duplication is limited and manageable.

Applied to files:

  • backend/apps/ai/models/context.py
🧬 Code graph analysis (5)
backend/apps/ai/common/utils.py (2)
backend/apps/ai/models/chunk.py (3)
  • Chunk (12-76)
  • update_data (45-76)
  • split_text (35-42)
backend/apps/ai/models/context.py (2)
  • Context (14-60)
  • update_data (38-60)
backend/apps/ai/models/context.py (3)
backend/apps/common/models.py (1)
  • TimestampedModel (37-46)
backend/apps/owasp/admin/entity_member.py (1)
  • entity (61-74)
backend/apps/ai/models/chunk.py (2)
  • Meta (15-18)
  • update_data (45-76)
backend/tests/apps/ai/agent/tools/rag/retriever_test.py (1)
backend/apps/ai/agent/tools/rag/retriever.py (6)
  • Retriever (21-268)
  • get_query_embedding (49-70)
  • get_source_name (72-77)
  • get_additional_context (79-182)
  • extract_content_types_from_query (247-268)
  • retrieve (184-245)
backend/tests/apps/ai/common/utils_test.py (1)
backend/apps/ai/common/utils.py (2)
  • create_chunks_and_embeddings (19-70)
  • regenerate_chunks_for_context (73-98)
backend/apps/ai/agent/tools/rag/retriever.py (1)
backend/apps/owasp/admin/entity_member.py (1)
  • entity (61-74)
🔇 Additional comments (2)
backend/apps/ai/models/context.py (1)

17-21: Confirm “optional GFK” requirement vs current required fields

The PR/issue description mentions an optional GenericForeignKey on Context. Here both entity_type and entity_id are required. If contexts without an entity are needed (e.g., global/system contexts), make these fields nullable and validate the pair (both set or both null).

If optional linkage is intended:

-    entity_type = models.ForeignKey(ContentType, on_delete=models.CASCADE)
-    entity_id = models.PositiveIntegerField()
+    entity_type = models.ForeignKey(ContentType, on_delete=models.CASCADE, null=True, blank=True)
+    entity_id = models.PositiveBigIntegerField(null=True, blank=True)

Add a clean() or DB constraint to ensure both-null or both-set for integrity. If the “required” design is intentional, please update the docs/issue to reflect that.

backend/tests/apps/ai/common/utils_test.py (1)

96-113: Docstring says “failure path when context is None” but behavior is log-and-empty

This matches current behavior. If you adopt an explicit ValueError in utils (alternative path), update this test accordingly. Otherwise no action needed.

@@ -60,6 +60,3 @@ def test_complete_general_exception(self, mock_openai, mock_logger, openai_insta
response = openai_instance.complete()

assert response is None
mock_logger.exception.assert_called_once_with(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I acidentally removed it while I was testing other assertions -- I have added it back

result = str(context)
assert result == (
"test_model Test Object: This is test content that is longer than 50 charac"
# cspell:ignore charac
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not something we normally do. Use truncate for Context::str instead.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made the changes

@Dishant1804 Dishant1804 requested a review from arkid15r August 21, 2025 17:37
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
backend/apps/ai/models/context.py (1)

38-61: Make update_data concurrency-safe and avoid unnecessary timestamp bumps.

The current get/except-create pattern can race under concurrent calls and always saves even when nothing changed, unnecessarily bumping nest_updated_at (hurting “chunks stale if context updated” logic). Use get_or_create for atomicity and only persist when content/source actually change. This keeps the “always return Context” behavior (per the retrieved learning).

@@
     @staticmethod
     def update_data(
-        content: str,
-        entity,
+        content: str,
+        entity: models.Model,
         source: str = "",
         *,
         save: bool = True,
     ) -> "Context":
-        """Create or update context for a given entity."""
-        entity_type = ContentType.objects.get_for_model(entity)
-        entity_id = entity.pk
-
-        try:
-            context = Context.objects.get(entity_type=entity_type, entity_id=entity_id)
-        except Context.DoesNotExist:
-            context = Context(entity_type=entity_type, entity_id=entity_id)
-
-        context.content = content
-        context.source = source
-
-        if save:
-            context.save()
-
-        return context
+        """Create or update context for a given entity.
+        Always returns a Context instance (never None)."""
+        entity_type = ContentType.objects.get_for_model(entity)
+        entity_id = entity.pk
+
+        if save:
+            context, created = Context.objects.get_or_create(
+                entity_type=entity_type,
+                entity_id=entity_id,
+                defaults={"content": content, "source": source},
+            )
+            if not created and (context.content != content or context.source != source):
+                context.content = content
+                context.source = source
+                # Include nest_updated_at to ensure the timestamp is persisted.
+                context.save(update_fields=["content", "source", "nest_updated_at"])
+            return context
+        else:
+            # Support bulk-save flows by returning an unsaved instance when missing.
+            try:
+                context = Context.objects.get(entity_type=entity_type, entity_id=entity_id)
+            except Context.DoesNotExist:
+                return Context(
+                    entity_type=entity_type,
+                    entity_id=entity_id,
+                    content=content,
+                    source=source,
+                )
+            # Update in-memory only (caller decides when to save).
+            context.content = content
+            context.source = source
+            return context
backend/tests/apps/ai/models/context_test.py (1)

297-336: Update “new context with save=True” test to patch get_or_create.

This mirrors the creation path after the refactor.

-    @patch("apps.ai.models.context.Context.objects.get")
-    @patch("apps.ai.models.context.Context.__init__")
-    def test_update_data_new_context_with_save(self, mock_init, mock_get):
+    @patch("apps.ai.models.context.Context.objects.get_or_create")
+    def test_update_data_new_context_with_save(self, mock_get_or_create):
         """Test update_data creating a new context with save=True."""
         from apps.ai.models.context import Context as ContextModel
 
-        mock_get.side_effect = ContextModel.DoesNotExist
-        mock_init.return_value = None  # __init__ should return None
+        new_ctx = create_model_mock(ContextModel)
+        mock_get_or_create.return_value = (new_ctx, True)
@@
-                mock_get_for_model.assert_called_once_with(mock_content_object)
-                mock_get.assert_called_once_with(
-                    entity_type=mock_content_type,
-                    entity_id=1,
-                )
-                mock_init.assert_called_once_with(
-                    entity_type=mock_content_type,
-                    entity_id=1,
-                )
-                assert result.content == content
-                assert result.source == source
-                mock_save.assert_called_once()
+                mock_get_for_model.assert_called_once_with(mock_content_object)
+                mock_get_or_create.assert_called_once_with(
+                    entity_type=mock_content_type,
+                    entity_id=1,
+                    defaults={"content": content, "source": source},
+                )
+                assert result == new_ctx
🧹 Nitpick comments (8)
backend/apps/ai/models/context.py (4)

29-36: Harden str when the related entity is missing.

If the target object is deleted, self.entity may be None and str(self.entity) yields "None". Fall back to entity_id for a clearer label.

-        entity = (
-            getattr(self.entity, "name", None)
-            or getattr(self.entity, "key", None)
-            or str(self.entity)
-        )
-        return f"{self.entity_type.model} {entity}: {truncate(self.content, 50)}"
+        obj = self.entity
+        label = (
+            getattr(obj, "name", None)
+            or getattr(obj, "key", None)
+            or (str(obj) if obj is not None else str(self.entity_id))
+        )
+        return f"{self.entity_type.model} {label}: {truncate(self.content, 50)}"

24-28: Optional: prefer UniqueConstraint over unique_together (named, modern style).

Not urgent, but UniqueConstraint provides naming and advanced options.

     class Meta:
         db_table = "ai_contexts"
         verbose_name = "Context"
-        unique_together = ("entity_type", "entity_id")
+        constraints = [
+            models.UniqueConstraint(
+                fields=("entity_type", "entity_id"),
+                name="uq_ai_context_entity",
+            )
+        ]

3-3: Remove unused logger.

logging and logger are unused in this module.

-import logging
@@
-logger = logging.getLogger(__name__)

Also applies to: 12-12


20-21: Use PositiveBigIntegerField for entity_id to match BigAutoField default PK

Confirmed that DEFAULT_AUTO_FIELD is set to django.db.models.BigAutoField in backend/settings/base.py:194, and most apps (and their migrations) use BigAutoField for primary keys. To avoid potential overflow when referencing large PKs via the GenericForeignKey, switch entity_id to a 64-bit field.

• File to update: backend/apps/ai/models/context.py

 -    entity_id = models.PositiveIntegerField()
 +    entity_id = models.PositiveBigIntegerField()

• After applying, run ./manage.py makemigrations ai to generate an AlterField migration.
• Review any tests or fixtures that set or compare entity_id and update them if they assume a 32-bit range.

backend/tests/apps/ai/models/context_test.py (4)

66-78: Make manager.create test reflect required fields.

Context requires entity_type and entity_id; the current test calls create without them (only works because of patching). Include those args for realism.

-    @patch("apps.ai.models.context.Context.objects.create")
+    @patch("apps.ai.models.context.Context.objects.create")
     def test_context_manager_create(self, mock_create):
         mock_context = create_model_mock(Context)
         mock_create.return_value = mock_context
 
         content = "Test text"
         source = "test_source"
 
-        result = Context.objects.create(content=content, source=source)
+        mock_ct = Mock(spec=ContentType)
+        result = Context.objects.create(
+            content=content,
+            source=source,
+            entity_type=mock_ct,
+            entity_id=1,
+        )
 
-        mock_create.assert_called_once_with(content=content, source=source)
+        mock_create.assert_called_once_with(
+            content=content,
+            source=source,
+            entity_type=mock_ct,
+            entity_id=1,
+        )
         assert result == mock_context

233-264: Adjust str expectation when entity is missing.

With the hardened str, it falls back to entity_id instead of “Unknown”.

@@
-            result = str(context)
-            assert result == "test_model Unknown: Another test content"
+            result = str(context)
+            assert result == "test_model 456: Another test content"

40-43: Update field type assertion if switching to PositiveBigIntegerField.

If you adopt the 64-bit entity_id change, update this assertion accordingly.

-        assert field.__class__.__name__ == "PositiveIntegerField"
+        assert field.__class__.__name__ == "PositiveBigIntegerField"

158-168: Add a no-op update test to prevent accidental timestamp bumps.

Recommend adding a test ensuring update_data(save=True) does not call save() when values are unchanged (protects stale-chunk detection).

def test_update_data_noop_does_not_save(monkeypatch):
    from apps.ai.models.context import Context as ContextModel
    mock_entity = Mock()
    mock_entity.pk = 1
    mock_ct = Mock(spec=ContentType)
    mock_ct.get_source_expressions = Mock(return_value=[])

    # Existing instance with given values
    ctx = create_model_mock(ContextModel)
    ctx.content = "same"
    ctx.source = "src"

    with patch("apps.ai.models.context.ContentType.objects.get_for_model", return_value=mock_ct), \
         patch("apps.ai.models.context.Context.objects.get_or_create", return_value=(ctx, False)) as goc, \
         patch.object(ContextModel, "save") as mock_save:
        result = Context.update_data("same", mock_entity, source="src", save=True)
        goc.assert_called_once()
        mock_save.assert_not_called()
        assert result is ctx
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 6c353d1 and 506ad46.

📒 Files selected for processing (3)
  • backend/apps/ai/models/context.py (1 hunks)
  • backend/tests/apps/ai/models/context_test.py (1 hunks)
  • backend/tests/apps/common/open_ai_test.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/tests/apps/common/open_ai_test.py
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-08-16T12:07:38.124Z
Learnt from: Dishant1804
PR: OWASP/Nest#1891
File: backend/apps/ai/models/context.py:44-66
Timestamp: 2025-08-16T12:07:38.124Z
Learning: In the Context.update_data method, the Context object should always be returned (never None), even when no changes are made to the content or source fields. This ensures consistent behavior for callers.

Applied to files:

  • backend/apps/ai/models/context.py
  • backend/tests/apps/ai/models/context_test.py
📚 Learning: 2025-07-28T14:51:14.736Z
Learnt from: adithya-naik
PR: OWASP/Nest#1894
File: frontend/src/components/TopContributorsList.tsx:74-74
Timestamp: 2025-07-28T14:51:14.736Z
Learning: In the OWASP/Nest project, the maintainer adithya-naik prefers not to create separate components for code that's only used in two specific cases, following the YAGNI principle to avoid over-engineering when the duplication is limited and manageable.

Applied to files:

  • backend/apps/ai/models/context.py
🧬 Code graph analysis (2)
backend/apps/ai/models/context.py (3)
backend/apps/common/models.py (1)
  • TimestampedModel (37-46)
backend/apps/common/utils.py (1)
  • truncate (164-176)
backend/apps/ai/models/chunk.py (2)
  • Meta (15-18)
  • update_data (45-76)
backend/tests/apps/ai/models/context_test.py (3)
backend/apps/ai/models/context.py (2)
  • Context (15-61)
  • update_data (39-61)
backend/apps/common/models.py (1)
  • TimestampedModel (37-46)
backend/apps/common/utils.py (1)
  • truncate (164-176)

Comment on lines +142 to +168
@patch("apps.ai.models.context.Context.objects.get")
def test_update_data_existing_context(self, mock_get):
mock_context = create_model_mock(Context)
mock_get.return_value = mock_context

content = "Test"
mock_content_object = Mock()
mock_content_object.pk = 1

with patch(
"apps.ai.models.context.ContentType.objects.get_for_model"
) as mock_get_for_model:
mock_content_type = Mock()
mock_content_type.get_source_expressions = Mock(return_value=[])
mock_get_for_model.return_value = mock_content_type

result = Context.update_data(content, mock_content_object, source="src", save=True)

mock_get_for_model.assert_called_once_with(mock_content_object)
mock_get.assert_called_once_with(
entity_type=mock_content_type,
entity_id=1,
)
assert result == mock_context
assert mock_context.content == content
assert mock_context.source == "src"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Align existing-context test with get_or_create path.

If update_data switches to get_or_create, this test should patch that and simulate created=False.

-    @patch("apps.ai.models.context.Context.objects.get")
-    def test_update_data_existing_context(self, mock_get):
+    @patch("apps.ai.models.context.Context.objects.get_or_create")
+    def test_update_data_existing_context(self, mock_get_or_create):
         mock_context = create_model_mock(Context)
-        mock_get.return_value = mock_context
+        mock_get_or_create.return_value = (mock_context, False)
@@
-            mock_get_for_model.assert_called_once_with(mock_content_object)
-            mock_get.assert_called_once_with(
-                entity_type=mock_content_type,
-                entity_id=1,
-            )
+            mock_get_for_model.assert_called_once_with(mock_content_object)
+            mock_get_or_create.assert_called_once_with(
+                entity_type=mock_content_type,
+                entity_id=1,
+                defaults={"content": content, "source": "src"},
+            )
             assert result == mock_context
             assert mock_context.content == content
             assert mock_context.source == "src"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@patch("apps.ai.models.context.Context.objects.get")
def test_update_data_existing_context(self, mock_get):
mock_context = create_model_mock(Context)
mock_get.return_value = mock_context
content = "Test"
mock_content_object = Mock()
mock_content_object.pk = 1
with patch(
"apps.ai.models.context.ContentType.objects.get_for_model"
) as mock_get_for_model:
mock_content_type = Mock()
mock_content_type.get_source_expressions = Mock(return_value=[])
mock_get_for_model.return_value = mock_content_type
result = Context.update_data(content, mock_content_object, source="src", save=True)
mock_get_for_model.assert_called_once_with(mock_content_object)
mock_get.assert_called_once_with(
entity_type=mock_content_type,
entity_id=1,
)
assert result == mock_context
assert mock_context.content == content
assert mock_context.source == "src"
@patch("apps.ai.models.context.Context.objects.get_or_create")
def test_update_data_existing_context(self, mock_get_or_create):
mock_context = create_model_mock(Context)
mock_get_or_create.return_value = (mock_context, False)
content = "Test"
mock_content_object = Mock()
mock_content_object.pk = 1
with patch(
"apps.ai.models.context.ContentType.objects.get_for_model"
) as mock_get_for_model:
mock_content_type = Mock()
mock_content_type.get_source_expressions = Mock(return_value=[])
mock_get_for_model.return_value = mock_content_type
result = Context.update_data(content, mock_content_object, source="src", save=True)
mock_get_for_model.assert_called_once_with(mock_content_object)
mock_get_or_create.assert_called_once_with(
entity_type=mock_content_type,
entity_id=1,
defaults={"content": content, "source": "src"},
)
assert result == mock_context
assert mock_context.content == content
assert mock_context.source == "src"
🤖 Prompt for AI Agents
In backend/tests/apps/ai/models/context_test.py around lines 142 to 168, the
test currently patches Context.objects.get but update_data now uses
get_or_create; change the test to patch Context.objects.get_or_create instead,
have it return (mock_context, False) to simulate existing context, and assert
get_for_model and get_or_create were called with the correct args
(entity_type=mock_content_type, entity_id=1); then verify the returned object is
mock_context, its content and source fields were updated, and that save was
invoked when save=True.

Copy link

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
backend/tests/apps/ai/models/context_test.py (2)

298-337: Align “new context + save=True” test with a get_or_create-based implementation.

If Context.update_data is (or will be) implemented with get_or_create, patching Context.__init__ and objects.get becomes brittle. Prefer patching objects.get_or_create and asserting defaults carry content and source. This also models the concurrency-safe path.

Proposed rewrite:

-@patch("apps.ai.models.context.Context.objects.get")
-@patch("apps.ai.models.context.Context.__init__")
-def test_update_data_new_context_with_save(self, mock_init, mock_get):
-    """Test update_data creating a new context with save=True."""
-    from apps.ai.models.context import Context as ContextModel
-
-    mock_get.side_effect = ContextModel.DoesNotExist
-    mock_init.return_value = None  # __init__ should return None
+@patch("apps.ai.models.context.Context.objects.get_or_create")
+def test_update_data_new_context_with_save(self, mock_get_or_create):
+    """Test update_data creating a new context with save=True."""
+    from apps.ai.models.context import Context as ContextModel
+    new_ctx = create_model_mock(ContextModel)
+    mock_get_or_create.return_value = (new_ctx, True)
@@
-    with patch(
-        "apps.ai.models.context.ContentType.objects.get_for_model"
-    ) as mock_get_for_model:
+    with patch("apps.ai.models.context.ContentType.objects.get_for_model") as mock_get_for_model:
         mock_content_type = Mock(spec=ContentType)
-        mock_content_type.get_source_expressions = Mock(return_value=[])
         mock_get_for_model.return_value = mock_content_type
@@
-            # Mock the context instance and its save method
-            with patch.object(ContextModel, "save") as mock_save:
-                result = Context.update_data(
-                    content, mock_content_object, source=source, save=True
-                )
-
-                mock_get_for_model.assert_called_once_with(mock_content_object)
-                mock_get.assert_called_once_with(
-                    entity_type=mock_content_type,
-                    entity_id=1,
-                )
-                mock_init.assert_called_once_with(
-                    entity_type=mock_content_type,
-                    entity_id=1,
-                )
-                assert result.content == content
-                assert result.source == source
-                mock_save.assert_called_once()
+            result = Context.update_data(content, mock_content_object, source=source, save=True)
+            mock_get_for_model.assert_called_once_with(mock_content_object)
+            mock_get_or_create.assert_called_once_with(
+                entity_type=mock_content_type,
+                entity_id=1,
+                defaults={"content": content, "source": source},
+            )
+            assert result == new_ctx

To verify which implementation the code currently uses, run:

#!/bin/bash
# Inspect update_data implementation in this branch
rg -nC2 -P 'class\s+Context\b.*?@staticmethod.*?def\s+update_data|get_or_create|objects\.get\(' --type=py apps/ai/models/context.py

338-377: Do the same for the “new context + save=False” variant.

Mirror the get_or_create approach and assert that save is not forced when save=False.

-@patch("apps.ai.models.context.Context.objects.get")
-@patch("apps.ai.models.context.Context.__init__")
-def test_update_data_new_context_without_save(self, mock_init, mock_get):
+@patch("apps.ai.models.context.Context.objects.get_or_create")
+def test_update_data_new_context_without_save(self, mock_get_or_create):
@@
-    mock_get.side_effect = ContextModel.DoesNotExist
-    mock_init.return_value = None  # __init__ should return None
+    new_ctx = create_model_mock(ContextModel)
+    mock_get_or_create.return_value = (new_ctx, True)
@@
-    with patch(
-        "apps.ai.models.context.ContentType.objects.get_for_model"
-    ) as mock_get_for_model:
+    with patch("apps.ai.models.context.ContentType.objects.get_for_model") as mock_get_for_model:
         mock_content_type = Mock(spec=ContentType)
-        mock_content_type.get_source_expressions = Mock(return_value=[])
         mock_get_for_model.return_value = mock_content_type
@@
-            # Mock the context instance and its save method
-            with patch.object(ContextModel, "save") as mock_save:
-                result = Context.update_data(
-                    content, mock_content_object, source=source, save=False
-                )
-
-                mock_get_for_model.assert_called_once_with(mock_content_object)
-                mock_get.assert_called_once_with(
-                    entity_type=mock_content_type,
-                    entity_id=1,
-                )
-                mock_init.assert_called_once_with(
-                    entity_type=mock_content_type,
-                    entity_id=1,
-                )
-                assert result.content == content
-                assert result.source == source
-                mock_save.assert_not_called()
+            result = Context.update_data(content, mock_content_object, source=source, save=False)
+            mock_get_for_model.assert_called_once_with(mock_content_object)
+            mock_get_or_create.assert_called_once_with(
+                entity_type=mock_content_type,
+                entity_id=1,
+                defaults={"content": content, "source": source},
+            )
+            assert result == new_ctx
🧹 Nitpick comments (9)
backend/tests/apps/ai/models/context_test.py (9)

12-19: Trim unused test double surface; tighten mock spec.

create_model_mock sets a chunks relation that isn’t used anywhere in this module. Extra attributes on specs make tests harder to maintain and can hide accidental usage. Also consider autospeccing to catch typos at call sites.

Apply this diff to simplify:

 def create_model_mock(model_class):
-    mock = Mock(spec=model_class)
+    mock = Mock(spec=model_class)
     mock._state = Mock()
     mock.pk = 1
     mock.id = 1
-    mock.chunks = Mock()
-    mock.chunks.count.return_value = 0
     return mock

32-38: Assert FK target model explicitly to harden the test.

You already verify on_delete=CASCADE. Add an assertion that the FK actually points to ContentType; this catches accidental regresses to a different model.

 def test_content_type_field_properties(self):
     field = Context._meta.get_field("entity_type")
     assert field.null is False
     assert field.blank is False
     assert hasattr(field, "remote_field")
     assert field.remote_field.on_delete.__name__ == "CASCADE"
+    assert field.remote_field.model is ContentType

49-61: Avoid patching model init; this test only exercises Django internals.

Patching Context.__init__ is brittle and doesn’t validate our code paths. If we keep this test, it should focus on Context.update_data(save=True) calling save(). Otherwise, it’s redundant with the dedicated update_data tests below.

Two options (pick one):

  • Remove the test entirely (preferred for noise reduction), or
  • Rewrite to cover update_data(save=True) instead of Context.save() in isolation.

Removal diff:

-@patch("apps.ai.models.context.Context.save")
-@patch("apps.ai.models.context.Context.__init__")
-def test_context_creation_with_save(self, mock_init, mock_save):
-    mock_init.return_value = None
-
-    content = "Test generated text"
-    source = "test_source"
-
-    context = Context(content=content, source=source)
-    context.save()
-
-    mock_save.assert_called_once()

65-97: Manager call-through tests add little value; consider trimming or upgrading to integration.

Asserting objects.create/filter/get invocations mostly re-tests Django. Either:

  • Convert to behavior-oriented tests that exercise your code paths which use these managers, or
  • Mark as Django DB tests and assert real persistence semantics, or
  • Remove to keep the suite lean.

98-121: Test field-level validation instead of patching full_clean.

Mocking full_clean verifies nothing about validation. You can avoid DB I/O and still assert constraints using the field’s clean. This keeps the test fast and meaningful.

-@patch("apps.ai.models.context.Context.full_clean")
-def test_context_validation(self, mock_full_clean):
-    context = Context()
-    context.content = "Valid text"
-    context.source = "A" * 100
-
-    context.full_clean()
-
-    mock_full_clean.assert_called_once()
+def test_context_validation(self):
+    # Field-level validation should allow max length 100 without DB access.
+    context = Context(content="Valid text", source="A" * 100)
+    Context._meta.get_field("source").clean(context.source, context)
 
-@patch("apps.ai.models.context.Context.full_clean")
-def test_context_validation_source_too_long(self, mock_full_clean):
-    from django.core.exceptions import ValidationError
-
-    mock_full_clean.side_effect = ValidationError("Source too long")
-
-    context = Context()
-    context.content = "Valid text"
-    context.source = "A" * 101
-
-    with pytest.raises(ValidationError) as exc_info:
-        context.full_clean()
-    assert "Source too long" in str(exc_info.value)
+def test_context_validation_source_too_long(self):
+    from django.core.exceptions import ValidationError
+    context = Context(content="Valid text")
+    with pytest.raises(ValidationError) as exc_info:
+        Context._meta.get_field("source").clean("A" * 101, context)
+    # Error wording may vary by Django version; keep assertion resilient.
+    assert "at most" in str(exc_info.value)

127-140: Drop refresh_from_db/delete stubs; they don’t exercise our code.

These tests only assert that patched Django methods were called, which is tautological. If you need them, convert to higher-level behavior where these calls are side effects of our logic.

-@patch("apps.ai.models.context.Context.refresh_from_db")
-def test_context_refresh_from_db(self, mock_refresh):
-    context = Context()
-    context.refresh_from_db()
-
-    mock_refresh.assert_called_once()
-
-@patch("apps.ai.models.context.Context.delete")
-def test_context_delete(self, mock_delete):
-    context = Context()
-    context.delete()
-
-    mock_delete.assert_called_once()

141-167: Also assert save was invoked on existing-context update with save=True.

update_data(..., save=True) should persist changes even for existing contexts. Since mock_context is a spec’d model, you can verify save() was called.

             assert result == mock_context
             assert mock_context.content == content
             assert mock_context.source == "src"
+            mock_context.save.assert_called_once()

141-167: Add a no-op-change test to codify the “always return a Context” guarantee.

Based on the retrieved learnings, update_data should always return a Context, even when content/source don’t change. Add a test to pin this behavior.

Example test (place near the other update_data tests):

def test_update_data_no_changes_returns_context(self):
    mock_context = create_model_mock(Context)
    mock_context.content = "same"
    mock_context.source = "src"

    with patch("apps.ai.models.context.ContentType.objects.get_for_model") as mock_get_for_model, \
         patch("apps.ai.models.context.Context.objects.get") as mock_get:
        mock_ct = Mock(spec=ContentType)
        mock_get_for_model.return_value = mock_ct
        mock_get.return_value = mock_context

        result = Context.update_data("same", Mock(pk=1), source="src", save=False)
        assert result is mock_context

I can push a commit with this (and the other tweaks) if you want me to.

Also applies to: 298-377


168-202: Improve test resilience around truncation helper semantics

The test for Context.__str__ currently asserts a hardcoded mid-word truncation ("50 cha..."), which will break if truncate ever switches to word boundaries or changes its ellipsis. To make this more robust, consider one of the following approaches:

• Use prefix/suffix checks plus a sanity substring check:

assert result.startswith("test_model Test Object: ")
assert result.endswith("...")
assert "This is test content" in result  # sanity check on content presence

• Or, for exact matching, compute the expectation with the same helper:

from apps.common.utils import truncate
expected = f"test_model Test Object: {truncate(context.content, 50)}"
assert result == expected

Note: truncate is imported from apps.common.utils (see backend/apps/common/utils.py at line 164).

File: backend/tests/apps/ai/models/context_test.py: lines 168–202

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 506ad46 and 871d266.

📒 Files selected for processing (1)
  • backend/tests/apps/ai/models/context_test.py (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-16T12:07:38.124Z
Learnt from: Dishant1804
PR: OWASP/Nest#1891
File: backend/apps/ai/models/context.py:44-66
Timestamp: 2025-08-16T12:07:38.124Z
Learning: In the Context.update_data method, the Context object should always be returned (never None), even when no changes are made to the content or source fields. This ensures consistent behavior for callers.

Applied to files:

  • backend/tests/apps/ai/models/context_test.py
🧬 Code graph analysis (1)
backend/tests/apps/ai/models/context_test.py (2)
backend/apps/ai/models/context.py (2)
  • Context (15-61)
  • update_data (39-61)
backend/apps/common/models.py (1)
  • TimestampedModel (37-46)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Run backend tests
  • GitHub Check: Run frontend unit tests
  • GitHub Check: Run frontend e2e tests
  • GitHub Check: CodeQL (javascript-typescript)
🔇 Additional comments (1)
backend/tests/apps/ai/models/context_test.py (1)

23-31: LGTM: Meta and content field checks are precise.

The db_table/verbose_name assertions and TextField verification are appropriate and stable.

@arkid15r arkid15r added this pull request to the merge queue Aug 21, 2025
Merged via the queue into OWASP:main with commit 18d6f8b Aug 21, 2025
25 checks passed
@coderabbitai coderabbitai bot mentioned this pull request Sep 6, 2025
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Extend AI Application to Support NestBot AI Assistant Contexts
2 participants