Feature/knowledge base #116

amirasaran · 2025-08-22T19:11:35Z

User description

Description

feat: Introduce Knowledge Base feature with plan-based credit tracking and usage history

Add Knowledge Base as a new core feature:
- Implement models, signals, and logic for managing knowledge bases and their documents.
- Integrate KnowledgeBaseProcessor and related tooling.
Extend Plan model to support knowledge base quotas:
- Add fields for max number of knowledge bases, max documents per knowledge base, and retrieval rate limits.
Refactor UsageHistory to support generic credit tracking:
- Use GenericForeignKey to associate usage history with crawl, search, sitemap, and knowledge base document events.
- Enforce UUID primary key for referenced models.
- Update admin and serializer logic for new usage history structure.
Update plan enforcement and validators:
- Modularize validation for crawl, search, sitemap, and knowledge base operations.
- Enforce plan limits for knowledge base creation and document addition.
- Improve error messages and validation feedback.
Add API endpoints and filters for usage history and knowledge base management.
Update signals to automate usage history creation for knowledge base document events.
Update .gitignore for new directories and artifacts.
Add new dependencies for knowledge base and search (elasticsearch, aiohttp, dataclasses-json, etc.).
Remove debugging code and improve formatting across updated files.

This commit introduces the Knowledge Base system, enabling teams to create, manage, and track knowledge bases and their documents with full plan-based credit enforcement and usage history auditing.

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update

UI Changes

Testing

Test A
Test B

Checklist

My code follows the project's style guidelines
I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
Any dependent changes have been merged and published

PR Type

Enhancement

Description

• Knowledge Base System: Complete implementation of a new knowledge base feature with document management, vector search, and AI-powered querying capabilities
• LLM Provider Management: Add comprehensive provider configuration system supporting OpenAI and WaterCrawl with model discovery and API key management
• Generic Usage Tracking: Refactor usage history to support multiple content types (crawl, search, sitemap, knowledge base) using GenericForeignKey
• Plan Extensions: Extend subscription plans with knowledge base quotas (max knowledge bases, documents per KB, retrieval rate limits)
• Vector Store Integration: Implement OpenSearch-based vector store with multiple retrieval strategies (similarity, MMR, hybrid search)
• Document Processing Pipeline: Add comprehensive document processing with text splitting, embedding, summarization, and keyword extraction
• Frontend Components: Complete React frontend with knowledge base management, provider configuration, and usage history interfaces
• Admin Interface: Add superuser-only admin panels for managing LLM providers, models, and system configurations
• API Endpoints: Comprehensive REST API for knowledge base operations, document import, querying, and provider management
• Background Tasks: Celery-based async processing for document indexing, vector store operations, and content processing

Diagram Walkthrough

flowchart LR
  KB["Knowledge Base"] --> DOC["Documents"]
  DOC --> CHUNK["Chunks"]
  CHUNK --> VS["Vector Store"]
  VS --> SEARCH["Search & Query"]
  
  LLM["LLM Providers"] --> EMB["Embeddings"]
  LLM --> SUM["Summarization"]
  EMB --> VS
  SUM --> DOC
  
  PLAN["Plan Model"] --> QUOTA["KB Quotas"]
  USAGE["Usage History"] --> GFK["Generic FK"]
  GFK --> KB
  GFK --> CRAWL["Crawl Requests"]
  GFK --> SITEMAP["Sitemaps"]
  
  FRONTEND["React Frontend"] --> API["REST API"]
  API --> KB
  API --> LLM
  API --> USAGE

File Walkthrough

Relevant files

New feature

14 files

knowledgeBase.ts `Knowledge Base API Service Implementation` frontend/src/services/api/knowledgeBase.ts • Add comprehensive API service for knowledge base operations including CRUD operations, document management, and querying • Implement methods for importing documents from URLs, crawl results, and files with upload progress tracking • Add support for context-aware enhancement, chunk retrieval, and retry indexing functionality	+128/-0
provider.ts `Admin Provider Management API Service` frontend/src/services/api/admin/provider.ts • Add admin API service for managing provider configurations, LLM models, and embedding models • Implement CRUD operations with pagination support for all provider-related entities • Add provider synchronization and configuration testing endpoints	+87/-0
knowledge.ts `Knowledge Base TypeScript Type Definitions` frontend/src/types/knowledge.ts • Define comprehensive TypeScript interfaces for knowledge base entities including status enums and form data types • Add interfaces for documents, chunks, context-aware enhancement, and import operations • Include default values and utility functions for chunk size calculations	+104/-0
provider.ts `Provider Configuration API Service` frontend/src/services/api/provider.ts • Add API service for provider configuration management with CRUD operations • Implement provider listing, configuration testing, and model management endpoints • Support both paginated and non-paginated provider configuration retrieval	+66/-0
provider.ts `Provider and Model Type Definitions` frontend/src/types/provider.ts • Define TypeScript interfaces for providers, models, embeddings, and configurations • Add enums for option requirements and form data structures • Include comprehensive type definitions for LLM and embedding model properties	+74/-0
provider.ts `Admin Provider Type Definitions` frontend/src/types/admin/provider.ts • Add admin-specific TypeScript interfaces for provider management • Define visibility level enums and request/response types for LLM and embedding models • Include comprehensive admin provider configuration interfaces	+71/-0
usage_history.ts `Usage History API Service` frontend/src/services/api/usage_history.ts • Add API service for retrieving usage history with pagination and filtering support • Implement filtering by team API key and content type parameters	+21/-0
usage_history.ts `Usage History Type Definitions` frontend/src/types/usage_history.ts • Define TypeScript interfaces for usage history tracking across different content types • Add content type enum for crawl requests, sitemaps, searches, and knowledge base documents • Include team API key summary interface for usage attribution	+24/-0
vectore_store.py `OpenSearch Vector Store Implementation` backend/knowledge_base/tools/vectore_store.py • Implement comprehensive OpenSearch vector store with pluggable retrieval strategies • Add support for similarity search, MMR search, and hybrid retrieval methods • Include automatic index creation, document management, and keyword-based scoring	+734/-0
views.py `Knowledge Base API ViewSets` backend/knowledge_base/views.py • Add comprehensive ViewSets for knowledge bases, documents, and chunks with full CRUD operations • Implement document import from URLs, crawl results, files, and context-aware enhancement • Add query functionality with rate limiting and plan validation integration	+467/-0
retrieval_strategies.py `Vector Store Retrieval Strategies` backend/knowledge_base/tools/retrieval_strategies.py • Implement pluggable retrieval strategies for different search approaches (dense, content, keyword) • Add BM25-optimized queries with hybrid search support and keyword boosting • Include comprehensive OpenSearch index mapping generation with similarity metrics	+451/-0
serializers.py `Knowledge Base API Serializers` backend/knowledge_base/serializers.py • Add comprehensive serializers for knowledge base entities with validation • Implement form data serializers for document import from various sources • Include context-aware enhancement and query serializers with plan validation integration	+337/-0
factories.py `Knowledge Base Component Factories` backend/knowledge_base/factories.py • Implement factory pattern for creating knowledge base components (embedders, vector stores, summarizers) • Add support for multiple providers (OpenAI, WaterCrawl) and file format converters • Include configurable text splitters and keyword extractors with knowledge base integration	+249/-0
serializers.py `LLM Provider and Model Serializers` backend/llm/serializers.py • Add serializers for LLM models, embedding models, and provider configurations • Implement API key encryption/decryption and provider configuration testing • Include comprehensive validation for provider settings and model parameters	+150/-0

Enhancement

49 files

utils.ts `CSS Class Name Utility Function` frontend/src/lib/utils.ts • Add `classnames` utility function for conditional CSS class generation • Implement object-based class name filtering and joining functionality	+7/-0
subscription.ts `Knowledge Base Subscription Fields` frontend/src/types/subscription.ts • Extend `CurrentSubscription` interface with knowledge base quota fields • Add properties for number of knowledge bases, documents per knowledge base, and retrieval rate limits	+3/-0
user.ts `User Profile Superuser Field` frontend/src/types/user.ts • Add `is_superuser` boolean field to the `Profile` interface • Extend user profile with administrative privilege indicator	+1/-0
validators.py `Modular Plan Validation System` backend/plan/validators.py • Refactor plan validation into modular mixins for different request types • Add comprehensive knowledge base validation for creation and document limits • Implement generic credit validation with daily and total credit checking	+215/-90
services.py `Generic Usage Tracking and Knowledge Base Quotas` backend/plan/services.py • Extend plan services with knowledge base quota support and generic usage tracking • Refactor usage history to support multiple content types via GenericForeignKey • Add credit calculation and validation for knowledge base operations	+116/-87
views.py `Add admin API views for LLM management` backend/llm/admin_api/views.py • Add comprehensive admin API views for LLM provider configurations, models, and embeddings • Implement CRUD operations with OpenAPI documentation and proper permissions • Include custom actions for syncing models/embeddings and testing configurations	+230/-0
services.py `Implement knowledge base core services` backend/knowledge_base/services.py • Implement core knowledge base service classes for managing knowledge bases and documents • Add methods for adding URLs, files, crawl results and processing documents • Include document indexing, vector store operations and content processing logic	+173/-0
views.py `Add LLM provider configuration API views` backend/llm/views.py • Add team-specific provider configuration API endpoints with full CRUD operations • Implement provider testing, model listing, and configuration validation • Include comprehensive OpenAPI documentation and proper authentication	+178/-0
models.py `Define knowledge base data models` backend/knowledge_base/models.py • Define core knowledge base models: `KnowledgeBase`, `KnowledgeBaseDocument`, `KnowledgeBaseChunk` • Add fields for chunking configuration, embedding models, and summarization settings • Include status tracking and metadata fields with proper relationships	+171/-0
processor.py `Add knowledge base processing engine` backend/knowledge_base/tools/processor.py • Implement main knowledge base processor for text splitting, embedding, and vector storage • Add methods for document persistence, search operations, and vector store management • Include factory pattern integration for various processing components	+177/-0
0001_initial.py `Initial LLM database schema migration` backend/llm/migrations/0001_initial.py • Create initial database schema for LLM models, provider configs, and embedding models • Define relationships between models and teams with proper constraints • Add visibility levels and temperature configuration fields	+75/-0
serializers.py `Add admin API serializers for LLM management` backend/llm/admin_api/serializers.py • Add serializers for admin API with validation and encryption handling • Implement provider configuration testing and API key encryption • Include comprehensive field validation and error handling	+137/-0
models.py `Define LLM and provider configuration models` backend/llm/models.py • Define LLM model classes: `LLMModel`, `ProviderConfig`, `EmbeddingModel` • Add provider configuration with team relationships and global/team-specific logic • Include temperature settings, visibility levels, and model metadata	+126/-0
0001_initial.py `Initial knowledge base database schema` backend/knowledge_base/migrations/0001_initial.py • Create initial database schema for knowledge base models • Define tables for knowledge bases, documents, and chunks with proper indexing • Add status tracking and configuration fields for processing	+72/-0
tasks.py `Add knowledge base background tasks` backend/knowledge_base/tasks.py • Implement Celery tasks for knowledge base operations: creation, deletion, crawling • Add document processing pipeline with error handling and status updates • Include vector store initialization and OpenSearch configuration	+126/-0
models.py `Extend plan model for knowledge base support` backend/plan/models.py • Extend `Plan` model with knowledge base quotas and rate limiting fields • Refactor `UsageHistory` to use generic foreign keys for flexible content association • Add UUID validation for referenced models and team API key tracking	+44/-20
services.py `Add LLM provider service implementations` backend/llm/services.py • Implement provider service classes for configuration management and testing • Add OpenAI provider validation and model temperature handling • Include team-specific provider configuration retrieval logic	+108/-0
summarizers.py `Add document summarization tools` backend/knowledge_base/tools/summarizers.py • Implement LLM-based summarizers with standard and context-aware variants • Add context enhancement service for improving user-provided goals • Include prompt templates and temperature configuration	+93/-0
0006_auto_20250807_1344.py `Migrate usage history to generic relationships` backend/plan/migrations/0006_auto_20250807_1344.py • Migrate existing usage history foreign key relationships to generic foreign keys • Populate `content_type` and `content_id` fields from existing data • Include reverse migration for rollback capability	+87/-0
admin.py `Add knowledge base Django admin interface` backend/knowledge_base/admin.py • Add Django admin interface for knowledge base models • Configure fieldsets, search fields, and list displays for better management • Include proper field organization and readonly configurations	+105/-0
providers.py `Add LLM provider implementations` backend/llm/providers.py • Implement provider classes for OpenAI and WaterCrawl with model discovery • Add temperature configuration logic and embedding model definitions • Include client initialization and API interaction methods	+99/-0
keyword_extractors.py `Add keyword extraction tools` backend/knowledge_base/tools/keyword_extractors.py • Implement keyword extraction using Jieba and LLM-based approaches • Add configurable keyword count and filtering logic • Include Pydantic schema for structured LLM output parsing	+93/-0
factories.py `Add LLM factory classes` backend/llm/factories.py • Implement factory classes for creating chat models and providers • Add support for OpenAI and WaterCrawl provider configurations • Include temperature validation and API key decryption	+89/-0
0002_initial.py `Add knowledge base model relationships` backend/knowledge_base/migrations/0002_initial.py • Add foreign key relationships between knowledge base and LLM models • Create proper constraints and indexes for model relationships • Include team associations and provider configuration links	+57/-0
signals.py `Update usage tracking signals for generic approach` backend/plan/signals.py • Refactor signal handlers to use generic usage history service methods • Add knowledge base document usage tracking with credit calculation • Update method names for consistency across different request types	+28/-7
file_to_markdown.py `Add file format conversion tools` backend/knowledge_base/tools/file_to_markdown.py • Implement file-to-markdown converters for various formats (HTML, DOCX, CSV) • Add base converter class with storage integration • Include PyPandoc integration for document format conversion	+76/-0
helpers.py `Add content cleaning utilities` backend/knowledge_base/helpers.py • Implement noise removal utility for cleaning markdown content • Add methods for removing SVG, base64 images, HTML tags, and fixing relative URLs • Include URL parsing and absolute path conversion logic	+56/-0
views.py `Add usage history API endpoint` backend/plan/views.py • Add usage history API endpoint with filtering and team-based access • Include proper queryset definitions for existing viewsets • Add comprehensive OpenAPI documentation for new endpoints	+31/-2
models.py `Define agent system data models` backend/agent/models.py • Define agent system models: `Agent`, `Tool`, `Conversation`, `Message` • Add relationships with LLM models, provider configs, and teams • Include configuration fields for agent behavior and tool integration	+74/-0
interfaces.py `Define knowledge base component interfaces` backend/knowledge_base/interfaces.py • Define abstract base classes for knowledge base components • Add interfaces for summarizers, keyword extractors, and file converters • Include proper inheritance structure and method signatures	+69/-0
throttle.py `Add team-based rate throttling` backend/plan/throttle.py • Implement team-based throttling for knowledge base operations • Add configurable rate limiting based on team plan settings • Include cache key generation and rate limit enforcement	+52/-0
admin.py `Add LLM Django admin interface` backend/llm/admin.py • Add Django admin interface for LLM models and configurations • Configure list displays, search fields, and fieldset organization • Include proper field grouping and readonly configurations	+57/-0
serializers.py `Extend plan serializers for knowledge base` backend/plan/serializers.py • Extend team plan serializer with knowledge base quota fields • Add usage history serializer with content type and API key information • Include proper field serialization and relationship handling	+32/-1
services.py `Add admin services for provider management` backend/llm/admin_api/services.py • Implement admin service for provider configuration management • Add automatic model and embedding synchronization logic • Include provider factory integration for model discovery	+45/-0
storage.py `Add knowledge base file storage service` backend/knowledge_base/tools/storage.py • Implement storage service for knowledge base file management • Add file path generation and storage abstraction • Include unique ID generation and file extension handling	+46/-0
0005_usagehistory_content_id_usagehistory_content_type_and_more.py `Add generic foreign key fields to usage history` backend/plan/migrations/0005_usagehistory_content_id_usagehistory_content_type_and_more.py • Add generic foreign key fields to usage history model • Include content type and content ID fields for flexible associations • Add team API key relationship for tracking usage attribution	+31/-0
filters.py `Add usage history filtering capabilities` backend/plan/filters.py • Implement usage history filtering by content type and API key • Add validation for content type format and proper error handling • Include support for filtering across different model types	+38/-0
0004_plan_number_of_each_knowledge_base_documents_and_more.py `Add knowledge base quotas to plan model` backend/plan/migrations/0004_plan_number_of_each_knowledge_base_documents_and_more.py • Add knowledge base quota fields to plan model • Include number of knowledge bases and documents per knowledge base limits • Set default values for new plan configuration fields	+23/-0
serializers.py `Add validation to spider option serializers` backend/core/serializers.py • Add minimum value validation to spider option fields • Ensure `max_depth` and `page_limit` have minimum value of 1 • Improve input validation for crawling parameters	+2/-2
0008_plan_knowledge_base_retrival_rate_limit.py `Add knowledge base retrieval rate limiting` backend/plan/migrations/0008_plan_knowledge_base_retrival_rate_limit.py • Add rate limiting field to plan model for knowledge base retrieval • Include DRF-style rate string format for flexible rate configuration • Set default value to None for optional rate limiting	+18/-0
0007_remove_usagehistory_crawl_request_and_more.py `Remove deprecated usage history foreign keys` backend/plan/migrations/0007_remove_usagehistory_crawl_request_and_more.py • Remove old foreign key fields from usage history model • Clean up deprecated crawl_request, search_request, and sitemap_request fields • Complete migration to generic foreign key approach	+25/-0
admin.py `Update plan admin for knowledge base features` backend/plan/admin.py • Add knowledge base quota fields to plan admin interface • Update usage history admin display to show generic content • Include new fields in plan fieldset organization	+4/-1
decorators.py `Add API key context tracking in authentication decorator` backend/user/decorators.py • Import and call `set_application_context_api_key` function to store API key in application context • Add context tracking for API key usage during authentication	+2/-0
0002_alter_providerconfig_api_key.py `Expand API key field size in ProviderConfig model` backend/llm/migrations/0002_alter_providerconfig_api_key.py • Change `api_key` field type from CharField to TextField in ProviderConfig model • Allow for longer API keys storage	+18/-0
application_context.py `Implement application context for API key tracking` backend/common/application_context.py • Create application context management using Django's Local storage • Add functions to set, get, and clear API key context	+15/-0
permissions.py `Add superuser permission class` backend/user/permissions.py • Add `IsSuperUser` permission class for superuser-only access	+5/-0
serializers.py `Include superuser status in user serialization` backend/user/serializers.py • Add `is_superuser` field to user serializer fields list	+1/-0
markdown.css `Add markdown styling with syntax highlighting support` frontend/src/styles/markdown.css • Add comprehensive CSS styling for markdown content with syntax highlighting • Include both light and dark theme support for code blocks and markdown elements	+175/-0
SelectCrawlResultsPage.tsx `Add crawl results selection page for knowledge base import` frontend/src/pages/dashboard/knowledge-base/SelectCrawlResultsPage.tsx • Create comprehensive page for selecting crawl results to import into knowledge base • Implement pagination, bulk selection, and import functionality • Add breadcrumb navigation and loading states	+392/-0

Bug fix

2 files

views.py `Fix viewset queryset definitions and cleanup` backend/core/views.py • Add missing `queryset` attributes to existing viewsets for consistency • Remove debugging print statement from proxy server testing • Import additional models for proper type hints	+12/-2
views.py `Fix user viewset queryset definitions` backend/user/views.py • Add missing `queryset` attributes to existing viewsets for consistency • Import additional models for proper type hints • Maintain existing functionality while fixing queryset definitions	+5/-1

Configuration changes

10 files

consts.py `Add LLM configuration constants` backend/llm/consts.py • Define LLM provider constants, choices, and configuration options • Add visibility levels, truncation options, and provider information • Include structured provider configuration with required/optional fields	+48/-0
consts.py `Add knowledge base configuration constants` backend/knowledge_base/consts.py • Define knowledge base status constants and document source types • Add summarizer type choices and processing status options • Include comprehensive choice definitions for model fields	+43/-0
urls.py `Add knowledge base URL routing` backend/knowledge_base/urls.py • Define URL routing for knowledge base API endpoints • Add nested routing for documents and chunks within knowledge bases • Include proper UUID parameter handling in URL patterns	+23/-0
urls.py `Add usage history URL routing` backend/plan/urls.py • Add usage history endpoint to plan URL routing • Include proper router registration for new viewset • Import required view classes for URL configuration	+7/-1
urls.py `Add LLM admin API URL routing` backend/llm/admin_api/urls.py • Define URL routing for LLM admin API endpoints • Add router registration for provider configs, models, and embeddings • Include proper basename configuration for API endpoints	+19/-0
urls.py `Add knowledge base and LLM URL routing` backend/watercrawl/urls.py • Add URL patterns for knowledge base and LLM endpoints • Include admin API routes for LLM management	+10/-0
apps.py `Add knowledge base Django app configuration` backend/knowledge_base/apps.py • Create Django app configuration for knowledge base module • Import signals module in ready method	+9/-0
apps.py `Add LLM Django app configuration` backend/llm/apps.py • Create Django app configuration for LLM module	+6/-0
apps.py `Add agent Django app configuration` backend/agent/apps.py • Create Django app configuration for agent module	+6/-0
.env.example `Add OpenSearch configuration to environment template` docker/.env.example • Add OpenSearch configuration settings with password and dashboard port • Fix trailing whitespace formatting	+9/-2

Tests

4 files

test.py `Add noise removal test script` backend/test.py • Add test script for noise removal functionality • Include sample text processing and output verification • Demonstrate usage of `NoiseRemover` helper class	+40/-0
tests.py `Add knowledge base test file placeholder` backend/knowledge_base/tests.py • Create empty test file placeholder for knowledge base module	+1/-0
tests.py `Add LLM test file placeholder` backend/llm/tests.py • Create empty test file placeholder for LLM module	+1/-0
tests.py `Add agent test file placeholder` backend/agent/tests.py • Create empty test file placeholder for agent module	+1/-0

Miscellaneous

2 files

admin.py `Add agent admin file placeholder` backend/agent/admin.py • Create empty admin file placeholder for agent module	+1/-0
views.py `Add agent views file placeholder` backend/agent/views.py • Create empty views file placeholder for agent module	+1/-0

Dependencies

1 files

pnpm-lock.yaml `Add markdown rendering dependencies to package lock` frontend/pnpm-lock.yaml • Add dependencies for markdown rendering: `@tailwindcss/typography`, `react-markdown`, `rehype-highlight`, `rehype-raw`, `remark-gfm` • Include all related transitive dependencies and type definitions	+956/-4

Additional files

86 files

.env.example	+5/-0
__init__.py	[link]
__init__.py	[link]
__init__.py	[link]
__init__.py	[link]
__init__.py	[link]
signals.py	+15/-0
__init__.py	[link]
__init__.py	[link]
interfaces.py	+20/-0
__init__.py	[link]
urls.py	+13/-0
utils.py	+7/-0
pyproject.toml	+13/-2
settings.py	+6/-0
.env.local	+9/-0
README.md	+32/-1
docker-compose.local.yml	+30/-1
docker-compose.yml	+38/-11
python.md	+106/-20
package.json	+5/-0
App.tsx	+80/-8
AdminCard.tsx	+58/-0
TeamSelector.tsx	+9/-9
PageOptionsForm.tsx	+1/-1
EnhanceContextModal.tsx	+113/-0
KnowledgeBaseApiDocumentation.tsx	+248/-0
KnowledgeBasePricingInfo.tsx	+152/-0
KnowledgeBaseQueryForm.tsx	+233/-0
KnowledgeBaseQueryResult.tsx	+178/-0
SearchForm.tsx	+0/-1
ProviderConfigForm.tsx	+347/-0
ProviderConfigList.tsx	+208/-0
ProviderConfigSettings.tsx	+179/-0
Breadcrumbs.tsx	+3/-9
Card.tsx	+82/-0
MarkdownRenderer.tsx	+26/-0
Modal.tsx	+85/-0
OptionCard.tsx	+71/-0
Slider.tsx	+127/-0
StatusBadge.tsx	+44/-8
UsageLimitBox.tsx	+57/-0
WithBreadcrumbs.tsx	+0/-44
SitemapApiDocumentation.tsx	+6/-9
index.d.ts	+0/-24
BreadcrumbContext.tsx	+40/-0
AdminLayout.tsx	+238/-0
DashboardLayout.tsx	+34/-6
AdminDashboard.tsx	+43/-0
ManageLLMProvidersPage.tsx	+197/-0
ManageProxiesPage.tsx	+165/-0
ProviderConfigDetailPage.tsx	+251/-0
ApiKeysPage.tsx	+9/-0
CrawlLogsPage.tsx	+11/-2
CrawlPage.tsx	+8/-0
CrawlRequestDetailPage.tsx	+10/-0
DashboardPage.tsx	+17/-5
ProfilePage.tsx	+11/-0
SearchLogsPage.tsx	+9/-0
SearchPage.tsx	+9/-1
SearchRequestDetailPage.tsx	+10/-1
SettingsPage.tsx	+27/-2
SitemapLogsPage.tsx	+10/-0
SitemapPage.tsx	+10/-2
SitemapRequestDetailPage.tsx	+10/-0
UsageHistoryPage.tsx	+470/-0
UsagePage.tsx	+9/-0
BatchUrlImportPage.tsx	+138/-0
ImportOptionsPage.tsx	+224/-0
ImportProgressPage.tsx	+195/-0
KnowledgeBaseDetailPage.tsx	+480/-0
KnowledgeBaseDocumentDetailPage.tsx	+295/-0
KnowledgeBaseEditPage.tsx	+469/-0
KnowledgeBaseNewPage.tsx	+815/-0
KnowledgeBasePage.tsx	+227/-0
KnowledgeBaseQueryPage.tsx	+58/-0
ManualEntryPage.tsx	+156/-0
NewCrawlPage.tsx	+127/-0
NewSitemapPage.tsx	+187/-0
SelectCrawlPage.tsx	+223/-0
SelectSitemapPage.tsx	+217/-0
UploadDocumentsPage.tsx	+193/-0
UrlSelectorPage.tsx	+584/-0
breadcrumbs.ts	+0/-105
classNames.ts	+0/-3
tailwind.config.mjs	+2/-1

…g and usage history - Add Knowledge Base as a new core feature: - Implement models, signals, and logic for managing knowledge bases and their documents. - Integrate KnowledgeBaseProcessor and related tooling. - Extend Plan model to support knowledge base quotas: - Add fields for max number of knowledge bases, max documents per knowledge base, and retrieval rate limits. - Refactor UsageHistory to support generic credit tracking: - Use GenericForeignKey to associate usage history with crawl, search, sitemap, and knowledge base document events. - Enforce UUID primary key for referenced models. - Update admin and serializer logic for new usage history structure. - Update plan enforcement and validators: - Modularize validation for crawl, search, sitemap, and knowledge base operations. - Enforce plan limits for knowledge base creation and document addition. - Improve error messages and validation feedback. - Add API endpoints and filters for usage history and knowledge base management. - Update signals to automate usage history creation for knowledge base document events. - Update .gitignore for new directories and artifacts. - Add new dependencies for knowledge base and search (elasticsearch, aiohttp, dataclasses-json, etc.). - Remove debugging code and improve formatting across updated files. This commit introduces the Knowledge Base system, enabling teams to create, manage, and track knowledge bases and their documents with full plan-based credit enforcement and usage history auditing.

… UI, reset migrations - Improved security by updating default OpenSearch password across configuration files (.env.example, .env.local, docker-compose files). - Refactored factories to centralize OpenSearch client creation and added decryption logic for embedding API keys. - Updated ProviderConfig model to use TextField for API keys, improving credential handling. - Enhanced document status service to always clear the error field. - Added Celery initializer task for OpenSearch pipeline setup on worker startup. - Deleted all knowledge_base and llm migrations for migration reset and schema changes. - Increased top_k default/max values for knowledge base queries in serializers and UI forms. - Improved validation and filetype checking for uploads. - Major frontend improvements: - Revamped KnowledgeBase pages with beta notices and clarified document labels. - Redesigned crawl, sitemap, and crawl results selection pages; switched to table layouts and added pagination. - Enhanced selection state handling for crawl import, supporting cross-page result selection and total counters. - Updated API usage patterns and documentation to reflect new query signature. - Refined feedback messages, added enterprise-only fields, and improved loading/empty states in history and selection pages. - Removed legacy/commented code and improved code formatting in several backend/frontend files. BREAKING CHANGE: - All Django migration files for knowledge_base and llm were deleted.

…RL, update temperature handling in models

qodo-code-review · 2025-08-22T19:13:23Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Possible Issue In MMR ranking, embeddings for documents are generated using embed_query on document text each time, which is inefficient and may be semantically wrong (should use embed_documents or stored vectors). Also selection logic uses documents.index(documents[i]) which is redundant; verify correctness and performance for large k. ) # Get embeddings for all documents doc_embeddings = [] for doc in documents: doc_embedding = self.embedding.embed_query(doc.page_content) doc_embeddings.append(doc_embedding) # Convert to numpy arrays query_emb = np.array(query_embedding) doc_embs = np.array(doc_embeddings) # Calculate similarities to query query_similarities = np.dot(doc_embs, query_emb) / ( np.linalg.norm(doc_embs, axis=1) * np.linalg.norm(query_emb) ) selected = [] remaining = list(range(len(documents))) # Select first document (highest similarity to query) best_idx = np.argmax(query_similarities) selected.append(remaining.pop(best_idx)) # Select remaining documents using MMR for _ in range(min(k - 1, len(remaining))): mmr_scores = [] for idx in remaining: # Relevance score relevance = query_similarities[idx] # Diversity score (max similarity to already selected) if selected: selected_embs = doc_embs[ [documents.index(documents[i]) for i in selected] ] current_emb = doc_embs[idx] similarities = np.dot(selected_embs, current_emb) / ( np.linalg.norm(selected_embs, axis=1) * np.linalg.norm(current_emb) ) max_similarity = np.max(similarities) else: max_similarity = 0 # MMR score mmr_score = lambda_mult * relevance - (1 - lambda_mult) * max_similarity mmr_scores.append(mmr_score) # Select document with highest MMR score best_idx = np.argmax(mmr_scores) selected.append(remaining.pop(best_idx)) return [documents[i] for i in selected] Logic Consistency Knowledge base documents limit check uses number_of_knowledge_bases instead of per-KB documents field when gating (-1 check). This likely should reference number_of_each_knowledge_base_documents for unlimited case; otherwise unlimited KB count could erroneously allow unlimited docs. if self.team_plan_service.number_of_knowledge_bases == -1: return total_number_of_documents = ( knowledge_base.documents.count() + new_document_count ) if ( total_number_of_documents >= self.team_plan_service.number_of_each_knowledge_base_documents ): raise PermissionDenied( Validation Flow In FillKnowledgeBaseFromCrawlResultsSerializer.validate, attrs["crawl_result_uuids"] is replaced by a queryset via field validator, but later .count() is used and credits validated; ensure downstream code expects a queryset not a list of UUIDs to avoid type mismatches in services using these attrs. def validate(self, attrs): return PlanLimitValidator( team=self.context["team"], ).validate_create_knowledge_base_document_from_crawl_results( self.context["knowledge_base"], attrs["crawl_result_uuids"].count(), attrs )

qodo-code-review · 2025-08-22T19:14:38Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Fix invalid nested router paths DRF's DefaultRouter does not support nested regex paths in router.register; these patterns won't route as intended. Use a nested router (e.g., drf-nested-routers) or move nested segments into the viewset's lookup and use standard register prefixes. backend/knowledge_base/urls.py [6-19] router = DefaultRouter() -router.register( - r"knowledge-bases", views.KnowledgeBaseViewSet, basename="knowledge-base" -) -router.register( - r"knowledge-bases/(?P<knowledge_base_uuid>[0-9a-fA-F-]{36})/documents", - views.KnowledgeBaseDocumentViewSet, - basename="knowledge-base-document", -) -router.register( - r"knowledge-bases/(?P<knowledge_base_uuid>[0-9a-fA-F-]{36})/documents/(?P<document_uuid>[0-9a-fA-F-]{36})/chunks", - views.KnowledgeBaseChunkViewSet, - basename="knowledge-base-chunk", -) +router.register(r"knowledge-bases", views.KnowledgeBaseViewSet, basename="knowledge-base") +router.register(r"documents", views.KnowledgeBaseDocumentViewSet, basename="knowledge-base-document") +router.register(r"chunks", views.KnowledgeBaseChunkViewSet, basename="knowledge-base-chunk") Apply / Chat Suggestion importance[1-10]: 10 __ Why: The suggestion correctly identifies that `DefaultRouter` does not support nested regex paths, which would make the defined document and chunk endpoints completely non-functional.	High
	Fix MMR indexing and stability The MMR implementation recomputes indexes via `documents.index(...)` and can mis-index, and it recomputes norms repeatedly, risking division by zero. Precompute arrays and norms once, use indices directly, and guard zero norms. This fixes incorrect selection and potential crashes for identical/zero vectors. backend/knowledge_base/tools/vectore_store.py [350-414] def _apply_mmr_ranking( - self, documents: List[Document], query: str, k: int, lambda_mult: float + self, documents: List[Document], query: str \| List[float], k: int, lambda_mult: float ) -> List[Document]: - """Apply MMR ranking to documents.""" if not documents or len(documents) <= k: return documents - # Generate query embedding - query_embedding = ( - self.embedding.embed_query(query) if isinstance(query, str) else query - ) + # Prepare embeddings + query_embedding = self.embedding.embed_query(query) if isinstance(query, str) else query + doc_embs = np.array([self.embedding.embed_query(doc.page_content) for doc in documents], dtype=float) + query_emb = np.array(query_embedding, dtype=float) - # Get embeddings for all documents - doc_embeddings = [] - for doc in documents: - doc_embedding = self.embedding.embed_query(doc.page_content) - doc_embeddings.append(doc_embedding) + # Guard against zero norms + doc_norms = np.linalg.norm(doc_embs, axis=1) + doc_norms[doc_norms == 0] = 1e-12 + query_norm = np.linalg.norm(query_emb) + if query_norm == 0: + query_norm = 1e-12 - # Convert to numpy arrays - query_emb = np.array(query_embedding) - doc_embs = np.array(doc_embeddings) + # Similarity to query + query_similarities = (doc_embs @ query_emb) / (doc_norms * query_norm) - # Calculate similarities to query - query_similarities = np.dot(doc_embs, query_emb) / ( - np.linalg.norm(doc_embs, axis=1) * np.linalg.norm(query_emb) - ) + selected: list[int] = [] + remaining: list[int] = list(range(len(documents))) - selected = [] - remaining = list(range(len(documents))) + # First pick + first_idx = int(np.argmax(query_similarities)) + selected.append(first_idx) + remaining.remove(first_idx) - # Select first document (highest similarity to query) - best_idx = np.argmax(query_similarities) - selected.append(remaining.pop(best_idx)) + # Iteratively pick with MMR + for _ in range(min(k - 1, len(remaining))): + max_sim_to_selected = np.zeros(len(remaining)) + if selected: + selected_embs = doc_embs[selected] + selected_norms = np.linalg.norm(selected_embs, axis=1) + selected_norms[selected_norms == 0] = 1e-12 - # Select remaining documents using MMR - for _ in range(min(k - 1, len(remaining))): - mmr_scores = [] + # Compute cosine similarity between each remaining and selected, take max + rem_embs = doc_embs[remaining] + rem_norms = doc_norms[remaining] + sims = (rem_embs @ selected_embs.T) / (rem_norms[:, None] * selected_norms[None, :]) + max_sim_to_selected = sims.max(axis=1) - for idx in remaining: - # Relevance score - relevance = query_similarities[idx] + relevance = query_similarities[remaining] + mmr_scores = lambda_mult * relevance - (1 - lambda_mult) * max_sim_to_selected + pick_pos = int(np.argmax(mmr_scores)) + pick_idx = remaining[pick_pos] + selected.append(pick_idx) + remaining.pop(pick_pos) - # Diversity score (max similarity to already selected) - if selected: - selected_embs = doc_embs[ - [documents.index(documents[i]) for i in selected] - ] - current_emb = doc_embs[idx] - similarities = np.dot(selected_embs, current_emb) / ( - np.linalg.norm(selected_embs, axis=1) - * np.linalg.norm(current_emb) - ) - max_similarity = np.max(similarities) - else: - max_similarity = 0 + return [documents[i] for i in selected[:k]] - # MMR score - mmr_score = lambda_mult * relevance - (1 - lambda_mult) * max_similarity - mmr_scores.append(mmr_score) - - # Select document with highest MMR score - best_idx = np.argmax(mmr_scores) - selected.append(remaining.pop(best_idx)) - - return [documents[i] for i in selected] - Apply / Chat Suggestion importance[1-10]: 9 __ Why: The suggestion correctly identifies and fixes multiple critical issues in the `_apply_mmr_ranking` method, including a potential crash from division by zero and incorrect results with duplicate documents, while also improving performance.	High
	Fix duplicated file extensions in paths The generated path currently duplicates the file extension (e.g., "file.csv.csv"). Either keep the original filename or strip the extension before appending yours. This prevents incorrect paths and broken file retrieval later. backend/knowledge_base/tools/storage.py [7-22] class StorageFile: - uuid: str name: str path: str def __init__(self, unique_id: str, name: str): self.unique_id = unique_id self.name = name @property def extension(self): - return self.name.split(".")[-1] + return self.name.rsplit(".", 1)[-1] if "." in self.name else "" + + @property + def basename(self): + return self.name.rsplit(".", 1)[0] if "." in self.name else self.name def make_path(self, knowledge_base_uuid): - self.path = f"knowledge_base/{knowledge_base_uuid}/{self.unique_id}/{self.name}.{self.extension}" + if self.extension: + filename = f"{self.basename}.{self.extension}" + else: + filename = self.basename + self.path = f"knowledge_base/{knowledge_base_uuid}/{self.unique_id}/{filename}" return self Apply / Chat Suggestion importance[1-10]: 8 __ Why: The suggestion correctly identifies a bug in `make_path` where the file extension is appended to the full filename, resulting in incorrect paths like `file.txt.txt`, which would cause issues with file storage and retrieval.	Medium
	Align stored and indexed chunk content Keep vector-store `page_content` consistent with what you persist in DB. Currently the DB chunk includes the summary, but the vector doc does not, causing retrieval/traceability mismatches. Use the same `chunk.content` for both. backend/knowledge_base/tools/processor.py [89-113] def persist_to_vector_store(self, document: KnowledgeBaseDocument) -> List[str]: - ... + self.remove_from_vector_store(document) + document.chunks.all().delete() + chunks = [] + index = 1 + summary = "" + if self.summarizer: + summary = self.summarizer.summarize(document.content) for chunk_text in self.text_splitter.split_text(document.content): + enriched_text = f"{summary}\n\n{chunk_text}" if summary else chunk_text chunk = KnowledgeBaseChunk( document=document, index=index, - content=f"{summary}\n\n{chunk_text}" if summary else chunk_text, + content=enriched_text, keywords=self.keyword_extractor.extract_keywords(chunk_text), ) chunk.save() chunk_uuid = str(chunk.uuid) chunks.append( Document( - page_content=chunk_text, + page_content=enriched_text, id=chunk_uuid, metadata={ "index": index, "title": document.title, "uuid": chunk_uuid, "source": document.source, "knowledge_base_id": str(document.knowledge_base.uuid), "document_id": str(document.uuid), "keywords": chunk.keywords, }, ) ) index += 1 + return self.vector_store.add_documents(chunks) `[To ensure code accuracy, apply this suggestion manually]` Suggestion importance[1-10]: 8 __ Why: The suggestion correctly identifies a critical inconsistency where the content saved to the database (`KnowledgeBaseChunk`) includes a summary, but the content indexed in the vector store does not, which would lead to retrieval mismatches and degrade search quality.	Medium
	Ensure tool returns a string `_run` returns `None` despite the type hint `-> str`, which will break tool pipelines expecting a string result. Return a meaningful status or scraped output, and handle exceptions to surface errors instead of failing silently. backend/agent/tools/scraper.py [13-19] class ScrapperTool(BaseTool): name = "scrapper" description = "scrapper" args_schema: Type[BaseModel] = ScraperParameters def _run(self, url: str) -> str: - CrawlerService.make_with_urls([url], self.agent.knowledge_base.team).run() + try: + CrawlerService.make_with_urls([url], self.agent.knowledge_base.team).run() + return f"Scrape started for: {url}" + except Exception as e: + return f"Scrape failed for {url}: {e}" Apply / Chat Suggestion importance[1-10]: 8 __ Why: The suggestion correctly identifies that the `_run` method violates its `-> str` type hint by returning `None`, which would cause a runtime issue in the LangChain tool pipeline.	Medium
High-level	Risky OpenSearch integration The vector store assumes OpenSearch availability and creates indices/pipelines at runtime with broad settings, but there’s no environment/health gating or graceful degradation path. Add a clear abstraction to disable or swap retrieval backends and gate all index/pipeline creation and queries behind connectivity checks with fail-closed behavior to avoid startup/task crashes and partial data corruption. Examples: backend/knowledge_base/tasks.py [90-126] @shared_task def initializer(): vector_store_type = getattr(settings, "KB_VECTOR_STORE_TYPE", "opensearch") if vector_store_type == "opensearch": client = VectorStoreFactory.create_opensearch_client() if settings.DEBUG: # Set low watermark to 99% client.cluster.put_settings( body={ "persistent": { ... (clipped 27 lines) backend/knowledge_base/tools/vectore_store.py [18-86] def __init__( self, opensearch_client: OpenSearch, index_name: str, embedding: Embeddings, retrieval_strategy: Optional[RetrievalStrategy] = None, text_field: str = "text", vector_field: str = "vector_field", similarity_metric: str = "l2", ): ... (clipped 59 lines) Solution Walkthrough: Before: # backend/knowledge_base/tasks.py @worker_ready.connect def initializer_on_worker_ready(sender, kwargs): client = VectorStoreFactory.create_opensearch_client() # This will crash the worker if OpenSearch is down client.transport.perform_request( "PUT", "/_search/pipeline/rrf-pipeline", body={...} ) # backend/knowledge_base/tools/vectore_store.py class WaterCrawlOpenSearchVectorStore(VectorStore): def __init__(self, opensearch_client, ...): self.client = opensearch_client # This is called directly and can raise an exception self._create_index_if_not_exists() def _create_index_if_not_exists(self): try: if not self.client.indices.exists(...): self.client.indices.create(...) except Exception as e: logger.error(...) raise # Crashes the caller After: # backend/knowledge_base/health.py class OpenSearchHealth: @staticmethod def is_available(): try: client = VectorStoreFactory.create_opensearch_client() return client.ping() except Exception: return False # backend/knowledge_base/tasks.py @worker_ready.connect def initializer_on_worker_ready(sender, kwargs): if not OpenSearchHealth.is_available(): logger.warning("OpenSearch is not available. Skipping pipeline creation.") return # ... create pipeline ... # backend/knowledge_base/tools/vectore_store.py class WaterCrawlOpenSearchVectorStore(VectorStore): def __init__(self, opensearch_client, ...): self.client = opensearch_client self.is_healthy = self.client.ping() if self.is_healthy: self._create_index_if_not_exists() else: logger.error("OpenSearch client is not healthy.") def similarity_search(self, query, ...): if not self.is_healthy: # Gracefully fail instead of crashing raise ServiceUnavailable("Vector store is currently unavailable.") # ... existing logic ... Suggestion importance[1-10]: 9 __ Why: This suggestion correctly identifies a critical architectural flaw where the system's stability is tightly coupled to OpenSearch's availability, which can cause startup and runtime failures.	High
General	Let browser set multipart header The endpoint returns 204 with no body; the function `await`s but returns `void`, which is fine, yet the explicit multipart header can break browser boundary setting. Remove the manual `Content-Type` to let the browser set correct boundaries, and return the response status to allow caller to detect completion. frontend/src/services/api/knowledgeBase.ts [87-107] async importFromFiles( knowledgeBaseUuid: string, files: File[], onUploadProgress: (progressEvent: any) => void -) { +): Promise<number> { const formData = new FormData(); files.forEach(file => { formData.append('files', file); }); - await api.post( + const resp = await api.post( `/api/v1/knowledge-base/knowledge-bases/${knowledgeBaseUuid}/documents/from-files/`, formData, { - headers: { - 'Content-Type': 'multipart/form-data', - }, onUploadProgress, } ); + return resp.status; }, `[To ensure code accuracy, apply this suggestion manually]` Suggestion importance[1-10]: 7 __ Why: The suggestion correctly points out that manually setting the `Content-Type` for `multipart/form-data` is problematic and should be removed, which is a valid and important fix for file uploads.	Medium
	Enforce status check before adding files Validate the knowledge base status before accepting files to avoid ingesting into inactive/deleted bases. Early-reject when `can_add_documents()` is false to prevent inconsistent state and wasted processing. backend/knowledge_base/services.py [87-103] -class KnowledgeBaseService: - ... - def add_files( - self, files: List[TemporaryUploadedFile] - ) -> List[KnowledgeBaseDocument]: - documents = [] +def add_files( + self, files: List[TemporaryUploadedFile] +) -> List[KnowledgeBaseDocument]: + if not self.can_add_documents(): + raise ValueError("Cannot add documents to a non-active knowledge base.") + documents = [] + storage_service = KnowledgeBaseStorageService.from_knowledge_base(self.knowledge_base) + for file in files: + storage_file = storage_service.save_file(file) + documents.append( + self.make_document( + title=storage_file.name, + source=storage_file.path, + source_type=consts.DOCUMENT_SOURCE_TYPE_FILE, + ), + ) + return documents - for file in files: - storage_file = KnowledgeBaseStorageService.from_knowledge_base( - self.knowledge_base, - ).save_file(file) - documents.append( - self.make_document( - title=storage_file.name, - source=storage_file.path, - source_type=consts.DOCUMENT_SOURCE_TYPE_FILE, - ), - ) - return documents - `[To ensure code accuracy, apply this suggestion manually]` Suggestion importance[1-10]: 7 __ Why: The suggestion correctly points out that `add_files` lacks a status check, which could lead to adding documents to an inactive knowledge base. Adding the `can_add_documents()` check improves the robustness and correctness of the business logic.	Medium
	Remove misleading field name The filter declares `field_name="content_type_filter"` but filtering is implemented via a custom method that targets `content_type__app_label`/`model`. The mismatched `field_name` is misleading and can cause double filtering; set `field_name` to `None` (or remove it) to rely solely on the method-driven filtering. backend/plan/filters.py [8-19] class UsageHistoryFilter(django_filters.FilterSet): content_type = django_filters.ChoiceFilter( - field_name="content_type_filter", label=_("Content type"), choices=[ ("core.crawlrequest", "Crawl request"), ("core.searchrequest", "Search request"), ("core.sitemaprequest", "Sitemap request"), ("knowledge_base.knowledgebasedocument", "Knowledge base document"), ], method="filter_content_type", ) `[To ensure code accuracy, apply this suggestion manually]` Suggestion importance[1-10]: 5 __ Why: The suggestion correctly points out that `field_name` is redundant and misleading when a custom `method` is used for filtering, improving code clarity and preventing potential future bugs.	Low
Update

…mpose file

…ttings - Introduce KNOWLEDGE_BASE_ENABLED flag to conditionally enable Knowledge Base features in backend and frontend. - Rename and consolidate KB_* environment variables to KNOWLEDGE_BASE_*. - Feature-gate API endpoints, signals, and background tasks for Knowledge Base using new flag. - Conditionally render Knowledge Base navigation and routes in frontend based on settings. - Update .env.example and documentation to reflect new variable names and startup instructions. - Clean up variable usage for consistency and future maintainability.

…xt-aware enhancer

Update Development branch

Feature/knowledge base

…dering issues

fix(SelectCrawlPage): handle empty crawl URL to prevent potential rendering issues

…knowledge-base

Feature/knowledge base

Merge with main

Feature/knowledge base

Merge With main

Merge With Main

…evelopment

New Crowdin updates

Update with development branch

amirasaran added 4 commits August 11, 2025 00:13

fix: resolve conflicts

192cdf8

feat: enhance provider configuration with optional API key and base U…

7221cef

…RL, update temperature handling in models

dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Aug 22, 2025

amirasaran changed the title ~~Feature/knowledge base~~ feat: Introduce Knowledge Base feature with plan-based credit tracking and usage history Aug 22, 2025

dosubot bot added the 🧠 feat:workflow Smart crawl planning, route building label Aug 22, 2025

qodo-code-review bot changed the title ~~feat: Introduce Knowledge Base feature with plan-based credit tracking and usage history~~ Feature/knowledge base Aug 22, 2025

qodo-code-review bot added the Review effort 4/5 label Aug 22, 2025

amirasaran and others added 19 commits August 22, 2025 21:33

feat: move OpenSearch and Dashboards services to a separate docker-co…

711653b

…mpose file

feat(knowledge-base): enable SummaryEnhancementRateThrottle for conte…

d94d88c

…xt-aware enhancer

Merge pull request #117 from watercrawl/main

813e657

Update Development branch

Merge pull request #118 from watercrawl/feature/knowledge-base

f39cead

Feature/knowledge base

fix(SelectCrawlPage): handle empty crawl URL to prevent potential ren…

8e824d9

…dering issues

Merge pull request #119 from watercrawl/feature/knowledge-base

abe1931

fix(SelectCrawlPage): handle empty crawl URL to prevent potential rendering issues

Merge branch 'main' of github.com:watercrawl/watercrawl into feature/…

3898aa0

…knowledge-base

fix(services): update crawl_type based on URL count

af7db32

Merge pull request #123 from watercrawl/feature/knowledge-base

5de3a69

Feature/knowledge base

chore: merge with main branch

24ad984

chore: merge with main branch

9ca3445

Merge pull request #125 from watercrawl/feature/knowledge-base

4b405f5

Merge with main

merge with main branch

69431a4

Merge pull request #127 from watercrawl/feature/knowledge-base

0b8062a

Feature/knowledge base

Merge pull request #130 from watercrawl/main

a599471

Merge With main

Merge with main

3ba77e7

Merge pull request #133 from watercrawl/main

57c180e

Merge With Main

Merge with main

2bc5b57

amirasaran and others added 30 commits November 4, 2025 00:22

New translations en.json (Persian)

85f4481

New translations django.po (French)

bb35629

New translations django.po (Spanish)

be4cadd

New translations django.po (Arabic)

10abe33

New translations django.po (German)

8f2dfa4

New translations django.po (Italian)

46202ca

New translations django.po (Japanese)

dd08a51

New translations django.po (Portuguese)

d1c1e88

New translations django.po (Chinese Simplified)

e44d157

New translations django.po (English)

ee61f3d

New translations django.po (Persian)

c945c02

New translations en.json (French)

fb27e21

New translations en.json (Spanish)

8e60abd

New translations en.json (Arabic)

98c9ef0

New translations en.json (German)

40021c3

New translations en.json (Italian)

840f996

New translations en.json (Japanese)

6610923

New translations en.json (Portuguese)

5669728

New translations en.json (Chinese Simplified)

3ef9663

New translations en.json (Persian)

7594e21

New translations django.po (French)

10a8a11

New translations django.po (Spanish)

4c3b817

New translations django.po (Arabic)

53de85c

New translations django.po (German)

1212ed9

New translations django.po (Portuguese)

23b4ba1

New translations django.po (Persian)

deaab85

fix conflict

241d6ff

Merge branch 'development' of github.com:watercrawl/watercrawl into d…

5a9c68e

…evelopment

Merge pull request #160 from watercrawl/l10n_development

e6b375f

New Crowdin updates

Merge pull request #163 from watercrawl/development

fbfedfd

Update with development branch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/knowledge base #116

Feature/knowledge base #116

Uh oh!

amirasaran commented Aug 22, 2025 •

edited by qodo-code-review bot

Loading

Uh oh!

qodo-code-review bot commented Aug 22, 2025

Uh oh!

qodo-code-review bot commented Aug 22, 2025 •

edited

Loading

Examples:

Solution Walkthrough:

Before:

After:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Feature/knowledge base #116

Are you sure you want to change the base?

Feature/knowledge base #116

Uh oh!

Conversation

amirasaran commented Aug 22, 2025 • edited by qodo-code-review bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

Description

Type of Change

UI Changes

Testing

Checklist

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

qodo-code-review bot commented Aug 22, 2025

PR Reviewer Guide 🔍

Uh oh!

qodo-code-review bot commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Examples:

Solution Walkthrough:

Before:

After:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

amirasaran commented Aug 22, 2025 •

edited by qodo-code-review bot

Loading

qodo-code-review bot commented Aug 22, 2025 •

edited

Loading