Skip to content

Implement project-specific configuration overrides functionalityΒ #23

@martin-papy

Description

@martin-papy

Enhancement: Implement Project-Specific Configuration Overrides

🎯 Overview

The overrides field in project configurations is currently parsed and stored but not actively used in the application logic. This feature should allow projects to override global configuration settings on a per-project basis.

πŸ“‹ Problem Statement

Currently, all projects use the same global configuration settings (chunking, embedding, file conversion, etc.). The multi-project architecture includes an overrides field in ProjectConfig that is designed to allow project-specific customization, but this functionality is not implemented.

Current behavior:

  • overrides field is parsed from config.yaml
  • Overrides are merged with global config and stored in ProjectConfig.overrides
  • The merged overrides are never applied during processing
  • All projects use identical global settings

Expected behavior:

  • Projects should be able to override global settings like chunk_size, embedding model, etc.
  • Processing pipeline should use project-specific settings when available
  • Fallback to global settings when no project overrides exist

πŸ”§ Technical Details

Current Implementation Status

  • βœ… Parsing: Overrides are correctly parsed in MultiProjectConfigParser._parse_project_config()
  • βœ… Storage: Merged overrides stored in ProjectConfig.overrides field
  • βœ… Validation: ConfigValidator validates overrides structure
  • ❌ Application: Overrides are not applied during document processing

Files Involved

  • packages/qdrant-loader/src/qdrant_loader/config/models.py - ProjectConfig.overrides field
  • packages/qdrant-loader/src/qdrant_loader/config/parser.py - Override parsing and merging
  • packages/qdrant-loader/src/qdrant_loader/core/project_manager.py - ProjectContext management
  • Processing pipeline components that should use project-specific config

Example Use Case

global:
  chunking:
    chunk_size: 1000
    chunk_overlap: 200
  embedding:
    model: "text-embedding-3-small"

projects:
  technical-docs:
    display_name: "Technical Documentation"
    sources:
      git:
        docs-repo:
          # ... git config
    overrides:
      chunking:
        chunk_size: 2000  # Larger chunks for technical content
        chunk_overlap: 400
      embedding:
        model: "text-embedding-3-large"  # Better model for technical content
  
  marketing-content:
    display_name: "Marketing Content"
    sources:
      confluence:
        marketing-space:
          # ... confluence config
    overrides:
      chunking:
        chunk_size: 500   # Smaller chunks for marketing content
        chunk_overlap: 100

πŸš€ Proposed Solution

1. Create Configuration Resolution Service

class ProjectConfigResolver:
    """Resolves effective configuration by applying project overrides."""
    
    def get_effective_config(
        self, 
        project_context: ProjectContext, 
        global_config: GlobalConfig
    ) -> GlobalConfig:
        """Apply project overrides to global configuration."""
        if not project_context.config_overrides:
            return global_config
        
        # Deep merge project overrides with global config
        effective_config_dict = self._deep_merge(
            global_config.to_dict(), 
            project_context.config_overrides
        )
        
        # Create new GlobalConfig instance with merged settings
        return GlobalConfig(**effective_config_dict)

2. Update Processing Components

  • Modify chunking strategies to accept project-specific config
  • Update embedding components to use project-specific models
  • Ensure file conversion uses project-specific settings

3. Update Project Manager

  • Add method to get effective configuration for a project
  • Ensure ProjectContext includes resolved configuration

4. Update Pipeline Orchestrator

  • Pass project-specific configuration to processing components
  • Ensure proper fallback to global config when no project specified

πŸ“ Implementation Tasks

Phase 1: Core Infrastructure

  • Create ProjectConfigResolver service
  • Add get_effective_config() method to ProjectManager
  • Update ProjectContext to include effective configuration
  • Add unit tests for configuration resolution

Phase 2: Component Integration

  • Update chunking strategies to accept project-specific config
  • Modify embedding components for project-specific models
  • Update file conversion to use project-specific settings
  • Update text processing components

Phase 3: Pipeline Integration

  • Modify PipelineOrchestrator to use project-specific config
  • Update connector instantiation to use effective config
  • Ensure proper config propagation through processing pipeline

Phase 4: Testing & Documentation

  • Add integration tests for project-specific overrides
  • Update configuration documentation
  • Add examples to config template
  • Update CLI help text and examples

πŸ§ͺ Testing Strategy

Unit Tests

  • Configuration resolution with various override scenarios
  • Deep merging of nested configuration objects
  • Fallback behavior when no overrides specified

Integration Tests

  • End-to-end processing with project-specific settings
  • Multiple projects with different configurations
  • Validation of effective settings in processing components

Example Test Cases

def test_project_specific_chunking():
    """Test that project overrides affect chunking behavior."""
    # Project with larger chunk size should produce fewer chunks
    
def test_project_specific_embedding():
    """Test that project uses specified embedding model."""
    # Verify correct model is used for embedding generation

def test_fallback_to_global_config():
    """Test fallback when no project overrides specified."""
    # Should use global settings when overrides empty

πŸ“š Documentation Updates

Configuration Reference

  • Document override syntax and available options
  • Provide examples for common override scenarios
  • Explain inheritance and merging behavior

User Guide

  • Add section on project-specific customization
  • Include best practices for using overrides
  • Troubleshooting guide for configuration issues

πŸ”— Related Issues

  • Multi-project architecture implementation
  • Configuration validation improvements
  • Performance optimization for different content types

πŸ’‘ Future Enhancements

  • Runtime configuration updates
  • Configuration profiles/presets
  • Project-specific connector settings
  • Dynamic configuration based on content analysis

Priority: Medium
Effort: Medium (2-3 days)
Impact: High - Enables flexible multi-project configurations

This enhancement will complete the multi-project architecture by making the override functionality fully operational, allowing users to optimize settings for different types of content and use cases.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions