Skip to content

Enterprise-ready vector database toolkit for building searchable knowledge bases from multiple data sources. Supports multi-project management, automatic ingestion from Confluence/JIRA/Git, intelligent file conversion (PDF/Office/images), and semantic search. Includes MCP server for seamless AI assistant integration.

License

Notifications You must be signed in to change notification settings

martin-papy/qdrant-loader

Repository files navigation

QDrant Loader

PyPI - qdrant-loader PyPI - mcp-server PyPI - qdrant-loader-core CodeRabbit Pull Request Reviews Test Coverage License: GPL v3

πŸ“‹ Release Notes v0.7.2 - Latest improvements and bug fixes

A comprehensive toolkit for loading data into Qdrant vector database with advanced MCP server support for AI-powered development workflows.

🎯 What is QDrant Loader?

QDrant Loader is a data ingestion and retrieval system that collects content from multiple sources, processes and vectorizes it, then provides intelligent search capabilities through a Model Context Protocol (MCP) server for AI development tools.

Perfect for:

  • πŸ€– AI-powered development with Cursor, Windsurf, and other MCP-compatible tools
  • πŸ“š Knowledge base creation from technical documentation
  • πŸ” Intelligent code assistance with contextual information
  • 🏒 Enterprise content integration from multiple data sources

πŸ“¦ Packages

This monorepo contains three complementary packages:

πŸ”„ QDrant Loader

Data ingestion and processing engine

Collects and vectorizes content from multiple sources into QDrant vector database.

Key Features:

  • Multi-source connectors: Git, Confluence (Cloud & Data Center), JIRA (Cloud & Data Center), Public Docs, Local Files
  • File conversion: PDF, Office docs (Word, Excel, PowerPoint), images, audio, EPUB, ZIP, and more using MarkItDown
  • Smart chunking: Modular chunking strategies with intelligent document processing and hierarchical context
  • Incremental updates: Change detection and efficient synchronization
  • Multi-project support: Organize sources into projects with shared collections
  • Provider-agnostic LLM: OpenAI, Azure OpenAI, Ollama, and custom endpoints with unified configuration

βš™οΈ QDrant Loader Core

Core library and LLM abstraction layer

Provides the foundational components and provider-agnostic LLM interface used by other packages.

Key Features:

  • LLM Provider Abstraction: Unified interface for OpenAI, Azure OpenAI, Ollama, and custom endpoints
  • Configuration Management: Centralized settings and validation for LLM providers
  • Rate Limiting: Built-in rate limiting and request management
  • Error Handling: Robust error handling and retry mechanisms
  • Logging: Structured logging with configurable levels

AI development integration layer

Model Context Protocol server providing search capabilities to AI development tools.

Key Features:

  • MCP Protocol 2025-06-18: Latest protocol compliance with dual transport support (stdio + HTTP)
  • Advanced search tools: Semantic search, hierarchy-aware search, attachment discovery, and conflict detection
  • Cross-document intelligence: Document similarity, clustering, relationship analysis, and knowledge graphs
  • Streaming capabilities: Server-Sent Events (SSE) for real-time search results
  • Production-ready: HTTP transport with security, session management, and health checks

πŸš€ Quick Start

Installation

# Install both packages
pip install qdrant-loader qdrant-loader-mcp-server

# Or install individually
pip install qdrant-loader          # Data ingestion only
pip install qdrant-loader-mcp-server  # MCP server only

5-Minute Setup

  1. Create a workspace

    mkdir my-workspace && cd my-workspace
  2. Initialize workspace with templates

    qdrant-loader init --workspace .
  3. Configure your environment (edit .env)

    # Qdrant connection
    QDRANT_URL=http://localhost:6333
    QDRANT_COLLECTION_NAME=my_docs
    
    # LLM provider (new unified configuration)
    OPENAI_API_KEY=your_openai_key
    LLM_PROVIDER=openai
    LLM_BASE_URL=https://api.openai.com/v1
    LLM_EMBEDDING_MODEL=text-embedding-3-small
    LLM_CHAT_MODEL=gpt-4o-mini
  4. Configure data sources (edit config.yaml)

    global:
      qdrant:
        url: "http://localhost:6333"
        collection_name: "my_docs"
      llm:
        provider: "openai"
        base_url: "https://api.openai.com/v1"
        api_key: "${OPENAI_API_KEY}"
        models:
          embeddings: "text-embedding-3-small"
          chat: "gpt-4o-mini"
        embeddings:
          vector_size: 1536
    
    projects:
      my-project:
        project_id: "my-project"
        sources:
          git:
            docs-repo:
              base_url: "https://github.com/your-org/your-repo.git"
              branch: "main"
              file_types: ["*.md", "*.rst"]
  5. Load your data

    qdrant-loader ingest --workspace .
  6. Start the MCP server

    mcp-qdrant-loader --env /path/tp/your/.env

πŸ”§ Integration with Cursor

Add to your Cursor settings (.cursor/mcp.json):

{
  "mcpServers": {
    "qdrant-loader": {
      "command": "/path/to/venv/bin/mcp-qdrant-loader",
      "env": {
        "QDRANT_URL": "http://localhost:6333",
        "QDRANT_COLLECTION_NAME": "my_docs",
        "OPENAI_API_KEY": "your_key"
      }
    }
  }
}

Alternative: Use configuration file (recommended for complex setups):

{
  "mcpServers": {
    "qdrant-loader": {
      "command": "/path/to/venv/bin/mcp-qdrant-loader",
      "args": ["--config", "/path/to/your/config.yaml", "--env", "/path/to/your/.env"]
    }
  }
}

Example queries in Cursor:

  • "Find documentation about authentication in our API"
  • "Show me examples of error handling patterns"
  • "What are the deployment requirements for this service?"
  • "Find all attachments related to database schema"

πŸ“š Documentation

πŸš€ Getting Started

πŸ‘₯ User Guides

⚠️ Migration Guide (v0.7.1+)

LLM Configuration Migration Required

  • New unified configuration: global.llm.* replaces legacy global.embedding.* and file_conversion.markitdown.*
  • Provider-agnostic: Now supports OpenAI, Azure OpenAI, Ollama, and custom endpoints
  • Legacy support: Old configuration still works but shows deprecation warnings
  • Action required: Update your config.yaml to use the new syntax (see examples above)

Migration Resources

πŸ› οΈ Developer Resources

🀝 Contributing

We welcome contributions! See our Contributing Guide for:

  • Development environment setup
  • Code style and standards
  • Pull request process

Quick Development Setup

# Clone and setup
git clone https://github.com/martin-papy/qdrant-loader.git
cd qdrant-loader
python -m venv venv
source venv/bin/activate

# Install packages in development mode
pip install -e ".[dev]"
pip install -e "packages/qdrant-loader-core[dev,openai,ollama]"
pip install -e "packages/qdrant-loader[dev]"
pip install -e "packages/qdrant-loader-mcp-server[dev]"

πŸ“„ License

This project is licensed under the GNU GPLv3 - see the LICENSE file for details.


Ready to get started? Check out our Quick Start Guide or browse the complete documentation.

About

Enterprise-ready vector database toolkit for building searchable knowledge bases from multiple data sources. Supports multi-project management, automatic ingestion from Confluence/JIRA/Git, intelligent file conversion (PDF/Office/images), and semantic search. Includes MCP server for seamless AI assistant integration.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Contributors 2

  •  
  •  

Languages