Skip to content

Latest commit

 

History

History
214 lines (157 loc) · 8.36 KB

File metadata and controls

214 lines (157 loc) · 8.36 KB


Logo

DBMS extension for multimodal query processing and optimization.
Explore the docs »

Landing Page | Report Bug | Request Feature

Table of Contents

  1. About The Project
  2. Features
  3. Getting Started
  4. Usage
  5. Roadmap
  6. Feedback and Issues
  7. License
  8. Acknowledgments

📜 About The Project

Flock is an advanced DuckDB extension that seamlessly integrates analytics with semantic analysis through declarative SQL queries. Designed for modern data analysis needs, Flock empowers users to work with structured and unstructured data, combining OLAP workflows with the capabilities of LLMs (Large Language Models) and RAG (Retrieval-Augmented Generation) pipelines.

To cite the project:

@article{10.14778/3750601.3750685,
  author  = {Dorbani, Anas and Yasser, Sunny and Lin, Jimmy and Mhedhbi, Amine},
  title   = {Beyond Quacking: Deep Integration of Language Models and RAG into DuckDB},
  journal = {Proc. VLDB Endow.},
  year    = {2025},
  volume  = {18},
  number  = {12},
  doi     = {10.14778/3750601.3750685},
  url     = {https://doi.org/10.14778/3750601.3750685}
}

🔝 back to top

🔥 Features

  • Declarative SQL Interface: Perform text generation, classification, summarization, filtering, and embedding generation using SQL queries.
  • Multi-Provider Support: Easily integrate with OpenAI, Azure, Ollama, and Anthropic/Claude for your AI needs.
  • End-to-End RAG Pipelines: Enable retrieval and augmentation workflows for enhanced analytics.
  • Map and Reduce Functions: Intuitive APIs for combining semantic tasks and data analytics directly in DuckDB.
  • Multimodal Analytics: First-class support for text, images, and audio (via transcription) directly in SQL.
  • LLM Observability: Built-in metrics tracking for tokens, latency, and call counts across Flock LLM functions.
  • Browser & WASM Support: Run Flock-powered DuckDB workloads in the browser via DuckDB-WASM.

✨ Key Highlights (v0.4.0 and later)

  • Anthropic/Claude Provider: Use Claude models as a fourth provider, alongside OpenAI, Azure, and Ollama, with full support for structured output and image analysis.
  • WASM Support: Compile Flock as a DuckDB-WASM loadable extension to run in the browser, enabling client-side analytics and demos without server infrastructure.
  • LLM Metrics Tracking: Track token usage, API latency, and execution time through dedicated functions like flock_get_metrics() for better cost and performance monitoring.
  • Audio Transcription: Send audio inputs to OpenAI or Azure and obtain text transcripts using the same context_columns abstraction (with type: 'audio').
  • DuckDB v1.4.4: Upgraded to DuckDB 1.4.4, inheriting the latest performance and stability improvements.
  • Architecture Improvements: Centralized bind data and RAII-based storage guards reduce duplication and improve robustness across scalar and aggregate functions.
  • Developer Experience: Interactive build scripts, improved extension CI tooling, and GitHub Copilot agent instructions streamline local development and contributions.

🔝 back to top

🚀 Getting Started

📝 Prerequisites

  1. DuckDB: Version 1.4.4 or later. Install it from the official DuckDB installation guide.
  2. Supported Providers: Ensure you have credentials or API keys for at least one of the supported providers:
    • OpenAI
    • Azure
    • Ollama
    • Anthropic/Claude
  3. Supported OS:
    • Linux
    • macOS
    • Windows

🔝 back to top

⚙️ Installation

Flock can be installed in two ways:

Option 1: Install from Community Extension (Recommended)

Flock is a Community Extension available directly from DuckDB's community catalog.

  1. Install the extension:
    INSTALL flock FROM community;
  2. Load the extension:
    LOAD flock;

Option 2: Build from Source

If you want to build Flock from source or contribute to the project, you can use our automated build script:

  1. Clone the repository with submodules:

    git clone --recursive https://github.com/dais-polymtl/flock.git
    cd flock

    Or if you've already cloned without submodules:

    git submodule update --init --recursive
  2. Run the build and run script:

    ./scripts/build_and_run.sh

    This interactive script will guide you through:

    • Checking prerequisites (CMake, build tools, compilers)
    • Setting up vcpkg (dependency manager)
    • Building the project (Debug or Release mode)
    • Running DuckDB with the Flock extension

    The script will automatically detect your system configuration and use the appropriate build tools (Ninja or Make).

  3. The script will launch DuckDB with Flock extension ready to use. Make sure to check the documentation for usage examples.

Requirements for building from source:

  • CMake (3.5 or later)
  • C++ compiler (GCC, Clang, or MSVC)
  • Build system (Ninja or Make)
  • Git
  • Python 3 (optional, for integration tests)

🔝 back to top

💻 Usage

🔧 Example Query

Using Flock, you can run semantic analysis tasks directly in DuckDB. For example:

SELECT llm_complete(
            { 'model_name': 'summarizer'},
            { 'prompt_name': 'description-generation', 'context_columns': [{ 'data': product_name }]}
       ) AS product_description
  FROM UNNEST(['Wireless Headphones', 'Gaming Laptop', 'Smart Watch']) AS t(product_name);

Explore more usage examples in the documentation.

If you are a contributor or want to work on Flock itself, see the dedicated Developer Guide for build, testing, and contribution details.

🔝 back to top

🛣️ Roadmap

Our roadmap outlines upcoming features and improvements. Stay updated by checking out our detailed plan.

🔝 back to top

🛠️ Feedback and Issues

We value your feedback! If you’d like to report an issue or suggest a new feature, please use the links below:

For contributing code or other contributions, please refer to our dedicated Contribution Guidelines.

🔝 back to top

📝 License

This project is licensed under the MIT License. See the LICENSE file for details.

🔝 back to top

✨ Team

This project is under active development by the Data & AI Systems Laboratory (DAIS Lab) at Polytechnique Montréal.

🔝 back to top