Skip to content

Extend the package with the falkordblite capabilities #163

@gkorland

Description

@gkorland

Design: Embedded FalkorDB Support in falkordb-py

Goal

Extend falkordb-py so users can optionally run an embedded FalkorDB instance (no external server required), installed via:

pip install falkordb          # remote-only (default, no binaries)
pip install falkordb[lite]    # includes embedded redis-server + falkordb.so

The embedded mode reuses the existing FalkorDB / Graph / AsyncGraph classes — users get the same API surface regardless of connection mode.


Architecture Overview

falkordb-py (existing)
├── falkordb/
│   ├── __init__.py
│   ├── falkordb.py          # FalkorDB class (remote connections)
│   ├── graph.py             # Graph class
│   ├── asyncio/             # Async variants
│   └── ...
│
│   # ── NEW ──
│   ├── lite/                # Embedded server management (lazy-loaded)
│   │   ├── __init__.py
│   │   ├── server.py        # EmbeddedServer: manages redis-server lifecycle
│   │   ├── config.py        # Redis config generation
│   │   └── binaries.py      # Binary resolution (finds redis-server + falkordb.so)
│   │
│   └── falkordb.py          # Modified: add embedded= param to constructor
│
├── pyproject.toml           # Modified: add [lite] optional dependency
└── ...

Key Principle: New falkordb-bin Package for Binaries

A new package falkordb-bin on PyPI ships only the precompiled redis-server and falkordb.so binaries (per-platform wheels). The falkordb[lite] optional extra declares falkordb-bin as a dependency. All orchestration logic (starting/stopping the server, config generation, etc.) lives inside falkordb-py in the falkordb.lite subpackage.

The existing falkordblite package is left as-is and eventually deprecated — existing users of falkordblite are unaffected and can migrate at their own pace.

This gives us clean separation:

  • falkordb-bin on PyPI = new package, platform-specific binaries only (built via CI, one wheel per OS/arch)
  • falkordb on PyPI = pure Python client + optional embedded orchestration code
  • falkordb[lite] = both together
  • falkordblite on PyPI = legacy standalone package (deprecated, eventually archived)

User-Facing API

Constructor Parameter (unified API)

Embedded mode is activated via the constructor, alongside all the standard connection kwargs. No separate factory method — one constructor, one class.

from falkordb import FalkorDB

# Remote (existing, unchanged)
db = FalkorDB(host='localhost', port=6379)

# Embedded — zero config (ephemeral)
db = FalkorDB(embedded=True)

# Embedded — with persistence
db = FalkorDB(embedded=True, db_path='/tmp/my_graph.db')

# Embedded — custom config + standard kwargs
db = FalkorDB(
    embedded=True,
    db_path='/tmp/my_graph.db',
    embedded_config={'maxmemory': '1gb'},
    max_connections=32,
    socket_timeout=30,
    encoding='utf-8',
)

# Usage is identical from here on
g = db.select_graph('social')
g.query('CREATE (n:Person {name: "Alice"}) RETURN n')

# Cleanup (stops the embedded server)
db.close()

When embedded=True:

  • The constructor spins up a local redis-server + falkordb.so via Unix socket
  • All standard kwargs (socket_timeout, encoding, encoding_errors, retry_on_error, etc.) are applied to the connection to the embedded server
  • Remote-specific kwargs (host, port, ssl_*, etc.) are ignored
  • A connection pool is created automatically for parallel query support

When embedded=False (default):

  • Behavior is identical to today — no change whatsoever

Async Support

from falkordb.asyncio import FalkorDB as AsyncFalkorDB

# Async embedded
db = AsyncFalkorDB(embedded=True, db_path='/tmp/async_graph.db')
g = db.select_graph('social')
result = await g.query('MATCH (n) RETURN n')
await db.close()

Context Manager

from falkordb import FalkorDB

with FalkorDB(embedded=True, db_path='/tmp/my_graph.db') as db:
    g = db.select_graph('social')
    g.query('CREATE (n:Person {name: "Alice"}) RETURN n')
# Server automatically stopped on exit

Implementation Plan

Phase 1: New Binary Package (falkordb-bin on PyPI)

Create a new repo and PyPI package falkordb-bin that contains only the precompiled binaries and a minimal Python API to locate them. This is a completely separate package from the existing falkordblite.

New repo: FalkorDB/falkordb-bin

falkordb_bin/__init__.py:

import os
import sys
import platform

def get_bin_dir():
    """Return path to directory containing redis-server and falkordb.so/dylib"""
    return os.path.join(os.path.dirname(__file__), 'bin')

def get_redis_server():
    """Return path to redis-server binary"""
    name = 'redis-server.exe' if sys.platform == 'win32' else 'redis-server'
    return os.path.join(get_bin_dir(), name)

def get_falkordb_module():
    """Return path to falkordb module (falkordb.so on Linux, falkordb.dylib on macOS)"""
    if sys.platform == 'darwin':
        name = 'falkordb.dylib'
    else:
        name = 'falkordb.so'
    return os.path.join(get_bin_dir(), name)

The CI builds platform-specific wheels containing:

falkordb_bin/
├── __init__.py
└── bin/
    ├── redis-server
    └── falkordb.so    (or falkordb.dylib on macOS)

pyproject.toml for falkordb-bin:

[project]
name = "falkordb-bin"
version = "1.2.0"  # Mirrors falkordb client version
description = "Precompiled redis-server and FalkorDB module binaries"
requires-python = ">=3.8"

[build-system]
requires = ["setuptools>=64", "wheel"]
build-backend = "setuptools.build_meta"

CI/CD: The build pipeline (GitHub Actions) would:

  1. Build redis-server from source (there are no official Redis binaries to download)
  2. Download pre-built falkordb.so/falkordb.dylib from FalkorDB GitHub releases
  3. Package into platform wheels: falkordb_bin-1.0.0-cp3-none-manylinux_2_17_x86_64.whl, ...-macosx_11_0_arm64.whl, etc.
  4. Publish to PyPI

The existing falkordblite build scripts (setup.py, build_scripts/) can be adapted for the Redis compilation step. The FalkorDB binary download replaces the current from-source build, simplifying the pipeline significantly.

Platform matrix (initial):

Platform Redis FalkorDB
Linux x86_64 Build from source Download from GH releases
Linux aarch64 Build from source Download from GH releases
macOS x86_64 Build from source Download from GH releases
macOS arm64 Build from source Download from GH releases

Windows (follow-up task):

Windows support will be added in a subsequent phase using:

This requires separate work to validate compatibility and will be tracked as a separate task.

Phase 2: Orchestration in falkordb-py

2a. pyproject.toml Changes

[tool.poetry.extras]
lite = ["falkordb-bin"]

[tool.poetry.dependencies]
# ... existing deps ...
falkordb-bin = { version = ">=1.0.0,<2.0.0", optional = true }

Or if using standard [project] table:

[project.optional-dependencies]
lite = ["falkordb-bin>=1.0.0,<2.0.0"]

Note: falkordb-bin uses the same versioning as falkordb — e.g. falkordb 1.2.0 and falkordb-bin 1.2.0 are released together and known-compatible. The pinned range ensures users don't accidentally mix incompatible versions.

2b. falkordb/lite/__init__.py

"""
Embedded FalkorDB support.

This module is only usable when the 'lite' extra is installed:
    pip install falkordb[lite]
"""

2c. falkordb/lite/binaries.py

"""Binary resolution — finds redis-server and falkordb.so from falkordb-bin package."""

import shutil
from pathlib import Path


class BinaryNotFoundError(Exception):
    """Raised when embedded binaries are not installed."""
    pass


def _require_bin():
    """Check that falkordb-bin is installed, raise helpful error if not."""
    try:
        import falkordb_bin
        return falkordb_bin
    except ImportError:
        raise BinaryNotFoundError(
            "Embedded FalkorDB requires the 'lite' extra. "
            "Install with: pip install falkordb[lite]"
        )


def get_redis_server_path() -> Path:
    """Resolve path to redis-server binary."""
    falkordb_bin = _require_bin()
    path = Path(falkordb_bin.get_redis_server())
    if not path.exists():
        raise BinaryNotFoundError(f"redis-server not found at {path}")
    return path


def get_falkordb_module_path() -> Path:
    """Resolve path to falkordb.so module."""
    falkordb_bin = _require_bin()
    path = Path(falkordb_bin.get_falkordb_module())
    if not path.exists():
        raise BinaryNotFoundError(f"falkordb.so not found at {path}")
    return path

2d. falkordb/lite/config.py

"""Redis configuration generation for embedded mode."""

import os
import tempfile
from pathlib import Path


DEFAULT_CONFIG = {
    'bind': '127.0.0.1',
    'port': '0',                    # 0 = auto-assign port
    'save': '',                     # Disable RDB by default for ephemeral
    'appendonly': 'no',
    'protected-mode': 'yes',
    'loglevel': 'warning',
    'databases': '16',
}

PERSISTENT_OVERRIDES = {
    'save': '900 1 300 10 60 10000',  # RDB snapshots
    'appendonly': 'yes',
    'appendfsync': 'everysec',
}


def generate_config(
    falkordb_module_path: Path,
    db_path: str | None = None,
    unix_socket_path: str | None = None,
    user_config: dict | None = None,
) -> str:
    """Generate redis.conf content for embedded mode."""

    config = dict(DEFAULT_CONFIG)

    # If persistence requested, set dir and enable AOF/RDB
    if db_path:
        db_dir = os.path.dirname(os.path.abspath(db_path))
        db_file = os.path.basename(db_path)
        os.makedirs(db_dir, exist_ok=True)
        config['dir'] = db_dir
        config['dbfilename'] = db_file
        config.update(PERSISTENT_OVERRIDES)

    # Unix socket for local communication (preferred over TCP)
    if unix_socket_path:
        config['unixsocket'] = unix_socket_path
        config['unixsocketperm'] = '700'
        config['port'] = '0'  # Disable TCP when using socket

    # Load FalkorDB module
    config['loadmodule'] = str(falkordb_module_path)

    # Apply user overrides
    if user_config:
        config.update(user_config)

    # Render
    lines = [f'{k} {v}' for k, v in config.items()]
    return '\n'.join(lines) + '\n'

2e. falkordb/lite/server.py

"""Embedded redis-server lifecycle management."""

import atexit
import os
import subprocess
import tempfile
import time
from pathlib import Path

import redis

from .binaries import get_redis_server_path, get_falkordb_module_path
from .config import generate_config


class EmbeddedServerError(Exception):
    pass


class EmbeddedServer:
    """Manages a local redis-server + FalkorDB process."""

    def __init__(
        self,
        db_path: str | None = None,
        config: dict | None = None,
        startup_timeout: float = 10.0,
    ):
        self._process: subprocess.Popen | None = None
        self._tmpdir = tempfile.mkdtemp(prefix='falkordb_')
        self._socket_path = os.path.join(self._tmpdir, 'falkordb.sock')
        self._config_path = os.path.join(self._tmpdir, 'redis.conf')
        self._db_path = db_path
        self._startup_timeout = startup_timeout

        # Resolve binaries
        self._redis_server = get_redis_server_path()
        self._falkordb_module = get_falkordb_module_path()

        # Generate config
        config_content = generate_config(
            falkordb_module_path=self._falkordb_module,
            db_path=db_path,
            unix_socket_path=self._socket_path,
            user_config=config,
        )
        Path(self._config_path).write_text(config_content)

        # Start server
        self._start()

        # Ensure cleanup on exit
        atexit.register(self.stop)

    def _start(self):
        """Start the redis-server process and wait for it to be ready."""
        self._process = subprocess.Popen(
            [str(self._redis_server), self._config_path],
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
        )

        # Wait for socket to appear and server to respond
        deadline = time.monotonic() + self._startup_timeout
        while time.monotonic() < deadline:
            if self._process.poll() is not None:
                stderr = self._process.stderr.read().decode()
                raise EmbeddedServerError(
                    f"redis-server exited with code {self._process.returncode}: {stderr}"
                )
            if os.path.exists(self._socket_path):
                try:
                    r = redis.Redis(unix_socket_path=self._socket_path)
                    r.ping()
                    r.close()
                    return
                except redis.ConnectionError:
                    pass
            time.sleep(0.05)

        self.stop()
        raise EmbeddedServerError(
            f"redis-server did not start within {self._startup_timeout}s"
        )

    @property
    def unix_socket_path(self) -> str:
        return self._socket_path

    def stop(self):
        """Gracefully shut down the embedded server."""
        if self._process and self._process.poll() is None:
            try:
                r = redis.Redis(unix_socket_path=self._socket_path)
                r.shutdown(nosave=not bool(self._db_path))
                r.close()
            except Exception:
                self._process.terminate()
            try:
                self._process.wait(timeout=5)
            except subprocess.TimeoutExpired:
                self._process.kill()
        self._process = None

    def __del__(self):
        self.stop()

2f. Changes to falkordb/falkordb.py

Add embedded parameter to the constructor, with connection pooling for parallel queries:

class FalkorDB:
    """FalkorDB client — connects to remote or embedded server."""

    def __init__(
        self,
        host="localhost",
        port=6379,
        password=None,
        socket_timeout=None,
        socket_connect_timeout=None,
        socket_keepalive=None,
        socket_keepalive_options=None,
        connection_pool=None,
        unix_socket_path=None,
        encoding="utf-8",
        encoding_errors="strict",
        retry_on_error=None,
        ssl=False,
        # ... other existing params ...
        #
        # ── NEW embedded params ──
        embedded=False,
        db_path=None,
        embedded_config=None,
        max_connections=16,
        startup_timeout=10.0,
    ):
        self._embedded_server = None

        if embedded:
            # Lazy import — only loaded when embedded=True
            from .lite.server import EmbeddedServer

            server = EmbeddedServer(
                db_path=db_path,
                config=embedded_config,
                startup_timeout=startup_timeout,
            )
            self._embedded_server = server

            # Override connection to use Unix socket with pooling
            unix_socket_path = server.unix_socket_path
            connection_pool = redis.ConnectionPool(
                connection_class=redis.UnixDomainSocketConnection,
                path=unix_socket_path,
                max_connections=max_connections,
                decode_responses=True,
                socket_timeout=socket_timeout,
                socket_connect_timeout=socket_connect_timeout,
                encoding=encoding,
                encoding_errors=encoding_errors,
                retry_on_error=retry_on_error or [],
            )

        # ... existing __init__ logic using connection_pool or host/port ...

    def close(self):
        """Close the connection. If embedded, also stops the server."""
        if self._embedded_server:
            self._embedded_server.stop()
            self._embedded_server = None

    def __enter__(self):
        return self

    def __exit__(self, *args):
        self.close()

    def __del__(self):
        self.close()

Key points:

  • When embedded=True, the constructor starts an EmbeddedServer, creates a Unix socket ConnectionPool, and passes it through to the existing Redis connection logic
  • All standard kwargs (socket_timeout, encoding, etc.) are forwarded to the pool
  • Remote-specific kwargs (host, port, ssl_*) are silently ignored when embedded
  • Connection pool with configurable max_connections (default 16) enables parallel queries out of the box

2g. Async Variant in falkordb/asyncio/falkordb.py

Same constructor approach — embedded=True spins up the server and creates an async connection pool:

import redis.asyncio

class FalkorDB:
    # ... existing async FalkorDB ...

    def __init__(
        self,
        # ... existing params ...
        embedded=False,
        db_path=None,
        embedded_config=None,
        max_connections=16,
        startup_timeout=10.0,
    ):
        self._embedded_server = None

        if embedded:
            from ..lite.server import EmbeddedServer

            # Server start is synchronous (subprocess), but fast
            server = EmbeddedServer(
                db_path=db_path,
                config=embedded_config,
                startup_timeout=startup_timeout,
            )
            self._embedded_server = server

            # Create async connection pool over Unix socket
            connection_pool = redis.asyncio.BlockingConnectionPool(
                connection_class=redis.asyncio.UnixDomainSocketConnection,
                path=server.unix_socket_path,
                max_connections=max_connections,
                timeout=None,
                decode_responses=True,
            )
            # Pass pool to existing init logic...

    async def close(self):
        """Close async connection and stop embedded server if applicable."""
        # Close the connection pool
        if hasattr(self, 'connection') and self.connection:
            await self.connection.aclose()
        if self._embedded_server:
            self._embedded_server.stop()
            self._embedded_server = None

    async def __aenter__(self):
        return self

    async def __aexit__(self, *args):
        await self.close()

falkordblite Deprecation Strategy

The existing falkordblite package (full redislite fork) is left untouched and deprecated gradually. Existing users are unaffected.

Current State (stays as-is)

falkordblite (on PyPI) — full standalone package, no changes
├── redislite/           # Full redislite fork
│   ├── __init__.py      # Redis class, server management
│   ├── falkordb_client.py  # FalkorDB class (reimplements graph client)
│   ├── bin/
│   │   ├── redis-server
│   │   └── falkordb.so
│   └── ...
└── setup.py             # Builds redis from source

New Package (separate repo)

falkordb-bin (on PyPI) — new binary-only package
├── falkordb_bin/
│   ├── __init__.py      # get_redis_server(), get_falkordb_module()
│   └── bin/
│       ├── redis-server
│       └── falkordb.so
└── pyproject.toml       # Platform-specific wheel builds

Deprecation Timeline

  1. Now: Create falkordb-bin as a new repo & PyPI package. Ship binaries only.
  2. Now: Add falkordb[lite] extra to falkordb-py, depending on falkordb-bin.
  3. Now: Implement falkordb.lite subpackage in falkordb-py.
  4. Next release of falkordblite: Add deprecation warning on import:
    # In falkordblite's __init__.py or redislite/__init__.py
    import warnings
    warnings.warn(
        "falkordblite is deprecated. Use 'pip install falkordb[lite]' instead. "
        "See https://github.com/FalkorDB/falkordb-py#embedded-mode for migration guide.",
        DeprecationWarning,
        stacklevel=2,
    )
  5. 6-12 months later: Archive the falkordblite repo. The PyPI package remains installable but unmaintained.

Migration Guide for falkordblite Users

# BEFORE (falkordblite)
from redislite.falkordb_client import FalkorDB
db = FalkorDB('/tmp/falkordb.db')
g = db.select_graph('social')
g.query('CREATE (n:Person {name: "Alice"}) RETURN n')

# AFTER (falkordb[lite])
from falkordb import FalkorDB
db = FalkorDB(embedded=True, db_path='/tmp/falkordb.db')
g = db.select_graph('social')
g.query('CREATE (n:Person {name: "Alice"}) RETURN n')

The API is intentionally almost identical — users change the import and constructor, everything else stays the same.


Comparison: Before and After

Scenario Before After
Remote connection pip install falkordbFalkorDB(host=...) Same, unchanged
Embedded (current) pip install falkordblitefrom redislite.falkordb_client import FalkorDB pip install falkordb[lite]FalkorDB(embedded=True)
Embedded + remote in same app Two different packages, two different APIs One package, same FalkorDB class
Switching from embedded → remote Rewrite imports + constructor Remove embedded=True, add host=...
Existing falkordblite users Works as-is Still works — deprecation warning only, migrate when ready

Key Design Decisions

1. Unix Socket vs TCP for Embedded

Decision: Unix domain socket (default)

  • Faster than TCP loopback (~30% lower latency)
  • No port conflicts
  • File-permission-based security (only creating user can access)
  • Matches what redislite/falkordblite already does
  • Falls back to TCP 127.0.0.1 on Windows (no Unix sockets)

2. Ephemeral vs Persistent by Default

Decision: Ephemeral by default, opt-in persistence

FalkorDB(embedded=True)                                   # ephemeral (no RDB/AOF)
FalkorDB(embedded=True, db_path='/tmp/my.db')             # persistent

3. Connection Pooling in Embedded Mode

Decision: Always use a connection pool, configurable size

Even with a Unix socket, a connection pool is needed to support parallel queries (e.g. from multiple threads or concurrent graph operations). The embedded constructor creates a ConnectionPool / BlockingConnectionPool with max_connections=16 by default, configurable via kwarg.

# Default pool (16 connections)
db = FalkorDB(embedded=True)

# Custom pool size
db = FalkorDB(embedded=True, max_connections=32)

4. Lazy Import of falkordb-bin

Decision: Import only when embedded=True is passed

The falkordb_bin package is never imported at module load time. This means:

  • import falkordb works without falkordb-bin installed
  • The import error with a helpful message only happens when you construct with embedded=True
  • Zero overhead for remote-only users

5. Server Lifecycle

  • The embedded server is tied to the FalkorDB instance
  • close() / context manager / garbage collection all stop the server
  • atexit handler ensures cleanup on interpreter exit
  • Multiple embedded instances are supported (each gets its own server + socket)

Testing Strategy

tests/
├── test_embedded.py         # Requires falkordb-bin installed
│   ├── test_ephemeral_basic
│   ├── test_persistent_roundtrip
│   ├── test_context_manager
│   ├── test_close_stops_server
│   ├── test_multiple_instances
│   └── test_custom_config
├── test_embedded_async.py   # Async embedded tests
├── test_missing_extra.py    # Verifies helpful error when falkordb-bin not installed
└── ...existing tests...

CI matrix:

  • All platforms: Run existing remote tests (no change)
  • Linux/macOS: Additionally run embedded tests with pip install .[lite]

File Changes Summary

falkordb-py repo

File Change
pyproject.toml Add [project.optional-dependencies] lite = ["falkordb-bin>=1.0,<2.0"]
falkordb/falkordb.py Add embedded=, db_path=, embedded_config=, max_connections= params, close(), context manager
falkordb/asyncio/falkordb.py Add same embedded params, async close(), __aenter__/__aexit__
falkordb/lite/__init__.py New — subpackage init
falkordb/lite/server.py New — EmbeddedServer class
falkordb/lite/config.py New — Redis config generation
falkordb/lite/binaries.py New — Binary resolution from falkordb-bin
tests/test_embedded.py New — Embedded mode tests
README.md Add embedded usage examples

falkordb-bin repo (NEW)

File Description
falkordb_bin/__init__.py get_redis_server(), get_falkordb_module() API
falkordb_bin/bin/redis-server Precompiled binary (per-platform)
falkordb_bin/bin/falkordb.so Precompiled binary (per-platform)
pyproject.toml Package metadata, platform wheel config
.github/workflows/build.yml CI: compile binaries, build wheels, publish to PyPI

falkordblite repo (existing, minimal changes)

File Change
redislite/__init__.py Add DeprecationWarning pointing to falkordb[lite]
README.md Add deprecation notice and migration guide

Resolved Decisions

# Question Decision
1 Windows support Required, but done as a follow-up task. Will use redis-windows for Redis and falkordb-rs-next-gen for FalkorDB on Windows.
2 Binary versioning Pinned to major version range (e.g. >=1.0.0,<2.0.0). Both packages bump major together on breaking protocol changes.
3 Connection pooling Yes, always. Embedded mode creates a ConnectionPool with max_connections=16 over Unix socket to support parallel queries.
4 API style Constructor parameter (embedded=True). All standard kwargs (socket_timeout, encoding, etc.) are accepted alongside embedded-specific ones. No separate factory method.
5 Build pipeline Hybrid. Download pre-built FalkorDB binaries from GitHub releases. Build Redis from source (no official binaries available).
6 Package name falkordb-bin. Clear, concise, communicates "just the binaries".
7 max_connections Configurable via kwarg, default 16. Constructor accepts max_connections= passed through to the pool.
8 Version alignment falkordb-bin mirrors falkordb versioning. e.g. falkordb 1.2.0 and falkordb-bin 1.2.0 are released together and known-compatible. Simplifies the compatibility story — users don't need a matrix.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions