StreamDB

A lightweight, thread-safe embedded key-value database implemented in C, using a reverse trie data structure for efficient suffix-based searches.

Features
Quick Start
Building
Design Specification
API Reference
Performance Characteristics
Limitations
License

Features

Feature	Description
Thread-safe	All operations protected by recursive mutex
Suffix search	O(m + k) lookup for keys ending with a given suffix
Binary keys	Supports arbitrary byte sequences, not just strings
Auto-persistence	Background thread periodically flushes to disk
Cross-platform	Windows, Linux, macOS, BSD
Zero dependencies	Pure C11 with standard library only
Memory-only mode	Optional in-memory operation without disk I/O
Atomic writes	Crash-safe persistence via temp file + rename

Quick Start

#include <streamdb.h>
#include <stdio.h>
#include <stdlib.h>

int main(void) {
    // Initialize with file persistence, 5-second auto-flush
    StreamDB* db = streamdb_init("mydb.dat", 5000);
    
    // Insert key-value pairs
    streamdb_insert(db, (unsigned char*)"user:alice", 10, "Alice Smith", 12);
    streamdb_insert(db, (unsigned char*)"user:bob", 8, "Bob Jones", 10);
    
    // Retrieve (returns a copy - must free)
    size_t size;
    char* value = streamdb_get(db, (unsigned char*)"user:alice", 10, &size);
    if (value) {
        printf("Found: %s\n", value);
        free(value);
    }
    
    // Suffix search - find all keys ending with "alice"
    StreamDBResult* results = streamdb_suffix_search(db, 
        (unsigned char*)"alice", 5);
    for (StreamDBResult* r = results; r; r = r->next) {
        printf("Key: %.*s\n", (int)r->key_len, (char*)r->key);
    }
    streamdb_free_results(results);
    
    // Cleanup (auto-flushes if dirty)
    streamdb_free(db);
    return 0;
}

Compile and run:

gcc -o example example.c -lstreamdb -lpthread
./example

Building

Prerequisites

C11-compatible compiler (GCC 4.9+, Clang 3.1+, MSVC 2015+)
POSIX threads (pthreads) on Unix-like systems
CMake 3.10+ (optional) or GNU Make

Using Make

make              # Build static and shared libraries
make test         # Build and run test suite
make debug        # Build with AddressSanitizer + run tests
make install      # Install to /usr/local (or PREFIX=/path)

Using CMake

mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
cmake --build .
ctest             # Run tests
sudo cmake --install .

CMake Options

Option	Default	Description
`STREAMDB_BUILD_SHARED`	ON	Build shared library (.so/.dylib/.dll)
`STREAMDB_BUILD_STATIC`	ON	Build static library (.a/.lib)
`STREAMDB_BUILD_TESTS`	ON	Build test suite
`STREAMDB_INSTALL`	ON	Generate install targets

Using Nix

nix build              # Build default package
nix develop            # Enter development shell
nix flake check        # Run tests

Design Specification

Architecture Overview

┌─────────────────────────────────────────────────────────────────┐
│                         StreamDB                                │
├─────────────────────────────────────────────────────────────────┤
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────────┐  │
│  │  Public API │  │   Wrapper   │  │   Platform Abstraction  │  │
│  │             │  │    API      │  │   (mutex, threads, UUID)│  │
│  └──────┬──────┘  └──────┬──────┘  └────────────┬────────────┘  │
│         │                │                      │               │
│         └────────────────┼──────────────────────┘               │
│                          │                                      │
│  ┌───────────────────────▼───────────────────────────────────┐  │
│  │                    Core Engine                            │  │
│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────────┐    │  │
│  │  │ Reverse Trie│  │  Serializer │  │  Auto-Flush     │    │  │
│  │  │   (Data)    │  │ (Persistence│  │   Thread        │    │  │
│  │  └─────────────┘  └─────────────┘  └─────────────────┘    │  │
│  └───────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘

Components:

Public API (streamdb.h) — Core database operations
Wrapper API (streamdb_wrapper.h) — Convenience functions (documents, batch ops)
Platform Abstraction — Portable mutex, threads, condition variables, UUID
Reverse Trie — Primary data structure for storage and suffix search
Serializer — Binary persistence with versioned format
Auto-Flush Thread — Background persistence with graceful shutdown

Data Structure: Reverse Trie

StreamDB uses a reverse trie (suffix trie) as its primary data structure. Keys are inserted in reverse byte order, enabling efficient suffix-based lookups.

Why Reverse Trie?

Operation	Hash Table	B-Tree	Trie	Reverse Trie
Exact lookup	O(1) avg	O(log n)	O(m)	O(m)
Prefix search	O(n)	O(log n + k)	O(m + k)	O(n)
Suffix search	O(n)	O(n)	O(n)	O(m + k)
Range scan	O(n)	O(log n + k)	O(n)	O(n)
Insert	O(1) avg	O(log n)	O(m)	O(m)
Delete	O(1) avg	O(log n)	O(m)	O(m)

Where: n = total keys, m = key length, k = matching results

Use cases optimized for suffix search:

File extension lookups (.txt, .json)
Domain name resolution (*.example.com)
Log analysis by timestamp suffix
Reverse DNS lookups

Trie Node Structure

typedef struct TrieNode {
    struct TrieNode* children[256];  // One slot per byte value
    void* value;                      // Stored value (if terminal)
    size_t value_size;                // Size of value in bytes
    int is_end;                       // Terminal node flag
} TrieNode;

Insertion Example

Inserting keys "cat", "car", "card":

Original keys:     Reversed for insertion:
  cat                tac
  car                rac  
  card               drac

Trie structure (reversed):

         [root]
         /    \
       [t]    [r]
        |      |
       [a]    [a]
        |      |
       [c]*   [c]*
               |
              [d]*

(* = terminal node with value)

Suffix search for "ar" (reversed: "ra"):

Traverse: root → [r] → [a]
Collect all terminals in subtree: "car", "card"

Memory Layout

Database Structure

struct StreamDB {
    // Data
    TrieNode* root;           // Root of reverse trie (never NULL)
    size_t total_size;        // Sum of all value sizes
    size_t key_count;         // Number of stored keys
    size_t node_count;        // Number of allocated nodes
    
    // Synchronization
    mutex_t mutex;            // Recursive mutex for all operations
    condvar_t shutdown_cv;    // Condition variable for shutdown
    
    // Persistence
    char* file_path;          // Path to data file (NULL = memory-only)
    int is_file_backend;      // Quick check for persistence mode
    int dirty;                // Unflushed modifications exist
    int running;              // Auto-flush thread active
    int shutdown_requested;   // Graceful shutdown flag
    thread_t auto_thread;     // Background flush thread handle
    int auto_flush_interval_ms;
    int thread_started;       // Thread creation succeeded
};

Memory Overhead

Component	Size (64-bit)	Notes
TrieNode	2,064 bytes	256 pointers + value ptr + metadata
StreamDB	~128 bytes	Fixed overhead per database
Per key	~(depth × 2KB)	Shared nodes amortize cost

Memory optimization strategies:

Nodes are shared across keys with common suffixes
Values are stored by reference (single allocation)
Node count tracked for monitoring

Threading Model

Concurrency Guarantees

All public functions are thread-safe
Recursive mutex allows nested calls within callbacks
Copy-on-read semantics — streamdb_get() returns independent copy
Atomic persistence — temp file + rename pattern

Lock Hierarchy

┌─────────────────────────────────────────────┐
│              db->mutex                      │
│  ┌───────────────────────────────────────┐  │
│  │  All trie operations                  │  │
│  │  All metadata updates                 │  │
│  │  Serialization (during flush)         │  │
│  └───────────────────────────────────────┘  │
└─────────────────────────────────────────────┘

Single mutex design chosen for:

Simplicity and correctness
Low contention expected (embedded use case)
Avoiding deadlock complexity

Auto-Flush Thread Lifecycle

streamdb_init()
      │
      ▼
┌─────────────────┐
│ Thread Created  │◄─────────────────────────┐
└────────┬────────┘                          │
         │                                   │
         ▼                                   │
┌─────────────────┐    timeout    ┌──────────┴────────┐
│ Wait on condvar │──────────────►│ Check dirty flag  │
└────────┬────────┘               └──────────┬────────┘
         │                                   │
         │ signal                            │ if dirty
         ▼                                   ▼
┌─────────────────┐               ┌─────────────────────┐
│ shutdown_requested?             │ Flush to disk       │
└────────┬────────┘               └──────────┬──────────┘
         │                                   │
         │ yes                               │
         ▼                                   │
┌─────────────────┐                          │
│  Thread Exit    │◄─────────────────────────┘
└─────────────────┘

streamdb_shutdown() ──► Sets shutdown_requested + signals condvar
streamdb_free() ──────► Calls shutdown + joins thread + final flush

Graceful shutdown benefits:

Immediate wakeup via condition variable (vs. sleeping full interval)
Final flush ensures no data loss
Clean thread termination

Persistence Format

File Structure (v2)

┌──────────────────────────────────────────┐
│  Magic Number: "STDB" (4 bytes)          │
├──────────────────────────────────────────┤
│  Version: int32 = 2 (4 bytes)            │
├──────────────────────────────────────────┤
│  Serialized Trie (recursive structure)   │
│  ┌────────────────────────────────────┐  │
│  │  Node:                             │  │
│  │    is_end: int32                   │  │
│  │    value_size: size_t              │  │
│  │    value: [value_size bytes]       │  │
│  │    child_count: int32              │  │
│  │    children: [                     │  │
│  │      byte_index: uint8             │  │
│  │      child_node: Node (recursive)  │  │
│  │    ] × child_count                 │  │
│  └────────────────────────────────────┘  │
└──────────────────────────────────────────┘

Serialization Algorithm

Iterative depth-first traversal using explicit stack (avoids stack overflow):

// Pseudocode for serialization
stack = [(root, WRITE_HEADER)]
while stack not empty:
    (node, state) = stack.pop()
    
    if state == WRITE_HEADER:
        write(node.is_end, node.value_size)
        if node.is_end: write(node.value)
        write(child_count)
        stack.push((node, 0))  // Start iterating children
        
    else:  // state = next child index to process
        for i in range(state, 256):
            if node.children[i]:
                write(i)  // Child byte index
                stack.push((node, i+1))  // Resume here later
                stack.push((node.children[i], WRITE_HEADER))
                break

Atomic Write Strategy

1. Write to: {file_path}.{pid}.tmp
2. Sync to disk (implicit on fclose)
3. Delete existing: {file_path}
4. Rename: {file_path}.{pid}.tmp → {file_path}

Properties:

Crash during write leaves original intact
PID in temp name prevents multi-process collision
Rename is atomic on POSIX systems

Error Handling

Status Codes

typedef enum {
    STREAMDB_OK = 0,            // Success
    STREAMDB_ERROR = -1,        // General/unknown error
    STREAMDB_NOT_FOUND = -2,    // Key does not exist
    STREAMDB_INVALID_ARG = -3,  // NULL pointer or invalid parameter
    STREAMDB_NO_MEMORY = -4,    // malloc() failed
    STREAMDB_IO_ERROR = -5,     // File operation failed
    STREAMDB_NOT_SUPPORTED = -6 // Operation not applicable
} StreamDBStatus;

Error Handling Philosophy

Never crash — All NULL inputs handled gracefully
Explicit status codes — No errno dependency
Partial failure handling — Batch operations report first error
Human-readable messages — streamdb_strerror() for logging

API Reference

Initialization

Function	Description
`streamdb_init(path, interval_ms)`	Create database with optional file persistence
`streamdb_init_with_config(path, config)`	Create with extended configuration
`streamdb_free(db)`	Close database and free resources
`streamdb_shutdown(db)`	Signal background thread to stop

Core Operations

Function	Description
`streamdb_insert(db, key, key_len, value, value_size)`	Insert or update
`streamdb_get(db, key, key_len, &size)`	Retrieve copy of value
`streamdb_exists(db, key, key_len)`	Check if key exists
`streamdb_delete(db, key, key_len)`	Remove key-value pair

Search & Iteration

Function	Description
`streamdb_suffix_search(db, suffix, len)`	Find keys ending with suffix
`streamdb_foreach(db, callback, user_data)`	Iterate all entries
`streamdb_free_results(results)`	Free search result list

Persistence

Function	Description
`streamdb_flush(db)`	Force write to disk
`streamdb_get_stats(db, &stats)`	Get database statistics

Utilities

Function	Description
`streamdb_strerror(status)`	Get error message
`streamdb_version()`	Get library version string

Performance Characteristics

Time Complexity

Operation	Average	Worst Case	Notes
Insert	O(m)	O(m)	m = key length
Get	O(m)	O(m)	Plus O(v) copy, v = value size
Delete	O(m)	O(m)
Exists	O(m)	O(m)	No value copy
Suffix search	O(m + k)	O(m + n)	k = matches, n = total keys
Flush	O(N)	O(N)	N = total nodes

Space Complexity

Metric	Complexity	Notes
Node overhead	O(256 × ptr_size)	~2KB per node on 64-bit
Total nodes	O(Σ key_lengths)	Upper bound, sharing reduces this
Value storage	O(Σ value_sizes)	Exact, no overhead

Benchmarks (Indicative)

Tested on: AMD Ryzen 7 5800X, 32GB RAM, NVMe SSD, GCC 11 -O2

Operation	Throughput	Latency (p99)
Insert (100B value)	850K ops/sec	12 µs
Get (100B value)	1.2M ops/sec	8 µs
Suffix search (10 matches)	450K ops/sec	25 µs
Flush (10K keys)	180 ops/sec	18 ms

Limitations

Current Limitations

Limitation	Value	Rationale
Max key length	1,024 bytes	Stack allocation in search
Max value size	~2GB	`size_t` on 32-bit systems
Concurrent writers	Serialized	Single mutex design
Transactions	Not supported	Simplicity
Compression	Not supported	Future enhancement

Known Trade-offs

Memory usage — 2KB per node is high for sparse key distributions
Suffix-only search — Prefix search requires full scan
Single-file persistence — No WAL, potential data loss window
No range queries — Trie structure doesn't support ordered iteration

Future Enhancements

Adaptive radix trie (reduce node memory)
Write-ahead logging (reduce data loss window)
Read-write lock (improve read concurrency)
Value compression (LZ4/Zstd)
Memory-mapped I/O option
Prefix search via separate forward trie

License

GNU Lesser General Public License v2.1 (LGPL-2.1)

This library is free software; you can redistribute it and/or modify it under
the terms of the GNU Lesser General Public License as published by the Free
Software Foundation; either version 2.1 of the License, or (at your option)
any later version.

Contributing

Fork the repository
Create a feature branch
Ensure all tests pass: make test
Run with sanitizers: make debug
Submit a pull request

Code style:

C11 standard
4-space indentation
snake_case for functions and variables
UPPER_CASE for constants and macros

Full Changelog: https://github.com/ALH477/DeMoD-StreamDB/commits/C

StreamDB v2 C