High-Performance Multi-Model Database with Native AI/LLM Integration
"ThemisDB keeps its own llamas." – Run LLaMA, Mistral, Phi-3 directly in your database, no API calls needed.
- 📝 Grammar-Constrained Generation - EBNF/GBNF support for guaranteed valid JSON/XML/CSV outputs (95-99% reliability vs 60-70%)
- Built-in grammars: JSON, XML, CSV, ReAct Agent
- Thread-safe grammar cache with LRU eviction
- Zero post-processing required
- 🔭 RoPE Scaling - Extended context window from 4K → 32K tokens (8x increase)
- Linear, NTK-aware, YaRN scaling methods
- Process entire research papers and codebases
- 🖼️ Vision Support - Multi-modal LLMs with CLIP-based image encoding
- LLaVA integration for image analysis
- Single and multiple image support
- Thread-safe VisionEncoder class
- ⚡ Flash Attention - CUDA kernels for 15-25% speedup, 30% memory reduction
- Optimized attention mechanism
- Backward pass for training support
- Multi-compute capability support
- 🎯 Speculative Decoding - 2-3x faster inference with draft+target models
- 🔄 Continuous Batching - 2x+ throughput with dynamic request batching
- 🔥 Hot Spare Management - Automatic failover with health monitoring
- 📊 Enhanced Prometheus Metrics - LLM inference, cache performance, response tracking
- 🔄 WAL Replication via gRPC - Distributed inter-shard replication
- 🎮 Multi-GPU LoRA Support - Distributed LoRA adapters across GPUs
- 🐘 PostgreSQL Protocol - COPY, prepared statements, transaction support
- 31 new test suites with comprehensive coverage
- 11 new performance benchmarks
- 17 new documentation guides
- 938 files changed (+113,762 lines, -45,154 lines)
📚 Quick Links:
- Grammar-Constrained Generation
- RoPE Scaling Guide
- Vision Support Quick Start
- Flash Attention Implementation
- Complete Changelog
- 🔧 Critical Fix - Resolved server hang in RAID cluster mode at "Adaptive Index Manager initialized"
- 🎯 Root Cause - AdaptiveIndexManager MVCC coordination across 2 CFs before Sharding Manager initialization
- ✅ Solution - Conditional Column Family opening when
THEMIS_ENABLE_SHARDING=truedetected - 🔧 Port Mapping Fix - Corrected docker-compose HTTP mappings from
808X:8080to808X:8765 - 🧪 RAID Testing - 2-hour endurance test suite with monitoring dashboard
- 📊 Verification - All 9 RAID shards (RAID 0/1/5) operational with 0% error rate
- 📦 Build Optimization - Docker build context reduced from 3GB to 85MB (97% reduction)
Files Modified: src/storage/rocksdb_wrapper.cpp, src/server/http_server.cpp, docker/compose/docker-compose-sharding.yml, Dockerfile.themis-server, .dockerignore
Tools Added: scripts/raid_endurance_test.py, scripts/monitor_raid_test.ps1
- 🛡️ Critical Security Fixes - Addressed 7 critical and 8 medium severity issues in RocksDB wrapper (100% segfault risk elimination)
- 🐳 Docker Security - Upgraded to Ubuntu 24.04 LTS with 80%+ CVE reduction
- 🔄 Update Checker Security - Secure token handling, HTTPS-only, thread-safe implementation
- 🔐 Binary Authenticity - Cryptographic manifest signing architecture (RSA-4096, SHA-256)
- 🔧 Memory Safety - Fixed use-after-free vulnerabilities, memory leaks, and resource management
- ✅ Verification - CodeQL passed, comprehensive audit reports, all critical issues resolved
Quick Links: Security Summary (EN) | Sicherheitszusammenfassung (DE) | Security Audit | Security Policy
- 🗣️ Natural Language Voice Interaction - Similar to Alexa/Siri, powered by Whisper.cpp + Piper TTS + llama.cpp
- 📞 Phone Call Recording - Automatic transcription and secure storage with revision control
- 📝 Meeting Protocol Generation - AI-powered meeting minutes, key points, and action items
- 🎯 Speaker Diarization - Identify and label different speakers in recordings
- 🔒 Revision-Safe Storage - All recordings stored securely in ThemisDB with full audit trails
- 🌐 Multi-Language Support - 100+ languages for STT, multiple voices for TTS
Quick Links: Voice Assistant Guide (EN) | Sprachassistent (DE)
- 🎯 Git Flow Implementation -
mainas production release branch,developas integration branch - 📚 Comprehensive Documentation - Bilingual guides (DE/EN), visual workflows, quick reference cards
- 🚀 Branch-Based CI/CD - Fast builds on
develop(~5-10 min), production builds onmain(~30-40 min) - 🔄 Migration Guide - Step-by-step transition for existing contributors
- 🛡️ Branch Protection Setup - GitHub configuration with automated scripts
Quick Links: Branching Strategy (DE) | Branching Strategy (EN) | Quick Reference | Documentation Hub
- 🌐 HTTP/2 with Server Push for CDC/Changefeed with ~0ms latency
- 📡 WebSocket support with CDC streaming for real-time communication
- 📬 MQTT broker with WebSocket transport and monitoring
- ⚡ HTTP/3 base implementation with QUIC protocol
- 🐘 PostgreSQL Wire Protocol for BI tool compatibility
- 🔌 MCP Server (Model Context Protocol) support
- 🖼️ Image Analysis AI Plugin Architecture running parallel with LLM
- 🔧 Multi-backend support: llama.cpp Vision, ONNX Runtime, OpenCV DNN, OpenVINO, ncnn
- 🛠️ Plugin interfaces:
IImageAnalysisBackend,ImageAnalysisManager - 🧪 Comprehensive unit tests (15+ test cases) and benchmarks
- 📄 Added
ATTRIBUTIONS.mddocumenting 15+ core dependencies - 🏆 Documented ThemisDB's 12 unique innovations
- ✅ Clear attribution for all major dependencies
🧠 Native LLM Integration with llama.cpp (Optional)
"ThemisDB keeps its own llamas." – Run AI/LLM workloads directly in your database - no external API costs!
[!NOTE] LLM integration is an optional feature that requires: -# LLM Features (When Enabled)
| Feature | Description | Status |
|---|---|---|
| 🧠 Embedded LLM Engine | llama.cpp integration for LLaMA/Mistral/Phi-3 (1B-70B params) | ✅ |
| 📝 Grammar Constraints | EBNF/GBNF for guaranteed valid JSON/XML/CSV outputs | ✅ v1.4.0-alpha |
| 🔭 RoPE Scaling | Extended context window 4K → 32K tokens (8x increase) | ✅ v1.4.0-alpha |
| 🖼️ Vision Support | Multi-modal LLMs with CLIP image encoding (LLaVA) | ✅ v1.4.0-alpha |
| ⚡ Flash Attention | CUDA kernels: 15-25% speedup, 30% memory reduction | ✅ v1.4.0-alpha |
| 🎯 Speculative Decoding | 2-3x faster inference with draft+target models | ✅ v1.4.0-alpha |
| 🔄 Continuous Batching | 2x+ throughput with dynamic request batching | ✅ v1.4.0-alpha |
| 🎙️ Voice Assistant | STT/TTS/LLM for phone calls, meetings, voice commands (Enterprise) | ✅ |
| 🖼️ Image Analysis AI | Multi-backend plugins (llama.cpp Vision, ONNX CLIP, OpenCV DNN) | ✅ |
| ⚡ GPU Acceleration | NVIDIA CUDA support with significant speedup | ✅ |
| 💾 PagedAttention | Advanced memory management | ✅ |
| 🔧 Quantization | Q4_K_M, Q5_K_M, Q8_0 for efficient memory usage | ✅ |
| 📊 Monitoring | Grafana dashboards with metrics and alerts | ✅ |
| 🔌 Plugin Architecture | Extensible LLM and image analysis backends | ✅ |
| 🌐 Distributed RPC | Inter-shard communication for distributed LLM ops | ✅ |
[!TIP] GPU acceleration provides significant speedup over CPU with PagedAttention memory savings.
- ⚡ Significant speedup with GPU acceleration vs CPU
- 💾 Memory savings with PagedAttention and prefix caching
- 🚀 Kernel fusion for additional performance gains
- ✅ Comprehensive test coverage with unit tests
📚 Documentation:
- 🧠 LLM Complete Setup Guide (DE) - Vollständiger Guide für LLM-Setup und Inferencing
- 🎯 Overview - System architecture and design
ThemisDB is a production-ready multi-model database that combines relational, graph, vector, and document models in a single system with full ACID transaction support. Built on RocksDB with advanced security and compliance features.
Available Editions
| Edition | License | Features | Use Case |
|---|---|---|---|
| 🔹 Minimal | Open Source (MIT) | Core database only - no LLM, GPU, sharding, advanced protocols | Embedded systems, IoT, edge devices, fast builds |
| 🆓 Community | Open Source (MIT) | Full-featured single-node database with all core capabilities | Development, startups, single-server deployments |
| 🔒 Enterprise | Commercial | + Horizontal scaling, advanced analytics, HA/replication, and more | Large-scale production deployments |
→ See Minimal Edition Details | → See Enterprise Edition Details
Database Capabilities
| Feature | Description | Community | Enterprise |
|---|---|---|---|
| 🚀 Quick Start |
# Pull and run the latest version
docker pull themisdb/themisdb:latest
# Run with Docker
docker run -d \
--name themis \
-p 8080:8080 \
-p 18765:18765 \
-p 4318:4318 \
-v themis_data:/data \
themisdb/themisdb:latest
# Or use Docker Compose
docker compose up -d
# Verify installation
curl http://localhost:8080/health[!TIP] Use Docker Compose for production deployments with proper configuration.
| Port | Protocol | Description |
|---|---|---|
8080 |
HTTP/1.1 | REST API, GraphQL |
18765 |
Binary | Wire Protocol, gRPC |
4318 |
HTTP | OpenTelemetry/Prometheus |
[!NOTE] Complete Port Reference: See [v1.3.0+)
- ✅ Image Analysis - Multi-backend AI plugins (v1.3.0+)
- ✅ GNN Embeddings - Graph Neural Network support
🌐 Modern Protocols
| Protocol | Status | Description |
|---|---|---|
| HTTP/1.1 | ✅ | REST API, GraphQL |
| HTTP/2 | ✅ | Server Push for CDC |
| HTTP/3 | 🚧 | QUIC (experimental) |
| WebSocket | ✅ | Bidirectional streaming |
| gRPC | ✅ | Binary RPC |
| MQTT | ✅ | IoT messaging |
| PostgreSQL Wire | ✅ | BI tool compatibility |
| MCP | ✅ | Model Context Protocol |
| SSE | ✅ | Server-Sent Events |
📚 Transparency & Attribution
ThemisDB is built on proven open-source foundations with clear attribution:
- ✅ Transparent Attribution - Clear documentation of all dependencies
- ✅ Innovation Documentation - ThemisDB's unique contributions vs third-party features
- ✅ License Compliance - Full license information for all components
- 🔒 ACID Transactions - Full snapshot isolation with MVCC
- 🔍 Multi-Model - Relational, Graph, Vector, Document in one database
- 🚀 High Performance - 45K writes/s, 120K reads/s, GPU-accelerated vector search
- 🛡️ Security - TLS 1.3, RBAC, field-level encryption, audit logging (Enterprise: HSM integration)
- 📊 Analytics - Time-series, aggregations (Enterprise: OLAP, CEP, materialized views)
- 🌐 Distribution - Single-node optimized (Enterprise: horizontal sharding, replication, Kubernetes)
- 🧠 AI-Ready - Hybrid search (RAG), embedding cache, FAISS integration, optional LLM engine with llama.cpp (v1.3.0+), image analysis AI plugins (v1.3.0+), voice assistant with STT/TTS (v1.4.0+, Enterprise)
- 🌐 Modern Protocols - HTTP/1.1, GraphQL, SSE, gRPC (v1.3.0), HTTP/2 with Server Push ✅, WebSocket ✅, MQTT ✅, HTTP/3 🚧, PostgreSQL Wire ✅, MCP ✅
- 📚 Transparent Attribution - Clear documentation of third-party dependencies vs ThemisDB innovations (see ATTRIBUTIONS.md)
- 🖼️ Image Analysis - Multi-backend AI plugin architecture (llama.cpp Vision, ONNX CLIP, OpenCV DNN)
# Pull and run the latest stable version
docker pull themisdb/themisdb:latest
docker run -d \
-p 8080:8080 \
-p 18765:18765 \
-p 4318:4318 \
-v themis_data:/data \
themisdb/themisdb:latest
# Or use nightly builds (latest development version)
docker pull themisdb/server:nightly
docker run -d -p 8080:8080 -p 18765:18765 -v themis_data:/data themisdb/server:nightly
# Or use Docker Compose
docker compose up -d
# Check health
curl http://localhost:8080/healthDefault Ports:
8080- HTTP/REST API, GraphQL18765- Binary Wire Protocol, gRPC4318- OpenTelemetry/Prometheus metrics
📖 Complete Port Reference: See docs/deployment/PORT_REFERENCE.md for all ports including optional protocols (MQTT, PostgreSQL Wire, MCP).
Release Builds: Automated release builds with edition support (community/enterprise/hyperscaler) are available on DockerHub. Each release includes automatically generated changelogs and enterprise security artifacts (SBOM, signatures, SLSA provenance). See Release Build Documentation and Release Changelogs for details.
# Clone repository
git clone https://github.com/makr-code/ThemisDB.git
cd ThemisDB
# Setup and build (Linux/macOS)
./scripts/setup.sh
./scripts/build.sh
# Setup and build (Windows)
.\scripts\setup.ps1
.\scripts\build.ps1
# Start server
./build/themis_server --config config.yaml[!TIP] Configuration: See
config/config.yamlfor complete configuration reference with all available options, or check the Configuration Guide | Tuning Guide (DE) for detailed tuning recommendations.
Optional Protocol Support (Security: Opt-In by Default):
# Enable HTTP/2 with Server Push (explicit opt-in for security)
cmake -B build -S . -DTHEMIS_ENABLE_HTTP2=ON
# Enable WebSocket with CDC (explicit opt-in for security)
cmake -B build -S . -DTHEMIS_ENABLE_WEBSOCKET=ON
# Enable MQTT broker (explicit opt-in for security)
cmake -B build -S . -DTHEMIS_ENABLE_MQTT=ON
# Enable PostgreSQL Wire Protocol (explicit opt-in for security)
cmake -B build -S . -DTHEMIS_ENABLE_POSTGRES_WIRE=ON
# Enable MCP for LLM integration (explicit opt-in for security)
cmake -B build -S . -DTHEMIS_ENABLE_MCP=ON
# Enable HTTP/3 (explicit opt-in for security)
cmake -B build -S . -DTHEMIS_ENABLE_HTTP3=ON
# Default build only includes HTTP/1.1, GraphQL, SSE, gRPC (minimal attack surface)See Protocol Documentation for details.
# OPTIONAL: Für LLM-Unterstützung - lokaler Clone von llama.cpp erforderlich
if (!(Test-Path "llama.cpp")) {
git clone https://github.com/ggerganov/llama.cpp.git llama.cpp
}
# MSVC Release-Build mit LLM-Unterstützung
powershell -File scripts/build-themis-server-llm.ps1
# Sanity-Check
./build-msvc/bin/themis_server.exe --helpHinweise:
- LLM-Unterstützung ist optional und erfordert
-DTHEMIS_ENABLE_LLM=ONbeim Build llama.cpp/liegt als lokaler Clone im Projekt-Root und ist per.gitignoreund.dockerignoreausgeschlossen (wird nicht committed oder in Docker kopiert)- Der Build-Skript setzt Visual Studio 2022 (
-G "Visual Studio 17 2022") und-A x64, bindet die vcpkg-Toolchain ein und behebt MSVC‑spezifischechar8_t‑Fehler amllama‑Target
→ Comprehensive Build Documentation | Build-Varianten, Plattformen, Troubleshooting
Linux (Debian/Ubuntu):
wget https://github.com/makr-code/ThemisDB/releases/latest/download/themisdb_1.3.0-1_amd64.deb
sudo apt install ./themisdb_1.3.0-1_amd64.deb
sudo systemctl start themisdbmacOS (Homebrew):
brew install themisdb
brew services start themisdbWindows (Chocolatey):
choco install themisdb# 1. Check server health
curl http://localhost:8765/health
# 2. Create an entity
curl -X PUT http://localhost:8765/entities/users:alice \
-H "Content-Type: application/json" \
-d '{"blob":"{\"name\":\"Alice\",\"age\":30,\"city\":\"Berlin\"}"}'
# 3. Create an index
curl -X POST http://localhost:8765/index/create \
-H "Content-Type: application/json" \
-d '{"table":"users","column":"city"}'
# 4. Query by index
curl -X POST http://localhost:8765/query \
-H "Content-Type: application/json" \
-d '{"table":"users","predicates":[{"column":"city","value":"Berlin"}],"return":"entities"}'
# 5. View metrics
curl http://localhost:8765/metricsThemisDB uses a unified storage architecture with specialized projection layers:
┌─────────────────────────────────────────────────────────┐
│ Query Layer (AQL) │
│ SQL-like • Graph Traversals • Vector Search • Analytics│
├─────────────────────────────────────────────────────────┤
│ Projection Layers │
│ Secondary Indices • Graph Adjacency • HNSW Vector │
├─────────────────────────────────────────────────────────┤
│ Canonical Storage (Base Entity) │
│ RocksDB LSM-Tree • MVCC Transactions │
└─────────────────────────────────────────────────────────┘
Core Components:
- Storage Engine: RocksDB TransactionDB with LSM-Tree
- Transaction Manager: MVCC with snapshot isolation
- Query Engine: Advanced Query Language (AQL) with graph/vector support
- Index Manager: Automatic maintenance of secondary, graph, and vector indexes
- Security: TLS 1.3, RBAC, field encryption, audit logging
- Observability: Prometheus metrics, OpenTelemetry tracing
→ Full Architecture Documentation
- Relational: SQL-like queries with secondary indexes
- Graph: BFS, Dijkstra, A* traversals with path constraints
- Vector: HNSW and FAISS for similarity search (GPU-accelerated)
- Document: JSON storage with flexible schema
- Time-Series: Gorilla compression, continuous aggregates
- Full ACID guarantees with snapshot isolation
- Write-write conflict detection
- Atomic updates across all index types
- Session-based and direct API
- CEP Engine: Complex Event Processing with pattern matching
- OLAP: CUBE, ROLLUP, window functions
- Time-Series: Compression, retention policies, aggregates
- Hybrid Search: BM25 + vector for RAG workflows
- TLS 1.3 with mTLS support
- Role-Based Access Control (RBAC)
- Field-level encryption
- Audit logging with SIEM integration
- Certificate pinning for HSM/TSA
- Secrets management (HashiCorp Vault)
- Horizontal sharding with consistent hashing
- Leader-follower and multi-master replication
- RAID-like redundancy (MIRROR, STRIPE, PARITY)
- Kubernetes operator with CRDs
- Auto-rebalancing and cloud deployment
- 10 backend options: CUDA, Vulkan, HIP, OpenCL, DirectX, OneAPI, ZLUDA
- 10-50x speedup for vector search
- Automatic platform detection and fallback
Getting Started:
Core Concepts:
Features:
Operations:
Development:
Full Documentation: https://makr-code.github.io/ThemisDB/
Getting Started
Core Concepts
Features
Operations
Development
- 🔨 Build Guide
- 🌿 Branching Strategy | EN
- 🤝 Contributing
- 📖 API Reference
- 📦 Client SDKs
Enterprise & Strategy
- 🏛️ CMS Strategy Paper (DE) - ThemisDB für Content Management in Government und Enterprise
- 💼 Enterprise Edition - Enterprise features and licensing
- 📊 Governance - Data governance and policies
[!NOTE] Full Documentation: https://makr-code.github.io/ThemisDB/
Production-Ready Features
- ✅ ACID transactions with MVCC
- ✅ Multi-model support (relational, graph, vector, document)
- ✅ Horizontal sharding and replication
- ✅ GPU acceleration (10 backends)
- ✅ Enterprise security features
- ✅ Client SDKs (7 languages)
- ✅ Kubernetes operator
- ✅ Native LLM integration (optional)
- ✅ Modern protocol support (HTTP/2, WebSocket, gRPC, MQTT, PostgreSQL Wire, MCP)
- 🚧 Query Optimizer - Advanced query optimization and execution plans
- 🚧 Multi-Datacenter - Cross-region deployment support
- 🚧 Advanced ML/GNN - Enhanced machine learning features
- 🚧 Production Hardening - Additional stability and performance improvements
- 📋 Modular Architecture - Split monolithic core into 11 focused libraries
- 📋 Real-Time Views - Materialized views with automatic updates
- 📋 Cross-Region Replication - Global data distribution
- 📋 Advanced Compliance - SOC 2, HIPAA certification
- 📋 Cloud-Native Optimizations - Enhanced cloud provider integrations
📚 Detailed Planning:
Test Environment: Release build, Windows x64, 20 cores @ 3696 MHz
| Operation | Throughput | Latency (avg) | Notes |
|---|---|---|---|
| 📝 Entity PUT | 45,000 ops/s | 0.02 ms | Write throughput |
| 📖 Entity GET | 120,000 ops/s | 0.008 ms | Read throughput |
| 🔍 Indexed Query | 3.4M queries/s | 0.29 μs | AQL WHERE clause |
| 🕸️ Graph Traverse | 9.56M ops/s | 0.105 μs | BFS (depth=3) |
| 🎯 Vector Search (RGB) | 59.7M queries/s | 0.017 μs | Simple 3D vectors |
| 📊 Vector Insert (384D) | 411k vectors/s | 2.44 μs | Typical embeddings |
| 🧠 RAG Search (Top-50) | 7.17M queries/s | 0.14 μs | LLM retrieval |
[!IMPORTANT] Performance Disclaimer: Benchmarks represent optimal conditions. Actual performance varies based on:
- Hardware configuration (CPU, RAM, storage)
- Data size and complexity
- Concurrent workload patterns
- Build configuration and optimizations
📊 Detailed Analysis:
- Complete Benchmark Results
- [🤝 Community & Support
| Resource | Description | Link |
|---|---|---|
| 📚 Documentation | Complete guides and API reference | Docs Site |
| 🐛 Issues | Report bugs or request features | GitHub Issues |
| 💬 Discussions | Community Q&A and discussions | GitHub Discussions |
| 🤝 Contributing | How to contribute to ThemisDB | Contributing Guide |
| 🌿 Branching Strategy | Git Flow workflow (main = release, develop = dev) | Branching Strategy |
| 🔒 Security | Responsible disclosure policy | Security Policy |
License Information
ThemisDB Community Edition is released under the MIT License.
- ✅ Free to use, modify, and distribute
- ✅ Commercial use allowed
- ✅ Full feature set for single-node deployments
ThemisDB Enterprise Edition features (horizontal sharding, advanced analytics, HA/replication, etc.) are available under a commercial license.
Enterprise Inquiries: [email protected]
ThemisDB builds upon and is inspired by these excellent projects:
Inspirations & Foundations
| Project | Influence | Area |
|---|---|---|
| ArangoDB | Multi-model architecture | Design Philosophy |
| CozoDB | Hybrid relational-graph-vector | Data Models |
| Azure Cosmos DB | Multi-model with unified API | API Design |
| RocksDB | High-performance LSM-Tree storage | Storage Engine |
| FAISS | Efficient similarity search | Vector Search |
[!NOTE] For a complete list of third-party libraries and detailed feature attributions, see ATTRIBUTIONS.md.
Built with ❤️ for the database community
Built with ❤️ for the database community