Skip to content

Latest commit

 

History

History
217 lines (143 loc) · 8.26 KB

File metadata and controls

217 lines (143 loc) · 8.26 KB

SBDK: Local-First Data & AI Development Tools

Build and test complete data pipelines in 30 seconds. Zero cloud setup, zero Docker, zero cost.

Five production-ready reference implementations demonstrating how to build local-first data and AI tools—from pipeline sandboxes to ML-in-SQL to conversational analytics.


The Problem We Solve

Traditional data pipeline development is slow and expensive:

  • Setting up a dev environment takes days (Docker, cloud accounts, configuration)
  • Testing requires deploying to cloud infrastructure ($$$)
  • Iteration cycles are painfully slow (push → wait → test → repeat)
  • Breaking production is expensive and stressful

SBDK tools run everything locally:

  • Full dev environment in 30 seconds (1 command)
  • Test everything safely on your laptop (zero cost)
  • Instant iteration cycles (30-second feedback loops)
  • Production patterns validated before deployment

Who Should Use These?

🛠️ Data Engineers

Testing dbt models and data pipelines without cloud infrastructure

Use SBDK.dev to get instant local DuckDB + dbt + DLT environment, test transformations, iterate fast

🏗️ Platform Engineers

Building data tools and evaluating infrastructure patterns

Study the codebases to see professional CLI architecture, MCP server patterns, exception handling, testing frameworks

📚 Data Engineering Students

Learning modern data stack without deployment complexity

Run working examples of dbt transformations, DuckDB queries, Rust extensions, AI integrations—all on your laptop


The 5 Projects

Core Foundation

1. 🏗️ SBDK.dev - Local Pipeline Sandbox

Get a complete data pipeline running in 30 seconds | Python | Active

A local development sandbox giving you DuckDB + dbt + DLT in 1 command. No Docker, no cloud, no configuration.

pip install sbdk-dev
sbdk init my_project && cd my_project
sbdk run  # Data generation → ingestion → transformation
sbdk query "SELECT * FROM orders_daily LIMIT 10"

Solves: Days of environment setup → 30 seconds. Cloud testing costs → zero. Slow iteration → instant feedback.

Try SBDK.dev

Extensions & Enhancements

2. 🧠 Mallard (local-inference) - ML in SQL

Run ML models directly in your database—no separate infrastructure | Rust | Archived

DuckDB extension for zero-shot predictions, embeddings, and feature importance. Write SQL, get ML.

-- Run zero-shot classification in SQL
SELECT predict_category(description) as category FROM products;

-- Generate embeddings
SELECT embed_text(content) as vector FROM documents;

Solves: Separate ML infrastructure → All in SQL. Model training complexity → Zero-shot inference. Python overhead → Rust performance.

Explore Mallard

3. 🔍 Semantic Tracer - Lineage Visualization

Understand complex dbt projects with interactive graphs | Rust + TypeScript | Archived

Desktop app visualizing dbt semantic layers. See how your metrics, dimensions, and entities connect.

  • Interactive lineage graphs (React Flow)
  • Direct semantic_models.yml integration
  • Tauri desktop app (fast Rust backend)

Solves: Complex dbt projects → Visual understanding. Scattered docs → Interactive exploration. Cloud tools → Local desktop app.

Explore Semantic Tracer

4. 💬 Local AI Analyst - Conversational Analytics

Ask data questions in natural language—with statistical rigor | Python | Archived

AI analyst that runs real queries first, then explains results. No hallucination—just actual data with confidence intervals.

  • Natural language → SQL → Results → Statistical analysis
  • Execution-first (prevents AI making up answers)
  • Automatic significance testing, confidence intervals

Solves: AI hallucination → Execution-first validation. Unreliable insights → Statistical rigor. SQL expertise needed → Natural language queries.

Explore Local AI Analyst

5. 🔌 knowDB - AI Assistant Integration

Query your data through Claude Desktop or ChatGPT | Python | Archived

MCP server connecting local data to AI assistants. Ask questions in Claude Desktop, get real query results.

  • MCP (Model Context Protocol) server implementation
  • Works with Claude Desktop, ChatGPT Desktop, any MCP client
  • Auto-sync dbt semantic layer

Solves: Separate tools for data/AI → Unified interface. Complex queries → Natural language. Context switching → Query from chat.

Explore knowDB

Documentation Hub

6. 🌐 sbdk.dev - This Website

Central hub with architecture guides and getting started | Next.js | Active

Visit sbdk.dev | View Source


What You Get From These Projects

Complete working code (not tutorials):

  • ✅ Run everything locally—no Docker, no cloud accounts
  • ✅ See how DLT, dbt, DuckDB, Rust, and MCP actually fit together
  • ✅ Production patterns you can adapt (CLI architecture, exception handling, testing)
  • ✅ MIT licensed—fork and use however you want

Technologies & patterns demonstrated:

  • Local-first data pipelines: DuckDB + dbt + DLT running on your laptop
  • Professional CLI design: Typer + Rich + Pydantic with exception hierarchies
  • Rust database extensions: High-performance DuckDB extensions
  • MCP server patterns: Connect data tools to AI assistants
  • Desktop apps with Tauri: Rust backend + React frontend
  • Statistical rigor: Execution-first AI to prevent hallucination

🚀 Getting Started

Quick Start with SBDK.dev

git clone https://github.com/sbdk-dev/sbdk-dev
cd sbdk-dev
pip install -e .
sbdk init my-project

Pick Your Project

Learn the Patterns

All projects include complete documentation, real-world examples, and comprehensive test coverage—perfect for learning modern data engineering and local-first development.


Why Archived?

These are complete, stable reference implementations—not active products. They're archived because they're done: production-quality code demonstrating proven patterns.

Perfect for:

  • Forking and adapting for your own projects
  • Learning from real, working code (not tutorials)
  • Understanding how modern data tools fit together

🚀 Quick Start

# Start with the foundation
git clone https://github.com/sbdk-dev/sbdk-dev
cd sbdk-dev
pip install -e .
sbdk init my-project

Or pick a specific project:


📚 Learn More

→ Visit sbdk.dev for architecture diagrams, use cases, and getting started guides

→ Browse all repositories to explore individual projects


MIT Licensed • Open Source • Archived Nov 2025 as reference implementations