Skip to content

Commit d1ee8d2

Browse files
committed
overhaul, mvp
Signed-off-by: allen-munsch <james.a.munsch@gmail.com>
1 parent 56e8716 commit d1ee8d2

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

53 files changed

+3717
-1180
lines changed

Makefile

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -130,6 +130,32 @@ shell:
130130
iex:
131131
@docker compose exec coordinator bin/semantic_fabric remote
132132

133+
# Local Development Commands (using asdf)
134+
.PHONY: local-deps local-compile local-test local-run local-format local-iex
135+
local-deps:
136+
@echo "Fetching local dependencies..."
137+
@mix deps.get
138+
139+
local-compile:
140+
@echo "Compiling local project..."
141+
@mix compile
142+
143+
local-test:
144+
@echo "Running local test suite..."
145+
@mix test
146+
147+
local-run:
148+
@echo "Running local application..."
149+
@mix run --no-halt
150+
151+
local-format:
152+
@echo "Formatting local code..."
153+
@mix format
154+
155+
local-iex:
156+
@echo "Starting local IEx console..."
157+
@iex -S mix
158+
133159
# Utility commands
134160
.PHONY: update-deps
135161
update-deps:

README.md

Lines changed: 152 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -1,79 +1,191 @@
11
# MosaicDB
22

3-
Distributed semantic search built on SQLite shards ( a sketch )
3+
### A Distributed, Federated Semantic Search Engine Built on SQLite Shards
44

5-
## Setup
5+
MosaicDB is an experimental distributed query engine performing **hybrid vector + metadata search** across many **immutable SQLite shard files**. Each shard contains:
66

7-
```bash
8-
make build
9-
make up
10-
```
7+
* Document text or metadata
8+
* Vector embeddings (`sqlite-vss`)
9+
* PageRank or other ranking signals
10+
11+
Elixir acts as the **coordinator and control plane**, orchestrating fan-out queries, retries, merges, caching, and ranking.
12+
13+
---
14+
15+
# Features
16+
17+
* Federated search across multiple SQLite shards
18+
* Vector similarity search using `sqlite-vss`
19+
* Metadata-aware filtering
20+
* PageRank-based reranking
21+
* LRU embedding cache
22+
* Distributed coordinator architecture
23+
* HTTP API for search
24+
* Metrics via Prometheus/Grafana
25+
26+
MosaicDB combines **SQLite simplicity with Erlang/Elixir scale**. Each node is a lightweight SQLite database capable of storing both vector embeddings and structured metadata. Distributed across multiple nodes, MosaicDB provides fault-tolerant, scalable storage **without the overhead of managed clusters**.
27+
28+
---
29+
30+
# Feature Comparison
31+
32+
| Feature | PostgreSQL | Pinecone | Weaviate | MosaicDB (SQLite nodes) |
33+
| --------------- | ----------------------------- | ------------- | ------------- | --------------------------------- |
34+
| SQL support | Yes | No | No | Yes, native SQLite queries |
35+
| Vector search | Extensions needed (pgvector) | Yes | Yes | Yes, exact or approximate |
36+
| Distribution | Manual (sharding/replication) | Managed | Managed | Built-in via Elixir/Erlang |
37+
| Fault tolerance | Manual / HA setups | Cloud-managed | Cloud-managed | Erlang/Elixir supervision trees |
38+
| Lightweight | Moderate | No | No | Each node is a single SQLite file |
39+
| Edge-ready | No | No | No | Yes, nodes are self-contained |
40+
41+
**Developer Pitch:**
42+
MosaicDB gives developers a **lightweight, distributed vector + relational database** where each node is just a SQLite file. Fully SQL-capable, fault-tolerant via Erlang/Elixir, and easy to deploy at the edge — you get vector search + relational queries in one place, without complex cluster management or cloud lock-in. It’s **SQLite simplicity with Erlang reliability**.
43+
44+
---
45+
46+
# Why Elixir?
1147

12-
Done. Go to http://localhost/health
48+
MosaicDB uses Elixir for its coordination layer because it naturally fits **federated query execution**:
1349

14-
## What It Does
50+
### Concurrency for fan-out search
1551

16-
Takes queries, searches across SQLite database shards, returns ranked results.
52+
Each shard query runs as an isolated BEAM process—no thread pools, no shared state, no locks.
1753

18-
Each shard = immutable SQLite file with documents, vectors, and PageRank scores.
54+
### Supervisor-based fault tolerance
1955

20-
This is a system for performing semantic search across many distributed chunks of data (shards) using embedding-based vector search (via sqlite-vss) and then ranking the results.
56+
Shard errors, timeouts, or node failures are isolated and automatically recovered.
2157

22-
All in a federated/distributed way, inspired by scalable distributed systems like Riak.
58+
### Predictable under load
2359

24-
## Services
60+
The BEAM scheduler ensures slow shards do not block others.
2561

26-
- **Coordinator** (4040): Routes queries
27-
- **Nginx** (80): Load balancer
28-
- **Redis** (6379): Cache
29-
- **Prometheus** (9090): Metrics
30-
- **Grafana** (3000): Dashboards
62+
### Built-in distribution
63+
64+
Elixir nodes auto-discover and form a cluster, enabling multi-node coordination without external registries.
65+
66+
### Clean pipeline composition
67+
68+
Query planning, merging, and reranking are expressed using functional pipelines and pattern matching.
69+
70+
### Observability
71+
72+
LiveDashboard, telemetry, and introspection tools simplify distributed debugging.
73+
74+
**In short:** Elixir is the resilient, concurrent **control plane** around fast SQLite shards.
75+
76+
---
77+
78+
# Quick Start
79+
80+
## Build and run
81+
82+
```bash
83+
make build
84+
make up
85+
```
3186

32-
## API
87+
Check health:
3388

3489
```bash
35-
# Health
3690
curl http://localhost/health
91+
```
3792

38-
# Search (placeholder)
39-
curl -X POST http://localhost/api/search \
40-
-d '{"query":"test"}' \
41-
-H "Content-Type: application/json"
93+
---
94+
95+
# API
96+
97+
### Health
98+
99+
```bash
100+
curl http://localhost/health
42101
```
43102

44-
## Commands
103+
### Search (placeholder API)
45104

46105
```bash
47-
make up # Start
48-
make down # Stop
49-
make logs # Logs
50-
make restart # Restart
106+
curl -X POST http://localhost/api/search \
107+
-H "Content-Type: application/json" \
108+
-d '{"query": "test"}'
51109
```
52110

53-
## Local Dev
111+
---
112+
113+
# Components
114+
115+
| Service | Port | Description |
116+
| ----------- | ---- | -------------------------- |
117+
| Coordinator | 4040 | Elixir-based query router |
118+
| Nginx | 80 | Load balancer / entrypoint |
119+
| Redis | 6379 | Metadata + embedding cache |
120+
| Prometheus | 9090 | Metrics |
121+
| Grafana | 3000 | Dashboards |
122+
123+
---
124+
125+
# Development
126+
127+
Install dependencies:
54128

55129
```bash
56130
mix deps.get
131+
```
132+
133+
Run the system:
134+
135+
```bash
57136
mix run --no-halt
58137
```
59138

60-
## Architecture
139+
---
140+
141+
# Basic Architecture
142+
143+
```
144+
Client Query
145+
146+
Nginx
147+
148+
Coordinator (Elixir)
149+
┌────┴─────────────┐
150+
│ fan-out async RPC│
151+
└────┬─────────────┘
152+
Many SQLite Shards
153+
154+
Vector + metadata search
155+
156+
Coordinator merges + ranks
157+
158+
Response
159+
```
160+
161+
---
162+
163+
# Scaling
61164

165+
To scale horizontally, edit `docker-compose.yml` and increase coordinator workers:
166+
167+
```yaml
168+
scale: 4
62169
```
63-
Query → Nginx → Coordinator → Shards (SQLite files)
64-
65-
Cache (Redis)
170+
171+
Then:
172+
173+
```bash
174+
make restart
66175
```
67176

68-
## Scaling
177+
Elixir nodes will auto-discover each other (via libcluster) and share load.
178+
179+
---
69180

70-
Edit `docker-compose.yml`, add more workers, restart.
181+
# Documentation
71182

72-
## Docs
183+
* `docs/ARCHITECTURE.md` — data flow, shard layout, search pipeline
184+
* `docs/DEPLOYMENT_GUIDE.md` — running MosaicDB in production
185+
* `docs/SHARD_FORMAT.md` — SQLite schema, embeddings, PageRank structure
73186

74-
- `docs/ARCHITECTURE.md` - How it works
75-
- `docs/DEPLOYMENT_GUIDE.md` - Production setup
187+
---
76188

77-
## License
189+
# License
78190

79191
MIT

config/config.exs

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
11
import Config
2+
3+
config :nx, :default_backend, EXLA.Backend
4+
config :bumblebee, :default_backend, EXLA.Backend
5+
26
config :logger, level: :info
3-
import_config "#{config_env()}.exs"
7+
import_config "#{config_env()}.exs"

0 commit comments

Comments
 (0)