replikativ
diff --git a/‎AGENTS.md‎
Lines changed: 9 additions & 9 deletions b/‎AGENTS.md‎
Lines changed: 9 additions & 9 deletions
diff --git a/‎NOTEBOOKS.md‎
Lines changed: 9 additions & 9 deletions b/‎NOTEBOOKS.md‎
Lines changed: 9 additions & 9 deletions
diff --git a/‎README.md‎
Lines changed: 19 additions & 19 deletions b/‎README.md‎
Lines changed: 19 additions & 19 deletions
diff --git a/‎doc/anomaly-detection.md‎
Lines changed: 10 additions & 10 deletions b/‎doc/anomaly-detection.md‎
Lines changed: 10 additions & 10 deletions
diff --git a/‎doc/architecture.md‎
Lines changed: 7 additions & 7 deletions b/‎doc/architecture.md‎
Lines changed: 7 additions & 7 deletions
diff --git a/‎doc/benchmarks.md‎
Lines changed: 1 addition & 1 deletion b/‎doc/benchmarks.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎doc/dataset.md‎
Lines changed: 2 additions & 2 deletions b/‎doc/dataset.md‎
Lines changed: 2 additions & 2 deletions
@@ -149,8 +149,8 @@ User → stratum.api/q
 ```
 
 **Data representations:**
-- `long[]` / `double[]` — heap arrays (JVM GC managed)
-- `PersistentColumnIndex` — chunked B-tree with per-chunk statistics and zone maps
+- `long[]` / `double[]` - heap arrays (JVM GC managed)
+- `PersistentColumnIndex` - chunked B-tree with per-chunk statistics and zone maps
 - `String[]` → dictionary-encoded `long[]` for group-by and LIKE
 
 ## Important Constraints
@@ -203,10 +203,10 @@ clj -M:release:test
 ## Technical Documentation
 
 See `doc/` for in-depth documentation:
-- [Architecture](doc/architecture.md) — System overview, module map, walkthrough
-- [SIMD Internals](doc/simd-internals.md) — Java Vector API patterns, JIT lessons
-- [Query Engine](doc/query-engine.md) — Dispatch logic, expressions, optimization
-- [Storage and Indices](doc/storage-and-indices.md) — Chunks, CoW, zone maps
-- [Benchmarks](doc/benchmarks.md) — Methodology, results, reproducing
-- [SQL Interface](doc/sql-interface.md) — PgWire server, SQL translation
-- [Anomaly Detection](doc/anomaly-detection.md) — Isolation forest training, scoring, online rotation
+- [Architecture](doc/architecture.md) - System overview, module map, walkthrough
+- [SIMD Internals](doc/simd-internals.md) - Java Vector API patterns, JIT lessons
+- [Query Engine](doc/query-engine.md) - Dispatch logic, expressions, optimization
+- [Storage and Indices](doc/storage-and-indices.md) - Chunks, CoW, zone maps
+- [Benchmarks](doc/benchmarks.md) - Methodology, results, reproducing
+- [SQL Interface](doc/sql-interface.md) - PgWire server, SQL translation
+- [Anomaly Detection](doc/anomaly-detection.md) - Isolation forest training, scoring, online rotation
@@ -10,7 +10,7 @@ Interactive notebooks for exploring Stratum's features.
 ;; In a running REPL (clj -M:repl):
 (require '[scicloj.clay.v2.api :as clay])
 
-;; Render to HTML — opens in browser at http://localhost:1971/
+;; Render to HTML - opens in browser at http://localhost:1971/
 (clay/make! {:source-path "notebooks/stratum_intro.clj" :show true})
 
 ;; Generate Quarto document
@@ -38,17 +38,17 @@ clj -M:dev -i notebooks/test_persistence.clj
 
 Introduction for Clojure data science practitioners:
 
-- **Column maps & SQL** — DSL query maps and SQL strings, same engine
-- **Tablecloth interop** — Pass `tc/dataset` directly, zero copy
-- **Fused SIMD execution** — Why it's fast, live timing on 1M rows
-- **Zone map pruning** — Range queries skip irrelevant chunks automatically
-- **Persistence** — `st/sync!`, `st/fork`, `st/load`, time-travel by commit UUID
-- **Statistics** — STDDEV, VARIANCE, CORR natively in a single pass
-- **Hash joins** — INNER, LEFT, RIGHT, FULL
+- **Column maps & SQL** - DSL query maps and SQL strings, same engine
+- **Tablecloth interop** - Pass `tc/dataset` directly, zero copy
+- **Fused SIMD execution** - Why it's fast, live timing on 1M rows
+- **Zone map pruning** - Range queries skip irrelevant chunks automatically
+- **Persistence** - `st/sync!`, `st/fork`, `st/load`, time-travel by commit UUID
+- **Statistics** - STDDEV, VARIANCE, CORR natively in a single pass
+- **Hash joins** - INNER, LEFT, RIGHT, FULL
 
 ### `datahike_integration.clj`
 
-Datahike + Stratum — entity queries alongside OLAP analytics, auto-sync
+Datahike + Stratum - entity queries alongside OLAP analytics, auto-sync
 via `d/listen!`, Yggdrasil composite for atomic snapshots, SQL via PgWire.
 
 ## Writing Notebooks
 
@@ -9,14 +9,14 @@
 
 SIMD-accelerated SQL engine for the JVM. Every table is a branchable, copy-on-write value.
 
-Stratum is a columnar analytics engine that combines the performance of fused SIMD execution with the semantics of immutable data. Tables are persistent values — fork one in O(1), modify it independently, persist snapshots to named branches, and time-travel to any previous commit. It's the same model as Clojure's persistent collections and git's object store, applied to analytical data.
+Stratum is a columnar analytics engine that combines the performance of fused SIMD execution with the semantics of immutable data. Tables are persistent values - fork one in O(1), modify it independently, persist snapshots to named branches, and time-travel to any previous commit. It's the same model as Clojure's persistent collections and git's object store, applied to analytical data.
 
 ## 30-Second Demo
 
 Start a PostgreSQL-compatible server and query CSV/Parquet files directly:
 
 ```bash
-# Standalone JAR — no Clojure needed, just Java 21+
+# Standalone JAR - no Clojure needed, just Java 21+
 java --add-modules jdk.incubator.vector -jar stratum-standalone.jar --demo
 
 # Or with your own data
@@ -50,7 +50,7 @@ clj -M:server --demo
 
 ## Performance
 
-Stratum's architecture — fused SIMD execution over copy-on-write columnar data — delivers strong analytical performance.
+Stratum's architecture - fused SIMD execution over copy-on-write columnar data - delivers strong analytical performance.
 
 Single-threaded comparison vs DuckDB v1.4.4 (JDBC in-process) on 10M rows, 8-core Intel Lunar Lake. Full results in [doc/benchmarks.md](doc/benchmarks.md).
 
@@ -134,7 +134,7 @@ clj -M:olap cb            # ClickBench tier only
 
 ## Snapshots and Branching
 
-Every Stratum dataset is a copy-on-write value. Fork one in O(1) to create an isolated branch — modifications only touch the changed chunks, everything else is structurally shared. Persist snapshots to named branches, load them back, or time-travel to any previous commit.
+Every Stratum dataset is a copy-on-write value. Fork one in O(1) to create an isolated branch - modifications only touch the changed chunks, everything else is structurally shared. Persist snapshots to named branches, load them back, or time-travel to any previous commit.
 
 ```clojure
 (require '[stratum.api :as st])
@@ -144,7 +144,7 @@ Every Stratum dataset is a copy-on-write value. Fork one in O(1) to create an is
                           :qty   (long-array [1 2 3])}
                          {:name "orders"}))
 
-;; O(1) fork — structural sharing, independent mutations
+;; O(1) fork - structural sharing, independent mutations
 (def experiment (st/fork ds))
 
 ;; Persist to storage
@@ -161,9 +161,9 @@ Every Stratum dataset is a copy-on-write value. Fork one in O(1) to create an is
 
 **DML**: SELECT, INSERT, UPDATE, DELETE, UPSERT (INSERT ON CONFLICT), UPDATE FROM (joined updates), CREATE TABLE, DROP TABLE
 
-**Joins**: INNER, LEFT, RIGHT, FULL — single and multi-column keys
+**Joins**: INNER, LEFT, RIGHT, FULL - single and multi-column keys
 
-**Window functions**: ROW_NUMBER, RANK, DENSE_RANK, NTILE, PERCENT_RANK, CUME_DIST, LAG, LEAD, SUM/AVG/COUNT/MIN/MAX OVER — with PARTITION BY, ORDER BY, and frame clauses
+**Window functions**: ROW_NUMBER, RANK, DENSE_RANK, NTILE, PERCENT_RANK, CUME_DIST, LAG, LEAD, SUM/AVG/COUNT/MIN/MAX OVER - with PARTITION BY, ORDER BY, and frame clauses
 
 **Subqueries and composition**: CTEs (WITH), correlated and uncorrelated subqueries, IN/NOT IN/EXISTS, set operations (UNION, INTERSECT, EXCEPT)
 
@@ -201,15 +201,15 @@ Bidirectional support: query `tech.ml.dataset` datasets directly with the Stratu
 
 ## Query DSL Reference
 
-> **Note:** The DSL is still a work in progress. SQL strings are the more complete interface — use the DSL when you want to compose queries programmatically or pass in Clojure data directly without a SQL layer.
+> **Note:** The DSL is still a work in progress. SQL strings are the more complete interface - use the DSL when you want to compose queries programmatically or pass in Clojure data directly without a SQL layer.
 
-The DSL is intentionally flat. Every clause resolves column names by keyword lookup against a single merged map: `:from` establishes the base columns, `:join` merges in the dimension table's columns, and all subsequent clauses (`:where`, `:agg`, `:group`, `:select`, `:having`, `:order`) reference any column by its keyword. This makes it straightforward to build queries from Clojure data — no quoting, no SQL string interpolation, just maps and vectors. Composition (the DSL equivalent of SQL CTEs/subqueries) is done with Clojure `let`/`def` — see [Column Scoping and Composition](doc/query-engine.md#column-scoping-and-composition) for details.
+The DSL is intentionally flat. Every clause resolves column names by keyword lookup against a single merged map: `:from` establishes the base columns, `:join` merges in the dimension table's columns, and all subsequent clauses (`:where`, `:agg`, `:group`, `:select`, `:having`, `:order`) reference any column by its keyword. This makes it straightforward to build queries from Clojure data - no quoting, no SQL string interpolation, just maps and vectors. Composition (the DSL equivalent of SQL CTEs/subqueries) is done with Clojure `let`/`def` - see [Column Scoping and Composition](doc/query-engine.md#column-scoping-and-composition) for details.
 
 ```clojure
 ;; Full query map
 {:from    {:col1 data1 :col2 data2}     ;; Column data (arrays, indices, or encoded)
  :join    [{:with {:k data}             ;; Dimension table columns
-            :on   [:= :col1 :k]        ;; :col1 from :from, :k from :with — both visible after join
+            :on   [:= :col1 :k]        ;; :col1 from :from, :k from :with - both visible after join
             :type :inner}]
  :where   [[:< :col1 100] [:like :name "%foo%"]]             ;; Predicates
  :select  [:col1 [:as [:* :col2 100] :pct]]                  ;; Projection
@@ -234,12 +234,12 @@ The DSL is intentionally flat. Every clause resolves column names by keyword loo
 
 ## Ecosystem
 
-Stratum is part of the [Replikativ](https://github.com/replikativ) ecosystem — a set of composable, immutable data systems:
+Stratum is part of the [Replikativ](https://github.com/replikativ) ecosystem - a set of composable, immutable data systems:
 
-- **[Datahike](https://github.com/replikativ/datahike)** — immutable graph database with Datalog queries
-- **[Yggdrasil](https://github.com/replikativ/yggdrasil)** — branching protocol for multi-system snapshots
-- **[Scriptum](https://github.com/replikativ/scriptum)** — full-text search
-- **[Proximum](https://github.com/replikativ/proximum)** — vector search
+- **[Datahike](https://github.com/replikativ/datahike)** - immutable graph database with Datalog queries
+- **[Yggdrasil](https://github.com/replikativ/yggdrasil)** - branching protocol for multi-system snapshots
+- **[Scriptum](https://github.com/replikativ/scriptum)** - full-text search
+- **[Proximum](https://github.com/replikativ/proximum)** - vector search
 
 All share copy-on-write semantics and can be branched together via Yggdrasil.
 
@@ -268,8 +268,8 @@ User → stratum.api/q
 ```
 
 **Data representations:**
-- `long[]` / `double[]` — heap arrays for raw columnar data
-- `PersistentColumnIndex` — chunked B-tree with per-chunk statistics and zone maps
+- `long[]` / `double[]` - heap arrays for raw columnar data
+- `PersistentColumnIndex` - chunked B-tree with per-chunk statistics and zone maps
 - `String[]` → dictionary-encoded `long[]` for group-by and LIKE
 
 ## Installation
@@ -326,9 +326,9 @@ javac --add-modules jdk.incubator.vector -d target/classes \
 # Restart REPL (JVM can't reload classes)
 ```
 
-## Commercial Support
+## Work with us
 
-Need SIMD-accelerated analytics in your JVM stack? We offer integration support, custom development, and commercial licensing. Contact [contact@datahike.io](mailto:contact@datahike.io) or visit [datahike.io](https://datahike.io/about).
+If you need help getting Stratum into production, we can help with integration, custom development, and support contracts. Contact [contact@datahike.io](mailto:contact@datahike.io) or visit [datahike.io](https://datahike.io/about).
 
 ## License
 
 
@@ -47,11 +47,11 @@ Train an isolation forest on columnar data.
 ```
 
 **Parameters:**
-- `:from` — map of keyword to `double[]` or `long[]` columns (required)
-- `:n-trees` — number of isolation trees (default 100)
-- `:sample-size` — rows subsampled per tree (default 256). Controls tree depth: `ceil(log2(sample-size))`
-- `:seed` — random seed for reproducibility
-- `:contamination` — expected fraction of anomalies in training data. When set, computes a score threshold automatically from the training score distribution (percentile at `1 - contamination`)
+- `:from` - map of keyword to `double[]` or `long[]` columns (required)
+- `:n-trees` - number of isolation trees (default 100)
+- `:sample-size` - rows subsampled per tree (default 256). Controls tree depth: `ceil(log2(sample-size))`
+- `:seed` - random seed for reproducibility
+- `:contamination` - expected fraction of anomalies in training data. When set, computes a score threshold automatically from the training score distribution (percentile at `1 - contamination`)
 
 **Returns** a model map containing the flat forest array, metadata, and (if contamination was set) the threshold, training score min/max.
 
@@ -96,8 +96,8 @@ Prediction confidence based on tree agreement.
 
 Returns `double[]` in `[0, 1]` where `1.0` means all trees fully agree on the point's isolation depth. Uses the coefficient of variation (CV) of per-tree path lengths: `confidence = 1 / (1 + CV)`.
 
-- **High confidence** (>0.8): Trees agree — the prediction is reliable
-- **Low confidence** (<0.5): Trees disagree — the point is in an ambiguous region
+- **High confidence** (>0.8): Trees agree - the prediction is reliable
+- **Low confidence** (<0.5): Trees disagree - the point is in an ambiguous region
 
 ### `iforest-rotate`
 
@@ -222,6 +222,6 @@ All inputs are validated against malli schemas (`stratum.specification`):
 
 ## Related Documentation
 
-- [Query Engine](query-engine.md) — Using anomaly scores in queries
-- [SQL Interface](sql-interface.md) — SQL anomaly functions
-- [Architecture](architecture.md) — System overview
+- [Query Engine](query-engine.md) - Using anomaly scores in queries
+- [SQL Interface](sql-interface.md) - SQL anomaly functions
+- [Architecture](architecture.md) - System overview
@@ -65,7 +65,7 @@ A persistent sorted set (PSS) tree of `ChunkEntry` records, each containing:
 - **PersistentColChunk**: CoW wrapper around a `long[]` or `double[]` (8192 elements default)
 - **ChunkStats**: per-chunk count, sum, sum-of-squares, min, max
 
-Indices support O(1) fork via structural sharing and copy-on-write on mutation. The query engine can stream over chunks without materializing the full array (64KB per chunk fits L2 cache). When persisted, the PSS tree is stored in konserve and lazy-loaded on demand — opening a billion-row index costs nothing until chunks are actually accessed.
+Indices support O(1) fork via structural sharing and copy-on-write on mutation. The query engine can stream over chunks without materializing the full array (64KB per chunk fits L2 cache). When persisted, the PSS tree is stored in konserve and lazy-loaded on demand - opening a billion-row index costs nothing until chunks are actually accessed.
 
 ### Dictionary-Encoded Strings
 
@@ -127,9 +127,9 @@ Total time: ~4ms single-threaded, ~1ms multi-threaded (6M rows).
 
 ## Related Documentation
 
-- [SIMD Internals](simd-internals.md) — Java Vector API patterns, fused filter+aggregate, morsel-driven parallelism
-- [Query Engine](query-engine.md) — Dispatch logic, expression evaluation, optimization
-- [Storage and Indices](storage-and-indices.md) — Chunks, CoW semantics, zone maps, Konserve
-- [Benchmarks](benchmarks.md) — Methodology, results, reproducing
-- [SQL Interface](sql-interface.md) — PgWire server, SQL translation, supported subset
-- [Anomaly Detection](anomaly-detection.md) — Isolation forest training, scoring, online rotation
+- [SIMD Internals](simd-internals.md) - Java Vector API patterns, fused filter+aggregate, morsel-driven parallelism
+- [Query Engine](query-engine.md) - Dispatch logic, expression evaluation, optimization
+- [Storage and Indices](storage-and-indices.md) - Chunks, CoW semantics, zone maps, Konserve
+- [Benchmarks](benchmarks.md) - Methodology, results, reproducing
+- [SQL Interface](sql-interface.md) - PgWire server, SQL translation, supported subset
+- [Anomaly Detection](anomaly-detection.md) - Isolation forest training, scoring, online rotation
@@ -29,7 +29,7 @@ Standard decision-support queries on TPC-H lineitem data (6M rows from CSV).
 | B1 | TPC-H Q6: filter + SUM(price*discount) | **12.9ms** | 7.3ms | 27.9ms | 5.4ms | **2.2x** |
 | B2 | TPC-H Q1: GROUP BY + 7 aggregates | **74.6ms** | 23.4ms | 92.5ms | 16.8ms | **1.2x** |
 | B3 | SSB Q1.1: filter + SUM(price*discount) | **12.9ms** | 4.8ms | 28.3ms | 5.7ms | **2.2x** |
-| B4 | COUNT(*) no filter | **0.1ms** | — | 0.4ms | 0.3ms | **4.0x** |
+| B4 | COUNT(*) no filter | **0.1ms** | - | 0.4ms | 0.3ms | **4.0x** |
 | B5 | Filtered COUNT (NEQ predicate) | **3.1ms** | 1.7ms | 12.2ms | 2.9ms | **4.0x** |
 | B6 | Low-cardinality GROUP BY + COUNT | **16.9ms** | 7.3ms | 24.0ms | 4.6ms | **1.4x** |
 | SSB-Q1.2 | Tighter filter + SUM(price*discount) | **12.5ms** | 4.8ms | 23.3ms | 4.5ms | **1.9x** |
 
@@ -69,14 +69,14 @@ Only index-backed columns support persistence (`st/sync!`) and O(1) forking (`st
 Like Clojure collections, mutations require transient mode:
 
 ```clojure
-;; CORRECT — transient → mutate → persistent
+;; CORRECT - transient → mutate → persistent
 (-> ds
     dataset/ds-transient
     (dataset/ds-set! :price 0 99.0)
     (dataset/ds-append! {:price 40.0 :qty 4})
     dataset/ds-persistent!)
 
-;; WRONG — will throw IllegalStateException
+;; WRONG - will throw IllegalStateException
 (dataset/ds-set! ds :price 0 99.0)
 ```