perf(memory): optimize fact operations (~30% faster writes, ~56% faster reads) (commontoolsinc#2174)

willkelly · claude · web-flow · commit 456ae7cdc839 · 2025-12-02T18:31:55.000-07:00
* feat(memory): add benchmarks for fact operations Add comprehensive benchmarks to measure write and read performance, including isolation benchmarks to identify specific bottlenecks. * perf(memory): add LRU memoization for merkle reference hashing Add bounded LRU cache (1000 entries) to memoize refer() results in reference.ts. refer() is a pure function computing SHA-256 hashes, which was identified as the primary bottleneck via isolation benchmarks. Benchmark results for cache hits: - 3x refer() calls: 44µs vs ~500µs uncached (27x faster) - 10x unclaimed refs: 2.5µs (400k ops/sec) The memoization benefits real-world usage patterns: - Repeated entity access (queries, updates on same docs) - unclaimed({ the, of }) patterns called multiple times - Multi-step transaction flows referencing same content Implementation: - reference.ts: LRU cache using Map with bounded eviction - Updated imports in fact.ts, access.ts, error.ts, entity.ts to use memoized refer() from ./reference.ts instead of merkle-reference The cache uses JSON.stringify as key (~7µs for 16KB) which is ~25x faster than the SHA-256 hash computation (~170µs for 16KB). * perf(memory): cache and reuse SQLite prepared statements Implemented prepared statement caching to eliminate redundant statement preparation overhead on every database operation. Uses a WeakMap-based cache per database connection to ensure proper cleanup and memory safety. Changes: - Added PreparedStatements type and getPreparedStatement() helper - Cached 7 frequently-used SQL statements (EXPORT, CAUSE_CHAIN, GET_FACT, IMPORT_DATUM, IMPORT_FACT, IMPORT_MEMORY, SWAP) - Removed manual finalize() calls as statements are reused - Added finalizePreparedStatements() to close() for cleanup - Updated all database query functions to use cached statements Benchmark results (before → after): - Single GET query: 117.5µs → 53.4µs (54.6% faster / 2.2x speedup) - Single UPDATE: 906.6µs → 705.8µs (22.1% faster) - Batch retract (10): 2.5ms → 1.9ms (24.0% faster) - Query from 1000 docs: 89.6µs → 66.7µs (25.5% faster) - Batch SET (100): 99.4ms → 88.1ms (11.4% faster) - Batch SET (10): 8.6ms → 7.9ms (8.1% faster) - Single SET: 1.2ms → 1.1ms (8.3% faster) Overall, the optimization provides consistent improvements across all operations with particularly strong gains in read-heavy workloads. All 31 existing tests pass without modifications. * perf(memory): reorder datum/fact hashing to leverage merkle sub-object caching The merkle-reference library caches sub-objects by identity during traversal. By computing the datum hash BEFORE the fact hash, the subsequent refer(assertion) call hits the cache when it encounters the payload sub-object, avoiding redundant hashing of the same 16KB payload twice. Before: refer(assertion) then refer(datum) - payload hashed twice After: refer(datum) then refer(assertion) - payload hash reused via WeakMap This ordering matters because: 1. refer(datum) hashes the payload and caches it by object identity 2. refer(assertion) traverses {the, of, is: payload, cause} - when it reaches the 'is' field, the payload object reference hits the WeakMap cache Benchmark results (16KB payload): - set fact (single): 1.1ms → 924.7µs (16% faster) - retract fact (single): 483.8µs → 462.4µs (4% faster) - update fact (single): ~705µs → ~723µs (within noise) * perf(memory): batch label lookups with SELECT...IN via json_each() Previously, getLabels() performed N individual SELECT queries to look up labels for N facts in a transaction. This adds latency proportional to the number of facts being processed. Now uses a single batched query with SQLite's json_each() function to handle an array of 'of' values: SELECT ... WHERE state.the = :the AND state.of IN (SELECT value FROM json_each(:ofs)) This reduces N queries to 1 query regardless of transaction size. Changes: - Added GET_LABELS_BATCH query constant using json_each() for IN clause - Added 'getLabelsBatch' to prepared statement cache - Rewrote getLabels() to collect 'of' values and execute single batch query The optimization benefits workloads with label facts (access control, classification). Benchmarks show ~4% improvement on batch operations, with larger gains expected in label-heavy workloads. * perf(memory): use stored fact hash instead of recomputing with refer() In conflict detection, we were reading a fact from the database and then calling refer(actual) to compute its hash for comparison. But the fact hash is already stored in the database (row.fact) - we were discarding it and recomputing it unnecessarily. Changes: - Added RevisionWithFact<T> type that includes the stored fact hash - Updated recall() to return row.fact in the revision - Use revision.fact directly instead of refer(actual).toString() - Strip 'fact' field from error reporting to maintain API compatibility This eliminates a refer() call (~50-170µs) on the conflict detection path, which is taken for duplicate detection and first insertions. Benchmark results: - set fact (single): ~1.0ms → 846µs (15% faster) - update fact (single): ~740µs → 688µs (7% faster) - retract fact (single): ~428µs → 360µs (16% faster) * fix(memory): correct Reference type annotations and validate benchmarks - Fix Reference type annotations for memoized refer() - Validate Result in benchmarks to catch silent failures - Apply deno fmt * rename benchmark.ts -> memory_bench.ts This makes it work automatically with `deno bench` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
diff --git a/packages/memory/access.ts b/packages/memory/access.ts
@@ -8,7 +8,7 @@ import {
   Reference,
   Signer,
 } from "./interface.ts";
-import { refer } from "merkle-reference";
+import { refer } from "./reference.ts";
 import { unauthorized } from "./error.ts";
 import { type DID } from "@commontools/identity";
 import { fromDID } from "./util.ts";
diff --git a/packages/memory/deno.json b/packages/memory/deno.json
@@ -17,6 +17,10 @@
     "migrate": {
       "description": "Performs database migration",
       "command": "deno run -A ./migrate.ts"
+    },
+    "bench": {
+      "description": "Run benchmarks for fact operations",
+      "command": "deno bench --allow-read --allow-write --allow-net --allow-ffi --allow-env --no-check test/benchmark.ts"
     }
   },
   "test": {
diff --git a/packages/memory/entity.ts b/packages/memory/entity.ts
@@ -1,4 +1,4 @@
-import { fromJSON, refer } from "merkle-reference";
+import { fromJSON, refer } from "./reference.ts";
 
 export interface Entity<T extends null | NonNullable<unknown>> {
   "@": ToString<Entity<T>>;
diff --git a/packages/memory/error.ts b/packages/memory/error.ts
@@ -13,7 +13,7 @@ import type {
   TransactionError,
 } from "./interface.ts";
 import { MemorySpace } from "./interface.ts";
-import { refer } from "merkle-reference";
+import { refer } from "./reference.ts";
 
 export const unauthorized = (
   message: string,
diff --git a/packages/memory/fact.ts b/packages/memory/fact.ts
@@ -11,12 +11,7 @@ import {
   State,
   Unclaimed,
 } from "./interface.ts";
-import {
-  fromJSON,
-  fromString,
-  is as isReference,
-  refer,
-} from "merkle-reference";
+import { fromJSON, fromString, is as isReference, refer } from "./reference.ts";
 
 /**
  * Creates an unclaimed fact.
diff --git a/packages/memory/reference.ts b/packages/memory/reference.ts
@@ -5,4 +5,44 @@ export * from "merkle-reference";
 // workaround it like this.
 export const fromString = Reference.fromString as (
   source: string,
-) => Reference.Reference;
+) => Reference.View;
+
+/**
+ * Bounded LRU cache for memoizing refer() results.
+ * refer() is a pure function (same input → same output), so caching is safe.
+ * We use JSON.stringify as the cache key since it's ~25x faster than refer().
+ */
+const CACHE_MAX_SIZE = 1000;
+const referCache = new Map<string, Reference.View>();
+
+/**
+ * Memoized version of refer() that caches results.
+ * Provides significant speedup for repeated references to the same objects,
+ * which is common in transaction processing where the same payload is
+ * referenced multiple times (datum, assertion, commit log).
+ */
+export const refer = <T>(source: T): Reference.View<T> => {
+  const key = JSON.stringify(source);
+
+  let ref = referCache.get(key);
+  if (ref !== undefined) {
+    // Move to end (most recently used) by re-inserting
+    referCache.delete(key);
+    referCache.set(key, ref);
+    return ref as Reference.View<T>;
+  }
+
+  // Compute new reference
+  ref = Reference.refer(source);
+
+  // Evict oldest entry if at capacity
+  if (referCache.size >= CACHE_MAX_SIZE) {
+    const oldest = referCache.keys().next().value;
+    if (oldest !== undefined) {
+      referCache.delete(oldest);
+    }
+  }
+
+  referCache.set(key, ref);
+  return ref as Reference.View<T>;
+};
diff --git a/packages/memory/space.ts b/packages/memory/space.ts
diff --git a/packages/memory/test/memory_bench.ts b/packages/memory/test/memory_bench.ts

Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,4 @@`
`1`		`-import { fromJSON, refer } from "merkle-reference";`
	`1`	`+import { fromJSON, refer } from "./reference.ts";`
`2`	`2`
`3`	`3`	`export interface Entity<T extends null \| NonNullable<unknown>> {`
`4`	`4`	`"@": ToString<Entity<T>>;`