Skip to content

Commit 456ae7c

Browse files
willkellyclaude
andauthored
perf(memory): optimize fact operations (~30% faster writes, ~56% faster reads) (commontoolsinc#2174)
* feat(memory): add benchmarks for fact operations Add comprehensive benchmarks to measure write and read performance, including isolation benchmarks to identify specific bottlenecks. * perf(memory): add LRU memoization for merkle reference hashing Add bounded LRU cache (1000 entries) to memoize refer() results in reference.ts. refer() is a pure function computing SHA-256 hashes, which was identified as the primary bottleneck via isolation benchmarks. Benchmark results for cache hits: - 3x refer() calls: 44µs vs ~500µs uncached (27x faster) - 10x unclaimed refs: 2.5µs (400k ops/sec) The memoization benefits real-world usage patterns: - Repeated entity access (queries, updates on same docs) - unclaimed({ the, of }) patterns called multiple times - Multi-step transaction flows referencing same content Implementation: - reference.ts: LRU cache using Map with bounded eviction - Updated imports in fact.ts, access.ts, error.ts, entity.ts to use memoized refer() from ./reference.ts instead of merkle-reference The cache uses JSON.stringify as key (~7µs for 16KB) which is ~25x faster than the SHA-256 hash computation (~170µs for 16KB). * perf(memory): cache and reuse SQLite prepared statements Implemented prepared statement caching to eliminate redundant statement preparation overhead on every database operation. Uses a WeakMap-based cache per database connection to ensure proper cleanup and memory safety. Changes: - Added PreparedStatements type and getPreparedStatement() helper - Cached 7 frequently-used SQL statements (EXPORT, CAUSE_CHAIN, GET_FACT, IMPORT_DATUM, IMPORT_FACT, IMPORT_MEMORY, SWAP) - Removed manual finalize() calls as statements are reused - Added finalizePreparedStatements() to close() for cleanup - Updated all database query functions to use cached statements Benchmark results (before → after): - Single GET query: 117.5µs → 53.4µs (54.6% faster / 2.2x speedup) - Single UPDATE: 906.6µs → 705.8µs (22.1% faster) - Batch retract (10): 2.5ms → 1.9ms (24.0% faster) - Query from 1000 docs: 89.6µs → 66.7µs (25.5% faster) - Batch SET (100): 99.4ms → 88.1ms (11.4% faster) - Batch SET (10): 8.6ms → 7.9ms (8.1% faster) - Single SET: 1.2ms → 1.1ms (8.3% faster) Overall, the optimization provides consistent improvements across all operations with particularly strong gains in read-heavy workloads. All 31 existing tests pass without modifications. * perf(memory): reorder datum/fact hashing to leverage merkle sub-object caching The merkle-reference library caches sub-objects by identity during traversal. By computing the datum hash BEFORE the fact hash, the subsequent refer(assertion) call hits the cache when it encounters the payload sub-object, avoiding redundant hashing of the same 16KB payload twice. Before: refer(assertion) then refer(datum) - payload hashed twice After: refer(datum) then refer(assertion) - payload hash reused via WeakMap This ordering matters because: 1. refer(datum) hashes the payload and caches it by object identity 2. refer(assertion) traverses {the, of, is: payload, cause} - when it reaches the 'is' field, the payload object reference hits the WeakMap cache Benchmark results (16KB payload): - set fact (single): 1.1ms → 924.7µs (16% faster) - retract fact (single): 483.8µs → 462.4µs (4% faster) - update fact (single): ~705µs → ~723µs (within noise) * perf(memory): batch label lookups with SELECT...IN via json_each() Previously, getLabels() performed N individual SELECT queries to look up labels for N facts in a transaction. This adds latency proportional to the number of facts being processed. Now uses a single batched query with SQLite's json_each() function to handle an array of 'of' values: SELECT ... WHERE state.the = :the AND state.of IN (SELECT value FROM json_each(:ofs)) This reduces N queries to 1 query regardless of transaction size. Changes: - Added GET_LABELS_BATCH query constant using json_each() for IN clause - Added 'getLabelsBatch' to prepared statement cache - Rewrote getLabels() to collect 'of' values and execute single batch query The optimization benefits workloads with label facts (access control, classification). Benchmarks show ~4% improvement on batch operations, with larger gains expected in label-heavy workloads. * perf(memory): use stored fact hash instead of recomputing with refer() In conflict detection, we were reading a fact from the database and then calling refer(actual) to compute its hash for comparison. But the fact hash is already stored in the database (row.fact) - we were discarding it and recomputing it unnecessarily. Changes: - Added RevisionWithFact<T> type that includes the stored fact hash - Updated recall() to return row.fact in the revision - Use revision.fact directly instead of refer(actual).toString() - Strip 'fact' field from error reporting to maintain API compatibility This eliminates a refer() call (~50-170µs) on the conflict detection path, which is taken for duplicate detection and first insertions. Benchmark results: - set fact (single): ~1.0ms → 846µs (15% faster) - update fact (single): ~740µs → 688µs (7% faster) - retract fact (single): ~428µs → 360µs (16% faster) * fix(memory): correct Reference type annotations and validate benchmarks - Fix Reference type annotations for memoized refer() - Validate Result in benchmarks to catch silent failures - Apply deno fmt * rename benchmark.ts -> memory_bench.ts This makes it work automatically with `deno bench` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
1 parent 9a398be commit 456ae7c

File tree

8 files changed

+1273
-140
lines changed

8 files changed

+1273
-140
lines changed

packages/memory/access.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ import {
88
Reference,
99
Signer,
1010
} from "./interface.ts";
11-
import { refer } from "merkle-reference";
11+
import { refer } from "./reference.ts";
1212
import { unauthorized } from "./error.ts";
1313
import { type DID } from "@commontools/identity";
1414
import { fromDID } from "./util.ts";

packages/memory/deno.json

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,10 @@
1717
"migrate": {
1818
"description": "Performs database migration",
1919
"command": "deno run -A ./migrate.ts"
20+
},
21+
"bench": {
22+
"description": "Run benchmarks for fact operations",
23+
"command": "deno bench --allow-read --allow-write --allow-net --allow-ffi --allow-env --no-check test/benchmark.ts"
2024
}
2125
},
2226
"test": {

packages/memory/entity.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
import { fromJSON, refer } from "merkle-reference";
1+
import { fromJSON, refer } from "./reference.ts";
22

33
export interface Entity<T extends null | NonNullable<unknown>> {
44
"@": ToString<Entity<T>>;

packages/memory/error.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ import type {
1313
TransactionError,
1414
} from "./interface.ts";
1515
import { MemorySpace } from "./interface.ts";
16-
import { refer } from "merkle-reference";
16+
import { refer } from "./reference.ts";
1717

1818
export const unauthorized = (
1919
message: string,

packages/memory/fact.ts

Lines changed: 1 addition & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -11,12 +11,7 @@ import {
1111
State,
1212
Unclaimed,
1313
} from "./interface.ts";
14-
import {
15-
fromJSON,
16-
fromString,
17-
is as isReference,
18-
refer,
19-
} from "merkle-reference";
14+
import { fromJSON, fromString, is as isReference, refer } from "./reference.ts";
2015

2116
/**
2217
* Creates an unclaimed fact.

packages/memory/reference.ts

Lines changed: 41 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,4 +5,44 @@ export * from "merkle-reference";
55
// workaround it like this.
66
export const fromString = Reference.fromString as (
77
source: string,
8-
) => Reference.Reference;
8+
) => Reference.View;
9+
10+
/**
11+
* Bounded LRU cache for memoizing refer() results.
12+
* refer() is a pure function (same input → same output), so caching is safe.
13+
* We use JSON.stringify as the cache key since it's ~25x faster than refer().
14+
*/
15+
const CACHE_MAX_SIZE = 1000;
16+
const referCache = new Map<string, Reference.View>();
17+
18+
/**
19+
* Memoized version of refer() that caches results.
20+
* Provides significant speedup for repeated references to the same objects,
21+
* which is common in transaction processing where the same payload is
22+
* referenced multiple times (datum, assertion, commit log).
23+
*/
24+
export const refer = <T>(source: T): Reference.View<T> => {
25+
const key = JSON.stringify(source);
26+
27+
let ref = referCache.get(key);
28+
if (ref !== undefined) {
29+
// Move to end (most recently used) by re-inserting
30+
referCache.delete(key);
31+
referCache.set(key, ref);
32+
return ref as Reference.View<T>;
33+
}
34+
35+
// Compute new reference
36+
ref = Reference.refer(source);
37+
38+
// Evict oldest entry if at capacity
39+
if (referCache.size >= CACHE_MAX_SIZE) {
40+
const oldest = referCache.keys().next().value;
41+
if (oldest !== undefined) {
42+
referCache.delete(oldest);
43+
}
44+
}
45+
46+
referCache.set(key, ref);
47+
return ref as Reference.View<T>;
48+
};

0 commit comments

Comments
 (0)