perf: Implement single storage and retrieval of schemas by josephschorr · Pull Request #2892 · authzed/spicedb

josephschorr · 2026-02-10T18:55:14Z

Summary: Unified Schema Storage Implementation

Overview

This branch implements a fundamental architectural change to SpiceDB's schema storage system, transitioning from namespace-based schema storage to a unified single-storage model with hash-based caching. The implementation provides a gradual migration path supporting four operational modes, ensuring zero-downtime deployment.

Core Architecture Changes

Before: Schemas were stored as individual namespace and caveat definitions across multiple rows. Each read required querying and reconstructing the complete schema from these distributed components. Caching used complex revision-based ranges with checkpoint tracking, watchers, and time-based cleanup.

After: Complete schemas are stored as single versioned entities with content-addressable retrieval via SHA256 hashes. The new hash-based LRU cache provides O(1) lookups with single-flight deduplication, eliminating the complexity of revision tracking.

Key Components

1. StoredSchema Protocol Buffer

New proto definition (core.StoredSchema) for versioned schema storage:

Version field for format evolution
V1 format containing schema text, SHA256 hash, and timestamp
Support for both legacy and unified storage modes

2. Database Schema

Two new tables added across all SQL datastores (PostgreSQL, MySQL, CockroachDB, Spanner):

schema_revision: Lightweight metadata table for fast change detection

name (always 'current'): index key
hash: SHA256 of current schema
created_xid/timestamp: revision tracking

schema: Content storage with chunking support

name: schema identifier
chunk_index: sequential chunk number
chunk_data: data segment (1MB default)
Handles schemas exceeding database column size limits

3. Migration Modes

Four operational modes enable gradual rollout:

read-legacy-write-legacy: Default mode, no migration impact
read-legacy-write-both: Begin migration, dual-write to populate new tables
read-new-write-both: Switch reads to new storage, maintain dual-write
read-new-write-new: Complete migration, unified storage only

Configuration via --datastore-experimental-schema-mode flag. Cache size is configurable in code via WithSchemaCacheOptions() (default: 100 entries).

4. Hash-Based Cache (HashCache)

Replaced complex RevisionedCache with simplified LRU implementation:

Design:

Key: Schema hash (string)
Value: Cached *core.StoredSchema
Size: Configurable, default 100 entries
Eviction: LRU with single-flight deduplication
Thread safety: Built-in concurrent access protection

Performance characteristics:

Cache hit: O(1) hash map lookup
Cache miss: DB load with deduplication
Overhead: 9-62µs per HeadRevision() call
Sentinel optimization: O(1) map lookup for bypass checks

5. Schema Hash Integration

All datastore methods updated to return schema hash atomically with revision:

HeadRevision() → (Revision, SchemaHash, error)
OptimizedRevision() → (Revision, SchemaHash, error)
SnapshotReader(rev, hash) → Reader with hash for cache lookups

SQL datastores load hash via COALESCE subquery selecting current schema_revision. Memory datastore stores hash with each snapshot.

codecov · 2026-02-10T18:59:51Z

Codecov Report

❌ Patch coverage is 57.92313% with 1248 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.81%. Comparing base (cd8f1ce) to head (bbbf8b0).
⚠️ Report is 27 commits behind head on main.

Files with missing lines	Patch %	Lines
internal/datastore/schema/singlestore.go	26.20%	244 Missing and 4 partials ⚠️
internal/datastore/mysql/schema_chunker.go	24.04%	92 Missing and 6 partials ⚠️
internal/datastore/crdb/schema_chunker.go	27.11%	74 Missing and 4 partials ⚠️
internal/datastore/spanner/readwrite.go	58.23%	52 Missing and 14 partials ⚠️
internal/datastore/mysql/readwrite.go	52.42%	46 Missing and 13 partials ⚠️
internal/datastore/postgres/readwrite.go	52.42%	46 Missing and 13 partials ⚠️
internal/datastore/postgres/schema_chunker.go	30.96%	54 Missing and 4 partials ⚠️
internal/datastore/crdb/readwrite.go	50.51%	40 Missing and 9 partials ⚠️
internal/datastore/common/sqlschema.go	61.67%	33 Missing and 13 partials ⚠️
internal/datastore/memdb/memdb.go	65.05%	33 Missing and 10 partials ⚠️
... and 38 more

❌ Your project status has failed because the head coverage (73.81%) is below the target coverage (75.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2892      +/-   ##
==========================================
- Coverage   74.61%   73.81%   -0.79%     
==========================================
  Files         484      491       +7     
  Lines       58538    61053    +2515     
==========================================
+ Hits        43672    45061    +1389     
- Misses      11800    12798     +998     
- Partials     3066     3194     +128

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Add StoredSchema message to support storing complete schema definitions as a single versioned entity in datastores. The schema includes: - Version field for future schema format evolution - V1 format with schema text, hash, and timestamp - Support for both namespace-based and unified storage modes This enables datastores to store and retrieve schemas atomically, providing better consistency and performance for schema operations.

Add configuration options for schema storage mode with three modes: - ReadLegacyWriteLegacy: use legacy namespace/caveat tables (default) - ReadLegacyWriteBoth: write to both legacy and new schema tables - ReadNewWriteBoth: read from new schema table, write to both - ReadNewWriteNew: fully use new schema table Also add schema cache configuration options: - MaximumCacheMemoryBytes: memory limit for schema caching - QuantizationWindow: time window for cache entry consolidation These options enable gradual migration from legacy to unified schema storage without downtime.

Add interfaces for reading and writing schemas in datastores: - SingleStoreSchemaReader/Writer: for new unified schema storage - SchemaReader/Writer: mode-aware adapters that delegate to either legacy namespace/caveat operations or unified schema operations The adapter pattern (schemaadapter.NewSchemaWriter) enables smooth migration by routing operations based on the configured schema mode. Datastores implement SingleStore interfaces, and the adapter handles the complexity of dual-write scenarios and mode switching. Add context helpers for tracking schema mode across operations.

Add migrations for new schema_revision and schema_chunks tables: - schema_revision: stores current schema hash and timestamp for fast schema change detection - schema_chunks: stores schema content in chunks to work around database column size limits Tables added to: - PostgreSQL (migration 0024) - MySQL (migration 0011) - CockroachDB (migration 0010) - Spanner (migration 0012) Each datastore uses appropriate column types and indexing strategies optimized for their storage engines.

Add SQLSchemaReaderWriter to handle storage of large schemas by: - Chunking schemas into configurable chunk sizes (default 1MB) - Storing chunks in schema_chunks table with sequential indexing - Reassembling chunks on read with validation Add database-specific chunk executors for: - PostgreSQL: uses pgx transactions - MySQL: uses sqlx transactions - CockroachDB: uses pgx with CRDB-specific features - Spanner: uses Cloud Spanner mutations and queries The chunking approach handles schemas larger than database column size limits while maintaining transactional consistency.

Implement SingleStoreSchemaHashWatcher for each datastore to detect schema changes in real-time: - PostgreSQL: polls schema_revision table at configurable intervals - MySQL: polls schema_revision table at configurable intervals - CockroachDB: uses CHANGEFEED with resolved timestamps for checkpoints - Spanner: polls schema_revision table at configurable intervals Watchers invoke callbacks with (schemaHash, revision) on changes, enabling caches and watches to stay synchronized with schema updates. Polling is optimized to query only the lightweight schema_revision table rather than the full schema_chunks table.

Wire up unified schema support in all datastores: - Add SchemaWriter() method to ReadWriteTransaction returning mode-aware schema writers that handle legacy/new/both read and write patterns - Add WithTimestampRevision interface to revisions for time-based cache cleanup (implemented by HLCRevision and TimestampRevision) - Initialize SQLSchemaReaderWriter with schema hash watchers in PostgreSQL, MySQL datastores - Add SchemaHashWatcherForTesting() interfaces for test access - Update legacy caveat/namespace write operations to also update schema_revision table with computed schema hash - Add support for reading schemas at specific revisions where supported Each datastore now supports all four schema modes (ReadLegacyWriteLegacy, ReadLegacyWriteBoth, ReadNewWriteBoth, ReadNewWriteNew) for gradual migration without downtime.

Add RevisionedSchemaCache to cache schemas by revision with safety guarantees: - Lock-free reads using atomic snapshots for high performance - Async writes via background goroutine to avoid blocking - Checkpoint-based validity for all ranges (current and historical) - Ranges track startRevision and checkpointRevision (no endRevision) - Ranges are only valid from start up to and including checkpoint - Separate currentRange and historicalRanges list - Time-based cleanup for WithTimestampRevision types using quantization window (deletes ranges older than 2x window) - Memory-based eviction (keeps minimum 2 ranges) - Schema hash watcher integration for automatic cache updates Safety design: Both current and historical ranges trust only their checkpoint revision. This prevents returning cached data for revisions that may have had undetected schema changes between watcher checkpoints, preferring safe cache misses over incorrect data.

Update all datastore proxy implementations to support SchemaReader and SchemaWriter interfaces: - observable: track schema read/write operations in metrics - counting: count schema operations - schemacaching: cache schema reads (both legacy and unified) - relationshipintegrity: pass through schema operations - checkingreplicated: verify schema consistency across replicas - strictreplicated: enforce schema replication - indexcheck: validate schema integrity Proxies delegate to underlying datastore's schema methods while adding their layer-specific functionality (caching, metrics, validation, etc).

Add UnifiedSchemaTest to verify all four schema modes across all datastores: - Tests schema write and read operations in all modes - Verifies correct routing between legacy and unified storage - Tests schema versioning and revision tracking - Validates concurrent schema updates - Tests schema caching behavior - Verifies gradual migration path works correctly Test suite runs for: - PostgreSQL (all modes) - MySQL (all modes) - CockroachDB (all modes) - Spanner (all modes) Each test creates multiple schema versions and verifies they can be read back at the correct revisions, ensuring proper isolation and consistency guarantees.

Replace the complex revision-based cache with a simpler hash-based LRU cache for schema storage. This improves performance and simplifies the caching logic significantly. Key changes: - Replace RevisionedCache with HashCache using schema hash as key - Remove complex checkpoint tracking, watchers, and time-based cleanup - Add schema hash to HeadRevision() and OptimizedRevision() returns - Update all datastores to load schema hash atomically with revision - Add migrations to populate schema/schema_revision tables from existing data - Add NoSchemaHash sentinel for transaction reads and testing - Include schema_hash in dispatch ResolverMeta for distributed caching - Add Prometheus metrics for cache hits/misses - Remove unused revision-based cache files (revisionedcache.go, rangelist.go) Performance improvements: - O(1) hash lookup vs O(log n) binary search - Single-flight deduplication prevents duplicate schema loads - Configurable LRU size (default: 100 entries) - Overhead: 9-62µs per HeadRevision call (benchmarked on PostgreSQL) Schema hash loading per datastore: - PostgreSQL/MySQL: COALESCE subquery for current hash - CockroachDB: Same pattern with bytea type - Spanner: Similar with bytes type - Memory: Hash stored with each snapshot Migration details: - Each datastore migration reads legacy namespaces/caveats - Generates canonical schema (alphabetically sorted) - Computes SHA256 hash and stores in schema_revision table - Stores serialized schema in chunked format (1MB chunks)

Add support for testing with both legacy and new unified schema modes in the steelthread test framework. Each test now runs with both modes to ensure functional equivalence and enable performance comparison. Changes: - Modified TestNonMemdbSteelThreads to run with LegacySchema and NewSchema modes - Added BenchmarkSteelThreadSchemaMode for performance comparison - Memdb tests remain unchanged (schema mode not applicable) Benchmark Results (PostgreSQL on Apple M1 Max): Overall Performance: - Legacy schema: 10.119s - New schema: 8.701s - Improvement: 14.0% faster Top Performance Gains: - Cursored reads by subject (page_size=10): 38.1% faster - Cursored lookup resources (page_size=5): 33.7% faster - Cursored reads by subject (page_size=2): 31.3% faster - Bulk checks with traits: 28.3% faster - Lookup subjects for document: 27.4% faster - Cursored reads by resource (page_size=2): 25.9% faster Performance by Operation Type: - Cursored read operations: 15-38% faster - Bulk check operations: 12-28% faster - Lookup subject operations: 10-27% faster - Indirect lookup operations: 4-20% faster - Schema write operations: 4-10% faster Technical Improvements: - O(1) hash lookup vs O(log n) binary search - LRU cache with singleflight deduplication - HeadRevision overhead: 9-62µs per call - Unified schema storage reduces read paths All tests pass with identical results in both modes, confirming functional equivalence while delivering significant performance improvements.

The datastore proxy layers (schemacaching, observable, counting) were hardcoded to always return LegacySchemaReaderAdapter, regardless of the configured schema mode. This forced all schema reads through legacy methods even when the datastore was configured for unified schema mode (SchemaModeReadNewWriteNew), resulting in: - 16.71% of memory allocations (1234.74MB) in legacy schema methods - Lost opportunity for hash-based schema cache benefits - Bypassed O(1) unified schema lookups in favor of O(log n) legacy reads Fix: All proxy SchemaReader() methods now: 1. Query the underlying reader to determine its configured schema mode 2. For unified schema mode: pass through directly (hash cache is efficient) 3. For legacy mode: wrap with LegacySchemaReaderAdapter (maintains caching) This ensures the proxy layers preserve caching behavior while respecting the datastore's schema mode configuration. Impact: - Eliminates 17% memory overhead when using unified schema mode - Enables full benefits of hash-based schema caching (O(1) lookups) - Expected to improve new schema mode performance from 14% to 22-27% faster than legacy mode Test updates: - Added helper functions for consistent mock setup - Updated all test mocks to expect SchemaReader() calls - All schemacaching proxy tests pass

- Add NewSQLByteChunker function that returns errors instead of panicking - Update sqlschema to use NewSQLByteChunker in error-returning functions - Fix test mocks to include SchemaHash parameter in datastore method calls - Handle Prometheus metrics AlreadyRegisteredError in test environments - Pre-allocate slice capacity in spanner readwrite - Add nolint directive for safe SQL formatting in migrations - Explicitly ignore error returns in benchmark and panic tests

Spanner does not allow mixing mutations (BufferWrite) and DML statements in the same transaction. The migration framework's WriteVersion method uses DML, so all upTx callbacks must also use DML to avoid conflicts. Converted two migrations to use DML: - add-metadata-and-counters: Changed BufferWrite to INSERT DML - populate-schema-tables: Changed BufferWrite to INSERT DML with loop This fixes the 'Previously received a different request with this seqno' error.

This change ensures that the legacy schema hash is ONLY written when in "write to both" modes (SchemaModeReadLegacyWriteBoth or SchemaModeReadNewWriteBoth), not in "write to legacy only" mode (SchemaModeReadLegacyWriteLegacy). Key changes: 1. Removed writeLegacySchemaHash() from individual legacy operations (LegacyWriteNamespaces, LegacyWriteCaveats, etc.) across all datastores to prevent write-write conflicts on schema_revision table in CRDB's serializable isolation. 2. Added WriteLegacySchemaHashFromDefinitions implementation to all datastores (CRDB, Postgres, MySQL, Spanner, MemDB) to support writing the hash from buffered definitions without requiring reads. 3. Moved hash-writing logic from LegacySchemaWriterAdapter.WriteSchema() to dualSchemaWriter.WriteSchema() to ensure the hash is only written when using the dual writer (i.e., in "write to both" modes). This fixes the CRDB test failure in TestCaveatSnapshotReads where write conflicts were occurring due to concurrent schema_revision updates, while ensuring the hash is still written correctly for unified schema testing. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

… schemas MemDB supports both legacy and unified schema storage, but was only using LegacySchemaWriterAdapter which after our changes no longer writes the legacy schema hash. This caused TestUnifiedSchemaHashWatch to fail with a timeout waiting for hash updates. Changes: 1. Updated SchemaWriter() to return a dualSchemaWriter that writes to both legacy and unified storage (SchemaModeReadNewWriteBoth) 2. Refactored schema writing methods to avoid deadlocks: - Added writeStoredSchemaNoLock() that uses the transaction's db handle without acquiring additional locks - Added writeSchemaHashNoLock() for transaction-scoped hash writes - Added writeLegacySchemaHashFromDefinitionsNoLock() for legacy hash 3. Updated WriteStoredSchema() and WriteLegacySchemaHashFromDefinitions() to use the transaction's db handle via txSource() This ensures MemDB writes to both storages and properly updates the schema hash for watchers, fixing the unified schema hash watch test.

The SingleStoreSchemaHashWatcher interface and all its implementations were only used for testing and not in production code. This commit removes: - SingleStoreSchemaHashWatcher interface and SchemaWatchCallback type - All datastore schemawatch.go implementation files (postgres, mysql, crdb, spanner, memdb) - SchemaHashWatcherForTesting methods from all datastores and proxies - UnifiedSchemaHashWatchTest test function Changes made: - Inlined readSchemaHash logic into SchemaHashReaderForTesting methods - Added missing SchemaModeForTesting to observable proxy and validating datastore wrapper - Fixed test files to pass schema hash parameter to SnapshotReader - Fixed linter issues (perfsprint, testifylint) All datastore tests pass (postgres, mysql, crdb).

We therefore need to read them ourselves, manually, in the new schema writer

miparnisari · 2026-02-13T13:43:17Z

internal/datastore/common/hashcache_metrics.go

+	schemaCacheLookups = promauto.NewCounterVec(prometheus.CounterOpts{
+		Namespace: "spicedb",
+		Subsystem: "datastore",
+		Name:      "schema_cache_lookups_total",
+		Help:      "total number of schema cache Get() calls",
+	}, []string{"result"}) // result: "hit" or "miss"


can we keep this consistent with the existing metrics?

spicedb/pkg/cache/metrics.go

Lines 20 to 32 in 324b399

descCacheHitsTotal = prometheus.NewDesc(

stringz.Join("_", promNamespace, promSubsystem, "hits_total"),

"Number of cache hits",

[]string{"cache"},

nil,

)

descCacheMissesTotal = prometheus.NewDesc(

stringz.Join("_", promNamespace, promSubsystem, "misses_total"),

"Number of cache misses",

[]string{"cache"},

nil,

)

so, as a consumer, i will see the same metric but with a new label cache=schema

# HELP spicedb_cache_hits_total Number of cache hits # TYPE spicedb_cache_hits_total counter spicedb_cache_hits_total{cache="cluster_dispatch"} 0 spicedb_cache_hits_total{cache="dispatch"} 0 spicedb_cache_hits_total{cache="namespace"} 0 # HELP spicedb_cache_misses_total Number of cache misses # TYPE spicedb_cache_misses_total counter spicedb_cache_misses_total{cache="cluster_dispatch"} 0 spicedb_cache_misses_total{cache="dispatch"} 0 spicedb_cache_misses_total{cache="namespace"} 0

miparnisari · 2026-02-13T20:06:22Z

internal/services/v1/experimental.go

 		if err != nil {
 			return shared.RewriteErrorWithoutConfig(ctx, err)
 		}
+		// If cursor has empty schema hash (legacy cursor), skip cache


"skip cache"? there is no "cache" mentioned in this file. is this leaking a detail from one of the subsequent method calls?

can we not move this if to inside decodeCursor()?

miparnisari · 2026-02-13T20:14:56Z

internal/services/v1/experimental.go

 	perfinsights.SetInContext(ctx, perfinsights.NoLabels)

-	atRevision, _, err := consistency.RevisionFromContext(ctx)
+	atRevision, _, _, err := consistency.RevisionAndSchemaHashFromContext(ctx)


why change this if you're going to ignore the returned value?

Because we need it in many other places?

miparnisari · 2026-02-13T20:39:29Z

pkg/middleware/consistency/consistency.go


-	handle.(*revisionHandle).revision = revision
+	// Note: For some consistency modes (AtLeastAsFresh, AtExactSnapshot), schema hash may be empty.
+	// This will cause cache lookups to fail, which is expected until schema hash is properly threaded


"expected until schema hash is properly threaded through these code paths"

is this a follow-up PR?

Removed; it was a WIP comment

miparnisari · 2026-02-13T21:59:53Z

pkg/datastore/datastore.go

+	SnapshotReader(Revision, SchemaHash) Reader

 	// OptimizedRevision gets a revision that will likely already be replicated
 	// and will likely be shared amongst many queries.
-	OptimizedRevision(ctx context.Context) (Revision, error)
+	OptimizedRevision(ctx context.Context) (Revision, SchemaHash, error)

 	// HeadRevision gets a revision that is guaranteed to be at least as fresh as
 	// right now.
-	HeadRevision(ctx context.Context) (Revision, error)
+	HeadRevision(ctx context.Context) (Revision, SchemaHash, error)


godoc updates please. it's not obvious why SnapshotReader needs a SchemaHash

Added but it has to mention the cache now

miparnisari · 2026-02-13T22:01:43Z

internal/datastore/common/hashcache.go

+	"fmt"
+	"sync/atomic"
+
+	lru "github.com/hashicorp/golang-lru/v2"


we have three cache implementations and you brought another one? 😓

I didn't want to wire through the complexities of one of the existing cache libs; we don't really need it. Otherwise, we'd have to actually wire it through into the hashcache.go, which has... issues. I can do it, if you like, but it will make the datastore configuration that much more complicated

miparnisari · 2026-02-13T22:06:44Z

pkg/development/check.go

 			Subject:       subject,
 			CaveatContext: caveatContext,
 			AtRevision:    devContext.Revision,
+			SchemaHash:    datastore.NoSchemaHashForTesting, // DevContext uses memdb which doesn't support hashing


this seems so wrong.. why does the caller of ComputeCheck have to know what the underlying datastore is to pass the right parameter?

Because its being called from a development environment? Otherwise, we'd have to load it from somewhere

I changed it to its own bypass, but the bypass is correct

miparnisari · 2026-02-13T22:08:53Z

pkg/development/check.go

 	}

-	reader := devContext.Datastore.SnapshotReader(devContext.Revision)
+	// DevContext uses memdb which doesn't support schema hashing


this seems so wrong.. why does the caller of SnapshotReader have to know what the underlying datastore is to pass the right parameter?

(this is why i asked you to update the godoc of SnapshotReader)

Because if the underlying datastore is memdb, doing the extra work to load the hash makes zero sense and devcontext is always using memdb.

miparnisari

can you add instructions to this PR on how to test this by hand, perhaps with docker-compose up?

tstirrat15

See comments

tstirrat15 · 2026-02-10T23:35:10Z

internal/datastore/common/sqlschema.go

+	}
+	errMsg := err.Error()
+	// Check both the error message and wrapped errors
+	return strings.Contains(errMsg, "no chunks found") || strings.Contains(errMsg, "failed to reassemble chunks")


Is there a way to use types rather than string matching here? Or are we working across types and that's why this approach was taken?

tstirrat15 · 2026-02-11T22:12:05Z

internal/datastore/common/hashcache.go

+const defaultMaxCacheEntries = 10
+
+// bypassSentinels contains all schema hash values that intentionally bypass the cache.
+// These are special sentinel values used in specific contexts where caching is not appropriate.


Is the idea that bypassing the cache prevents the cache from being populated and having entries potentially evicted because of new entries?

No, its about not having access to the schema hash and therefore, skipping the cache when it provides no value.

tstirrat15 · 2026-02-11T22:12:23Z

internal/datastore/common/hashcache.go

+// bypassSentinels contains all schema hash values that intentionally bypass the cache.
+// These are special sentinel values used in specific contexts where caching is not appropriate.
+// Using a map for O(1) lookup instead of slice iteration.
+var bypassSentinels = map[datastore.SchemaHash]bool{


mapz.NewSet?

Not really needed for a quick check; changed to struct{} though to save a few bytes

tstirrat15 · 2026-02-11T22:13:26Z

internal/datastore/common/hashcache.go

+// The LRU cache itself is thread-safe, so no locks are needed for cache operations.
+// The atomic latest entry provides lock-free fast-path for the common case.
+type SchemaHashCache struct {
+	cache        *lru.Cache[string, *core.StoredSchema] // Thread-safe LRU


Is there a reason that we're not using the same cache(s) that we use for other purposes?

It requires wiring it through the entire datastore init system, for a tiny LRU cache with max 10 entries. If you like, I can do so, but its a lot of code change for something quite small

It requires wiring it through the entire datastore init system

That being an artifact of our current cache interface? It surprises me that a handle on a cache would have anything to do with the datastore system.

tstirrat15 · 2026-02-11T22:15:45Z

internal/datastore/common/hashcache.go

+func NewSchemaHashCache(opts options.SchemaCacheOptions) (*SchemaHashCache, error) {
+	maxEntries := int(opts.MaximumCacheEntries)
+	if maxEntries == 0 {
+		maxEntries = defaultMaxCacheEntries


10 seems reasonable. Is this something that we should benchmark in future work to figure out whether a different value works better? Or is this based on reasoning about defaults around default quantization intervals etc?

Its simply a guess: unless someone is changing schemas very, very rapidly, however, the normal number of entries should be 0 (1 current entry, 0 LRU ones)

tstirrat15 · 2026-02-17T18:08:32Z

internal/datastore/memdb/caveat.go

+	if err := rwt.writeCaveat(tx, caveats); err != nil {
+		return err
+	}


Why this rewrite?

Same thing: I was adding logs afterwards for testing, but then removed them when I was done

tstirrat15 · 2026-02-17T18:09:43Z

internal/datastore/memdb/memdb.go

+	r.db.RLock()
+	defer r.db.RUnlock()
+
+	tx := r.db.db.Txn(false)


Changed to mdb.db

tstirrat15 · 2026-02-17T18:49:19Z

internal/datastore/mysql/datastore.go

+const (
+	ctxKeyTransactionID contextKey = "mysql_transaction_id"
+)


Use ctxkey here

tstirrat15 · 2026-02-17T18:50:53Z

internal/datastore/mysql/datastore.go

 		return nil, err
 	}

+	// Hash-based cache doesn't need watchers or warming


Does this contradict line 344?

Yes, it does. Watching is still correct, but I removed it

tstirrat15 · 2026-02-17T18:59:13Z

internal/datastore/postgres/migrations/zz_migration.0024_add_schema_tables.go

+		created_xid xid8 NOT NULL DEFAULT (pg_current_xact_id()),
+		deleted_xid xid8 NOT NULL DEFAULT ('9223372036854775807'),
+		CONSTRAINT pk_schema PRIMARY KEY (name, chunk_index, created_xid));`,
+	`CREATE INDEX ix_schema_gc ON schema (deleted_xid DESC) WHERE deleted_xid < '9223372036854775807'::xid8;`,


Curious - does an index help when there's one (or an otherwise small number of) row? Or is it actually faster to let it table scan?

WHERE deleted_xid will make it faster to use an index, if there are a lot of schema changes in a short window

Oh, yeah, that makes sense - it was more of a general question. If we're using an index for GC, would it make sense to also have an index for the content lookups, or does it not make sense because the number of rows is small (1?) and it'd just be overhead? I don't know enough about datastores to know how indices behave when the data are small

miparnisari · 2026-02-17T19:28:15Z

pkg/cmd/datastore/datastore.go

 	}

+	// Parse the experimental schema mode string if provided
+	if opts.experimentalSchemaModeString != "" {


if you run go run ./cmd/spicedb/main.go serve --grpc-preshared-key foobar --datastore-experimental-schema-mode=read-legacy-write-both, by the time execution gets here, the value of opts.experimentalSchemaModeString is somehow read-legacy-write-legacy. and when the server prints its config, "ExperimentalSchemaMode":0 (why a number and not a string??)

in conclusion: the flag isn't being passed down properly, and the printed version of it makes for a poor dev experience

miparnisari · 2026-02-17T20:03:32Z

internal/datastore/postgres/postgres.go

+	}
+
+	// Create schema reader/writer
+	schemaReaderWriter, err := common.NewSQLSchemaReaderWriter[uint64, postgresRevision](BaseSchemaChunkerConfig, config.schemaCacheOptions)


putting this call here means that we are creating a new cache object once for every replica that the server is connected to.

why did you choose this design and not a proxy datastore object that holds the cache and if an entry isn't there we fallback to the actual datastore impl?

I didn't want any proxies to be able to interact directly with the cache; the current proxy-based design suffers from a problem: it is easy to forget to add the proxy, and it makes it hard/impossible for the datastores themselves to warm and update the cache; this was a bigger problem when I was using a watch-based update mechanism, but I suppose I could move to a proxy-based design now

josephschorr · 2026-02-25T20:29:51Z

Closing in favor of #2924

github-actions bot added area/cli Affects the command line area/api v1 Affects the v1 API area/datastore Affects the storage system area/tooling Affects the dev or user toolchain (e.g. tests, ci, build tools) area/dispatch Affects dispatching of requests labels Feb 10, 2026

josephschorr force-pushed the unified-schema-hash branch 4 times, most recently from baaee98 to e28cc5f Compare February 12, 2026 17:24

josephschorr and others added 19 commits February 12, 2026 12:29

perf: change to a map for the sentinels in the hashcache

163fe1f

chore: fix linting issues

d955beb

josephschorr force-pushed the unified-schema-hash branch from e28cc5f to e28922b Compare February 12, 2026 18:15

josephschorr added 2 commits February 12, 2026 13:24

fix: buffered transactional writes in Spanner cannot be read

2f16006

We therefore need to read them ourselves, manually, in the new schema writer

josephschorr force-pushed the unified-schema-hash branch from e28922b to 2f16006 Compare February 12, 2026 19:16

fix: spanner fixes for the migrations and the unified schema test

bbbf8b0

miparnisari reviewed Feb 13, 2026

View reviewed changes

miparnisari reviewed Feb 17, 2026

View reviewed changes

tstirrat15 reviewed Feb 17, 2026

View reviewed changes

miparnisari reviewed Feb 17, 2026

View reviewed changes

josephschorr closed this Feb 25, 2026

github-actions bot locked and limited conversation to collaborators Feb 25, 2026

	descCacheHitsTotal = prometheus.NewDesc(
	stringz.Join("_", promNamespace, promSubsystem, "hits_total"),
	"Number of cache hits",
	[]string{"cache"},
	nil,
	)

	descCacheMissesTotal = prometheus.NewDesc(
	stringz.Join("_", promNamespace, promSubsystem, "misses_total"),
	"Number of cache misses",
	[]string{"cache"},
	nil,
	)

Conversation

josephschorr commented Feb 10, 2026

Summary: Unified Schema Storage Implementation

Overview

Core Architecture Changes

Key Components

1. StoredSchema Protocol Buffer

2. Database Schema

3. Migration Modes

4. Hash-Based Cache (HashCache)

5. Schema Hash Integration

Uh oh!

codecov bot commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

miparnisari Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

miparnisari Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

miparnisari Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

miparnisari Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

miparnisari left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tstirrat15 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

codecov bot commented Feb 10, 2026 •

edited

Loading

miparnisari Feb 13, 2026 •

edited

Loading

miparnisari Feb 13, 2026 •

edited

Loading

miparnisari Feb 13, 2026 •

edited

Loading

miparnisari Feb 13, 2026 •

edited

Loading

miparnisari left a comment •

edited

Loading

miparnisari Feb 17, 2026 •

edited

Loading