Skip to content

feat: add unified schema storage with ReadStoredSchema/WriteStoredSchema#2924

Open
josephschorr wants to merge 1 commit intoauthzed:mainfrom
josephschorr:unified-schema-hash-v2
Open

feat: add unified schema storage with ReadStoredSchema/WriteStoredSchema#2924
josephschorr wants to merge 1 commit intoauthzed:mainfrom
josephschorr:unified-schema-hash-v2

Conversation

@josephschorr
Copy link
Member

@josephschorr josephschorr commented Feb 25, 2026

Introduce the foundation for unified schema storage, where the entire compiled schema (definitions + schema text + hash) is stored as a single serialized proto blob, chunked across rows for SQL datastores.

Datastore interface changes:

  • Add ReadStoredSchema(ctx, SchemaHash) to Reader interface
  • Add WriteStoredSchema(ctx, *StoredSchema) to ReadWriteTransaction
  • Add SchemaHash type with sentinel values for cache bypass

DataLayer interface changes:

  • SnapshotReader now takes (Revision, SchemaHash) to thread cache keys
  • OptimizedRevision/HeadRevision return SchemaHash alongside Revision
  • Add SchemaMode configuration for migration path (legacy → dual → new)
  • Add storedSchemaReaderAdapter for reading from StoredSchema protos
  • Add writeSchemaViaStoredSchema for building and writing StoredSchema

Storage implementation:

  • Add SQLByteChunker generic chunked blob storage for SQL datastores
  • Add SQLSingleStoreSchemaReaderWriter for read/write of StoredSchema
  • Add per-datastore ReadStoredSchema/WriteStoredSchema implementations for postgres, crdb, mysql, spanner, and memdb
  • Add schema table migrations for all SQL datastores
  • Add populate migrations to backfill from legacy namespace/caveat tables
  • Rename MySQL schema table to stored_schema (schema is a reserved word)

Caching:

  • Add SchemaHashCache with LRU + singleflight for schema-by-hash lookups

Proxy/middleware updates:

  • Add ReadStoredSchema/WriteStoredSchema pass-through to all datastore proxies (observable, counting, readonly, singleflight, indexcheck, strictreplicated, checkingreplicated, relationshipintegrity, hashcache)
  • Update consistency middleware for new HeadRevision/OptimizedRevision signatures

Proto changes:

  • Add StoredSchema message to core.proto with V1StoredSchema containing schema_text, schema_hash, namespace_definitions, caveat_definitions

@github-actions github-actions bot added area/api v1 Affects the v1 API area/datastore Affects the storage system area/tooling Affects the dev or user toolchain (e.g. tests, ci, build tools) area/dispatch Affects dispatching of requests labels Feb 25, 2026
@josephschorr josephschorr force-pushed the unified-schema-hash-v2 branch from 61106a9 to 2a945a4 Compare February 25, 2026 20:38
@codecov
Copy link

codecov bot commented Feb 25, 2026

Codecov Report

❌ Patch coverage is 70.74236% with 335 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.92%. Comparing base (b554261) to head (0ccb2b1).

Files with missing lines Patch % Lines
internal/datastore/spanner/schema_chunker.go 48.72% 36 Missing and 4 partials ⚠️
pkg/datalayer/impl.go 55.96% 32 Missing and 5 partials ⚠️
internal/datastore/mysql/schema_chunker.go 50.82% 24 Missing and 6 partials ⚠️
pkg/datalayer/hashcache.go 73.69% 17 Missing and 8 partials ⚠️
internal/datastore/crdb/schema_chunker.go 55.56% 20 Missing and 4 partials ⚠️
internal/datastore/postgres/schema_chunker.go 52.00% 20 Missing and 4 partials ⚠️
internal/datastore/memdb/storedschema.go 51.17% 14 Missing and 7 partials ⚠️
internal/datastore/proxy/proxy_test/mock.go 0.00% 17 Missing ⚠️
pkg/datalayer/schema_adapter.go 90.29% 12 Missing and 5 partials ⚠️
internal/datastore/common/sqlschema.go 82.76% 7 Missing and 3 partials ⚠️
... and 19 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2924      +/-   ##
==========================================
+ Coverage   74.83%   74.92%   +0.10%     
==========================================
  Files         497      509      +12     
  Lines       60621    61508     +887     
==========================================
+ Hits        45359    46081     +722     
- Misses      12103    12224     +121     
- Partials     3159     3203      +44     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@josephschorr josephschorr force-pushed the unified-schema-hash-v2 branch 4 times, most recently from 4ca4824 to 5b0d403 Compare February 27, 2026 17:47
@josephschorr josephschorr marked this pull request as ready for review February 27, 2026 17:57
@josephschorr josephschorr requested a review from a team as a code owner February 27, 2026 17:57
@josephschorr josephschorr force-pushed the unified-schema-hash-v2 branch 3 times, most recently from fa119e6 to b1b272e Compare March 3, 2026 16:38
return definitions[i].GetName() < definitions[j].GetName()
})

schemaText, _, err := generator.GenerateSchema(definitions)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i can't find a single place where the _ return value is actually used. for a follow up PR: remove it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow up

Copy link
Contributor

@miparnisari miparnisari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happened to the flags?

@josephschorr josephschorr force-pushed the unified-schema-hash-v2 branch from b1b272e to 3ab0a24 Compare March 4, 2026 19:51
@josephschorr
Copy link
Member Author

Updated

@github-actions github-actions bot added area/cli Affects the command line area/schema Affects the Schema Language labels Mar 4, 2026
@josephschorr josephschorr force-pushed the unified-schema-hash-v2 branch from 3ab0a24 to b98f4dd Compare March 4, 2026 20:14
@miparnisari
Copy link
Contributor

Things that are missing, IMO:

  • update the existing flags, any deprecations?
  • observability? e.g. add spans around the new chunking handlers, and around the cache layer. Also maybe attributes, e.g. SchemaReadFromcached=true/false

@josephschorr josephschorr force-pushed the unified-schema-hash-v2 branch 3 times, most recently from a6c4fce to 479c1d5 Compare March 11, 2026 17:38
@josephschorr josephschorr force-pushed the unified-schema-hash-v2 branch from 479c1d5 to 53eb317 Compare March 12, 2026 18:06
Introduce the foundation for unified schema storage, where the entire
compiled schema (definitions + schema text + hash) is stored as a single
serialized proto blob, chunked across rows for SQL datastores.

Datastore interface changes:
- Add ReadStoredSchema(ctx, SchemaHash) to Reader interface
- Add WriteStoredSchema(ctx, *StoredSchema) to ReadWriteTransaction
- Add SchemaHash type with sentinel values for cache bypass

DataLayer interface changes:
- SnapshotReader now takes (Revision, SchemaHash) to thread cache keys
- OptimizedRevision/HeadRevision return SchemaHash alongside Revision
- Add SchemaMode configuration for migration path (legacy → dual → new)
- Add storedSchemaReaderAdapter for reading from StoredSchema protos
- Add writeSchemaViaStoredSchema for building and writing StoredSchema

Storage implementation:
- Add SQLByteChunker generic chunked blob storage for SQL datastores
- Add SQLSingleStoreSchemaReaderWriter for read/write of StoredSchema
- Add per-datastore ReadStoredSchema/WriteStoredSchema implementations
  for postgres, crdb, mysql, spanner, and memdb
- Add schema table migrations for all SQL datastores
- Add populate migrations to backfill from legacy namespace/caveat tables
- Rename MySQL schema table to stored_schema (schema is a reserved word)

Caching:
- Add SchemaHashCache with LRU + singleflight for schema-by-hash lookups

Proxy/middleware updates:
- Add ReadStoredSchema/WriteStoredSchema pass-through to all datastore
  proxies (observable, counting, readonly, singleflight, indexcheck,
  strictreplicated, checkingreplicated, relationshipintegrity, hashcache)
- Update consistency middleware for new HeadRevision/OptimizedRevision
  signatures

Proto changes:
- Add StoredSchema message to core.proto with V1StoredSchema containing
  schema_text, schema_hash, namespace_definitions, caveat_definitions
@josephschorr josephschorr force-pushed the unified-schema-hash-v2 branch from 53eb317 to 0ccb2b1 Compare March 12, 2026 19:58
return "", err
}

// Sort the definitions by name for deterministic output
Copy link
Contributor

@miparnisari miparnisari Mar 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please update the godoc of SchemaText (and of the interface's godoc too) to say that the defs and caveats will be sorted in the response.
i'm not convinced that in the new storage implementation this is the case. i don't see where the sorting is happening in the code path that uses WriteSchemaViaStoredSchema

Comment on lines +481 to +484
// WriteSchemaViaStoredSchema builds a StoredSchema proto and writes it via WriteStoredSchema.
// If cache is nil, a no-op cache is used.
func WriteSchemaViaStoredSchema(ctx context.Context, rwt datastore.ReadWriteTransaction,
definitions []datastore.SchemaDefinition, schemaString string, cache storedSchemaCache,
Copy link
Contributor

@miparnisari miparnisari Mar 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function isn't doing any sorting. please amend the godoc to say what are the expectations around order. for example, must the caller sort definitions by name? must schemaString be sorted too? lastly, is the caller satisfying these pre-requisites?

return nil, nil
}

return schema, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are missing a unit test that hits this line. Please add:

func TestSchemaHashCache_SlowPathCacheHit(t *testing.T) {
	shc := newSchemaHashCache(newTestSchemaCache())

	schema1 := makeTestSchema("definition v1 {}")
	schema2 := makeTestSchema("definition v2 {}")

	// Set hash1, then hash2 — latest now points to hash2
	err := shc.Set(SchemaHash("hash1"), schema1)
	require.NoError(t, err)
	err = shc.Set(SchemaHash("hash2"), schema2)
	require.NoError(t, err)

	shc.cache.Wait()

	// Get hash1 — latest is hash2, so fast path misses,
	// but backing cache should have it (slow path hit, line 95)
	retrieved, err := shc.get(SchemaHash("hash1"))
	require.NoError(t, err)
	require.NotNil(t, retrieved)
	require.Equal(t, "definition v1 {}", retrieved.Get().GetV1().SchemaText)
}

Comment on lines +76 to +77
txn := &fakeTransaction{}
executor := &fakeExecutor{transaction: txn}
Copy link
Contributor

@miparnisari miparnisari Mar 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand these unit tests. run them by hand and look at coverage: they cover zero lines of the mysql chunker code. you are testing a fakeTx and a fakeExecutor. shouldn't be using mysqlTransactionAwareTransaction and mysqlRevisionAwareExecutor?

spanner/schema_chunker_test.go has the same issue

return fmt.Errorf("failed to generate canonical schema: %w", err)
}

// Compute SHA256 hash
Copy link
Contributor

@miparnisari miparnisari Mar 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here and in every other similar migration, use ComputeSchemaHash. also, this line appears twice generator.GenerateSchema(allDefs)

actually, since ComputeSchemaHash is generating the schema text as well, you could return it and use it directly.

finally: please amend the godoc of generator.GenerateSchema and generator.GenerateSchemaWithCaveatTypeSet to say that it does NOT do any sorting and callers are expected to do that


func (r *revisionedReader) ReadSchema(ctx context.Context) (SchemaReader, error) {
if r.schemaMode.ReadsFromNew() {
return newStoredSchemaReaderAdapter(r.reader, r.schemaHash, r.rev, r.cache)
Copy link
Contributor

@miparnisari miparnisari Mar 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pass the ctx to newStoredSchemaReaderAdapter and then use it in here: ctx, span := tracer.Start(context.Background(), "ReadStoredSchema")

otherwise the span for ReadStoredSchema won't have the right parent:

image

(i also don't know why 1 call to CheckPermission requires 4 schema reads... i am using read-new-write-new)

@authzed authzed deleted a comment from miparnisari Mar 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/api v1 Affects the v1 API area/cli Affects the command line area/datastore Affects the storage system area/dispatch Affects dispatching of requests area/schema Affects the Schema Language area/tooling Affects the dev or user toolchain (e.g. tests, ci, build tools)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants