feat: add unified schema storage with ReadStoredSchema/WriteStoredSchema#2924
feat: add unified schema storage with ReadStoredSchema/WriteStoredSchema#2924josephschorr wants to merge 1 commit intoauthzed:mainfrom
Conversation
61106a9 to
2a945a4
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2924 +/- ##
==========================================
+ Coverage 74.83% 74.92% +0.10%
==========================================
Files 497 509 +12
Lines 60621 61508 +887
==========================================
+ Hits 45359 46081 +722
- Misses 12103 12224 +121
- Partials 3159 3203 +44 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
4ca4824 to
5b0d403
Compare
fa119e6 to
b1b272e
Compare
| return definitions[i].GetName() < definitions[j].GetName() | ||
| }) | ||
|
|
||
| schemaText, _, err := generator.GenerateSchema(definitions) |
There was a problem hiding this comment.
i can't find a single place where the _ return value is actually used. for a follow up PR: remove it
internal/datastore/crdb/migrations/zz_migration.0011_populate_schema_tables.go
Show resolved
Hide resolved
miparnisari
left a comment
There was a problem hiding this comment.
what happened to the flags?
b1b272e to
3ab0a24
Compare
|
Updated |
3ab0a24 to
b98f4dd
Compare
|
Things that are missing, IMO:
|
a6c4fce to
479c1d5
Compare
479c1d5 to
53eb317
Compare
Introduce the foundation for unified schema storage, where the entire compiled schema (definitions + schema text + hash) is stored as a single serialized proto blob, chunked across rows for SQL datastores. Datastore interface changes: - Add ReadStoredSchema(ctx, SchemaHash) to Reader interface - Add WriteStoredSchema(ctx, *StoredSchema) to ReadWriteTransaction - Add SchemaHash type with sentinel values for cache bypass DataLayer interface changes: - SnapshotReader now takes (Revision, SchemaHash) to thread cache keys - OptimizedRevision/HeadRevision return SchemaHash alongside Revision - Add SchemaMode configuration for migration path (legacy → dual → new) - Add storedSchemaReaderAdapter for reading from StoredSchema protos - Add writeSchemaViaStoredSchema for building and writing StoredSchema Storage implementation: - Add SQLByteChunker generic chunked blob storage for SQL datastores - Add SQLSingleStoreSchemaReaderWriter for read/write of StoredSchema - Add per-datastore ReadStoredSchema/WriteStoredSchema implementations for postgres, crdb, mysql, spanner, and memdb - Add schema table migrations for all SQL datastores - Add populate migrations to backfill from legacy namespace/caveat tables - Rename MySQL schema table to stored_schema (schema is a reserved word) Caching: - Add SchemaHashCache with LRU + singleflight for schema-by-hash lookups Proxy/middleware updates: - Add ReadStoredSchema/WriteStoredSchema pass-through to all datastore proxies (observable, counting, readonly, singleflight, indexcheck, strictreplicated, checkingreplicated, relationshipintegrity, hashcache) - Update consistency middleware for new HeadRevision/OptimizedRevision signatures Proto changes: - Add StoredSchema message to core.proto with V1StoredSchema containing schema_text, schema_hash, namespace_definitions, caveat_definitions
53eb317 to
0ccb2b1
Compare
| return "", err | ||
| } | ||
|
|
||
| // Sort the definitions by name for deterministic output |
There was a problem hiding this comment.
please update the godoc of SchemaText (and of the interface's godoc too) to say that the defs and caveats will be sorted in the response.
i'm not convinced that in the new storage implementation this is the case. i don't see where the sorting is happening in the code path that uses WriteSchemaViaStoredSchema
| // WriteSchemaViaStoredSchema builds a StoredSchema proto and writes it via WriteStoredSchema. | ||
| // If cache is nil, a no-op cache is used. | ||
| func WriteSchemaViaStoredSchema(ctx context.Context, rwt datastore.ReadWriteTransaction, | ||
| definitions []datastore.SchemaDefinition, schemaString string, cache storedSchemaCache, |
There was a problem hiding this comment.
this function isn't doing any sorting. please amend the godoc to say what are the expectations around order. for example, must the caller sort definitions by name? must schemaString be sorted too? lastly, is the caller satisfying these pre-requisites?
| return nil, nil | ||
| } | ||
|
|
||
| return schema, nil |
There was a problem hiding this comment.
you are missing a unit test that hits this line. Please add:
func TestSchemaHashCache_SlowPathCacheHit(t *testing.T) {
shc := newSchemaHashCache(newTestSchemaCache())
schema1 := makeTestSchema("definition v1 {}")
schema2 := makeTestSchema("definition v2 {}")
// Set hash1, then hash2 — latest now points to hash2
err := shc.Set(SchemaHash("hash1"), schema1)
require.NoError(t, err)
err = shc.Set(SchemaHash("hash2"), schema2)
require.NoError(t, err)
shc.cache.Wait()
// Get hash1 — latest is hash2, so fast path misses,
// but backing cache should have it (slow path hit, line 95)
retrieved, err := shc.get(SchemaHash("hash1"))
require.NoError(t, err)
require.NotNil(t, retrieved)
require.Equal(t, "definition v1 {}", retrieved.Get().GetV1().SchemaText)
}| txn := &fakeTransaction{} | ||
| executor := &fakeExecutor{transaction: txn} |
There was a problem hiding this comment.
I don't understand these unit tests. run them by hand and look at coverage: they cover zero lines of the mysql chunker code. you are testing a fakeTx and a fakeExecutor. shouldn't be using mysqlTransactionAwareTransaction and mysqlRevisionAwareExecutor?
spanner/schema_chunker_test.go has the same issue
| return fmt.Errorf("failed to generate canonical schema: %w", err) | ||
| } | ||
|
|
||
| // Compute SHA256 hash |
There was a problem hiding this comment.
here and in every other similar migration, use ComputeSchemaHash. also, this line appears twice generator.GenerateSchema(allDefs)
actually, since ComputeSchemaHash is generating the schema text as well, you could return it and use it directly.
finally: please amend the godoc of generator.GenerateSchema and generator.GenerateSchemaWithCaveatTypeSet to say that it does NOT do any sorting and callers are expected to do that
|
|
||
| func (r *revisionedReader) ReadSchema(ctx context.Context) (SchemaReader, error) { | ||
| if r.schemaMode.ReadsFromNew() { | ||
| return newStoredSchemaReaderAdapter(r.reader, r.schemaHash, r.rev, r.cache) |
There was a problem hiding this comment.
pass the ctx to newStoredSchemaReaderAdapter and then use it in here: ctx, span := tracer.Start(context.Background(), "ReadStoredSchema")
otherwise the span for ReadStoredSchema won't have the right parent:
(i also don't know why 1 call to CheckPermission requires 4 schema reads... i am using read-new-write-new)
Introduce the foundation for unified schema storage, where the entire compiled schema (definitions + schema text + hash) is stored as a single serialized proto blob, chunked across rows for SQL datastores.
Datastore interface changes:
DataLayer interface changes:
Storage implementation:
Caching:
Proxy/middleware updates:
Proto changes: