fix(test): regenerate corrupted db fixtures and fix grpc server startup race#437
Merged
fix(test): regenerate corrupted db fixtures and fix grpc server startup race#437
Conversation
Use a persistent database (SyncMode::Durable) instead of in-memory (SyncMode::UtterlyNoSync) for generating test DB snapshots, and properly stop the node before archiving. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The gRPC server was spawning `serve_with_shutdown` on a background task and returning immediately, creating a race condition where clients could attempt to connect before the server had bound its port. This was masked by the MDBX_CORRUPTED error but became visible once the database fixtures were fixed. Bind the TCP listener eagerly before spawning the server task, then pass the already-bound listener via `serve_with_incoming_shutdown`. This guarantees the server is accepting connections when `start()` returns and also correctly resolves port 0 to the actual assigned port. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The
katana-grpctests in CI have been consistently failing. The root cause turned out to be two separate bugs, one masking the other.Bug 1: Corrupted database fixtures
The
generate_migration_dbbinary that produces thespawn_and_move.tar.gzandsimple.tar.gzfixture archives was creating the database viaDb::in_memory()which usesSyncMode::UtterlyNoSync— a mode that does not guarantee committed data is flushed to disk. The binary then archived the database files while the MDBX environment was still open and the node still running, capturing an inconsistent on-disk state. This manifested as:The fix switches
generate_migration_dbto use a persistent database directory withSyncMode::Durable, stops the node before archiving, and excludes the non-portablemdbx.lcklock file from archives. The fixture archives have been regenerated on Linux x86_64 inside theghcr.io/dojoengine/katana-dev:latestDocker container.Two regression tests (
open_spawn_and_move_db_fixture,open_simple_db_fixture) verify the fixtures can be opened without corruption:Bug 2: gRPC server startup race condition
Once the database corruption was fixed, the gRPC tests still failed with
Connection refused. The gRPC server'sstart()method was spawningserve_with_shutdownon a background task and returning immediately — before the TCP port was bound. The test'sGrpcClient::connect()fired before the server was listening. This was always a latent bug but was masked by the MDBX_CORRUPTED error which prevented the node from starting at all.The fix binds the
TcpListenereagerly before spawning the server task, then passes it viaserve_with_incoming_shutdown. This guarantees the server is accepting connections whenstart()returns and correctly resolves port 0 to the actual assigned port — matching how the RPC server already works.