clusterd: Unified Timely runtime prototype #34713

antiguru · 2026-01-15T13:37:23Z

Summary

This PR implements a prototype for unifying the compute and storage Timely runtimes in clusterd, as described in the design document.

Key changes:

New unified worker architecture: Single step_or_park loop coordinating both compute and storage domains
Multiplexed command/response types: ClusterCommand and ClusterResponse enums that wrap compute and storage messages
ClusterSpec implementation: UnifiedConfig implements the cluster trait for the unified runtime
Public API changes: Made necessary types/methods public in compute and storage crates

Files added:

src/clusterd/src/unified_config.rs - Configuration and multiplexed types
src/clusterd/src/unified_worker.rs - Unified worker implementation

Current limitations (prototype):

Storage internal command handling is stubbed (logs commands but doesn't render dataflows)
No unified introspection/logging infrastructure yet
Uses separate internal channels rather than GRPC multiplexing

Next steps:

Refactor storage Worker::handle_internal_storage_command to be callable externally
Wire up unified worker to be usable via clusterd flag
Add unified introspection infrastructure

Test plan

cargo check -p mz-clusterd passes
Integration testing with actual compute/storage workloads (future work)

🤖 Generated with Claude Code

This design document proposes unifying the compute and storage Timely runtimes in clusterd into a single runtime. Key aspects: - Single step_or_park loop coordinating both domains - Unified introspection via shared logging infrastructure - Compute-style command distribution for external commands - Separate controller connections initially, with optional multiplexing Co-Authored-By: Claude Opus 4.5 <[email protected]>

This commit implements a prototype for running both compute and storage dataflows within a single Timely runtime, as described in the design doc. Key components: - UnifiedConfig: Configuration holding shared resources (persist clients, txns context, tracing) plus compute and storage specific configuration - ClusterCommand/ClusterResponse: Multiplexed command/response enums - UnifiedWorker: Merged event loop coordinating both domains through a single step_or_park call - Channel adapter: Demultiplexes unified commands to compute and storage Changes to make types accessible: - Made compute modules public: command_channel, compute_state, metrics - Made storage setup_command_sequencer public - Made compute WorkerMetrics.timely_step_duration_seconds public - Made compute ActiveComputeState methods public (report_metrics, check_expiration) - Added Receiver::is_empty() to compute command_channel Limitations (noted in code): - Storage internal command handling is stubbed - full implementation requires refactoring storage Worker::handle_internal_storage_command - No unified introspection/logging infrastructure yet - Uses separate channels rather than GRPC multiplexing Co-Authored-By: Claude Opus 4.5 <[email protected]>

Refactors the storage internal command handling logic from Worker method into a free function that can be called from external code. This enables the unified worker to properly handle storage internal commands including dataflow rendering. Changes: - Add public handle_internal_storage_command<A: Allocate>() function that takes (timely_worker, storage_state, command) as separate parameters - Update Worker::handle_internal_storage_command to delegate to the new free function - Re-export handle_internal_storage_command from mz_storage - Update unified_worker.rs to use the new function instead of stubs The unified worker can now render storage dataflows (ingestions, sinks, oneshot ingestions) through the shared Timely runtime. Co-Authored-By: Claude Opus 4.5 <[email protected]>

This commit adds the ability to run clusterd with a unified Timely runtime for both compute and storage via the new --unified-runtime flag. Key changes: - Add serve() function to unified_config that creates client builders for both compute and storage that share the same underlying cluster - Add ComputeAdapter and StorageAdapter to translate between the specific protocols and the unified ClusterCommand/ClusterResponse - Implement Partitionable for the unified command/response pair - Add --unified-runtime CLI flag to clusterd - Update run() to conditionally use unified or separate runtimes When --unified-runtime is enabled, both compute and storage controllers connect to the same underlying Timely cluster, reducing resource usage by eliminating one of the two Timely runtimes. Co-Authored-By: Claude Opus 4.5 <[email protected]>

This commit wires up the unified runtime feature flag through the orchestrator to clusterd. Key changes: - Add ENABLE_UNIFIED_RUNTIME dyncfg in controller-types/dyncfgs.rs - Pass --unified-runtime flag to clusterd when the dyncfg is enabled - The flag is read at replica provision time and passed to all managed cluster replicas To enable the unified runtime: ALTER SYSTEM SET enable_unified_runtime = true; Note: This requires restarting cluster replicas to take effect. Co-Authored-By: Claude Opus 4.5 <[email protected]>

Add test-unified-runtime workflow to test/cluster/mzcompose.py that verifies the unified Timely runtime feature works correctly. The test: - Enables the enable_unified_runtime dyncfg - Creates a managed cluster (provisioned with --unified-runtime) - Tests compute workloads: tables, materialized views, indexes - Tests storage workloads: Postgres source with ongoing replication - Tests compute on storage data: MVs reading from Postgres source - Tests joins between tables and Postgres source data - Verifies ongoing replication updates propagate correctly Co-Authored-By: Claude Opus 4.5 <[email protected]>

antiguru and others added 6 commits January 15, 2026 13:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

clusterd: Unified Timely runtime prototype #34713

clusterd: Unified Timely runtime prototype #34713

antiguru commented Jan 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

clusterd: Unified Timely runtime prototype #34713

Are you sure you want to change the base?

clusterd: Unified Timely runtime prototype #34713

Conversation

antiguru commented Jan 15, 2026

Summary

Key changes:

Files added:

Current limitations (prototype):

Next steps:

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant