Design: replica capacity#31304

Open

antiguru wants to merge 1 commit intoMaterializeInc:mainfrom

antiguru:design_replica_capacity

Member

antiguru commented Feb 6, 2025 •

edited

Loading

Design for a replica capacity measurement.

Checklist

This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.


          Initial replica capacity design

9b3e136

Signed-off-by: Moritz Hoffmann <mh@materialize.com>

mgree reviewed

View reviewed changes

Contributor

mgree left a comment

Looks good to me. As a document, maybe a touch more abstract than we need it? If we're treating disk as elastic memory, do we really mean any resource other than memory? Can CPU starvation cause a cluster to not come up at all, or just not within a timeframe we're happy with?

doc/developer/design/20250206_replica_capacity.md

+              -->
+              A core problem of using Materialize today is that users cannot rely on a configuration that works today to continue working in the future.
+              We give indications about a workload's status, but it is easy to ignore them, or drift into an unsupported configuration by workload changes or unintended DDL operations.

Contributor

mgree Feb 12, 2025

Suggested change

      
            We give indications about a workload's status, but it is easy to ignore them, or drift into an unsupported configuration by workload changes or unintended DDL operations.
          
            We give indications about a workload's status, but it is easy to ignore them, misunderstand them, or drift into an unsupported configuration by workload changes or unintended DDL operations.

doc/developer/design/20250206_replica_capacity.md

+              At the moment, we present detailed metrics, such as memory, CPU and disk utilization, for replicas, with the hope that the metrics successfully characterize the health of a replica, and allow users to make scaling decisions.
+              While the metrics are useful from an operational perspective, they are not suitable to predict the future behavior of a replica.
+              Specifically, they are a bad predictor for whether a replica can successfully restart, since the resource utilization during restart can be different from steady-state.

Contributor

mgree Feb 12, 2025

Suggested change

      
            Specifically, they are a bad predictor for whether a replica can successfully restart, since the resource utilization during restart can be different from steady-state.
          
            Specifically, memory/disk utilization on their own are a bad/easy-to-misunderstand predictor for whether a replica can successfully restart, since the resource utilization during restart is almost certainly different from steady-state.

doc/developer/design/20250206_replica_capacity.md

+              In the absence of advance analysis of query plans, it's better to look at observed metrics.
+              We know the steady-state memory utilization and ignore other signals like the size of its inputs.
+              The required resources are within a factor of two from the steady-state utilization.

Contributor

mgree Feb 12, 2025

Suggested change

      
            The required resources are within a factor of two from the steady-state utilization.
          
            The memory resources required to restart and hydrate a repliace are within a factor of two from the steady-state utilization.

doc/developer/design/20250206_replica_capacity.md

+              ### Materialized views
+              Materialized views suffer from the same problem as indexes, but they do not necessarily maintain their output in memory.
+              The resource requirements can be approximated as within a factor of two of its steady-state plus the output size.

Contributor

mgree Feb 12, 2025

Suggested change

      
            The resource requirements can be approximated as within a factor of two of its steady-state plus the output size.
          
            The memory resource requirements to restart and hydrate can be approximated as within a factor of two of its steady-state plus the output size.

doc/developer/design/20250206_replica_capacity.md


		### Sources and sinks

		@antiguru cannot say much about non-compute objects. :/

Contributor

mgree Feb 12, 2025

One hopes that the memory required is linear in the snapshot size?

doc/developer/design/20250206_replica_capacity.md


		### Combining resource requirements

		Once we know the resource requirements of each object we can estimate the resources required to successfully restart a replica.

Contributor

mgree Feb 12, 2025

You say "resource" but do you really mean anything other than memory?

doc/developer/design/20250206_replica_capacity.md

+              like to skip or delay it.
+              -->
+              Before implementing any of this idea in code, we can validate the hypotheses by observing how Materialize behaves in production.

Contributor

mgree Feb 12, 2025

Are these questions we can answer using historical data?

doc/developer/design/20250206_replica_capacity.md

+              -->
+              * Should we have a finer-grained model that can take sequential hydration into account?
+                While possible, it would require us to exercise more control over what gets hydrated in which order, which I think is a separate problem.

Contributor

mgree Feb 12, 2025

Yes: rehydration is a capacity scheduling problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet