-
Notifications
You must be signed in to change notification settings - Fork 115
Description
Problem: a customer confirmed that CD convergence is still slow for large node counts -- see #816. Solutions: in November 2025, after our last round of improvements, we took note of two solution strategies that we would look into once necessary:
Any kind of sharding on a per-clique level may be super useful. And/or server-side applies.
We went ahead with server-side applies (SSA) (in #822). Subsequently, we measured that SSA-based convergence actually performs and scales worse than the pre-SSA convergence method. We then started to explore per-clique sharding (in #826). Initial measurements suggest that it yields the desired scaling behavior. A selection of measurement results are shown in the plot(s) below.
Convergence time over node count for different convergence methods
Main conclusions:
- The gray data points correspond to the pre-SSA method -- it is roughly a straight line (in a log-log plot) and hence confirms exponential growth.
- The magenta samples confirms that per-clique sharding (as proposed by Maintain CD daemon info in a new per-CD, per-clique ComputeDomainClique object #826) indeed can lead to quasi-constant-time scaling behavior.
- The dashed lines represent SSA / SSA-with-fixes; they show that SSA performs worse than the pre-SSA method for more than just a hand full of nodes (we tested the SSA patch CD daemon: use SSA for conflict-free nodes list updates #822 with just four nodes before merging).
Caveats:
- The measurement environment was a simulation environment (WIP: simulate large node counts: run multiple dummy CD daemons per node #827). The real environment may have other relevant bottlenecks that we couldn't cover in this simulation environment -- it's unlikely, but has to be pointed out.
- We still need to measure larger N -- towards O(10**4). The per-clique sharding technique effectively relies on being able to make thousands of independent write requests to the API server per second. The API server seemingly can do it, but of course depending on how exactly it's deployed and other workload in the cluster there will be a natural point of contention.
- Each data point above corresponds to one measurement. There of course is variance across repetitions which we did not thoroughly measure. The main conclusions with respect to the scaling behavior of the different methods are likely to be robust. In the future, we'll measure variance through repetitions. "Eine Messung ist keine Messung".
Appendix: same plot, using linear scales instead of a log-log representation (click to enlarge):

Metadata
Metadata
Assignees
Labels
Type
Projects
Status