Skip to content

Commit 446de3d

Browse files
knaveCode2Life
andauthored
feat: implement autoscaling (#242)
* feat: implement the core auto-scaling functionality * test: refactor test when processing workloads * feat: implement LeaderElectonRunnable explicitly and add compile-time check * feat: aggregate samples into histogram per tflops * feat: implement metrics provider * feat: add allocator logic * refactor: optimize update worker method * feat: add config parsing * feat: apply updates to specified target resources * feat: add auto-scaling switch config parsing and apply, TargetResource support value all * feat: merge AutoSetLimits and AutoSetRequests into AutoSetResources * feat: implement adjust allocation * fix: linter issues * fix: linter issues * refactor: support multiple recommenders * refactor: code organization * feat: define cron scaler crd * feat: implement cron scaling * feat: implement cron scaling * feat: implement merging recommendations * feat: implement restoring resources upon cron scaling termination * fix: properly handle the isScaleUp * refactor: each recommender is responsible for managing its own annotations * refactor: remove unused functions and params * feat: implement scale-down lock * refactor: improve naming * fix: scale down issue * feat: add a recommendation field to the status of the workload and implement * test: refactor tests * feat: integrate the autoscaler into the main function * fix: timestamp field issue * refactor: inject the metrics provider dependency * refactor: add namespace to identify the workload and process the namespace field of table * fix: handle zero resource value properly * feat: add condition type RecommendationProvided to workload * refactor: make percentile recommender more testable * refactor: improve status conditions * feat: add appliedRecommendedReplicas field to status and refactor * fix: handle activeCronScalingRule and applied replicas properly * fix: applied recommended replicas issue * fix: add vpa package * fix: linter issues * test: wrong suite name * fix: vram peak zero issue and fix query issue caused by time zone * feat: only recommend but do not apply recommendation if worker has a dedicated GPU * fix: missing autoScalingConfig * feat: implement max allowed resources recommendation processor * fix: linter issue * fix: only get the max allowed resources when scaling up * fix: test bug * feat: add targetResource annoataion * refactor: improve logs * fix: skipping preemption tests and linter issue * fix: handle gpu status properly --------- Co-authored-by: Joey Yang <[email protected]>
1 parent 2d08e62 commit 446de3d

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

43 files changed

+5281
-403
lines changed

api/v1/schedulingconfigtemplate_types.go

Lines changed: 60 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -86,17 +86,71 @@ type GPUFilter struct {
8686
}
8787

8888
type AutoScalingConfig struct {
89-
// layer 1 vertical auto-scaling, turbo burst to existing GPU cards quickly
90-
// VPA-like, aggregate metrics data <1m
91-
AutoSetLimits AutoSetLimits `json:"autoSetLimits,omitempty"`
89+
// layer 1 adjusting, to match the actual usage in the long run, only for N:M remote vGPU mode
90+
// Adjust baseline requests to match the actual usage in longer period, such as 1day - 2weeks
91+
AutoSetResources AutoSetResources `json:"autoSetResources,omitempty"`
9292

9393
// layer 2 horizontal auto-scaling, scale up to more GPU cards if max limits threshold hit
9494
// HPA-like, aggregate metrics data 1m-1h (when tf-worker scaled-up, should also trigger client pod's owner[Deployment etc.]'s replica increasing, check if KNative works)
9595
AutoSetReplicas AutoSetReplicas `json:"autoSetReplicas,omitempty"`
9696

97-
// layer 3 adjusting, to match the actual usage in the long run, only for N:M remote vGPU mode, not impl yet
98-
// Adjust baseline requests to match the actual usage in longer period, such as 1day - 2weeks
99-
AutoSetRequests AutoSetRequests `json:"autoSetRequests,omitempty"`
97+
// CronScalingRules defines a list of CronScaling rules used to schedule scaling actions based on cron expressions.
98+
CronScalingRules []CronScalingRule `json:"cronScalingRules,omitempty"`
99+
}
100+
101+
// CronScalingRule defines the rule for scaling resources based on a cron schedule.
102+
// It allows enabling/disabling the scaler, specifying the time window for scaling,
103+
// and configuring the desired resources and replicas during the scheduled period.
104+
type CronScalingRule struct {
105+
// Enable specifies whether the cron scaler is enabled.
106+
Enable bool `json:"enable,omitempty"`
107+
// Name is the identifier for the cron scaler.
108+
Name string `json:"name,omitempty"`
109+
// Start is the start time for the scaling schedule, in cron format.
110+
Start string `json:"start,omitempty"`
111+
// End is the end time for the scaling schedule, in cron format.
112+
End string `json:"end,omitempty"`
113+
// DesiredResources specifies the target resources to scale to during the schedule.
114+
DesiredResources Resources `json:"desiredResources,omitempty"`
115+
// DesiredReplicas is the target number of replicas during the schedule.
116+
DesiredReplicas *int32 `json:"desiredReplicas,omitempty"`
117+
}
118+
119+
type AutoSetResources struct {
120+
Enable bool `json:"enable,omitempty"`
121+
122+
// Target resource to scale, such as "tflops", "vram", or "all" by default
123+
TargetResource string `json:"targetResource,omitempty"`
124+
125+
// Tflops usage percentile that will be used as a base for tflops target recommendation. Default: 0.9
126+
TargetTflopsPercentile string `json:"targettflopspercentile,omitempty"`
127+
128+
// Tflops usage percentile that will be used for the lower bound on tflops recommendation. Default: 0.5
129+
LowerBoundTflopsPercentile string `json:"lowerboundtflopspercentile,omitempty"`
130+
131+
// Tflops usage percentile that will be used for the upper bound on tflops recommendation. Default: 0.95
132+
UpperBoundTflopsPercentile string `json:"upperboundtflopspercentile,omitempty"`
133+
134+
// Vram usage percentile that will be used as a base for vram target recommendation. Default: 0.9
135+
TargetVramPercentile string `json:"targetvrampercentile,omitempty"`
136+
137+
// Vram usage percentile that will be used for the lower bound on vram recommendation. Default: 0.5
138+
LowerBoundVramPercentile string `json:"lowerboundvrampercentile,omitempty"`
139+
140+
// Vram usage percentile that will be used for the upper bound on vram recommendation. Default: 0.95
141+
UpperBoundVramPercentile string `json:"upperboundvrampercentile,omitempty"`
142+
143+
// Fraction of usage added as the safety margin to the recommended request. Default: 0.15
144+
RequestMarginFraction string `json:"requestMarginFraction,omitempty"`
145+
146+
// The time interval used for computing the confidence multiplier for the lower and upper bound. Default: 24h
147+
ConfidenceInterval string `json:"confidenceInterval,omitempty"`
148+
149+
// How much time back TSDB have to be queried to get historical metrics. Default: 1d
150+
HistoryLength string `json:"historyLength,omitempty"`
151+
152+
// Resolution at which TSDB is queried for historical metrics. Default: 1m
153+
HistoryResolution string `json:"historyResolution,omitempty"`
100154
}
101155

102156
// A typical autoLimits algorithm could be checking every 5m, look back 1 day data,

api/v1/tensorfusionconnection_types.go

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,13 @@ import (
2121
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
2222
)
2323

24+
type ResourceName string
25+
26+
const (
27+
ResourceTflops ResourceName = "tflops"
28+
ResourceVram ResourceName = "vram"
29+
)
30+
2431
type Resource struct {
2532
Tflops resource.Quantity `json:"tflops"`
2633
Vram resource.Quantity `json:"vram"`
@@ -31,6 +38,23 @@ type Resources struct {
3138
Limits Resource `json:"limits"`
3239
}
3340

41+
func (r Resources) Equal(target *Resources) bool {
42+
if target == nil {
43+
return false
44+
}
45+
return r.Requests.Tflops.Equal(target.Requests.Tflops) &&
46+
r.Requests.Vram.Equal(target.Requests.Vram) &&
47+
r.Limits.Tflops.Equal(target.Limits.Tflops) &&
48+
r.Limits.Vram.Equal(target.Limits.Vram)
49+
}
50+
51+
func (r Resources) IsZero() bool {
52+
return r.Requests.Tflops.IsZero() &&
53+
r.Requests.Vram.IsZero() &&
54+
r.Limits.Tflops.IsZero() &&
55+
r.Limits.Vram.IsZero()
56+
}
57+
3458
// TensorFusionConnectionSpec defines the desired state of TensorFusionConnection.
3559
type TensorFusionConnectionSpec struct {
3660
WorkloadName string `json:"workloadName"`

api/v1/tensorfusionworkload_types.go

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,18 @@ type TensorFusionWorkloadStatus struct {
6565

6666
// Hash of the pod template used to create worker pods
6767
PodTemplateHash string `json:"podTemplateHash,omitempty"`
68+
69+
// The most recently GPU resources recommended by the autoscaler
70+
// +optional
71+
Recommendation *Resources `json:"recommendation,omitempty"`
72+
73+
// The number of replicas currently applied based on the latest recommendation
74+
// +optional
75+
AppliedRecommendedReplicas int32 `json:"appliedRecommendedReplicas,omitempty"`
76+
77+
// The currently active cron scaling rule
78+
// +optional
79+
ActiveCronScalingRule *CronScalingRule `json:"activeCronScalingRule,omitempty"`
6880
}
6981

7082
// +kubebuilder:object:root=true

api/v1/workloadprofile_types.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ type WorkloadProfileSpec struct {
6262
// +optional
6363
// AutoScalingConfig configured here will override Pool's schedulingConfig
6464
// This field can not be fully supported in annotation, if user want to enable auto-scaling in annotation,
65-
// user can set tensor-fusion.ai/auto-limits|requests|replicas: 'true'
65+
// user can set tensor-fusion.ai/auto-resources|replicas: 'true'
6666
AutoScalingConfig AutoScalingConfig `json:"autoScalingConfig,omitempty"`
6767

6868
// +optional

api/v1/zz_generated.deepcopy.go

Lines changed: 54 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)