diff --git a/docs/en/guide/recipes/configure-autoscaling.md b/docs/en/guide/recipes/configure-autoscaling.md index 13174af..29e3103 100644 --- a/docs/en/guide/recipes/configure-autoscaling.md +++ b/docs/en/guide/recipes/configure-autoscaling.md @@ -1,20 +1,30 @@ # Configure AutoScaling for AI Workloads -## Enable AutoScaling +## Step 1. Enable AutoScaling -### Simple Configuration with Pod AutoScaling Annotations +### Add Pod AutoScaling Annotations + +> To be used in conjunction with workload annotations:[Create Workload](/guide/recipes/create-workload#add-pod-annotations) ```yaml # Enable vertical scaling - autoResources: true + tensor-fusion.ai/auto-resources: 'true' # Configure target resource, options: all|tflops|vram, if empty only provides recommendations - targetResource: all + tensor-fusion.ai/auto-scale-target-resource: all # Enable horizontal scaling - autoReplicas: true + tensor-fusion.ai/auto-replicas: 'true' ``` ### Detailed Configuration Using Workload Configuration File +* Vertical Scaling: Based on historical GPU resource usage data, the community VPA Histogram algorithm is employed. +The estimates generated by the VPA algorithm consist of Target, LowerBound, and UpperBound, corresponding by default to P90, P50, and P95 usage levels. +If the current resource usage falls outside the LowerBound and UpperBound range, a recommended value is generated. + +>[!NOTE] Note: If enable is not set to true, or if targetResource is empty, only resource recommendations will be generated, and the recommended values will not be applied in practice. + +* Cron Scaling: Based on standard cron expressions, scaling takes effect when `enable` is `true` and within the `start` and `end` time range. Outside this range, resources revert to the values specified when the workload was added. [Cron Expression Reference](https://en.wikipedia.org/wiki/Cron) + ```yaml autoScalingConfig: # Vertical scaling configuration @@ -60,9 +70,9 @@ autoScalingConfig: vram: 5Gi ``` -## Monitor Scaling Status +## Step 2. Monitor Scaling Status -### View GPU Resource Recommendations via TensorFusionWorkload Status +> The workload generates a corresponding `TensorFusionWorkload` resource object, and the fields in `Status` reflect the current scaling status in real time. ```yaml status: diff --git a/docs/zh/guide/recipes/configure-autoscaling.md b/docs/zh/guide/recipes/configure-autoscaling.md index 4dc61c7..4c7c584 100644 --- a/docs/zh/guide/recipes/configure-autoscaling.md +++ b/docs/zh/guide/recipes/configure-autoscaling.md @@ -1,20 +1,30 @@ # 配置AI应用自动扩缩容 -## 开启自动扩缩容 +## 步骤 1. 开启自动扩缩容 -### 添加Pod自动扩缩容注解进行简单配置 +### 添加Pod自动扩缩容注解 + +> 需配合工作负载注解使用:[创建工作负载](/zh/guide/recipes/create-workload#添加pod注解) ```yaml # 开启垂直扩缩容 - autoResources: true + tensor-fusion.ai/auto-resources: 'true' # 配置目标资源, 可填all|tflops|vram,若为空则只推荐不更新 - targetResource: all + tensor-fusion.ai/auto-scale-target-resource: all # 开启水平扩缩容 - autoReplicas: true + tensor-fusion.ai/auto-replicas: 'true' ``` ### 使用工作负载配置文件进行详细配置 +* 垂直扩缩容: 基于GPU资源历史使用量数据,采用社区VPA Histogram算法实现。 + VPA算法生成的估算值由Target、LowerBound、UpperBound组成,默认对应P90、P50、P95用量。 + 若当前资源用量在LowerBound和UpperBound范围外,则生成推荐值。 + +>[!NOTE] 注意:若enable不为true,或者targetResource为空,则只推荐资源值,不会实际应用资源推荐值 + +* 定时扩缩容: 基于标准的cron表达式,当`enable`为`true`,并且在`start`和`end`范围内生效,超出时间范围则恢复至添加工作负载时指定的资源值,[Cron表达式参考](https://en.wikipedia.org/wiki/Cron) + ```yaml autoScalingConfig: # 垂直扩缩容配置 @@ -23,17 +33,17 @@ autoScalingConfig: enable: true # 目标资源 targetResource: all - # 计算TFLOPS目标值百分位数, 默认值:0.9 + # 计算TFLOPS目标值百分位, 默认值:0.9 targetTflopsPercentile: 0.9 - # 计算TFLOPS下边界值百分位数,默认值:0.5 + # 计算TFLOPS下边界值百分位,默认值:0.5 lowerBoundTflopsPercentile: 0.5 - # 计算TFLOPS上边界值百分位数,默认值:0.95 + # 计算TFLOPS上边界值百分位,默认值:0.95 upperBoundTflopsPercentile: 0.95 - # 计算VRAM目标值百分位数,默认值:0.9 + # 计算VRAM目标值百分位,默认值:0.9 targetVramPercentile: 0.9 - # 计算VRAM下边界值百分位数,默认值:0.5 + # 计算VRAM下边界值百分位,默认值:0.5 lowerBoundVramPercentile: 0.5 - # 计算VRAM上边界值百分位数,默认值:0.95 + # 计算VRAM上边界值百分位,默认值:0.95 upperBoundVramPercentile: 0.95 # 请求估算值扩大系数 默认值:0.15 requestMarginFraction: 0.15 @@ -43,7 +53,7 @@ autoScalingConfig: # 定时扩缩容配置 cronScalingRules: # 是否启用该规则 - - enable: True + - enable: true # 规则名称 name: "test" # 规则生效起始时间 @@ -60,9 +70,9 @@ autoScalingConfig: vram: 5Gi ``` -## 观测扩缩容状态 +## 步骤 2. 观测扩缩容状态 -### 通过TensorFusionWorkload Status查看GPU资源推荐值 +> 工作负载会生成对应的`TensorFusionWorkload`资源对象,`Status`中的字段会实时反应当前扩缩容状态 ```yaml status: