Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 17 additions & 7 deletions docs/en/guide/recipes/configure-autoscaling.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,30 @@
# Configure AutoScaling for AI Workloads

## Enable AutoScaling
## Step 1. Enable AutoScaling

### Simple Configuration with Pod AutoScaling Annotations
### Add Pod AutoScaling Annotations

> To be used in conjunction with workload annotations:[Create Workload](/guide/recipes/create-workload#add-pod-annotations)

```yaml
# Enable vertical scaling
autoResources: true
tensor-fusion.ai/auto-resources: 'true'
# Configure target resource, options: all|tflops|vram, if empty only provides recommendations
targetResource: all
tensor-fusion.ai/auto-scale-target-resource: all
# Enable horizontal scaling
autoReplicas: true
tensor-fusion.ai/auto-replicas: 'true'
```

### Detailed Configuration Using Workload Configuration File

* Vertical Scaling: Based on historical GPU resource usage data, the community VPA Histogram algorithm is employed.
The estimates generated by the VPA algorithm consist of Target, LowerBound, and UpperBound, corresponding by default to P90, P50, and P95 usage levels.
If the current resource usage falls outside the LowerBound and UpperBound range, a recommended value is generated.

>[!NOTE] Note: If enable is not set to true, or if targetResource is empty, only resource recommendations will be generated, and the recommended values will not be applied in practice.

* Cron Scaling: Based on standard cron expressions, scaling takes effect when `enable` is `true` and within the `start` and `end` time range. Outside this range, resources revert to the values specified when the workload was added. [Cron Expression Reference](https://en.wikipedia.org/wiki/Cron)

```yaml
autoScalingConfig:
# Vertical scaling configuration
Expand Down Expand Up @@ -60,9 +70,9 @@ autoScalingConfig:
vram: 5Gi
```

## Monitor Scaling Status
## Step 2. Monitor Scaling Status

### View GPU Resource Recommendations via TensorFusionWorkload Status
> The workload generates a corresponding `TensorFusionWorkload` resource object, and the fields in `Status` reflect the current scaling status in real time.

```yaml
status:
Expand Down
38 changes: 24 additions & 14 deletions docs/zh/guide/recipes/configure-autoscaling.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,30 @@
# 配置AI应用自动扩缩容

## 开启自动扩缩容
## 步骤 1. 开启自动扩缩容

### 添加Pod自动扩缩容注解进行简单配置
### 添加Pod自动扩缩容注解

> 需配合工作负载注解使用:[创建工作负载](/zh/guide/recipes/create-workload#添加pod注解)

```yaml
# 开启垂直扩缩容
autoResources: true
tensor-fusion.ai/auto-resources: 'true'
# 配置目标资源, 可填all|tflops|vram,若为空则只推荐不更新
targetResource: all
tensor-fusion.ai/auto-scale-target-resource: all
# 开启水平扩缩容
autoReplicas: true
tensor-fusion.ai/auto-replicas: 'true'
```

### 使用工作负载配置文件进行详细配置

* 垂直扩缩容: 基于GPU资源历史使用量数据,采用社区VPA Histogram算法实现。
VPA算法生成的估算值由Target、LowerBound、UpperBound组成,默认对应P90、P50、P95用量。
若当前资源用量在LowerBound和UpperBound范围外,则生成推荐值。

>[!NOTE] 注意:若enable不为true,或者targetResource为空,则只推荐资源值,不会实际应用资源推荐值

* 定时扩缩容: 基于标准的cron表达式,当`enable`为`true`,并且在`start`和`end`范围内生效,超出时间范围则恢复至添加工作负载时指定的资源值,[Cron表达式参考](https://en.wikipedia.org/wiki/Cron)

```yaml
autoScalingConfig:
# 垂直扩缩容配置
Expand All @@ -23,17 +33,17 @@ autoScalingConfig:
enable: true
# 目标资源
targetResource: all
# 计算TFLOPS目标值百分位数, 默认值:0.9
# 计算TFLOPS目标值百分位, 默认值:0.9
targetTflopsPercentile: 0.9
# 计算TFLOPS下边界值百分位数,默认值:0.5
# 计算TFLOPS下边界值百分位,默认值:0.5
lowerBoundTflopsPercentile: 0.5
# 计算TFLOPS上边界值百分位数,默认值:0.95
# 计算TFLOPS上边界值百分位,默认值:0.95
upperBoundTflopsPercentile: 0.95
# 计算VRAM目标值百分位数,默认值:0.9
# 计算VRAM目标值百分位,默认值:0.9
targetVramPercentile: 0.9
# 计算VRAM下边界值百分位数,默认值:0.5
# 计算VRAM下边界值百分位,默认值:0.5
lowerBoundVramPercentile: 0.5
# 计算VRAM上边界值百分位数,默认值:0.95
# 计算VRAM上边界值百分位,默认值:0.95
upperBoundVramPercentile: 0.95
# 请求估算值扩大系数 默认值:0.15
requestMarginFraction: 0.15
Expand All @@ -43,7 +53,7 @@ autoScalingConfig:
# 定时扩缩容配置
cronScalingRules:
# 是否启用该规则
- enable: True
- enable: true
# 规则名称
name: "test"
# 规则生效起始时间
Expand All @@ -60,9 +70,9 @@ autoScalingConfig:
vram: 5Gi
```

## 观测扩缩容状态
## 步骤 2. 观测扩缩容状态

### 通过TensorFusionWorkload Status查看GPU资源推荐值
> 工作负载会生成对应的`TensorFusionWorkload`资源对象,`Status`中的字段会实时反应当前扩缩容状态

```yaml
status:
Expand Down