Skip to content

Commit 7f0dadc

Browse files
Kyrie336Lei Guo
andauthored
Support Metax sGPU Qos Policy (#1123)
Signed-off-by: Lei Guo <[email protected]> Co-authored-by: Lei Guo <[email protected]>
1 parent a90081d commit 7f0dadc

37 files changed

+873
-86
lines changed

docs/metax-support.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ metadata:
4040
spec:
4141
containers:
4242
- name: ubuntu-container
43-
image: cr.metax-tech.com/public-ai-release/c500/colossalai:2.24.0.5-py38-ubuntu20.04-amd64
43+
image: ubuntu:22.04
4444
imagePullPolicy: IfNotPresent
4545
command: ["sleep","infinity"]
4646
resources:

docs/metax-support_cn.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ metadata:
3838
spec:
3939
containers:
4040
- name: ubuntu-container
41-
image: cr.metax-tech.com/public-ai-release/c500/colossalai:2.24.0.5-py38-ubuntu20.04-amd64
41+
image: ubuntu:22.04
4242
imagePullPolicy: IfNotPresent
4343
command: ["sleep","infinity"]
4444
resources:

examples/metax/gpu/binpack.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ metadata:
77
spec:
88
containers:
99
- name: ubuntu-container
10-
image: cr.metax-tech.com/public-ai-release/c500/colossalai:2.24.0.5-py38-ubuntu20.04-amd64
10+
image: ubuntu:22.04
1111
imagePullPolicy: IfNotPresent
1212
command: ["sleep","infinity"]
1313
resources:

examples/metax/gpu/default_use.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ metadata:
55
spec:
66
containers:
77
- name: ubuntu-container
8-
image: cr.metax-tech.com/public-ai-release/c500/colossalai:2.24.0.5-py38-ubuntu20.04-amd64
8+
image: ubuntu:22.04
99
imagePullPolicy: IfNotPresent
1010
command: ["sleep","infinity"]
1111
resources:

examples/metax/gpu/spread.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ metadata:
77
spec:
88
containers:
99
- name: ubuntu-container
10-
image: cr.metax-tech.com/public-ai-release/c500/colossalai:2.24.0.5-py38-ubuntu20.04-amd64
10+
image: ubuntu:22.04
1111
imagePullPolicy: IfNotPresent
1212
command: ["sleep","infinity"]
1313
resources:

examples/metax/sgpu/allocate_exclusive.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ metadata:
55
spec:
66
containers:
77
- name: ubuntu-container
8-
image: cr.metax-tech.com/public-ai-release/c500/colossalai:2.24.0.5-py38-ubuntu20.04-amd64
8+
image: ubuntu:22.04
99
imagePullPolicy: IfNotPresent
1010
command: ["sleep","infinity"]
1111
resources:

examples/metax/sgpu/allocate_specific_gpu.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ metadata:
77
spec:
88
containers:
99
- name: ubuntu-container
10-
image: cr.metax-tech.com/public-ai-release/c500/colossalai:2.24.0.5-py38-ubuntu20.04-amd64
10+
image: ubuntu:22.04
1111
imagePullPolicy: IfNotPresent
1212
command: ["sleep","infinity"]
1313
resources:
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
apiVersion: v1
2+
kind: Pod
3+
metadata:
4+
name: gpu-pod
5+
annotations:
6+
metax-tech.com/sgpu-qos-policy: "best-effort" # allocate specific qos sgpu
7+
spec:
8+
containers:
9+
- name: ubuntu-container
10+
image: ubuntu:22.04
11+
imagePullPolicy: IfNotPresent
12+
command: ["sleep","infinity"]
13+
resources:
14+
limits:
15+
metax-tech.com/sgpu: 1 # requesting 1 GPU
16+
metax-tech.com/vcore: 60 # each GPU use 60% of total compute cores
17+
metax-tech.com/vmemory: 4 # each GPU require 4 GiB device memory

examples/metax/sgpu/allocate_vmemory_MiB.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ metadata:
55
spec:
66
containers:
77
- name: ubuntu-container
8-
image: cr.metax-tech.com/public-ai-release/c500/colossalai:2.24.0.5-py38-ubuntu20.04-amd64
8+
image: ubuntu:22.04
99
imagePullPolicy: IfNotPresent
1010
command: ["sleep","infinity"]
1111
resources:

examples/metax/sgpu/default_use.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ metadata:
55
spec:
66
containers:
77
- name: ubuntu-container
8-
image: cr.metax-tech.com/public-ai-release/c500/colossalai:2.24.0.5-py38-ubuntu20.04-amd64
8+
image: ubuntu:22.04
99
imagePullPolicy: IfNotPresent
1010
command: ["sleep","infinity"]
1111
resources:

0 commit comments

Comments
 (0)