You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+6-11Lines changed: 6 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,16 +21,11 @@
21
21
22
22
# Overview
23
23
24
-
xpk (Accelerated Processing Kit, pronounced x-p-k,) is a software tool to help
25
-
Cloud developers to orchestrate training jobs on accelerators such as TPUs and
26
-
GPUs on GKE. xpk handles the "multihost pods" of TPUs, GPUs (HGX H100) and CPUs
27
-
(n2-standard-32) as first class citizens.
24
+
XPK (Accelerated Processing Kit, pronounced x-p-k) is a command line interface that simplifies cluster creation and workload execution on Google Kubernetes Engine (GKE). XPK generates preconfigured, training-optimized clusters and allows easy workload scheduling without any Kubernetes expertise.
28
25
29
-
xpk decouples provisioning capacity from running jobs. There are two structures:
30
-
clusters (provisioned VMs) and workloads (training jobs). Clusters represent the
31
-
physical resources you have available. Workloads represent training jobs -- at
32
-
any time some of these will be completed, others will be running and some will
33
-
be queued, waiting for cluster resources to become available.
26
+
XPK is recommended for quick creation of GKE clusters for proofs of concepts and testing.
27
+
28
+
XPK decouples provisioning capacity from running jobs. There are two structures: clusters (provisioned VMs) and workloads (training jobs). Clusters represent the physical resources you have available. Workloads represent training jobs -- at any time some of these will be completed, others will be running and some will be queued, waiting for cluster resources to become available.
34
29
35
30
The ideal workflow starts by provisioning the clusters for all of the ML
36
31
hardware you have reserved. Then, without re-provisioning, submit jobs as
@@ -41,7 +36,7 @@ return the hardware back to the shared pool when they complete, developers can
41
36
achieve better use of finite hardware resources. And automated tests can run
42
37
overnight while resources tend to be underutilized.
43
38
44
-
xpk supports the following TPU types:
39
+
XPK supports the following TPU types:
45
40
* v4
46
41
* v5e
47
42
* v5p
@@ -57,7 +52,7 @@ and the following GPU types:
57
52
and the following CPU types:
58
53
* n2-standard-32
59
54
60
-
xpk also supports [Google Cloud Storage solutions](#storage):
55
+
XPK also supports [Google Cloud Storage solutions](#storage):
0 commit comments