Skip to content

Commit 64a5990

Browse files
authored
Update README.md (Overview- what XPK is, when to use it) (#468)
Changed the "Overview" section - what XPK is, when to use it. It is now aligned with the positioning described in official google docs: https://cloud.google.com/ai-hypercomputer/docs/create/gke-ai-hypercompute#cluster-creation-options
1 parent e4d6749 commit 64a5990

File tree

1 file changed

+6
-11
lines changed

1 file changed

+6
-11
lines changed

README.md

Lines changed: 6 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -21,16 +21,11 @@
2121

2222
# Overview
2323

24-
xpk (Accelerated Processing Kit, pronounced x-p-k,) is a software tool to help
25-
Cloud developers to orchestrate training jobs on accelerators such as TPUs and
26-
GPUs on GKE. xpk handles the "multihost pods" of TPUs, GPUs (HGX H100) and CPUs
27-
(n2-standard-32) as first class citizens.
24+
XPK (Accelerated Processing Kit, pronounced x-p-k) is a command line interface that simplifies cluster creation and workload execution on Google Kubernetes Engine (GKE). XPK generates preconfigured, training-optimized clusters and allows easy workload scheduling without any Kubernetes expertise.
2825

29-
xpk decouples provisioning capacity from running jobs. There are two structures:
30-
clusters (provisioned VMs) and workloads (training jobs). Clusters represent the
31-
physical resources you have available. Workloads represent training jobs -- at
32-
any time some of these will be completed, others will be running and some will
33-
be queued, waiting for cluster resources to become available.
26+
XPK is recommended for quick creation of GKE clusters for proofs of concepts and testing.
27+
28+
XPK decouples provisioning capacity from running jobs. There are two structures: clusters (provisioned VMs) and workloads (training jobs). Clusters represent the physical resources you have available. Workloads represent training jobs -- at any time some of these will be completed, others will be running and some will be queued, waiting for cluster resources to become available.
3429

3530
The ideal workflow starts by provisioning the clusters for all of the ML
3631
hardware you have reserved. Then, without re-provisioning, submit jobs as
@@ -41,7 +36,7 @@ return the hardware back to the shared pool when they complete, developers can
4136
achieve better use of finite hardware resources. And automated tests can run
4237
overnight while resources tend to be underutilized.
4338

44-
xpk supports the following TPU types:
39+
XPK supports the following TPU types:
4540
* v4
4641
* v5e
4742
* v5p
@@ -57,7 +52,7 @@ and the following GPU types:
5752
and the following CPU types:
5853
* n2-standard-32
5954

60-
xpk also supports [Google Cloud Storage solutions](#storage):
55+
XPK also supports [Google Cloud Storage solutions](#storage):
6156
* [Cloud Storage FUSE](#fuse)
6257
* [Filestore](#filestore)
6358
* [Parallelstore](#parallelstore)

0 commit comments

Comments
 (0)