Skip to content

Commit b80acea

Browse files
committed
Reword
1 parent 43fa093 commit b80acea

File tree

1 file changed

+14
-37
lines changed
  • pathwaysutils/experimental/shared_pathways_service

1 file changed

+14
-37
lines changed

pathwaysutils/experimental/shared_pathways_service/README.md

Lines changed: 14 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ service that manages scheduling and error handling.
1111
1. You have a GKE cluster with atleast 1 slice of `v6e-4` or `v6e-8`. Note that the Shared Pathways Service supports
1212
single-host Trillium slices only, this support will be extended soon.
1313

14+
<a name="pw-service-yaml"></a>
1415
2. Start the Shared Pathways Service by using [pw-service-example.yaml](yamls/pw-service-example.yaml).
1516
Make sure to modify the following values to deploy the Pathways pods:
1617

@@ -35,53 +36,29 @@ $ gcloud container clusters get-credentials $CLUSTER_NAME --region $REGION --pro
3536
# Check the status of RM and Worker pods.
3637
$ kubectl get pods
3738
38-
# Sample expected output
39+
# Sample expected output (1 Head pod and 1 or more Worker pods)
3940
NAME READY STATUS RESTARTS AGE
40-
pathways-cluster-pathways-head-0-0-zzmn2 2/2 Running 0 3m49s
41-
pathways-cluster-worker-0-0-bdzq4 1/1 Running 0 3m36s
42-
pathways-cluster-worker-1-0-km2rf 1/1 Running 0 3m36s
41+
pathways-cluster-pathways-head-0-0-zzmn2 2/2 Running 0 3m49s # HEAD POD
42+
pathways-cluster-worker-0-0-bdzq4 1/1 Running 0 3m36s # WORKER 0
43+
pathways-cluster-worker-1-0-km2rf 1/1 Running 0 3m36s # WORKER 1
4344
```
4445

45-
You can also verify the pod status by looking at the project logs. Look for the below substring for the respective pod
46-
type.
46+
You can also verify the pod status by running below commands or by checking the project logs (Detailed instructions
47+
for the logs are <a href="https://docs.cloud.google.com/ai-hypercomputer/docs/workloads/pathways-on-cloud/troubleshooting-pathways#health_monitoring" target="_blank">here</a>).
4748

48-
(Detailed instructions are <a href="https://docs.cloud.google.com/ai-hypercomputer/docs/workloads/pathways-on-cloud/troubleshooting-pathways#health_monitoring" target="_blank">here</a>)
49-
50-
```
51-
# Set the environment variables
52-
$ HEAD_POD_NAME=pathways-cluster-pathways-head-0-0-zzmn2
53-
$ WORKER0_POD_NAME=pathways-cluster-worker-0-0-bdzq4
54-
$ WORKER1_POD_NAME=pathways-cluster-worker-1-0-km2rf
5549
```
50+
# e.g., pathways-cluster
51+
$ JOBSET_NAME=<your-jobset-name> # same as you used in [pw-service-example.yaml](#pw-service-yaml)
5652
57-
- RM
58-
```
59-
$ kubectl logs $HEAD_POD_NAME --container pathways-rm
60-
...
61-
I1208 20:10:04.992524 ...] Pathways Server serving on [::]:29001
62-
...
63-
I1208 20:10:23.848070 ...] *** 2/2 Pathways Slices Now Ready
64-
```
53+
# e.g., pathways-cluster-pathways-head-0-0-zzmn2
54+
$ HEAD_POD_NAME=$(kubectl get pods --selector=jobset.sigs.k8s.io/jobset-name=${JOBSET_NAME} -o jsonpath='{.items[?(@.status.phase=="Running")].metadata.name}' | sed 's/ /\n/g' | grep head)
6555
66-
- Worker
67-
```
68-
$ kubectl logs $WORKER0_POD_NAME --container pathways-worker
69-
...
70-
I1208 20:10:23.838022 ...] Pathways Server serving on [::]:29005
71-
...
72-
I1208 20:10:25.249167 ...] MegaScale transport initialized.
73-
I1208 20:10:25.249172 ...] MegaScale transport init succeeded.
74-
75-
$ kubectl logs $WORKER1_POD_NAME --container pathways-worker
76-
...
77-
I1208 20:10:23.579361 ...] Pathways Server serving on [::]:29005
78-
I1208 20:10:24.994411 ...] MegaScale transport initialized.
79-
I1208 20:10:24.994416 ...] MegaScale transport init succeeded.
80-
...
56+
# e.g., pathways-cluster-worker-0-0-bdzq4
57+
$ WORKER0_POD_NAME=$(kubectl get pods --selector=jobset.sigs.k8s.io/jobset-name=${JOBSET_NAME} -o jsonpath='{.items[?(@.status.phase=="Running")].metadata.name}' | sed 's/ /\n/g' | grep 'worker-0-0-')
8158
```
8259

8360
<a name="find-pw-service"></a>
84-
4. Find the address of the Pathways service.
61+
4. Find the address of the Pathways service from the logs. We check the worker pod logs in the below command.
8562
```
8663
$ kubectl logs $WORKER0_POD_NAME --container pathways-worker | grep "\-\-resource_manager_address"
8764
I1208 20:10:18.148825 ...] argv[2]: '--resource_manager_address=pathways-cluster-pathways-head-0-0.pathways-cluster:29001'

0 commit comments

Comments
 (0)