|
| 1 | +# Shared Pathways Service |
| 2 | + |
| 3 | +The Shared Pathways Service accelerates developer iteration by providing a |
| 4 | +persistent, multi-tenant TPU environment. This decouples service creation from |
| 5 | +the development loop, allowing JAX clients to connect on-demand from a familiar |
| 6 | +local environment (like a laptop or cloud VM) to a long-running Pathways |
| 7 | +service that manages scheduling and error handling. |
| 8 | + |
| 9 | +## Requirements |
| 10 | + |
| 11 | +Make sure that your GKE cluster is running the Resource Manager and Worker pods. |
| 12 | +You can follow the steps |
| 13 | +<a href="https://docs.cloud.google.com/ai-hypercomputer/docs/workloads/pathways-on-cloud/troubleshooting-pathways#health_monitoring" target="_blank">here</a> |
| 14 | +to confirm the status of these pods. If you haven't started the Pathways pods |
| 15 | +yet, you can use [pw-service-example.yaml](yamls/pw-service-example.yaml). |
| 16 | +Make sure to modify the following values to deploy these pods: |
| 17 | + |
| 18 | +- A unique Jobset name for the cluster's Pathways pods |
| 19 | +- GCS bucket path |
| 20 | +- TPU type and topology |
| 21 | +- Number of slices |
| 22 | + |
| 23 | +These fields are highlighted in the YAML file with trailing comments for easier |
| 24 | +understanding. |
| 25 | + |
| 26 | +## Instructions |
| 27 | + |
| 28 | +1. Clone `pathwaysutils`. |
| 29 | + |
| 30 | +`git clone https://github.com/AI-Hypercomputer/pathways-utils.git` |
| 31 | + |
| 32 | +2. Install portpicker |
| 33 | + |
| 34 | +`pip install portpicker` |
| 35 | + |
| 36 | +3. Import `isc_pathways` and move your workload under |
| 37 | +`with isc_pathways.connect()` statement. Refer to |
| 38 | +[run_connect_example.py](run_connect_example.py) for reference. Example code: |
| 39 | + |
| 40 | +``` |
| 41 | + from pathwaysutils.experimental.shared_pathways_service import isc_pathways |
| 42 | +
|
| 43 | + with isc_pathways.connect( |
| 44 | + cluster="my-cluster", |
| 45 | + project="my-project", |
| 46 | + region="region", |
| 47 | + gcs_bucket="gs://user-bucket", |
| 48 | + pathways_service="pathways-cluster-pathways-head-0-0.pathways-cluster:29001", |
| 49 | + expected_tpu_instances={"tpuv6e:2x2": 2}, |
| 50 | + ) as tm: |
| 51 | + import jax.numpy as jnp |
| 52 | + import pathwaysutils |
| 53 | + import pprint |
| 54 | +
|
| 55 | + pathwaysutils.initialize() |
| 56 | + orig_matrix = jnp.zeros(5) |
| 57 | + ... |
| 58 | +``` |
| 59 | + |
| 60 | +The connect block will deploy a proxy pod dedicated to your client and connect |
| 61 | +your local runtime environment to the proxy pod via port-forwarding. |
0 commit comments