|
| 1 | +# AGENT PLAYBOOK |
| 2 | + |
| 3 | +This repository builds PJRT artifacts for the OpenXLA project. |
| 4 | +When CI breaks because upstream changes invalidate our pinned commit or patches, follow this checklist to repair it. |
| 5 | + |
| 6 | +## 1. Workspace basics |
| 7 | +- Use SSH for GitHub access (`git@github.com:zml/pjrt-artifacts.git`). |
| 8 | +- Workflow lives in `.github/workflows/_build.yaml`; default commit pins must always track a revision verified locally. |
| 9 | +- Custom patches live under `openxla/patches/{upstream,rocm}` and are applied before every build. |
| 10 | + |
| 11 | +## 2. Checking and Fixing nightly builds |
| 12 | +Nightly runs are defined in `.github/workflows/nightly.yaml`. Always read that file first—its matrix specifies the authoritative OpenXLA (`XLA_COMMIT`) and ROCm (`ROCM_XLA_COMMIT`) SHAs that nightly builds expect. When fixing nightly failures, validate both repos at those exact SHAs unless the task explicitly asks to move them forward. |
| 13 | +1. Confirm the listed SHAs build: |
| 14 | + - Clone `openxla/xla` at `XLA_COMMIT` and apply `openxla/patches/upstream/*.patch`. |
| 15 | + - Clone `ROCm/xla` at `ROCM_XLA_COMMIT` and apply `openxla/patches/rocm/*.patch` when debugging ROCm nightly failures. |
| 16 | +2. Only update the commit values in `_build.yaml`/`nightly.yaml` after verifying both the upstream checkout and `bazel` commands succeed. |
| 17 | +3. When you do update them, always write the exact SHA (not branch names) and mirror those SHAs in `.github/workflows/_build.yaml`'s `XLA_COMMIT`/`ROCM_XLA_COMMIT` so the reusable `_build` workflow matches nightly. |
| 18 | + |
| 19 | +## 3. Updating to a new OpenXLA commit |
| 20 | +1. **Clone upstream via SSH** |
| 21 | + ```bash |
| 22 | + git clone git@github.com:openxla/xla.git openxla-work |
| 23 | + cd openxla-work |
| 24 | + git checkout <target commit or main> |
| 25 | + ``` |
| 26 | +2. **Apply upstream patches** |
| 27 | + ```bash |
| 28 | + git apply ../openxla/patches/upstream/*.patch |
| 29 | + ``` |
| 30 | + - If any patch fails, edit it inside `openxla/patches/upstream/…` so that it applies cleanly to the new commit (e.g., update context or add missing loads). |
| 31 | + - Keep patches minimal; they must stay in sync with the repo copy committed later. |
| 32 | +3. **Run a quick Bazel validation** |
| 33 | + ```bash |
| 34 | + bazel --batch --output_base="$(pwd)/.bazel_output" query //xla/pjrt/c:pjrt_c_api_cpu_plugin |
| 35 | + ``` |
| 36 | + - `--output_base` inside the repo keeps permissions simple. |
| 37 | + - If Bazel refuses to start (port permissions or missing deps), adjust the patches or add Bazelrc overrides locally until the query succeeds. |
| 38 | +4. **Record extra tweaks** |
| 39 | + - When a patch adds new Bazel macros (e.g., `http_archive`), ensure the patch itself contains any required loads so future checkouts do not need manual edits. |
| 40 | + |
| 41 | +## 4. Updating pjrt-artifacts |
| 42 | +1. **Edit `.github/workflows/_build.yaml`** |
| 43 | + - Set `XLA_COMMIT` (and ROCm if needed) to the validated SHA. |
| 44 | + - Update the inline comment with the actual commit date (grab via `git show -s --format=%ci <sha>`). |
| 45 | +2. **Update patches** |
| 46 | + - Copy any edits from the temporary checkout back into the corresponding patch files under `openxla/patches/...`. |
| 47 | + - Keep diffs ASCII and deterministic; number patches sequentially. |
| 48 | +3. **Document testing** |
| 49 | + - Always re-run the Bazel query after editing patches to confirm they still apply. |
| 50 | + - If build steps are required (cuda/rocm), mention the configs in the PR body even if not run locally. |
| 51 | + |
| 52 | +## 5. Commit & PR etiquette |
| 53 | +1. Stage relevant files only (workflow + updated patches). |
| 54 | +2. Commit with attribution, e.g., `Fix OpenXLA commit date comment (by ChatGPT)`. |
| 55 | +3. Push the branch (`git push origin HEAD`) and create a PR using `gh pr create --base master --head <branch>`. |
| 56 | +4. PR body template: |
| 57 | + ``` |
| 58 | + ## Summary |
| 59 | + - describe workflow/patch changes |
| 60 | +
|
| 61 | + ## Testing |
| 62 | + - bazel --batch --output_base="$(pwd)/.bazel_output" query //xla/pjrt/c:pjrt_c_api_cpu_plugin |
| 63 | + ``` |
| 64 | + |
| 65 | +## 6. Troubleshooting notes |
| 66 | +- **Bazel output permissions:** If `/var/tmp` is read-only, override `--output_base`. |
| 67 | +- **Missing symbols in patches:** Make sure required `load()` statements are added; otherwise Bazel commands will fail with “name 'http_archive' is not defined.” |
| 68 | +- **SSH requirement:** The environment often rewrites HTTPS URLs to SSH, so always prefer `git@github.com:` URIs to avoid cloning failures. |
| 69 | + |
| 70 | +Keep this file current whenever the workflow changes. Future agents should be able to follow these steps end-to-end without extra context. |
0 commit comments