Skip to content

Commit 95ced8a

Browse files
authored
Fix OpenXLA workflow commit (#53)
* Update default OpenXLA revision (by ChatGPT) * Fix OpenXLA commit date comment (by ChatGPT) * Doc agent flow + fix ROCm patch (by ChatGPT) * Sync ROCm pin + doc note (by ChatGPT)
1 parent b136f54 commit 95ced8a

File tree

4 files changed

+85
-18
lines changed

4 files changed

+85
-18
lines changed

.github/workflows/_build.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,8 @@ on:
1111
type: string
1212

1313
env:
14-
XLA_COMMIT: ${{ inputs.xla_commit || 'f238b48769d2ab8d62eeb09b5d31a972dfa4841a' }} # main from 2025-12-12
15-
ROCM_XLA_COMMIT: ${{ inputs.rocm_xla_commit || '06402b44669c52956732678772104dcb85c53806' }} # rocm-jaxlib-v0.8.0
14+
XLA_COMMIT: ${{ inputs.xla_commit || '9caad7b3520548142ccd6a2d528a06be6c474de1' }} # main from 2025-12-23
15+
ROCM_XLA_COMMIT: ${{ inputs.rocm_xla_commit || '85e2c6cc8ed98027e8542e5d331d3f8b3836f557' }} # rocm-jaxlib-v0.8.0 from 2025-12-23
1616
TF_ROCM_AMDGPU_TARGETS: "gfx900,gfx906,gfx908,gfx90a,gfx942,gfx1030,gfx1100"
1717

1818
jobs:

AGENTS.md

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
# AGENT PLAYBOOK
2+
3+
This repository builds PJRT artifacts for the OpenXLA project.
4+
When CI breaks because upstream changes invalidate our pinned commit or patches, follow this checklist to repair it.
5+
6+
## 1. Workspace basics
7+
- Use SSH for GitHub access (`git@github.com:zml/pjrt-artifacts.git`).
8+
- Workflow lives in `.github/workflows/_build.yaml`; default commit pins must always track a revision verified locally.
9+
- Custom patches live under `openxla/patches/{upstream,rocm}` and are applied before every build.
10+
11+
## 2. Checking and Fixing nightly builds
12+
Nightly runs are defined in `.github/workflows/nightly.yaml`. Always read that file first—its matrix specifies the authoritative OpenXLA (`XLA_COMMIT`) and ROCm (`ROCM_XLA_COMMIT`) SHAs that nightly builds expect. When fixing nightly failures, validate both repos at those exact SHAs unless the task explicitly asks to move them forward.
13+
1. Confirm the listed SHAs build:
14+
- Clone `openxla/xla` at `XLA_COMMIT` and apply `openxla/patches/upstream/*.patch`.
15+
- Clone `ROCm/xla` at `ROCM_XLA_COMMIT` and apply `openxla/patches/rocm/*.patch` when debugging ROCm nightly failures.
16+
2. Only update the commit values in `_build.yaml`/`nightly.yaml` after verifying both the upstream checkout and `bazel` commands succeed.
17+
3. When you do update them, always write the exact SHA (not branch names) and mirror those SHAs in `.github/workflows/_build.yaml`'s `XLA_COMMIT`/`ROCM_XLA_COMMIT` so the reusable `_build` workflow matches nightly.
18+
19+
## 3. Updating to a new OpenXLA commit
20+
1. **Clone upstream via SSH**
21+
```bash
22+
git clone git@github.com:openxla/xla.git openxla-work
23+
cd openxla-work
24+
git checkout <target commit or main>
25+
```
26+
2. **Apply upstream patches**
27+
```bash
28+
git apply ../openxla/patches/upstream/*.patch
29+
```
30+
- If any patch fails, edit it inside `openxla/patches/upstream/…` so that it applies cleanly to the new commit (e.g., update context or add missing loads).
31+
- Keep patches minimal; they must stay in sync with the repo copy committed later.
32+
3. **Run a quick Bazel validation**
33+
```bash
34+
bazel --batch --output_base="$(pwd)/.bazel_output" query //xla/pjrt/c:pjrt_c_api_cpu_plugin
35+
```
36+
- `--output_base` inside the repo keeps permissions simple.
37+
- If Bazel refuses to start (port permissions or missing deps), adjust the patches or add Bazelrc overrides locally until the query succeeds.
38+
4. **Record extra tweaks**
39+
- When a patch adds new Bazel macros (e.g., `http_archive`), ensure the patch itself contains any required loads so future checkouts do not need manual edits.
40+
41+
## 4. Updating pjrt-artifacts
42+
1. **Edit `.github/workflows/_build.yaml`**
43+
- Set `XLA_COMMIT` (and ROCm if needed) to the validated SHA.
44+
- Update the inline comment with the actual commit date (grab via `git show -s --format=%ci <sha>`).
45+
2. **Update patches**
46+
- Copy any edits from the temporary checkout back into the corresponding patch files under `openxla/patches/...`.
47+
- Keep diffs ASCII and deterministic; number patches sequentially.
48+
3. **Document testing**
49+
- Always re-run the Bazel query after editing patches to confirm they still apply.
50+
- If build steps are required (cuda/rocm), mention the configs in the PR body even if not run locally.
51+
52+
## 5. Commit & PR etiquette
53+
1. Stage relevant files only (workflow + updated patches).
54+
2. Commit with attribution, e.g., `Fix OpenXLA commit date comment (by ChatGPT)`.
55+
3. Push the branch (`git push origin HEAD`) and create a PR using `gh pr create --base master --head <branch>`.
56+
4. PR body template:
57+
```
58+
## Summary
59+
- describe workflow/patch changes
60+
61+
## Testing
62+
- bazel --batch --output_base="$(pwd)/.bazel_output" query //xla/pjrt/c:pjrt_c_api_cpu_plugin
63+
```
64+
65+
## 6. Troubleshooting notes
66+
- **Bazel output permissions:** If `/var/tmp` is read-only, override `--output_base`.
67+
- **Missing symbols in patches:** Make sure required `load()` statements are added; otherwise Bazel commands will fail with “name 'http_archive' is not defined.”
68+
- **SSH requirement:** The environment often rewrites HTTPS URLs to SSH, so always prefer `git@github.com:` URIs to avoid cloning failures.
69+
70+
Keep this file current whenever the workflow changes. Future agents should be able to follow these steps end-to-end without extra context.

openxla/patches/rocm/0003-Fixed-issue-with-passing-arch_name-in-CreateTritonPi.patch

Lines changed: 2 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -5,22 +5,13 @@ Subject: [PATCH] Fixed issue with passing arch_name in CreateTritonPipeline
55
for ROCm.
66

77
---
8-
xla/backends/gpu/codegen/triton/compilation_pipeline_rocm.cc | 4 ++--
9-
1 file changed, 2 insertions(+), 2 deletions(-)
8+
xla/backends/gpu/codegen/triton/compilation_pipeline_rocm.cc | 2 +-
9+
1 file changed, 1 insertion(+), 1 deletion(-)
1010

1111
diff --git a/xla/backends/gpu/codegen/triton/compilation_pipeline_rocm.cc b/xla/backends/gpu/codegen/triton/compilation_pipeline_rocm.cc
1212
index d2ef805ad4..666ce37933 100644
1313
--- a/xla/backends/gpu/codegen/triton/compilation_pipeline_rocm.cc
1414
+++ b/xla/backends/gpu/codegen/triton/compilation_pipeline_rocm.cc
15-
@@ -66,7 +66,7 @@ static void MakeTTGIR(mlir::OpPassManager* pm,
16-
pm->addPass(mt::gpu::createTritonGPURemoveLayoutConversions());
17-
pm->addPass(mt::gpu::createTritonGPUOptimizeThreadLocality());
18-
// TODO ROCm Pass rocm_cc.gfx_version() after fixing issue with fmfa
19-
- pm->addPass(mlir::createTritonAMDGPUAccelerateMatmul({arch_name}));
20-
+ pm->addPass(mlir::createTritonAMDGPUAccelerateMatmul({rocm_cc.gfx_version()}));
21-
pm->addPass(mt::gpu::createTritonGPURemoveLayoutConversions());
22-
// TODO ROCm Check if we want to compare MI100 and greater
23-
pm->addPass(mlir::createTritonAMDGPUOptimizeEpilogue());
2415
@@ -109,7 +109,7 @@ static void MakeTTGIR(mlir::OpPassManager* pm,
2516
if (/*use_buffer_ops=*/false) { // Not enabled by default.
2617
pm->addPass(mlir::createTritonAMDGPUCanonicalizePointers());
@@ -32,4 +23,3 @@ index d2ef805ad4..666ce37933 100644
3223
pm->addPass(mlir::createCanonicalizerPass());
3324
--
3425
2.43.0
35-

openxla/patches/upstream/0002-Use-hermetic-cc-toolchain-for-Linux-CPU-use-glibc-2..patch

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,14 +4,22 @@ Date: Wed, 5 Nov 2025 14:45:51 +0100
44
Subject: [PATCH 2/3] Use hermetic cc toolchain for Linux CPU (use glibc 2.31)
55

66
---
7-
WORKSPACE | 22 ++++++++++++++++++++++
8-
1 file changed, 22 insertions(+)
7+
WORKSPACE | 23 +++++++++++++++++++++++
8+
1 file changed, 23 insertions(+)
99

1010
diff --git a/WORKSPACE b/WORKSPACE
1111
index 8d7f1384d8..8cd8df86a0 100644
1212
--- a/WORKSPACE
1313
+++ b/WORKSPACE
14-
@@ -33,6 +33,28 @@ register_toolchains("@rules_ml_toolchain//cc:linux_aarch64_linux_aarch64")
14+
@@ -2,6 +2,7 @@
15+
workspace(name = "xla")
16+
17+
load("//third_party:repo.bzl", "tf_http_archive", "tf_mirror_urls")
18+
+load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
19+
20+
# Initialize toolchains for ML projects.
21+
#
22+
@@ -33,6 +34,28 @@ register_toolchains("@rules_ml_toolchain//cc:linux_aarch64_linux_aarch64")
1523

1624
register_toolchains("@rules_ml_toolchain//cc:linux_aarch64_linux_aarch64_cuda")
1725

@@ -42,4 +50,3 @@ index 8d7f1384d8..8cd8df86a0 100644
4250
# The cascade of load() statements and xla_workspace?() calls works around the
4351
--
4452
2.50.1 (Apple Git-155)
45-

0 commit comments

Comments
 (0)