Fix OpenXLA workflow commit (#53)

cerisier · web-flow · commit 95ced8ade3a1 · 2026-01-06T13:15:47.000+01:00
* Update default OpenXLA revision (by ChatGPT)

* Fix OpenXLA commit date comment (by ChatGPT)

* Doc agent flow + fix ROCm patch (by ChatGPT)

* Sync ROCm pin + doc note (by ChatGPT)
diff --git a/.github/workflows/_build.yaml b/.github/workflows/_build.yaml
@@ -11,8 +11,8 @@ on:
         type: string
 
 env:
-  XLA_COMMIT: ${{ inputs.xla_commit || 'f238b48769d2ab8d62eeb09b5d31a972dfa4841a' }} # main from 2025-12-12
-  ROCM_XLA_COMMIT: ${{ inputs.rocm_xla_commit || '06402b44669c52956732678772104dcb85c53806' }} # rocm-jaxlib-v0.8.0
+  XLA_COMMIT: ${{ inputs.xla_commit || '9caad7b3520548142ccd6a2d528a06be6c474de1' }} # main from 2025-12-23
+  ROCM_XLA_COMMIT: ${{ inputs.rocm_xla_commit || '85e2c6cc8ed98027e8542e5d331d3f8b3836f557' }} # rocm-jaxlib-v0.8.0 from 2025-12-23
   TF_ROCM_AMDGPU_TARGETS: "gfx900,gfx906,gfx908,gfx90a,gfx942,gfx1030,gfx1100"
 
 jobs:
diff --git a/AGENTS.md b/AGENTS.md
@@ -0,0 +1,70 @@
+# AGENT PLAYBOOK
+
+This repository builds PJRT artifacts for the OpenXLA project.  
+When CI breaks because upstream changes invalidate our pinned commit or patches, follow this checklist to repair it.
+
+## 1. Workspace basics
+- Use SSH for GitHub access (`git@github.com:zml/pjrt-artifacts.git`).
+- Workflow lives in `.github/workflows/_build.yaml`; default commit pins must always track a revision verified locally.
+- Custom patches live under `openxla/patches/{upstream,rocm}` and are applied before every build.
+
+## 2. Checking and Fixing nightly builds
+Nightly runs are defined in `.github/workflows/nightly.yaml`. Always read that file first—its matrix specifies the authoritative OpenXLA (`XLA_COMMIT`) and ROCm (`ROCM_XLA_COMMIT`) SHAs that nightly builds expect. When fixing nightly failures, validate both repos at those exact SHAs unless the task explicitly asks to move them forward.
+1. Confirm the listed SHAs build:
+   - Clone `openxla/xla` at `XLA_COMMIT` and apply `openxla/patches/upstream/*.patch`.
+   - Clone `ROCm/xla` at `ROCM_XLA_COMMIT` and apply `openxla/patches/rocm/*.patch` when debugging ROCm nightly failures.
+2. Only update the commit values in `_build.yaml`/`nightly.yaml` after verifying both the upstream checkout and `bazel` commands succeed.
+3. When you do update them, always write the exact SHA (not branch names) and mirror those SHAs in `.github/workflows/_build.yaml`'s `XLA_COMMIT`/`ROCM_XLA_COMMIT` so the reusable `_build` workflow matches nightly.
+
+## 3. Updating to a new OpenXLA commit
+1. **Clone upstream via SSH**
+   ```bash
+   git clone git@github.com:openxla/xla.git openxla-work
+   cd openxla-work
+   git checkout <target commit or main>
+   ```
+2. **Apply upstream patches**
+   ```bash
+   git apply ../openxla/patches/upstream/*.patch
+   ```
+   - If any patch fails, edit it inside `openxla/patches/upstream/…` so that it applies cleanly to the new commit (e.g., update context or add missing loads).  
+   - Keep patches minimal; they must stay in sync with the repo copy committed later.
+3. **Run a quick Bazel validation**
+   ```bash
+   bazel --batch --output_base="$(pwd)/.bazel_output" query //xla/pjrt/c:pjrt_c_api_cpu_plugin
+   ```
+   - `--output_base` inside the repo keeps permissions simple.
+   - If Bazel refuses to start (port permissions or missing deps), adjust the patches or add Bazelrc overrides locally until the query succeeds.
+4. **Record extra tweaks**
+   - When a patch adds new Bazel macros (e.g., `http_archive`), ensure the patch itself contains any required loads so future checkouts do not need manual edits.
+
+## 4. Updating pjrt-artifacts
+1. **Edit `.github/workflows/_build.yaml`**
+   - Set `XLA_COMMIT` (and ROCm if needed) to the validated SHA.
+   - Update the inline comment with the actual commit date (grab via `git show -s --format=%ci <sha>`).
+2. **Update patches**
+   - Copy any edits from the temporary checkout back into the corresponding patch files under `openxla/patches/...`.
+   - Keep diffs ASCII and deterministic; number patches sequentially.
+3. **Document testing**
+   - Always re-run the Bazel query after editing patches to confirm they still apply.
+   - If build steps are required (cuda/rocm), mention the configs in the PR body even if not run locally.
+
+## 5. Commit & PR etiquette
+1. Stage relevant files only (workflow + updated patches).
+2. Commit with attribution, e.g., `Fix OpenXLA commit date comment (by ChatGPT)`.
+3. Push the branch (`git push origin HEAD`) and create a PR using `gh pr create --base master --head <branch>`.
+4. PR body template:
+   ```
+   ## Summary
+   - describe workflow/patch changes
+
+   ## Testing
+   - bazel --batch --output_base="$(pwd)/.bazel_output" query //xla/pjrt/c:pjrt_c_api_cpu_plugin
+   ```
+
+## 6. Troubleshooting notes
+- **Bazel output permissions:** If `/var/tmp` is read-only, override `--output_base`.
+- **Missing symbols in patches:** Make sure required `load()` statements are added; otherwise Bazel commands will fail with “name 'http_archive' is not defined.”
+- **SSH requirement:** The environment often rewrites HTTPS URLs to SSH, so always prefer `git@github.com:` URIs to avoid cloning failures.
+
+Keep this file current whenever the workflow changes. Future agents should be able to follow these steps end-to-end without extra context.
diff --git a/openxla/patches/rocm/0003-Fixed-issue-with-passing-arch_name-in-CreateTritonPi.patch b/openxla/patches/rocm/0003-Fixed-issue-with-passing-arch_name-in-CreateTritonPi.patch
@@ -5,22 +5,13 @@ Subject: [PATCH] Fixed issue with passing arch_name in CreateTritonPipeline
  for ROCm.
 
 ---
- xla/backends/gpu/codegen/triton/compilation_pipeline_rocm.cc | 4 ++--
- 1 file changed, 2 insertions(+), 2 deletions(-)
+ xla/backends/gpu/codegen/triton/compilation_pipeline_rocm.cc | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/xla/backends/gpu/codegen/triton/compilation_pipeline_rocm.cc b/xla/backends/gpu/codegen/triton/compilation_pipeline_rocm.cc
 index d2ef805ad4..666ce37933 100644
 --- a/xla/backends/gpu/codegen/triton/compilation_pipeline_rocm.cc
 +++ b/xla/backends/gpu/codegen/triton/compilation_pipeline_rocm.cc
-@@ -66,7 +66,7 @@ static void MakeTTGIR(mlir::OpPassManager* pm,
-   pm->addPass(mt::gpu::createTritonGPURemoveLayoutConversions());
-   pm->addPass(mt::gpu::createTritonGPUOptimizeThreadLocality());
-   // TODO ROCm Pass rocm_cc.gfx_version() after fixing issue with fmfa
--  pm->addPass(mlir::createTritonAMDGPUAccelerateMatmul({arch_name}));
-+  pm->addPass(mlir::createTritonAMDGPUAccelerateMatmul({rocm_cc.gfx_version()}));
-   pm->addPass(mt::gpu::createTritonGPURemoveLayoutConversions());
-   // TODO ROCm Check if we want to compare MI100 and greater
-   pm->addPass(mlir::createTritonAMDGPUOptimizeEpilogue());
 @@ -109,7 +109,7 @@ static void MakeTTGIR(mlir::OpPassManager* pm,
    if (/*use_buffer_ops=*/false) {  // Not enabled by default.
      pm->addPass(mlir::createTritonAMDGPUCanonicalizePointers());
@@ -32,4 +23,3 @@ index d2ef805ad4..666ce37933 100644
    pm->addPass(mlir::createCanonicalizerPass());
 -- 
 2.43.0
-
diff --git a/openxla/patches/upstream/0002-Use-hermetic-cc-toolchain-for-Linux-CPU-use-glibc-2..patch b/openxla/patches/upstream/0002-Use-hermetic-cc-toolchain-for-Linux-CPU-use-glibc-2..patch
@@ -4,14 +4,22 @@ Date: Wed, 5 Nov 2025 14:45:51 +0100
 Subject: [PATCH 2/3] Use hermetic cc toolchain for Linux CPU (use glibc 2.31)
 
 ---
- WORKSPACE | 22 ++++++++++++++++++++++
- 1 file changed, 22 insertions(+)
+ WORKSPACE | 23 +++++++++++++++++++++++
+ 1 file changed, 23 insertions(+)
 
 diff --git a/WORKSPACE b/WORKSPACE
 index 8d7f1384d8..8cd8df86a0 100644
 --- a/WORKSPACE
 +++ b/WORKSPACE
-@@ -33,6 +33,28 @@ register_toolchains("@rules_ml_toolchain//cc:linux_aarch64_linux_aarch64")
+@@ -2,6 +2,7 @@
+ workspace(name = "xla")
+ 
+ load("//third_party:repo.bzl", "tf_http_archive", "tf_mirror_urls")
++load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
+ 
+ # Initialize toolchains for ML projects.
+ #
+@@ -33,6 +34,28 @@ register_toolchains("@rules_ml_toolchain//cc:linux_aarch64_linux_aarch64")
  
  register_toolchains("@rules_ml_toolchain//cc:linux_aarch64_linux_aarch64_cuda")
  
@@ -42,4 +50,3 @@ index 8d7f1384d8..8cd8df86a0 100644
  # The cascade of load() statements and xla_workspace?() calls works around the
 -- 
 2.50.1 (Apple Git-155)
-