Skip to content

Commit 3862b87

Browse files
authored
Create PyTorch commit pin. (#9654)
This PR creates a PyTorch pin. **Key Changes:** - Removed `get-torch-commit` (and, consequently, `torch-commit` parameters) from GitHub actions files - Modified the `setup.yml` action by: 1. Checking out PyTorch/XLA 2. Retrieving the contents of `.torch_commit` 3. Checking out PyTorch using the contents of the retrieved PyTorch commit 4. Moving PyTorch/XLA inside the PyTorch directory - Add `.torch_commit` file pointing to the commit just before #9651 started happening This should prevent our CI to break due to some PyTorch breaking changes as we have experienced recently (e.g. #9653, and #9651). From now on, in theory, we should only see our CI breaking because of PyTorch changes whenever we update this pin.
1 parent 3240166 commit 3862b87

File tree

11 files changed

+125
-76
lines changed

11 files changed

+125
-76
lines changed

.github/ci.md

Lines changed: 28 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -3,22 +3,22 @@
33
PyTorch and PyTorch/XLA use CI to lint, build, and test each PR that is
44
submitted. All CI tests should succeed before the PR is merged into master.
55
PyTorch CI pins PyTorch/XLA to a specific commit. On the other hand, PyTorch/XLA
6-
CI pulls PyTorch from master unless a pin is manually provided. This README will
7-
go through the reasons of these pins, how to pin a PyTorch/XLA PR to an upstream
8-
PyTorch PR, and how to coordinate a merge for breaking PyTorch changes.
6+
CI pulls PyTorch from `.torch_commit` unless a pin is manually provided. This
7+
README will go through the reasons of these pins, how to pin a PyTorch/XLA PR
8+
to an upstream PyTorch PR, and how to coordinate a merge for breaking PyTorch
9+
changes.
910

1011
## Usage
1112

12-
### Pinning PyTorch PR in PyTorch/XLA PR
13+
### Temporarily Pinning PyTorch PR in PyTorch/XLA PR
1314

1415
Sometimes a PyTorch/XLA PR needs to be pinned to a specific PyTorch PR to test
15-
new features, fix breaking changes, etc. Since PyTorch/XLA CI pulls from PyTorch
16-
master by default, we need to manually provide a PyTorch pin. In a PyTorch/XLA
17-
PR, PyTorch can be manually pinned by creating a `.torch_pin` file at the root
18-
of the repository. The `.torch_pin` should have the corresponding PyTorch PR
19-
number prefixed by "#". Take a look at [example
20-
here](https://github.com/pytorch/xla/pull/7313). Before the PyTorch/XLA PR gets
21-
merged, the `.torch_pin` must be deleted.
16+
new features, fix breaking changes, etc. In a PyTorch/XLA PR, PyTorch can be
17+
manually pinned by creating a `.torch_pin` file at the root of the repository.
18+
The `.torch_pin` should have the corresponding PyTorch PR number prefixed by
19+
"#". Take a look at [example here](https://github.com/pytorch/xla/pull/7313).
20+
Before the PyTorch/XLA PR gets merged, the `.torch_pin` must be deleted and
21+
`.torch_commit` updated.
2222

2323
### Coordinating merges for breaking PyTorch PRs
2424

@@ -35,10 +35,11 @@ fail. Steps for fixing and merging such breaking PyTorch change is as following:
3535
PyTorch PR to pin the PyTorch/XLA to the commit hash created in step 1 by
3636
updating `pytorch/.github/ci_commit_pins/xla.txt`.
3737
1. Once CI tests are green on both ends, merge PyTorch PR.
38-
1. Remove the `.torch_pin` in PyTorch/XLA PR and merge. To be noted, `git commit
39-
--amend` should be avoided in this step as PyTorch CI will keep using the
40-
commit hash created in step 1 until other PRs update that manually or the
41-
nightly buildbot updates that automatically.
38+
1. Remove the `.torch_pin` in PyTorch/XLA PR and update the `.torch_commit` to
39+
the hash of the merged PyTorch PR. To be noted, `git commit --amend` should
40+
be avoided in this step as PyTorch CI will keep using the commit hash
41+
created in step 1 until other PRs update that manually or the nightly
42+
buildbot updates that automatically.
4243
1. Finally, don't delete your branch until 2 days later. See step 4 for
4344
explanations.
4445

@@ -47,6 +48,18 @@ fail. Steps for fixing and merging such breaking PyTorch change is as following:
4748
The `build_and_test.yml` workflow runs tests on the TPU in addition to CPU.
4849
The set of tests run on the TPU is defined in `test/tpu/run_tests.sh`.
4950

51+
## Update the PyTorch Commit Pin
52+
53+
In order to reduce development burden of PyTorch/XLA, starting from #9654, we
54+
started pinning PyTorch using the `.torch_commit` file. This should reduce the
55+
number of times a PyTorch PR breaks our most recent commits. However, this also
56+
requires maintenance, i.e. someone has to keep updating the PyTorch commit so
57+
as to make sure it's always supporting (almost) the latest PyTorch versions.
58+
59+
Updating the PyTorch commit pin is, theoretically, simple. You just have to run
60+
`scripts/update_deps.py --pytorch` file, and open a PR. In practice, you may
61+
encounter a few compilation errors, or even segmentation faults.
62+
5063
## CI Environment
5164

5265
Before the CI in this repository runs, we build a base dev image. These are the
@@ -152,13 +165,6 @@ good" commit to prevent accidental changes from PyTorch/XLA to break PyTorch CI
152165
without warning. PyTorch has hundreds of commits each week, and this pin ensures
153166
that PyTorch/XLA as a downstream package does not cause failures in PyTorch CI.
154167

155-
#### Why does PyTorch/XLA CI pull from PyTorch master?
156-
157-
[PyTorch/XLA CI pulls PyTorch from master][pull-pytorch-master] unless a PyTorch
158-
pin is manually provided. PyTorch/XLA is a downstream package to PyTorch, and
159-
pulling from master ensures that PyTorch/XLA will stay up-to-date and works with
160-
the latest PyTorch changes.
161-
162168
#### TPU CI is broken
163169

164170
If the TPU CI won't run, try to debug using the following steps:

.github/workflows/_build_torch_xla.yml

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,6 @@ on:
66
required: true
77
type: string
88
description: Base image for builds
9-
torch-commit:
10-
required: true
11-
type: string
12-
description: torch-commit
139
runner:
1410
required: false
1511
type: string
@@ -53,8 +49,6 @@ jobs:
5349
- name: Setup
5450
if: inputs.has_code_changes == 'true'
5551
uses: ./.actions/.github/workflows/setup
56-
with:
57-
torch-commit: ${{ inputs.torch-commit }}
5852
- name: Build
5953
if: inputs.has_code_changes == 'true'
6054
shell: bash

.github/workflows/_test.yml

Lines changed: 0 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -23,10 +23,6 @@ on:
2323
description: |
2424
Set the maximum (in minutes) how long the workflow should take to finish
2525
timeout-minutes:
26-
torch-commit:
27-
required: true
28-
type: string
29-
description: torch-commit
3026
has_code_changes:
3127
required: false
3228
type: string
@@ -89,7 +85,6 @@ jobs:
8985
if: inputs.has_code_changes == 'true'
9086
uses: ./.actions/.github/workflows/setup
9187
with:
92-
torch-commit: ${{ inputs.torch-commit }}
9388
wheels-artifact: torch-xla-wheels
9489
- name: Fetch CPP test binaries
9590
if: inputs.has_code_changes == 'true' && matrix.run_cpp_tests
@@ -112,18 +107,6 @@ jobs:
112107
pip install fsspec
113108
pip install rich
114109
pip install flax
115-
- name: Checkout PyTorch Repo
116-
if: inputs.has_code_changes == 'true'
117-
uses: actions/checkout@v4
118-
with:
119-
repository: pytorch/pytorch
120-
path: pytorch
121-
ref: ${{ inputs.torch-commit }}
122-
- name: Checkout PyTorch/XLA Repo
123-
if: inputs.has_code_changes == 'true'
124-
uses: actions/checkout@v4
125-
with:
126-
path: pytorch/xla
127110
- name: Extra CI deps
128111
if: inputs.has_code_changes == 'true'
129112
shell: bash

.github/workflows/_tpu_ci.yml

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,6 @@ name: TPU Integration Test
22
on:
33
workflow_call:
44
inputs:
5-
torch-commit:
6-
required: false
7-
type: string
8-
description: torch-commit
95
timeout-minutes:
106
required: false
117
type: number
@@ -42,7 +38,6 @@ jobs:
4238
if: inputs.has_code_changes == 'true'
4339
uses: ./.actions/.github/workflows/setup
4440
with:
45-
torch-commit: ${{ inputs.torch-commit }}
4641
wheels-artifact: torch-xla-wheels
4742

4843
- name: Install test dependencies

.github/workflows/build_and_test.yml

Lines changed: 4 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -26,29 +26,21 @@ jobs:
2626
base_sha: ${{ github.event_name == 'pull_request' && github.event.pull_request.base.sha || github.event.before }}
2727
head_sha: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }}
2828

29-
get-torch-commit:
29+
report-no-code-changes:
3030
needs: [check_code_changes]
3131
runs-on: ubuntu-24.04
32-
outputs:
33-
torch_commit: ${{ steps.commit.outputs.torch_commit }}
3432
steps:
35-
- name: Get latest torch commit
36-
id: commit
37-
if: needs.check_code_changes.outputs.has_code_changes == 'true'
38-
run: |
39-
echo "torch_commit=$(git ls-remote https://github.com/pytorch/pytorch.git HEAD | awk '{print $1}')" >> "$GITHUB_OUTPUT"
4033
- name: Report no code changes
41-
if: needs.check_code_changes.outputs.has_code_changes == 'false'
4234
run: |
4335
echo "No code changes were detected that require running the full test suite."
36+
if: needs.check_code_changes.outputs.has_code_changes == 'false'
4437

4538
build-torch-xla:
4639
name: "Build PyTorch/XLA"
4740
uses: ./.github/workflows/_build_torch_xla.yml
48-
needs: [check_code_changes, get-torch-commit]
41+
needs: [check_code_changes]
4942
with:
5043
dev-image: us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/development:3.12_tpuvm
51-
torch-commit: ${{needs.get-torch-commit.outputs.torch_commit}}
5244
timeout-minutes: 45 # Takes ~20m as of 2025/5/30.
5345
has_code_changes: ${{ needs.check_code_changes.outputs.has_code_changes }}
5446
runner: linux.24xlarge
@@ -58,13 +50,12 @@ jobs:
5850
test-python-cpu:
5951
name: "CPU tests"
6052
uses: ./.github/workflows/_test.yml
61-
needs: [build-torch-xla, check_code_changes, get-torch-commit]
53+
needs: [build-torch-xla, check_code_changes]
6254
with:
6355
dev-image: us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/development:3.12_tpuvm
6456
timeout-minutes: 45 # Takes ~26m as of 2025/5/30.
6557
collect-coverage: false
6658
runner: linux.24xlarge
67-
torch-commit: ${{needs.get-torch-commit.outputs.torch_commit}}
6859
has_code_changes: ${{ needs.check_code_changes.outputs.has_code_changes }}
6960
secrets:
7061
gcloud-service-key: ${{ secrets.GCLOUD_SERVICE_KEY }}

.github/workflows/setup/action.yml

Lines changed: 17 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,5 @@
11
name: Set up PyTorch/XLA
22
inputs:
3-
torch-commit:
4-
type: string
5-
description: PyTorch commit to check out, if provided
63
wheels-artifact:
74
type: string
85
description: |
@@ -16,6 +13,7 @@ runs:
1613
run: |
1714
ls -la
1815
rm -rvf ${GITHUB_WORKSPACE}/*
16+
1917
- name: Setup gcloud
2018
shell: bash
2119
run: |
@@ -25,24 +23,37 @@ runs:
2523
# reason composite actions don't support secrets.
2624
# https://docs.github.com/en/actions/using-workflows/avoiding-duplication
2725
if: ${{ env.GCLOUD_SERVICE_KEY }}
26+
2827
- name: Checkout PyTorch Repo
2928
uses: actions/checkout@v4
3029
with:
3130
repository: pytorch/pytorch
3231
path: pytorch
33-
ref: ${{ inputs.torch-commit }}
34-
submodules: recursive
35-
if: ${{ inputs.torch-commit }}
32+
3633
- name: Checkout PyTorch/XLA Repo
3734
uses: actions/checkout@v4
3835
with:
3936
path: pytorch/xla
37+
38+
# Fetch and checkout to the pinned PyTorch commit.
39+
- name: Checkout to PyTorch Commit Pin
40+
working-directory: pytorch
41+
shell: bash
42+
env:
43+
TORCH_COMMIT_FILE: ".torch_commit"
44+
run: |
45+
COMMIT=$(tail -1 "xla/$TORCH_COMMIT_FILE")
46+
git fetch --no-recurse-submodules origin $COMMIT
47+
git checkout --no-recurse-submodules FETCH_HEAD
48+
git submodule update --init --recursive
49+
4050
- name: Fetch PyTorch/XLA packages
4151
uses: actions/download-artifact@v5
4252
with:
4353
name: ${{ inputs.wheels-artifact }}
4454
path: /tmp/wheels/
4555
if: ${{ inputs.wheels-artifact }}
56+
4657
- name: Install wheels
4758
shell: bash
4859
run: |

.torch_commit

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
# 2025-09-17
2+
928ac57c2ab03f9f79376f9995553eea2e6f4ca8

CONTRIBUTING.md

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ working with:
3737

3838
Next, we need to clone the forked repos locally so that we can make changes.
3939

40-
On your Linuc machine, decide a directory as your workspace. Make sure that
40+
On your Linux machine, decide a directory as your workspace. Make sure that
4141
this directory and all of its ancestors are publically readable. Then run
4242
the following commands on this machine:
4343

@@ -58,6 +58,28 @@ git clone --recursive [email protected]:<your-github-user-name>/vision.git
5858
git clone --recursive [email protected]:<your-github-user-name>/pytorch-xla.git pytorch/xla
5959
```
6060

61+
### Pinned PyTorch Version
62+
63+
Since PR #9654, PyTorch/XLA started pinnning a PyTorch version. The pinned
64+
commit can be found in `.torch_commit` file at the root directory. Note that
65+
the pinned PyTorch version guarantees all PyTorch/XLA tests are passing
66+
whenever the underlying PyTorch is compiled at that specific commit. Therefore,
67+
specially for development, it's recommended that PyTorch is compiled at that
68+
specific commit. Otherwise you might end up with all kinds of errors: from
69+
build errors, to segmentation faults. So, make sure to check out that version:
70+
71+
```bash
72+
# Go to PyTorch directory.
73+
cd $WORKSPACE_DIR/pytorch
74+
75+
# Retrieve the PyTorch commit pin inside PyTorch/XLA directory.
76+
# Note: it's located in the last line of `.torch_commit`.
77+
COMMIT=$(tail -1 "xla/.torch_commit")
78+
79+
# Create a branch (optional) and jump at that commit.
80+
git checkout -b pin "$COMMIT"
81+
```
82+
6183
### Setting up Remote Tracking
6284

6385
From time to time, we'll need to bring our forked repos up to date with the

infra/ansible/roles/build_srcs/tasks/main.yaml

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
- name: Read PyTorch pin
2-
ansible.builtin.command: cat {{ (src_root, 'pytorch/xla/.torch_pin') | path_join }}
2+
ansible.builtin.shell: |
3+
cat {{ (src_root, 'pytorch/xla/.torch_pin') | path_join }} 2> /dev/null ||
4+
tail -1 {{ (src_root, 'pytorch/xla/.torch_commit') | path_join }}
35
register: torch_pin
4-
# Pin may not exist
5-
ignore_errors: true
66

77
- name: Checkout PyTorch pin
88
# ansible.builtin.git wants to fetch the entire history, so check out the pin manually
@@ -21,7 +21,6 @@
2121
chdir: "{{ (src_root, 'pytorch') | path_join }}"
2222
args:
2323
executable: /bin/bash
24-
when: torch_pin is succeeded
2524

2625
- name: Build PyTorch
2726
ansible.builtin.command:

scripts/build_developer.sh

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,11 @@ if [ "$_BUILD_BASE" == "pytorch" ]; then
5656
# Change to the pytorch directory.
5757
cd $_SCRIPT_DIR/../..
5858

59+
TORCH_COMMIT="xla/.torch_commit"
60+
if [ -e "$TORCH_COMMIT" ]; then
61+
git checkout $(tail -1 "$TORCH_COMMIT")
62+
fi
63+
5964
# Remove any leftover old wheels and old installation.
6065
pip uninstall torch -y
6166
python3 setup.py clean

0 commit comments

Comments
 (0)