Skip to content

Commit 342de86

Browse files
authored
Remove other CUDA usage from PyTorch/XLA repository. (#9618)
This PR removes CUDA specific logic from the remaining files in this repository. This is in line with the CUDA deprecation that started on release 2.8. **Key Changes:** - Removed CUDA branches for testing (e.g. `.circleci/common.sh`) - Removed files (e.g. documentation) for CUDA related matters (e.g. `docs/source/accelerators/gpu.md`) - Removed mentions to CUDA as a supported PyTorch/XLA accelerator - Removed CUDA specific parameters from CI configuration files (e.g.`.github/workflows/_test.yml`) - Removed CUDA specific parameters from artifacts build configuration files (e.g. `infra/tpu-pytorch-releases/artifacts_builds.tf`)
1 parent e0de097 commit 342de86

File tree

28 files changed

+74
-417
lines changed

28 files changed

+74
-417
lines changed

.circleci/common.sh

Lines changed: 3 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -158,26 +158,12 @@ function run_torch_xla_cpp_tests() {
158158
fi
159159

160160
if [ "$USE_COVERAGE" != "0" ]; then
161-
if [ -x "$(command -v nvidia-smi)" ]; then
162-
PJRT_DEVICE=CUDA test/cpp/run_tests.sh $EXTRA_ARGS -L""
163-
cp $XLA_DIR/bazel-out/_coverage/_coverage_report.dat /tmp/cov1.dat
164-
PJRT_DEVICE=CUDA test/cpp/run_tests.sh -X early_sync -F AtenXlaTensorTest.TestEarlySyncLiveTensors -L"" $EXTRA_ARGS
165-
cp $XLA_DIR/bazel-out/_coverage/_coverage_report.dat /tmp/cov2.dat
166-
lcov --add-tracefile /tmp/cov1.dat -a /tmp/cov2.dat -o /tmp/merged.dat
167-
else
168-
PJRT_DEVICE=CPU test/cpp/run_tests.sh $EXTRA_ARGS -L""
169-
cp $XLA_DIR/bazel-out/_coverage/_coverage_report.dat /tmp/merged.dat
170-
fi
161+
PJRT_DEVICE=CPU test/cpp/run_tests.sh $EXTRA_ARGS -L""
162+
cp $XLA_DIR/bazel-out/_coverage/_coverage_report.dat /tmp/merged.dat
171163
genhtml /tmp/merged.dat -o ~/htmlcov/cpp/cpp_lcov.info
172164
mv /tmp/merged.dat ~/htmlcov/cpp_lcov.info
173165
else
174-
# Shard GPU testing
175-
if [ -x "$(command -v nvidia-smi)" ]; then
176-
PJRT_DEVICE=CUDA test/cpp/run_tests.sh $EXTRA_ARGS -L""
177-
PJRT_DEVICE=CUDA test/cpp/run_tests.sh -X early_sync -F AtenXlaTensorTest.TestEarlySyncLiveTensors -L"" $EXTRA_ARGS
178-
else
179-
PJRT_DEVICE=CPU test/cpp/run_tests.sh $EXTRA_ARGS -L""
180-
fi
166+
PJRT_DEVICE=CPU test/cpp/run_tests.sh $EXTRA_ARGS -L""
181167
fi
182168
popd
183169
}
@@ -196,11 +182,6 @@ function run_torch_xla_tests() {
196182
RUN_CPP="${RUN_CPP_TESTS:0}"
197183
RUN_PYTHON="${RUN_PYTHON_TESTS:0}"
198184

199-
if [ -x "$(command -v nvidia-smi)" ]; then
200-
num_devices=$(nvidia-smi --list-gpus | wc -l)
201-
echo "Found $num_devices GPU devices..."
202-
export GPU_NUM_DEVICES=$num_devices
203-
fi
204185
export PYTORCH_TESTING_DEVICE_ONLY_FOR="xla"
205186
export CXX_ABI=$(python -c "import torch;print(int(torch._C._GLIBCXX_USE_CXX11_ABI))")
206187

.devcontainer/gpu-internal/devcontainer.json

Lines changed: 0 additions & 30 deletions
This file was deleted.

.github/ISSUE_TEMPLATE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,5 +13,5 @@ Error messages and stack traces are also helpful.
1313

1414
## System Info
1515

16-
- reproducible on XLA backend [CPU/TPU/CUDA]:
16+
- reproducible on XLA backend [CPU/TPU]:
1717
- torch_xla version:

.github/ISSUE_TEMPLATE/bug-report.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ Steps to reproduce the behavior:
4646

4747
## Environment
4848

49-
- Reproducible on XLA backend [CPU/TPU/CUDA]:
49+
- Reproducible on XLA backend [CPU/TPU]:
5050
- torch_xla version:
5151

5252

.github/ci.md

Lines changed: 12 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -44,20 +44,20 @@ fail. Steps for fixing and merging such breaking PyTorch change is as following:
4444

4545
### Running TPU tests on PRs
4646

47-
The `build_and_test.yml` workflow runs tests on the TPU in addition to CPU and
48-
GPU. The set of tests run on the TPU is defined in `test/tpu/run_tests.sh`.
47+
The `build_and_test.yml` workflow runs tests on the TPU in addition to CPU.
48+
The set of tests run on the TPU is defined in `test/tpu/run_tests.sh`.
4949

5050
## CI Environment
5151

5252
Before the CI in this repository runs, we build a base dev image. These are the
5353
same images we recommend in our VSCode `.devcontainer` setup and nightly build
54-
to ensure consistency between environments. We produce variants with and without
55-
CUDA, configured in `infra/ansible` (build config) and
56-
`infra/tpu-pytorch-releases/dev_images.tf` (build triggers).
54+
to ensure consistency between environments. We produce variants configured in
55+
`infra/ansible` (build config) and `infra/tpu-pytorch-releases/dev_images.tf`
56+
(build triggers).
5757

5858
The CI runs in two environments:
5959

60-
1. Organization self-hosted runners for CPU and GPU: used for almost every step
60+
1. Organization self-hosted runners for CPU: used for almost every step
6161
of the CI. These runners are managed by PyTorch and have access to the shared
6262
ECR repository.
6363
1. TPU self-hosted runners: these are managed by us and are only available in
@@ -68,48 +68,35 @@ The CI runs in two environments:
6868

6969
We have two build paths for each CI run:
7070

71-
- `torch_xla`: we build the main package to support both TPU and GPU[^1], along
71+
- `torch_xla`: we build the main package to support TPU, along
7272
with a CPU build of `torch` from HEAD. This build step exports the
7373
`torch-xla-wheels` artifact for downstream use in tests.
7474
- Some CI tests also require `torchvision`. To reduce flakiness, we compile
7575
`torchvision` from [`torch`'s CI pin][pytorch-vision-pin].
7676
- C++ tests are piggybacked onto the same build and uploaded in the
7777
`cpp-test-bin` artifact.
78-
- `torch_xla_cuda_plugin`: the XLA CUDA runtime can be built independently of
79-
either `torch` or `torch_xla` -- it depends only on our pinned OpenXLA. Thus,
80-
this build should be almost entirely cached, unless your PR changes the XLA
81-
pin or adds a patch.
8278

83-
Both the main package build and plugin build are configured with ansible at
84-
`infra/ansible`, although they run in separate stages (`stage=build_srcs` vs
85-
`stage=build_plugin`). This is the same configuration we use for our nightly and
86-
release builds.
79+
The main package build is configured with ansible at `infra/ansible`. This is
80+
the same configuration we use for our nightly and release builds.
8781

88-
The CPU and GPU test configs are defined in the same file, `_test.yml`. Since
82+
The CPU test config is defined in the file `_test.yml`. Since
8983
some of the tests come from the upstream PyTorch repository, we check out
9084
PyTorch at the same git rev as the `build` step (taken from
9185
`torch_xla.version.__torch_gitrev__`). The tests are split up into multiple
9286
groups that run in parallel; the `matrix` section of `_test.yml` corresponds to
9387
in `.github/scripts/run_tests.sh`.
9488

9589
CPU tests run immediately after the `torch_xla` build completes. This will
96-
likely be the first test feedback on your commit. GPU tests will launch when
97-
both the `torch_xla` and `torch_xla_cuda_plugin` complete. GPU compilation is
98-
much slower due to the number of possible optimizations, and the GPU chips
99-
themselves are quite outdated, so these tests will take longer to run than the
100-
CPU tests.
90+
likely be the first test feedback on your commit.
10191

10292
![CPU tests launch when `torch_xla` is
10393
complete](../docs/assets/ci_test_dependency.png)
10494

105-
![GPU tests also depend on CUDA
106-
plugin](../docs/assets/ci_test_dependency_gpu.png)
107-
10895
For the C++ test groups in either case, the test binaries are pre-built during
10996
the build phase and packaged in `cpp-test-bin`. This will only be downloaded if
11097
necessary.
11198

112-
[^1]: Note: both GPU and TPU support require their respective plugins to be
99+
[^1]: Note: TPU support require its respective plugins to be
113100
installed. This package will _not_ work on either out of the box.
114101

115102
### TPU CI

.github/scripts/run_tests.sh

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -77,11 +77,6 @@ PYTORCH_DIR=$1
7777
XLA_DIR=$2
7878
USE_COVERAGE="${3:-0}"
7979

80-
if [ -x "$(command -v nvidia-smi)" ]; then
81-
num_devices=$(nvidia-smi --list-gpus | wc -l)
82-
echo "Found $num_devices GPU devices..."
83-
export GPU_NUM_DEVICES=$num_devices
84-
fi
8580
export PYTORCH_TESTING_DEVICE_ONLY_FOR="xla"
8681
export CXX_ABI=$(python -c "import torch;print(int(torch._C._GLIBCXX_USE_CXX11_ABI))")
8782

.github/workflows/_test.yml

Lines changed: 16 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -23,11 +23,6 @@ on:
2323
description: |
2424
Set the maximum (in minutes) how long the workflow should take to finish
2525
timeout-minutes:
26-
install-cuda-plugin:
27-
required: false
28-
type: boolean
29-
default: false
30-
description: Whether to install CUDA plugin package
3126
torch-commit:
3227
required: true
3328
type: string
@@ -46,7 +41,7 @@ jobs:
4641
runs-on: ${{ inputs.runner }}
4742
container:
4843
image: ${{ inputs.dev-image }}
49-
options: "${{ inputs.install-cuda-plugin == true && '--gpus all' || '' }} --shm-size 16g"
44+
options: "--shm-size 16g"
5045
strategy:
5146
fail-fast: false
5247
matrix:
@@ -95,9 +90,7 @@ jobs:
9590
uses: ./.actions/.github/workflows/setup
9691
with:
9792
torch-commit: ${{ inputs.torch-commit }}
98-
cuda: ${{ inputs.install-cuda-plugin && true || false }}
9993
wheels-artifact: torch-xla-wheels
100-
cuda-plugin-artifact: ${{ inputs.install-cuda-plugin && 'cuda-plugin' || null }}
10194
- name: Fetch CPP test binaries
10295
if: inputs.has_code_changes == 'true' && matrix.run_cpp_tests
10396
uses: actions/download-artifact@v4
@@ -111,9 +104,6 @@ jobs:
111104
run: |
112105
chmod +x /tmp/test/bin/*
113106
ls -l /tmp/test/bin
114-
- name: Check GPU
115-
if: inputs.has_code_changes == 'true' && inputs.install-cuda-plugin
116-
run: nvidia-smi
117107
- name: Install test deps
118108
if: inputs.has_code_changes == 'true'
119109
shell: bash
@@ -164,35 +154,24 @@ jobs:
164154
exit 0
165155
fi
166156
docker cp "${pid}":/home/jenkins/htmlcov "${GITHUB_WORKSPACE}"
167-
if [ -n "${GPU_FLAG:-}" ]; then
168-
if [ -n "${PYTHON_TEST_NAME}" ]; then
169-
gsutil cp ${GITHUB_WORKSPACE}/htmlcov/lcov.info gs://ng3-metrics/ng3-pytorchxla-coverage/absolute/pytorchxla/${CIRCLE_WORKFLOW_ID}/gpu_python_coverage_${PYTHON_TEST_NAME}.out
170-
gsutil cp ${GITHUB_WORKSPACE}/htmlcov/lcov.info gs://ng3-metrics/ng3-pytorchxla-coverage/incremental/pytorchxla/${CIRCLE_WORKFLOW_ID}/gpu_python_coverage_${PYTHON_TEST_NAME}.out
171-
fi
172-
if [ -n "${CPP_TEST_NAME}" ]; then
173-
gsutil cp ${GITHUB_WORKSPACE}/htmlcov/cpp_lcov.info gs://ng3-metrics/ng3-pytorchxla-coverage/absolute/pytorchxla/${CIRCLE_WORKFLOW_ID}/gpu_cpp_coverage_${CPP_TEST_NAME}.out
174-
gsutil cp ${GITHUB_WORKSPACE}/htmlcov/cpp_lcov.info gs://ng3-metrics/ng3-pytorchxla-coverage/incremental/pytorchxla/${CIRCLE_WORKFLOW_ID}/gpu_cpp_coverage_${CPP_TEST_NAME}.out
175-
fi
176-
else
177-
if [ -n "${PYTHON_TEST_NAME}" ]; then
178-
gsutil cp ${GITHUB_WORKSPACE}/htmlcov/lcov.info gs://ng3-metrics/ng3-pytorchxla-coverage/absolute/pytorchxla/${CIRCLE_WORKFLOW_ID}/cpu_python_coverage_${PYTHON_TEST_NAME}.out
179-
gsutil cp ${GITHUB_WORKSPACE}/htmlcov/lcov.info gs://ng3-metrics/ng3-pytorchxla-coverage/incremental/pytorchxla/${CIRCLE_WORKFLOW_ID}/cpu_python_coverage_${PYTHON_TEST_NAME}.out
180-
fi
157+
if [ -n "${PYTHON_TEST_NAME}" ]; then
158+
gsutil cp ${GITHUB_WORKSPACE}/htmlcov/lcov.info gs://ng3-metrics/ng3-pytorchxla-coverage/absolute/pytorchxla/${CIRCLE_WORKFLOW_ID}/cpu_python_coverage_${PYTHON_TEST_NAME}.out
159+
gsutil cp ${GITHUB_WORKSPACE}/htmlcov/lcov.info gs://ng3-metrics/ng3-pytorchxla-coverage/incremental/pytorchxla/${CIRCLE_WORKFLOW_ID}/cpu_python_coverage_${PYTHON_TEST_NAME}.out
160+
fi
181161
182-
if [ -n "${CPP_TEST_NAME}" ]; then
183-
gsutil cp ${GITHUB_WORKSPACE}/htmlcov/cpp_lcov.info gs://ng3-metrics/ng3-pytorchxla-coverage/absolute/pytorchxla/${CIRCLE_WORKFLOW_ID}/cpu_cpp_coverage_${CPP_TEST_NAME}.out
184-
gsutil cp ${GITHUB_WORKSPACE}/htmlcov/cpp_lcov.info gs://ng3-metrics/ng3-pytorchxla-coverage/incremental/pytorchxla/${CIRCLE_WORKFLOW_ID}/cpu_cpp_coverage_${CPP_TEST_NAME}.out
185-
fi
162+
if [ -n "${CPP_TEST_NAME}" ]; then
163+
gsutil cp ${GITHUB_WORKSPACE}/htmlcov/cpp_lcov.info gs://ng3-metrics/ng3-pytorchxla-coverage/absolute/pytorchxla/${CIRCLE_WORKFLOW_ID}/cpu_cpp_coverage_${CPP_TEST_NAME}.out
164+
gsutil cp ${GITHUB_WORKSPACE}/htmlcov/cpp_lcov.info gs://ng3-metrics/ng3-pytorchxla-coverage/incremental/pytorchxla/${CIRCLE_WORKFLOW_ID}/cpu_cpp_coverage_${CPP_TEST_NAME}.out
165+
fi
186166
187-
if [ "${CPP_TEST_NAME}" == "cpp_tests" ]; then
188-
ABS_METADATA='{"host": "github", "project": "pytorchxla", "trace_type": "LCOV", "commit_id": '\"${GITHUB_SHA}\"', "ref": "HEAD", "source": "https://github.com/pytorch/xla", "owner": "cloud-tpu-pt-dev", "bug_component": "587012"}'
189-
echo $ABS_METADATA > abs_metadata.json
190-
gsutil cp abs_metadata.json gs://ng3-metrics/ng3-pytorchxla-coverage/absolute/pytorchxla/${CIRCLE_WORKFLOW_ID}/metadata.json
167+
if [ "${CPP_TEST_NAME}" == "cpp_tests" ]; then
168+
ABS_METADATA='{"host": "github", "project": "pytorchxla", "trace_type": "LCOV", "commit_id": '\"${GITHUB_SHA}\"', "ref": "HEAD", "source": "https://github.com/pytorch/xla", "owner": "cloud-tpu-pt-dev", "bug_component": "587012"}'
169+
echo $ABS_METADATA > abs_metadata.json
170+
gsutil cp abs_metadata.json gs://ng3-metrics/ng3-pytorchxla-coverage/absolute/pytorchxla/${CIRCLE_WORKFLOW_ID}/metadata.json
191171
192-
INC_METADATA='{"host": "github", "project": "pytorchxla", "trace_type": "LCOV", "patchset_num": 1, "change_id": '${CIRCLE_BUILD_NUM}', "owner": "cloud-tpu-pt-dev", "bug_component": "587012"}'
193-
echo $INC_METADATA > inc_metadata.json
194-
gsutil cp inc_metadata.json gs://ng3-metrics/ng3-pytorchxla-coverage/incremental/pytorchxla/${CIRCLE_WORKFLOW_ID}/metadata.json
195-
fi
172+
INC_METADATA='{"host": "github", "project": "pytorchxla", "trace_type": "LCOV", "patchset_num": 1, "change_id": '${CIRCLE_BUILD_NUM}', "owner": "cloud-tpu-pt-dev", "bug_component": "587012"}'
173+
echo $INC_METADATA > inc_metadata.json
174+
gsutil cp inc_metadata.json gs://ng3-metrics/ng3-pytorchxla-coverage/incremental/pytorchxla/${CIRCLE_WORKFLOW_ID}/metadata.json
196175
fi
197176
- name: Report no code changes
198177
if: inputs.has_code_changes == 'false'

.github/workflows/setup/action.yml

Lines changed: 0 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -3,20 +3,10 @@ inputs:
33
torch-commit:
44
type: string
55
description: PyTorch commit to check out, if provided
6-
cuda:
7-
type: boolean
8-
description: Whether to set up CUDA library paths
9-
default: false
106
wheels-artifact:
117
type: string
128
description: |
139
Artifact containing `torch` (cpu) and `torch-xla` wheels to install
14-
cuda-plugin-artifact:
15-
type: string
16-
description: Artifact containing `torch-xla-cuda-plugin` to install
17-
cuda-torch-artifact:
18-
type: string
19-
description: Artifact containing CUDA build of `torch`
2010
runs:
2111
using: "composite"
2212
steps:
@@ -26,12 +16,6 @@ runs:
2616
run: |
2717
ls -la
2818
rm -rvf ${GITHUB_WORKSPACE}/*
29-
- name: Setup CUDA environment
30-
shell: bash
31-
run: |
32-
echo "PATH=$PATH:/usr/local/cuda-12.3/bin" >> $GITHUB_ENV
33-
echo "LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-12.3/lib64" >> $GITHUB_ENV
34-
if: ${{ inputs.cuda }}
3519
- name: Setup gcloud
3620
shell: bash
3721
run: |
@@ -59,23 +43,6 @@ runs:
5943
name: ${{ inputs.wheels-artifact }}
6044
path: /tmp/wheels/
6145
if: ${{ inputs.wheels-artifact }}
62-
- name: Fetch CUDA plugin
63-
uses: actions/download-artifact@v4
64-
with:
65-
name: ${{ inputs.cuda-plugin-artifact }}
66-
path: /tmp/wheels/
67-
if: ${{ inputs.cuda-plugin-artifact }}
68-
- name: Remove CPU `torch` build
69-
shell: bash
70-
run: |
71-
rm -rf /tmp/wheels/torch-*
72-
if: ${{ inputs.cuda-torch-artifact }}
73-
- name: Fetch CUDA `torch` build
74-
uses: actions/download-artifact@v4
75-
with:
76-
name: ${{ inputs.cuda-torch-artifact }}
77-
path: /tmp/wheels/
78-
if: ${{ inputs.cuda-torch-artifact }}
7946
- name: Install wheels
8047
shell: bash
8148
run: |

CONTRIBUTING.md

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -238,10 +238,6 @@ first time, you may need to build everything again, for example, after a
238238
python setup.py develop
239239
```
240240

241-
### Additional steps for GPU
242-
243-
Please refer to this [guide](https://github.com/pytorch/xla/blob/master/plugins/cuda/README.md).
244-
245241
## Before Creating a Pull Request
246242

247243
In `pytorch/xla` repo we enforce coding style for both C++ and Python files.

0 commit comments

Comments
 (0)