Skip to content
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 40 additions & 1 deletion .github/workflows/rocm-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,41 @@ concurrency:
cancel-in-progress: true

jobs:
build_wheels:
name: Build TE Wheels
runs-on: linux-mi325-8
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
fetch-depth: 0

- name: Build Wheel Builder Image
run: |
cd build_tools/wheel_utils
docker build -f Dockerfile.rocm.manylinux.x86 \
--build-arg ROCM_REPO_URL=https://repo.radeon.com/rocm/rhel8/latest/main/ \
-t te-builder .
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does te-builder need to be manually deleted and also, does docker cache keep anything between CI runs or it is always clean for every run?


- name: Generate Wheels
run: |
mkdir -p dist
docker run --rm \
-v $(pwd)/dist:/wheelhouse \
-v ${{ github.workspace }}:/TransformerEngine \
-e LOCAL_TREE_BUILD=1 \
te-builder

- name: Upload Wheels
uses: actions/upload-artifact@v4
with:
name: te-wheels
path: dist/*
retention-days: 5

build_and_test:
name: Build and Test on GPU
needs: build_wheels
timeout-minutes: 720
runs-on: linux-mi325-8
steps:
Expand Down Expand Up @@ -160,6 +193,12 @@ jobs:
run: |
docker pull ${{ steps.select-image.outputs.image-tag }}

- name: Download Wheels
uses: actions/download-artifact@v4
with:
name: te-wheels
path: downloaded_wheels

- name: Run Container
run: |
docker run -dt \
Expand Down Expand Up @@ -218,7 +257,7 @@ jobs:
export NVTE_AITER_PREBUILT_BASE_URL=https://compute-artifactory.amd.com:5000/artifactory/rocm-generic-local/te-ci/aiter-prebuilts
pip install ninja
git config --global --add safe.directory '*'
pip install --no-build-isolation -v . 2>&1
pip install /wheelhouse_mount/transformer_engine*.whl --no-build-isolation --force-reinstall 2>&1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why force-reinstall is needed? Is TE already installed on this image ?

EOF
)"

Expand Down
Loading