Skip to content

Unable to build in docker for ROCM 7 #121

@spacebat

Description

@spacebat

Hi, I'm trying to build the XLA tarball for use on my machine, the chipset is AMD RYZEN AI MAX+ 395 and so I believe I have to use the latest ROCM which is 7.0.1 at the moment. I'm running Ubuntu 24.04 LTS, Elixir 1.18.4 and Erlang 28.

To get as far as I have, I've made these changes to XLA 0.9.1:

~/xla/builds$ git diff
diff --git a/builds/Dockerfile b/builds/Dockerfile
index ef618aa..47f19b6 100644
--- a/builds/Dockerfile
+++ b/builds/Dockerfile
@@ -1,6 +1,5 @@
 ARG VARIANT
-# By default we build on Ubuntu 20 to compile against an older version of glibc.
-ARG BASE_IMAGE="hexpm/elixir:1.15.8-erlang-24.3.4.17-ubuntu-focal-20240427"
+ARG BASE_IMAGE="hexpm/elixir:1.18.4-erlang-28.1-ubuntu-noble-20250910"
 
 # Pre-stages for base image variants
 
@@ -41,7 +40,7 @@ RUN apt-get update && apt-get install -y --no-install-recommends ca-certificates
   apt-get install -y rocm-dev rocm-libs && \
   apt-get clean -y && rm -rf /var/lib/apt/lists/*
 
-ENV ROCM_PATH "/opt/rocm-${ROCM_VERSION}.0"
+ENV ROCM_PATH "/opt/rocm-${ROCM_VERSION}"
 
 FROM base-${VARIANT}
 
@@ -73,7 +72,7 @@ ENV USE_BAZEL_VERSION=7.4.1
 # Install Python and the necessary global dependencies
 RUN apt-get update && apt-get install -y python3 python3-pip && \
   ln -s /usr/bin/python3 /usr/bin/python && \
-  python -m pip install --upgrade pip numpy && \
+  python -m pip install --break-system-packages --upgrade numpy && \
   apt-get clean -y && rm -rf /var/lib/apt/lists/*
 
 # Setup project files
diff --git a/builds/build.sh b/builds/build.sh
index 0ef386a..b1eafde 100755
--- a/builds/build.sh
+++ b/builds/build.sh
@@ -44,7 +44,7 @@ case "$target" in
   "rocm")
     docker build -t xla-rocm -f builds/Dockerfile \
       --build-arg VARIANT=rocm \
-      --build-arg ROCM_VERSION=6.0 \
+      --build-arg ROCM_VERSION=7.0.1 \
       --build-arg XLA_TARGET=rocm \
       .
   ;;

I've been struggling to get past this error and others like it - always the same problem, sometimes a different package. It seems that in some nested bazel context, gcc is being used instead of clang and it blows up:

ERROR: /root/.cache/bazel/_bazel_root/77031b6b54d069fa14d9031c964d5f8f/external/com_google_absl/absl/base/BUILD.bazel:53:11: Compiling absl/base/log_severity.cc failed: (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing CppCompile command (from target @@com_google_absl//absl/base:log_severity) external/local_config_rocm/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer ... (remaining 50 arguments skipped)
gcc: error: unrecognized command-line option ‘-Qunused-arguments’

This is the full build output:

~/xla/builds$ ./build.sh rocm

[3/4] STEP 1/5: FROM hexpm/elixir:1.18.4-erlang-28.1-ubuntu-noble-20250910 AS base-rocm
[3/4] STEP 2/5: ARG ROCM_VERSION
--> Using cache 21e24e0eae92a1421ff0c9e675f893f42d44d4abd1d4ae1efec39ee6cfbfaf6f
--> 21e24e0eae92
[3/4] STEP 3/5: ARG DEBIAN_FRONTEND=noninteractive
--> Using cache fa2602493146f93990bcedfa8e49b15e2d6caaf2399cda5c66b077583f09e048
--> fa2602493146
[3/4] STEP 4/5: RUN apt-get update && apt-get install -y --no-install-recommends ca-certificates curl gnupg &&   distro="$(. /etc/lsb-release && echo "$DISTRIB_CODENAME")" &&   curl -sL https://repo.radeon.com/rocm/rocm.gpg.key | apt-key add - &&   echo 
"deb [arch=amd64] https://repo.radeon.com/rocm/apt/${ROCM_VERSION}/ $distro main" | tee /etc/apt/sources.list.d/rocm.list &&   printf 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600\n' | tee /etc/apt/preferences.d/rocm-pin-600 &&   apt-get
 update &&   apt-get install -y rocm-dev rocm-libs &&   apt-get clean -y && rm -rf /var/lib/apt/lists/*
--> Using cache eb1ecf97b94fbc780cafec7b776e9718d4fea8e50aa80152e582a3127910975b
--> eb1ecf97b94f
[3/4] STEP 5/5: ENV ROCM_PATH "/opt/rocm-${ROCM_VERSION}"
--> Using cache 6faa8af1d456dcfe49899ef7c7a8f78d5fe4942d7d7dae2e8982f3160898675f
--> 6faa8af1d456
[4/4] STEP 1/18: FROM 6faa8af1d456dcfe49899ef7c7a8f78d5fe4942d7d7dae2e8982f3160898675f
[4/4] STEP 2/18: ENV LC_ALL=C.UTF-8
--> Using cache 2d6cc23ec5ab67be303f3c48c1370ffe2e0e6f2898218060e6a73a7a27dded34
--> 2d6cc23ec5ab
[4/4] STEP 3/18: ARG DEBIAN_FRONTEND=noninteractive
--> Using cache c3372a81144a819d69d47ac1b757d856850adef77bc14e933b4386fe8c5c4ae4
--> c3372a81144a
[4/4] STEP 4/18: RUN apt-get update &&   apt-get update && apt-get install -y ca-certificates curl git unzip wget &&   clang_version="18" &&   apt-get install -y wget gnupg software-properties-common lsb-release &&   wget -qO- https://apt.llvm.org/llvm.s
h | bash -s -- $clang_version &&   update-alternatives --install /usr/bin/clang clang /usr/bin/clang-$clang_version 100 &&   update-alternatives --install /usr/bin/clang++ clang++ /usr/bin/clang++-$clang_version 100 &&   apt-get clean -y && rm -rf /var/l
ib/apt/lists/*
--> Using cache df0ba7255559c0435c250c204fe4c061234e41486643a973906e2d8777f9c447
--> df0ba7255559
[4/4] STEP 5/18: RUN wget -O bazel "https://github.com/bazelbuild/bazelisk/releases/download/v1.26.0/bazelisk-linux-$(dpkg --print-architecture)" &&   chmod +x bazel &&   mv bazel /usr/local/bin/bazel
--> Using cache 66f60f0af52a7fa16f26edca3d4bd54a1e1671dcda95ac7842d5400ccf2604a6
--> 66f60f0af52a
[4/4] STEP 6/18: ENV USE_BAZEL_VERSION=7.4.1
--> Using cache f709ad0de0c1ded69241a10bf49950a9898f5d40dec005493cadcb6a312c810b
--> f709ad0de0c1
[4/4] STEP 7/18: RUN apt-get update && apt-get install -y python3 python3-pip &&   ln -s /usr/bin/python3 /usr/bin/python &&   python -m pip install --break-system-packages --upgrade numpy &&   apt-get clean -y && rm -rf /var/lib/apt/lists/*
--> Using cache e809dbe4f3f99e5afc1ab6715b3de4cd0ace3811f34ad41b89095b7aeda6b72b
--> e809dbe4f3f9
[4/4] STEP 8/18: WORKDIR /xla
--> Using cache 657b6f64c18e6b2e941b5faec14c71f6ff495bf62db029377d03a38bf216dede
--> 657b6f64c18e
[4/4] STEP 9/18: ARG XLA_TARGET
--> Using cache 6fdf59978609e2d14ddd281fd714fbdb4d4688101f38248112a91d2a75d157dc
--> 6fdf59978609
[4/4] STEP 10/18: ENV XLA_TARGET=${XLA_TARGET}
--> Using cache f3f030b163727a76554f7cc2ff4dde35f3eb829e2cba20135f2f7f53fd1b65fe
--> f3f030b16372
[4/4] STEP 11/18: ENV XLA_CACHE_DIR=/cache
--> Using cache 6a2cd45f69c00a9d837da0932869fdfd6261dfff99e6f06d473048e261e57796
--> 6a2cd45f69c0
[4/4] STEP 12/18: ENV XLA_BUILD=true
--> Using cache 7058122124cc9fdbbe67e362895434bd4d3e602b63d75c8671e59bf35b7a8688
--> 7058122124cc
[4/4] STEP 13/18: COPY mix.exs mix.lock ./
--> Using cache cd64a721a1680f45a1fee53b75bc6ff8e9385310530919aa5ec196bcd570c619
--> cd64a721a168
[4/4] STEP 14/18: RUN mix deps.get
--> Using cache a70a744706d337a60c7e89e5006ef70bb60b81c916ed268535d9224cc8a75720
--> a70a744706d3
[4/4] STEP 15/18: COPY lib lib
--> Using cache ff64515b45c41d46cf1ed3af06c752eb54aac51b1468c857369b7317e68fa2a5
--> ff64515b45c4
[4/4] STEP 16/18: COPY README.md Makefile ./
--> Using cache e57a9d9800f12677857e2f588567a5047153c069e7a3d68396d0e10fce5ab34d
--> e57a9d9800f1
[4/4] STEP 17/18: COPY extension extension
--> Using cache 43f422e50bdfaf34fd3acb8264650f5584872389294dd5853daee8da8d8b1e89
--> 43f422e50bdf
[4/4] STEP 18/18: CMD [ "mix", "compile" ]
--> Using cache 4f13af52135bd862d28d2d0cb0e318634d2078114eebcea775f924db26903ef9
[4/4] COMMIT xla-rocm
--> 4f13af52135b
Successfully tagged localhost/xla-rocm:latest
4f13af52135bd862d28d2d0cb0e318634d2078114eebcea775f924db26903ef9
==> earmark_parser
Compiling 2 files (.xrl)
Compiling 1 file (.yrl)
Compiling 3 files (.erl)
Compiling 46 files (.ex)
    warning: Tuple.append/2 is deprecated. Use insert_at instead
    │
 65 │     tag_tpl |> Tuple.append(Enum.reverse(lines)) |> Tuple.append(@verbatim)
    │                      ~
    │
    └─ lib/earmark_parser/helpers/html_parser.ex:65:22: EarmarkParser.Helpers.HtmlParser._parse_rest/3
    └─ lib/earmark_parser/helpers/html_parser.ex:65:59: EarmarkParser.Helpers.HtmlParser._parse_rest/3
    └─ lib/earmark_parser/helpers/html_parser.ex:69:39: EarmarkParser.Helpers.HtmlParser._parse_rest/3
    └─ lib/earmark_parser/helpers/html_parser.ex:69:88: EarmarkParser.Helpers.HtmlParser._parse_rest/3
    └─ lib/earmark_parser/helpers/html_parser.ex:70:39: EarmarkParser.Helpers.HtmlParser._parse_rest/3
    └─ lib/earmark_parser/helpers/html_parser.ex:70:76: EarmarkParser.Helpers.HtmlParser._parse_rest/3
    └─ lib/earmark_parser/helpers/html_parser.ex:71:40: EarmarkParser.Helpers.HtmlParser._parse_rest/3
    └─ lib/earmark_parser/helpers/html_parser.ex:71:77: EarmarkParser.Helpers.HtmlParser._parse_rest/3

Generated earmark_parser app
==> elixir_make
Compiling 1 file (.ex)
Generated elixir_make app
==> nimble_parsec
Compiling 4 files (.ex)
Generated nimble_parsec app
==> makeup
Compiling 15 files (.ex)
Generated makeup app
==> makeup_elixir
Compiling 6 files (.ex)
Generated makeup_elixir app
==> makeup_erlang
Compiling 4 files (.ex)
Generated makeup_erlang app
==> ex_doc
Compiling 26 files (.ex)
Generated ex_doc app
==> xla
Compiling 5 files (.ex)
Generated xla app
rm -f /root/.cache/xla_build/xla-870d90fd098c480fb8a426126bd02047adb2bc20/xla/extension && \
        ln -s "/xla/extension" /root/.cache/xla_build/xla-870d90fd098c480fb8a426126bd02047adb2bc20/xla/extension && \
        cd /root/.cache/xla_build/xla-870d90fd098c480fb8a426126bd02047adb2bc20 && \
        bazel build --define "framework_shared_object=false" -c opt   --config=rocm --action_env=HIP_PLATFORM=hcc --action_env=TF_ROCM_AMDGPU_TARGETS="gfx900,gfx906,gfx908,gfx90a,gfx940,gfx941,gfx942,gfx1030,gfx1100,gfx1200,gfx1201" --repo_env=CC=clang --repo_env=CXX=clang++ --copt=-Wno-error=unused-command-line-argument --copt=-Wno-gnu-offsetof-extensions --copt=-Qunused-arguments --copt=-Wno-error=c23-extensions //xla/extension:xla_extension && \
        mkdir -p /cache/0.9.1/build/ && \
        cp -f /root/.cache/xla_build/xla-870d90fd098c480fb8a426126bd02047adb2bc20/bazel-bin/xla/extension/xla_extension.tar.gz /cache/0.9.1/build/xla_extension-0.9.1-x86_64-linux-gnu-rocm.tar.gz
Starting local Bazel server and connecting to it...
INFO: Reading 'startup' options from /root/.cache/xla_build/xla-870d90fd098c480fb8a426126bd02047adb2bc20/tensorflow.bazelrc: --windows_enable_symlinks
INFO: Options provided by the client:
  Inherited 'common' options: --isatty=0 --terminal_columns=80
INFO: Reading rc options for 'build' from /root/.cache/xla_build/xla-870d90fd098c480fb8a426126bd02047adb2bc20/.bazelrc:
  Inherited 'common' options: --noenable_bzlmod --noincompatible_enable_cc_toolchain_resolution
INFO: Reading rc options for 'build' from /root/.cache/xla_build/xla-870d90fd098c480fb8a426126bd02047adb2bc20/tensorflow.bazelrc:
  Inherited 'common' options: --announce_rc --experimental_cc_shared_library --experimental_link_static_libraries_once=false --incompatible_enforce_config_setting_visibility --experimental_repo_remote_exec
INFO: Reading rc options for 'build' from /root/.cache/xla_build/xla-870d90fd098c480fb8a426126bd02047adb2bc20/tensorflow.bazelrc:
  'build' options: --define framework_shared_object=true --define tsl_protobuf_header_only=true --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone -c opt --define=grpc_no_ares=true --noincompatible_remove_legacy_whole_archive --features=-force_no_whole_archive --enable_platform_specific_config --config=short_logs --@rules_python//python/config_settings:precompile=force_disabled
INFO: Found applicable config definition build:short_logs in file /root/.cache/xla_build/xla-870d90fd098c480fb8a426126bd02047adb2bc20/tensorflow.bazelrc: --output_filter=DONT_MATCH_ANYTHING
INFO: Found applicable config definition build:rocm in file /root/.cache/xla_build/xla-870d90fd098c480fb8a426126bd02047adb2bc20/tensorflow.bazelrc: --config=rocm_base
INFO: Found applicable config definition build:rocm_base in file /root/.cache/xla_build/xla-870d90fd098c480fb8a426126bd02047adb2bc20/tensorflow.bazelrc: --copt=-Wno-gnu-offsetof-extensions --crosstool_top=@local_config_rocm//crosstool:toolchain --define=using_rocm_hipcc=true --define=tensorflow_mkldnn_contraction_kernel=0 --define=xnn_enable_avxvnniint8=false --define=xnn_enable_avx512fp16=false --repo_env TF_NEED_ROCM=1
INFO: Found applicable config definition build:linux in file /root/.cache/xla_build/xla-870d90fd098c480fb8a426126bd02047adb2bc20/tensorflow.bazelrc: --host_copt=-w --define=PREFIX=/usr --define=LIBDIR=$(PREFIX)/lib --define=INCLUDEDIR=$(PREFIX)/include --define=PROTOBUF_INCLUDE_PATH=$(PREFIX)/include --cxxopt=-std=c++17 --host_cxxopt=-std=c++17 --experimental_guard_against_concurrent_changes
Computing main repo mapping: 
DEBUG: /root/.cache/xla_build/xla-870d90fd098c480fb8a426126bd02047adb2bc20/third_party/py/python_repo.bzl:156:14: 
HERMETIC_PYTHON_VERSION variable was not set correctly, using default version.
Python 3.11 will be used.
To select Python version, either set HERMETIC_PYTHON_VERSION env variable in
your shell:
  export HERMETIC_PYTHON_VERSION=3.12
OR pass it as an argument to bazel command directly or inside your .bazelrc
file:
  --repo_env=HERMETIC_PYTHON_VERSION=3.12
DEBUG: /root/.cache/xla_build/xla-870d90fd098c480fb8a426126bd02047adb2bc20/third_party/py/python_repo.bzl:87:10: 
=============================
Hermetic Python configuration:
Version: "3.11"
Kind: ""
Interpreter: "default" (provided by rules_python)
Requirements_lock label: "@//:requirements_lock_3_11.txt"
=====================================
Loading: 
Loading: 1 packages loaded
Analyzing: target //xla/extension:xla_extension (2 packages loaded, 0 targets configured)
Analyzing: target //xla/extension:xla_extension (2 packages loaded, 0 targets configured)

Analyzing: target //xla/extension:xla_extension (272 packages loaded, 21646 targets configured)

INFO: Analyzed target //xla/extension:xla_extension (283 packages loaded, 37505 targets configured).
[1 / 1] no actions running
ERROR: /root/.cache/bazel/_bazel_root/77031b6b54d069fa14d9031c964d5f8f/external/com_google_absl/absl/base/BUILD.bazel:53:11: Compiling absl/base/log_severity.cc failed: (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing CppCompile command (from target @@com_google_absl//absl/base:log_severity) external/local_config_rocm/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer ... (remaining 50 arguments skipped)
gcc: error: unrecognized command-line option ‘-Qunused-arguments’
Target //xla/extension:xla_extension failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 3.900s, Critical Path: 0.22s
INFO: 52 processes: 51 internal, 1 local.
ERROR: Build did NOT complete successfully
make: *** [Makefile:24: /cache/0.9.1/build/xla_extension-0.9.1-x86_64-linux-gnu-rocm.tar.gz] Error 1
** (Mix) Could not compile with "make" (exit status: 2).
You need to have gcc and make installed. If you are using
Ubuntu or any other Debian-based system, install the packages
"build-essential". Also install "erlang-dev" package if not
included in your Erlang/OTP version. If you're on Fedora, run
"dnf group install 'Development Tools'".

Perhaps there's some obvious fix for this, and if not perhaps I should try going the Torchx direction?

Cheers

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions