Skip to content

Support for ROCM 6 #82

@jalberto

Description

@jalberto

It seems ROCM 5.6 kind of works, but it really requires too much back and forth to have everything working, the new Fedora 40 brings official ROCM support but starting in ROCM 6.

I am using this config from #63

Mix.install(
  [
    {:web_driver_client, "~> 0.2.0"},
    {:kino, "~> 0.12.3"},
    {:req, "~> 0.4.14"},
    {:erlexec, "~> 2.0"},
    {:nx, github: "elixir-nx/nx", sparse: "nx", override: true},
    {:exla, github: "elixir-nx/nx", sparse: "exla", override: true}
  ],
  system_env: %{
    "XLA_ARCHIVE_URL" =>
      "https://static.jonatanklosko.com/builds/0.6.0/xla_extension-x86_64-linux-gnu-rocm.tar.gz",
    "ROCM_PATH" => "/usr/lib64/rocm/"
  },
  config: [nx: [default_backend: {EXLA.Backend, client: :host}]]

I managed to find every pkgs it was asking for (this took a while of back and forth) until I reached this:

18:36:37.767 [warning] The on_load function for module Elixir.EXLA.NIF returned:
{:error,
 {:load_failed,
  ~c"Failed to load NIF library /home/ja/.cache/mix/installs/elixir-1.16.2-erts-14.2.5/f3927a87654a1bf097d7e31b6277a9f8/_build/dev/lib/exla/priv/libexla: 'librocblas.so.3: cannot open shared object file: No such file or directory'"}}

My guess is xla_extension needs to be built for rocm 7 (librocblas.s0.4), I tried to build it myself but the requirements are too way off the current system (gcc versions and so on)

Will be great if there were official xla binaries for different ROCM versions, as there are for CUDA.

I understand ROCM support is in low priority, but it is really nice for start in AI as it works nicely in linux

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions