Multi-backend support (ROCm and MUSA) by yeahdongcn · Pull Request #21 · Dao-AILab/fast-hadamard-transform

yeahdongcn · 2025-10-10T03:03:59Z

This PR was inspired by #11 and extends it to support multiple backends, including ROCm and MUSA.

Build/install/unit tests all passed on ROCm and MUSA. Please see the logs below for more information.

Testing Done

ROCm 7.0.0 + Torch 2.9.0a0+git7bcbafe

root@4ba3a2f2f1b5:/sgl-workspace/fast-hadamard-transform# python setup.py install


torch.__version__  = 2.9.0a0+git7bcbafe


/sgl-workspace/fast-hadamard-transform/csrc/vendor.h -> /sgl-workspace/fast-hadamard-transform/csrc/vendor_hip.h [skipped, already hipified]
/sgl-workspace/fast-hadamard-transform/csrc/fast_hadamard_transform.h -> /sgl-workspace/fast-hadamard-transform/csrc/fast_hadamard_transform.h [skipped, no changes]
/sgl-workspace/fast-hadamard-transform/csrc/fast_hadamard_transform.cpp -> /sgl-workspace/fast-hadamard-transform/csrc/fast_hadamard_transform_hip.cpp [skipped, already hipified]
/sgl-workspace/fast-hadamard-transform/csrc/fast_hadamard_transform_common.h -> /sgl-workspace/fast-hadamard-transform/csrc/fast_hadamard_transform_common_hip.h [skipped, already hipified]
/sgl-workspace/fast-hadamard-transform/csrc/fast_hadamard_transform_special.h -> /sgl-workspace/fast-hadamard-transform/csrc/fast_hadamard_transform_special.h [skipped, no changes]
/sgl-workspace/fast-hadamard-transform/csrc/static_switch.h -> /sgl-workspace/fast-hadamard-transform/csrc/static_switch.h [skipped, no changes]
/sgl-workspace/fast-hadamard-transform/csrc/fast_hadamard_transform_gpu.cu -> /sgl-workspace/fast-hadamard-transform/csrc/fast_hadamard_transform_gpu.hip [skipped, already hipified]
Successfully preprocessed all matching files.
Total number of unsupported CUDA function calls: 0


Total number of replaced kernel launches: 5
/opt/venv/lib/python3.10/site-packages/setuptools/dist.py:759: SetuptoolsDeprecationWarning: License classifiers are deprecated.
!!

        ********************************************************************************
        Please consider removing the following classifiers in favor of a SPDX license expression:

        License :: OSI Approved :: BSD License

        See https://packaging.python.org/en/latest/guides/writing-pyproject-toml/#license for details.
        ********************************************************************************

!!
  self._finalize_license_expression()
running install
/opt/venv/lib/python3.10/site-packages/setuptools/_distutils/cmd.py:90: SetuptoolsDeprecationWarning: setup.py install is deprecated.
!!

        ********************************************************************************
        Please avoid running ``setup.py`` directly.
        Instead, use pypa/build, pypa/installer or other
        standards-based tools.

        See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.
        ********************************************************************************

!!
  self.initialize_options()
/opt/venv/lib/python3.10/site-packages/setuptools/_distutils/cmd.py:90: EasyInstallDeprecationWarning: easy_install command is deprecated.
!!

        ********************************************************************************
        Please avoid running ``setup.py`` and ``easy_install``.
        Instead, use pypa/build, pypa/installer or other
        standards-based tools.

        See https://github.com/pypa/setuptools/issues/917 for details.
        ********************************************************************************

!!
  self.initialize_options()
running bdist_egg
running egg_info
writing fast_hadamard_transform.egg-info/PKG-INFO
writing dependency_links to fast_hadamard_transform.egg-info/dependency_links.txt
writing requirements to fast_hadamard_transform.egg-info/requires.txt
writing top-level names to fast_hadamard_transform.egg-info/top_level.txt
adding license file 'LICENSE'
adding license file 'AUTHORS'
writing manifest file 'fast_hadamard_transform.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
copying fast_hadamard_transform/__init__.py -> build/lib.linux-x86_64-cpython-310/fast_hadamard_transform
copying fast_hadamard_transform/fast_hadamard_transform_interface.py -> build/lib.linux-x86_64-cpython-310/fast_hadamard_transform
running build_ext
building 'fast_hadamard_transform_cuda' extension
ninja: no work to do.
x86_64-linux-gnu-g++ -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -shared -Wl,-O1 -Wl,-Bsymbolic-functions /sgl-workspace/fast-hadamard-transform/build/temp.linux-x86_64-cpython-310/csrc/fast_hadamard_transform_gpu.o /sgl-workspace/fast-hadamard-transform/build/temp.linux-x86_64-cpython-310/csrc/fast_hadamard_transform_hip.o -L/opt/venv/lib/python3.10/site-packages/torch/lib -L/opt/rocm/lib -L/opt/rocm/hip/lib -L/usr/lib/x86_64-linux-gnu -lc10 -ltorch -ltorch_cpu -ltorch_python -lamdhip64 -lc10_hip -ltorch_hip -o build/lib.linux-x86_64-cpython-310/fast_hadamard_transform_cuda.cpython-310-x86_64-linux-gnu.so
creating build/bdist.linux-x86_64/egg
copying build/lib.linux-x86_64-cpython-310/fast_hadamard_transform_cuda.cpython-310-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/egg
creating build/bdist.linux-x86_64/egg/fast_hadamard_transform
copying build/lib.linux-x86_64-cpython-310/fast_hadamard_transform/__init__.py -> build/bdist.linux-x86_64/egg/fast_hadamard_transform
copying build/lib.linux-x86_64-cpython-310/fast_hadamard_transform/fast_hadamard_transform_interface.py -> build/bdist.linux-x86_64/egg/fast_hadamard_transform
byte-compiling build/bdist.linux-x86_64/egg/fast_hadamard_transform/__init__.py to __init__.cpython-310.pyc
byte-compiling build/bdist.linux-x86_64/egg/fast_hadamard_transform/fast_hadamard_transform_interface.py to fast_hadamard_transform_interface.cpython-310.pyc
creating stub loader for fast_hadamard_transform_cuda.cpython-310-x86_64-linux-gnu.so
byte-compiling build/bdist.linux-x86_64/egg/fast_hadamard_transform_cuda.py to fast_hadamard_transform_cuda.cpython-310.pyc
creating build/bdist.linux-x86_64/egg/EGG-INFO
copying fast_hadamard_transform.egg-info/PKG-INFO -> build/bdist.linux-x86_64/egg/EGG-INFO
copying fast_hadamard_transform.egg-info/SOURCES.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying fast_hadamard_transform.egg-info/dependency_links.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying fast_hadamard_transform.egg-info/requires.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying fast_hadamard_transform.egg-info/top_level.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
writing build/bdist.linux-x86_64/egg/EGG-INFO/native_libs.txt
zip_safe flag not set; analyzing archive contents...
__pycache__.fast_hadamard_transform_cuda.cpython-310: module references __file__
creating 'dist/fast_hadamard_transform-1.0.4.post1-py3.10-linux-x86_64.egg' and adding 'build/bdist.linux-x86_64/egg' to it
removing 'build/bdist.linux-x86_64/egg' (and everything under it)
Processing fast_hadamard_transform-1.0.4.post1-py3.10-linux-x86_64.egg
removing '/opt/venv/lib/python3.10/site-packages/fast_hadamard_transform-1.0.4.post1-py3.10-linux-x86_64.egg' (and everything under it)
creating /opt/venv/lib/python3.10/site-packages/fast_hadamard_transform-1.0.4.post1-py3.10-linux-x86_64.egg
Extracting fast_hadamard_transform-1.0.4.post1-py3.10-linux-x86_64.egg to /opt/venv/lib/python3.10/site-packages
Adding fast-hadamard-transform 1.0.4.post1 to easy-install.pth file

Installed /opt/venv/lib/python3.10/site-packages/fast_hadamard_transform-1.0.4.post1-py3.10-linux-x86_64.egg
Processing dependencies for fast-hadamard-transform==1.0.4.post1
Searching for ninja==1.13.0
Best match: ninja 1.13.0
Adding ninja 1.13.0 to easy-install.pth file

Using /opt/venv/lib/python3.10/site-packages
Searching for packaging==25.0
Best match: packaging 25.0
Adding packaging 25.0 to easy-install.pth file

Using /opt/venv/lib/python3.10/site-packages
Searching for torch==2.9.0a0+git7bcbafe
Best match: torch 2.9.0a0+git7bcbafe
Adding torch 2.9.0a0+git7bcbafe to easy-install.pth file
Installing torchfrtrace script to /opt/venv/bin
Installing torchrun script to /opt/venv/bin

Using /opt/venv/lib/python3.10/site-packages
Searching for fsspec==2025.3.0
Best match: fsspec 2025.3.0
Adding fsspec 2025.3.0 to easy-install.pth file

Using /opt/venv/lib/python3.10/site-packages
Searching for jinja2==3.1.6
Best match: jinja2 3.1.6
Adding jinja2 3.1.6 to easy-install.pth file

Using /opt/venv/lib/python3.10/site-packages
Searching for networkx==2.8.8
Best match: networkx 2.8.8
Adding networkx 2.8.8 to easy-install.pth file

Using /opt/venv/lib/python3.10/site-packages
Searching for sympy==1.13.3
Best match: sympy 1.13.3
Adding sympy 1.13.3 to easy-install.pth file
Installing isympy script to /opt/venv/bin

Using /opt/venv/lib/python3.10/site-packages
Searching for typing-extensions==4.14.1
Best match: typing-extensions 4.14.1
Adding typing-extensions 4.14.1 to easy-install.pth file

Using /opt/venv/lib/python3.10/site-packages
Searching for filelock==3.19.1
Best match: filelock 3.19.1
Adding filelock 3.19.1 to easy-install.pth file

Using /opt/venv/lib/python3.10/site-packages
Searching for MarkupSafe==3.0.2
Best match: MarkupSafe 3.0.2
Adding MarkupSafe 3.0.2 to easy-install.pth file

Using /opt/venv/lib/python3.10/site-packages
Searching for mpmath==1.3.0
Best match: mpmath 1.3.0
Adding mpmath 1.3.0 to easy-install.pth file

Using /opt/venv/lib/python3.10/site-packages
Finished processing dependencies for fast-hadamard-transform==1.0.4.post1
root@4ba3a2f2f1b5:/sgl-workspace/fast-hadamard-transform# cd tests
root@4ba3a2f2f1b5:/sgl-workspace/fast-hadamard-transform/tests# pytest test_fast_hadamard_transform.py
=================================================================== test session starts ====================================================================
platform linux -- Python 3.10.12, pytest-8.4.1, pluggy-1.6.0
rootdir: /sgl-workspace/fast-hadamard-transform
plugins: langsmith-0.4.31, anyio-4.10.0, asyncio-1.1.0, subtests-0.13.1, xdoctest-1.1.0, flakefinder-1.1.0, hypothesis-5.35.1, shard-0.1.2, xdist-3.3.1, cpp-2.3.0, rerunfailures-14.0
asyncio: mode=strict, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 51 items
Running 51 items in this shard

test_fast_hadamard_transform.py ...................................................                                                                  [100%]

============================================================== 51 passed in 71.87s (0:01:11) ===============================================================
root@4ba3a2f2f1b5:/sgl-workspace/fast-hadamard-transform/tests#

MUSA 4.3.0 + Torch 2.5.0

root@worker3218:/ws# python setup.py install
/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py:61: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
  import pynvml  # type: ignore[import]


torch.__version__  = 2.5.0


running install
/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.
!!

        ********************************************************************************
        Please avoid running ``setup.py`` directly.
        Instead, use pypa/build, pypa/installer or other
        standards-based tools.

        See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.
        ********************************************************************************

!!
  self.initialize_options()
/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/cmd.py:66: EasyInstallDeprecationWarning: easy_install command is deprecated.
!!

        ********************************************************************************
        Please avoid running ``setup.py`` and ``easy_install``.
        Instead, use pypa/build, pypa/installer or other
        standards-based tools.

        See https://github.com/pypa/setuptools/issues/917 for details.
        ********************************************************************************

!!
  self.initialize_options()
running bdist_egg
running egg_info
writing fast_hadamard_transform.egg-info/PKG-INFO
writing dependency_links to fast_hadamard_transform.egg-info/dependency_links.txt
writing requirements to fast_hadamard_transform.egg-info/requires.txt
writing top-level names to fast_hadamard_transform.egg-info/top_level.txt
adding license file 'LICENSE'
adding license file 'AUTHORS'
writing manifest file 'fast_hadamard_transform.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
copying fast_hadamard_transform/fast_hadamard_transform_interface.py -> build/lib.linux-x86_64-cpython-310/fast_hadamard_transform
copying fast_hadamard_transform/__init__.py -> build/lib.linux-x86_64-cpython-310/fast_hadamard_transform
running build_ext
building 'fast_hadamard_transform_cuda' extension
Emitting ninja build file /ws/build/temp.linux-x86_64-cpython-310/build.ninja...
Compiling objects...
Using envvar MAX_JOBS (128) as the number of workers...
[1/2] /usr/local/musa/bin/mcc -x musa -MMD -MF /ws/build/temp.linux-x86_64-cpython-310/csrc/fast_hadamard_transform.o.d -I/ws -I/usr/local/musa/include -I/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/aten/src -I/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include -I/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch_musa/share/torch_musa_codegen -I/usr/local/lib/python3.10/dist-packages -I/usr/local/musa/include -I/usr/include/python3.10 -c -c /ws/csrc/fast_hadamard_transform.cpp -o /ws/build/temp.linux-x86_64-cpython-310/csrc/fast_hadamard_transform.o -fPIC -O3 -fPIC -std=c++17 -x musa -mtgpu --cuda-gpu-arch=mp_31 -fno-strict-aliasing -ffast-math -Od3 -fmusa-flush-denormals-to-zero -DUSE_MUSA=1 --offload-arch=mp_31 -march=native -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=fast_hadamard_transform_cuda -D_GLIBCXX_USE_CXX11_ABI=1
[2/2] /usr/local/musa/bin/mcc -x musa -MMD -MF /ws/build/temp.linux-x86_64-cpython-310/csrc/fast_hadamard_transform_gpu.o.d -I/ws -I/usr/local/musa/include -I/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/aten/src -I/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include -I/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch_musa/share/torch_musa_codegen -I/usr/local/lib/python3.10/dist-packages -I/usr/local/musa/include -I/usr/include/python3.10 -c -c /ws/csrc/fast_hadamard_transform_gpu.cu -o /ws/build/temp.linux-x86_64-cpython-310/csrc/fast_hadamard_transform_gpu.o -fPIC -O3 -fPIC -std=c++17 -x musa -mtgpu --cuda-gpu-arch=mp_31 -fno-strict-aliasing -ffast-math -Od3 -fmusa-flush-denormals-to-zero -DUSE_MUSA=1 --offload-arch=mp_31 -march=native -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=fast_hadamard_transform_cuda -D_GLIBCXX_USE_CXX11_ABI=1
In file included from /ws/csrc/fast_hadamard_transform_gpu.cu:15:
In file included from /ws/csrc/vendor.h:24:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/csrc/core/MUSAGuard.h:4:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/impl/InlineDeviceGuard.h:8:
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/impl/DeviceGuardImplInterface.h:114:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/impl/DeviceGuardImplInterface.h:123:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/impl/DeviceGuardImplInterface.h:133:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/impl/DeviceGuardImplInterface.h:182:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/impl/DeviceGuardImplInterface.h:197:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/impl/DeviceGuardImplInterface.h:231:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
In file included from /ws/csrc/fast_hadamard_transform_gpu.cu:15:
In file included from /ws/csrc/vendor.h:24:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/csrc/core/MUSAGuard.h:7:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/csrc/core/GuardImpl.h:5:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/impl/GPUTrace.h:3:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/impl/PyInterpreter.h:5:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/Layout.h:3:
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/Backend.h:143:1: warning: non-void function does not return a value in all control paths [-Wreturn-type]
}
^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/Backend.h:282:1: warning: non-void function does not return a value in all control paths [-Wreturn-type]
}
^
In file included from /ws/csrc/fast_hadamard_transform_gpu.cu:15:
In file included from /ws/csrc/vendor.h:24:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/csrc/core/MUSAGuard.h:7:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/csrc/core/GuardImpl.h:5:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/impl/GPUTrace.h:3:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/impl/PyInterpreter.h:5:
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/Layout.h:76:1: warning: non-void function does not return a value in all control paths [-Wreturn-type]
}
^
In file included from /ws/csrc/fast_hadamard_transform_gpu.cu:15:
In file included from /ws/csrc/vendor.h:24:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/csrc/core/MUSAGuard.h:7:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/csrc/core/GuardImpl.h:5:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/impl/GPUTrace.h:3:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/impl/PyInterpreter.h:6:
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/MemoryFormat.h:61:1: warning: non-void function does not return a value in all control paths [-Wreturn-type]
}
^
In file included from /ws/csrc/fast_hadamard_transform_gpu.cu:15:
In file included from /ws/csrc/vendor.h:24:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/csrc/core/MUSAGuard.h:7:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/csrc/core/GuardImpl.h:5:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/impl/GPUTrace.h:3:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/impl/PyInterpreter.h:7:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymIntArrayRef.h:3:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymInt.h:3:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymBool.h:3:
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:35:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:38:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:41:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:47:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:50:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:53:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:57:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:67:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:77:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:83:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:86:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:89:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:92:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:95:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:98:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:101:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:104:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:107:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:110:3: warning: non-void function does not return a value [-Wreturn-type]
  };
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:113:3: warning: non-void function does not return a value [-Wreturn-type]
  };
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:116:3: warning: non-void function does not return a value [-Wreturn-type]
  };
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:119:3: warning: non-void function does not return a value [-Wreturn-type]
  };
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:122:3: warning: non-void function does not return a value [-Wreturn-type]
  };
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:125:3: warning: non-void function does not return a value [-Wreturn-type]
  };
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:128:3: warning: non-void function does not return a value [-Wreturn-type]
  };
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:134:3: warning: non-void function does not return a value [-Wreturn-type]
  };
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:139:3: warning: non-void function does not return a value [-Wreturn-type]
  };
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:144:3: warning: non-void function does not return a value [-Wreturn-type]
  };
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:149:3: warning: non-void function does not return a value [-Wreturn-type]
  };
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:154:3: warning: non-void function does not return a value [-Wreturn-type]
  };
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:159:3: warning: non-void function does not return a value [-Wreturn-type]
  };
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:162:3: warning: non-void function does not return a value [-Wreturn-type]
  };
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:165:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:168:3: warning: non-void function does not return a value [-Wreturn-type]
  };
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:171:3: warning: non-void function does not return a value [-Wreturn-type]
  };
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:174:3: warning: non-void function does not return a value [-Wreturn-type]
  };
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:177:3: warning: non-void function does not return a value [-Wreturn-type]
  };
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:180:3: warning: non-void function does not return a value [-Wreturn-type]
  };
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:183:3: warning: non-void function does not return a value [-Wreturn-type]
  };
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:201:3: warning: non-void function does not return a value [-Wreturn-type]
  };
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:204:3: warning: non-void function does not return a value [-Wreturn-type]
  };
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:207:3: warning: non-void function does not return a value [-Wreturn-type]
  };
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:210:3: warning: non-void function does not return a value [-Wreturn-type]
  };
  ^
In file included from /ws/csrc/fast_hadamard_transform_gpu.cu:15:
In file included from /ws/csrc/vendor.h:24:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/csrc/core/MUSAGuard.h:7:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/csrc/core/GuardImpl.h:9:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/csrc/aten/utils/Utils.h:4:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/ATen/Dispatch.h:3:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/ATen/core/DeprecatedTypeProperties.h:4:
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/ScalarType.h:343:1: warning: non-void function does not return a value in all control paths [-Wreturn-type]
}
^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/ScalarType.h:486:1: warning: non-void function does not return a value in all control paths [-Wreturn-type]
}
^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/ScalarType.h:526:1: warning: non-void function does not return a value in all control paths [-Wreturn-type]
}
^
In file included from /ws/csrc/fast_hadamard_transform_gpu.cu:15:
In file included from /ws/csrc/vendor.h:24:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/csrc/core/MUSAGuard.h:7:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/csrc/core/GuardImpl.h:9:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/csrc/aten/utils/Utils.h:4:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/ATen/Dispatch.h:3:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/ATen/core/DeprecatedTypeProperties.h:6:
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/TensorOptions.h:724:1: warning: non-void function does not return a value in all control paths [-Wreturn-type]
}
^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/TensorOptions.h:769:1: warning: non-void function does not return a value in all control paths [-Wreturn-type]
}
^
In file included from /ws/csrc/fast_hadamard_transform_gpu.cu:15:
In file included from /ws/csrc/vendor.h:24:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/csrc/core/MUSAGuard.h:7:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/csrc/core/GuardImpl.h:9:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/csrc/aten/utils/Utils.h:4:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/ATen/Dispatch.h:3:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/ATen/core/DeprecatedTypeProperties.h:7:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/Storage.h:6:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/StorageImpl.h:9:
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/impl/PyObjectSlot.h:125:3: warning: non-void function does not return a value in all control paths [-Wreturn-type]
  }
  ^
In file included from /ws/csrc/fast_hadamard_transform_gpu.cu:15:
In file included from /ws/csrc/vendor.h:24:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/csrc/core/MUSAGuard.h:7:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/csrc/core/GuardImpl.h:9:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/csrc/aten/utils/Utils.h:4:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/ATen/Dispatch.h:3:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/ATen/core/DeprecatedTypeProperties.h:9:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/ATen/core/Generator.h:18:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/GeneratorImpl.h:8:
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/TensorImpl.h:1057:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
In file included from /ws/csrc/fast_hadamard_transform_gpu.cu:15:
In file included from /ws/csrc/vendor.h:24:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/csrc/core/MUSAGuard.h:7:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/csrc/core/GuardImpl.h:10:
/usr/local/lib/python3.10/dist-packages/torch_musa/csrc/core/MUSACachingAllocator.h:214:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/csrc/core/MUSACachingAllocator.h:223:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
62 warnings generated when compiling for mp_31.
In file included from /ws/csrc/fast_hadamard_transform_gpu.cu:15:
In file included from /ws/csrc/vendor.h:24:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/csrc/core/MUSAGuard.h:4:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/impl/InlineDeviceGuard.h:8:
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/impl/DeviceGuardImplInterface.h:114:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/impl/DeviceGuardImplInterface.h:123:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/impl/DeviceGuardImplInterface.h:133:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/impl/DeviceGuardImplInterface.h:182:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/impl/DeviceGuardImplInterface.h:197:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/impl/DeviceGuardImplInterface.h:231:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
In file included from /ws/csrc/fast_hadamard_transform_gpu.cu:15:
In file included from /ws/csrc/vendor.h:24:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/csrc/core/MUSAGuard.h:7:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/csrc/core/GuardImpl.h:5:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/impl/GPUTrace.h:3:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/impl/PyInterpreter.h:5:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/Layout.h:3:
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/Backend.h:143:1: warning: non-void function does not return a value in all control paths [-Wreturn-type]
}
^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/Backend.h:282:1: warning: non-void function does not return a value in all control paths [-Wreturn-type]
}
^
In file included from /ws/csrc/fast_hadamard_transform_gpu.cu:15:
In file included from /ws/csrc/vendor.h:24:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/csrc/core/MUSAGuard.h:7:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/csrc/core/GuardImpl.h:5:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/impl/GPUTrace.h:3:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/impl/PyInterpreter.h:5:
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/Layout.h:76:1: warning: non-void function does not return a value in all control paths [-Wreturn-type]
}
^
In file included from /ws/csrc/fast_hadamard_transform_gpu.cu:15:
In file included from /ws/csrc/vendor.h:24:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/csrc/core/MUSAGuard.h:7:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/csrc/core/GuardImpl.h:5:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/impl/GPUTrace.h:3:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/impl/PyInterpreter.h:6:
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/MemoryFormat.h:61:1: warning: non-void function does not return a value in all control paths [-Wreturn-type]
}
^
In file included from /ws/csrc/fast_hadamard_transform_gpu.cu:15:
In file included from /ws/csrc/vendor.h:24:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/csrc/core/MUSAGuard.h:7:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/csrc/core/GuardImpl.h:5:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/impl/GPUTrace.h:3:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/impl/PyInterpreter.h:7:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymIntArrayRef.h:3:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymInt.h:3:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymBool.h:3:
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:35:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:38:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:41:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:47:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:50:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:53:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:57:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:67:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:77:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:83:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:86:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:89:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:92:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:95:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:98:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:101:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:104:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:107:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:110:3: warning: non-void function does not return a value [-Wreturn-type]
  };
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:113:3: warning: non-void function does not return a value [-Wreturn-type]
  };
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:116:3: warning: non-void function does not return a value [-Wreturn-type]
  };
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:119:3: warning: non-void function does not return a value [-Wreturn-type]
  };
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:122:3: warning: non-void function does not return a value [-Wreturn-type]
  };
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:125:3: warning: non-void function does not return a value [-Wreturn-type]
  };
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:128:3: warning: non-void function does not return a value [-Wreturn-type]
  };
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:134:3: warning: non-void function does not return a value [-Wreturn-type]
  };
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:139:3: warning: non-void function does not return a value [-Wreturn-type]
  };
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:144:3: warning: non-void function does not return a value [-Wreturn-type]
  };
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:149:3: warning: non-void function does not return a value [-Wreturn-type]
  };
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:154:3: warning: non-void function does not return a value [-Wreturn-type]
  };
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:159:3: warning: non-void function does not return a value [-Wreturn-type]
  };
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:162:3: warning: non-void function does not return a value [-Wreturn-type]
  };
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:165:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:168:3: warning: non-void function does not return a value [-Wreturn-type]
  };
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:171:3: warning: non-void function does not return a value [-Wreturn-type]
  };
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:174:3: warning: non-void function does not return a value [-Wreturn-type]
  };
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:177:3: warning: non-void function does not return a value [-Wreturn-type]
  };
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:180:3: warning: non-void function does not return a value [-Wreturn-type]
  };
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:183:3: warning: non-void function does not return a value [-Wreturn-type]
  };
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:201:3: warning: non-void function does not return a value [-Wreturn-type]
  };
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:204:3: warning: non-void function does not return a value [-Wreturn-type]
  };
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:207:3: warning: non-void function does not return a value [-Wreturn-type]
  };
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/SymNodeImpl.h:210:3: warning: non-void function does not return a value [-Wreturn-type]
  };
  ^
In file included from /ws/csrc/fast_hadamard_transform_gpu.cu:15:
In file included from /ws/csrc/vendor.h:24:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/csrc/core/MUSAGuard.h:7:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/csrc/core/GuardImpl.h:9:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/csrc/aten/utils/Utils.h:4:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/ATen/Dispatch.h:3:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/ATen/core/DeprecatedTypeProperties.h:4:
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/ScalarType.h:343:1: warning: non-void function does not return a value in all control paths [-Wreturn-type]
}
^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/ScalarType.h:486:1: warning: non-void function does not return a value in all control paths [-Wreturn-type]
}
^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/ScalarType.h:526:1: warning: non-void function does not return a value in all control paths [-Wreturn-type]
}
^
In file included from /ws/csrc/fast_hadamard_transform_gpu.cu:15:
In file included from /ws/csrc/vendor.h:24:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/csrc/core/MUSAGuard.h:7:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/csrc/core/GuardImpl.h:9:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/csrc/aten/utils/Utils.h:4:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/ATen/Dispatch.h:3:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/ATen/core/DeprecatedTypeProperties.h:6:
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/TensorOptions.h:724:1: warning: non-void function does not return a value in all control paths [-Wreturn-type]
}
^
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/TensorOptions.h:769:1: warning: non-void function does not return a value in all control paths [-Wreturn-type]
}
^
In file included from /ws/csrc/fast_hadamard_transform_gpu.cu:15:
In file included from /ws/csrc/vendor.h:24:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/csrc/core/MUSAGuard.h:7:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/csrc/core/GuardImpl.h:9:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/csrc/aten/utils/Utils.h:4:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/ATen/Dispatch.h:3:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/ATen/core/DeprecatedTypeProperties.h:7:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/Storage.h:6:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/StorageImpl.h:9:
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/impl/PyObjectSlot.h:125:3: warning: non-void function does not return a value in all control paths [-Wreturn-type]
  }
  ^
In file included from /ws/csrc/fast_hadamard_transform_gpu.cu:15:
In file included from /ws/csrc/vendor.h:24:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/csrc/core/MUSAGuard.h:7:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/csrc/core/GuardImpl.h:9:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/csrc/aten/utils/Utils.h:4:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/ATen/Dispatch.h:3:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/ATen/core/DeprecatedTypeProperties.h:9:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/ATen/core/Generator.h:18:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/GeneratorImpl.h:8:
/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/c10/core/TensorImpl.h:1057:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
In file included from /ws/csrc/fast_hadamard_transform_gpu.cu:15:
In file included from /ws/csrc/vendor.h:24:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/csrc/core/MUSAGuard.h:7:
In file included from /usr/local/lib/python3.10/dist-packages/torch_musa/csrc/core/GuardImpl.h:10:
/usr/local/lib/python3.10/dist-packages/torch_musa/csrc/core/MUSACachingAllocator.h:214:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
/usr/local/lib/python3.10/dist-packages/torch_musa/csrc/core/MUSACachingAllocator.h:223:3: warning: non-void function does not return a value [-Wreturn-type]
  }
  ^
62 warnings generated when compiling for host.
x86_64-linux-gnu-g++ -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -g -fwrapv -O2 /ws/build/temp.linux-x86_64-cpython-310/csrc/fast_hadamard_transform.o /ws/build/temp.linux-x86_64-cpython-310/csrc/fast_hadamard_transform_gpu.o -L/usr/local/lib/python3.10/dist-packages/torch/lib -L/usr/local/lib/python3.10/dist-packages/torch_musa/lib -L/usr/local/musa/lib -L/usr/lib/x86_64-linux-gnu -lc10 -ltorch -ltorch_cpu -ltorch_python -lmusa_python -o build/lib.linux-x86_64-cpython-310/fast_hadamard_transform_cuda.cpython-310-x86_64-linux-gnu.so -Wl,-rpath,$ORIGIN/lib -Wl,-rpath,/usr/local/lib/python3.10/dist-packages/torch/lib -Wl,-rpath,/usr/local/lib/python3.10/dist-packages/torch_musa/lib
creating build/bdist.linux-x86_64/egg
copying build/lib.linux-x86_64-cpython-310/fast_hadamard_transform_cuda.cpython-310-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/egg
creating build/bdist.linux-x86_64/egg/fast_hadamard_transform
copying build/lib.linux-x86_64-cpython-310/fast_hadamard_transform/fast_hadamard_transform_interface.py -> build/bdist.linux-x86_64/egg/fast_hadamard_transform
copying build/lib.linux-x86_64-cpython-310/fast_hadamard_transform/__init__.py -> build/bdist.linux-x86_64/egg/fast_hadamard_transform
byte-compiling build/bdist.linux-x86_64/egg/fast_hadamard_transform/fast_hadamard_transform_interface.py to fast_hadamard_transform_interface.cpython-310.pyc
byte-compiling build/bdist.linux-x86_64/egg/fast_hadamard_transform/__init__.py to __init__.cpython-310.pyc
creating stub loader for fast_hadamard_transform_cuda.cpython-310-x86_64-linux-gnu.so
byte-compiling build/bdist.linux-x86_64/egg/fast_hadamard_transform_cuda.py to fast_hadamard_transform_cuda.cpython-310.pyc
creating build/bdist.linux-x86_64/egg/EGG-INFO
copying fast_hadamard_transform.egg-info/PKG-INFO -> build/bdist.linux-x86_64/egg/EGG-INFO
copying fast_hadamard_transform.egg-info/SOURCES.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying fast_hadamard_transform.egg-info/dependency_links.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying fast_hadamard_transform.egg-info/requires.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying fast_hadamard_transform.egg-info/top_level.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
writing build/bdist.linux-x86_64/egg/EGG-INFO/native_libs.txt
zip_safe flag not set; analyzing archive contents...
__pycache__.fast_hadamard_transform_cuda.cpython-310: module references __file__
creating 'dist/fast_hadamard_transform-1.0.4.post1-py3.10-linux-x86_64.egg' and adding 'build/bdist.linux-x86_64/egg' to it
removing 'build/bdist.linux-x86_64/egg' (and everything under it)
Processing fast_hadamard_transform-1.0.4.post1-py3.10-linux-x86_64.egg
removing '/usr/local/lib/python3.10/dist-packages/fast_hadamard_transform-1.0.4.post1-py3.10-linux-x86_64.egg' (and everything under it)
creating /usr/local/lib/python3.10/dist-packages/fast_hadamard_transform-1.0.4.post1-py3.10-linux-x86_64.egg
Extracting fast_hadamard_transform-1.0.4.post1-py3.10-linux-x86_64.egg to /usr/local/lib/python3.10/dist-packages
Adding fast-hadamard-transform 1.0.4.post1 to easy-install.pth file

Installed /usr/local/lib/python3.10/dist-packages/fast_hadamard_transform-1.0.4.post1-py3.10-linux-x86_64.egg
Processing dependencies for fast-hadamard-transform==1.0.4.post1
Searching for ninja==1.11.1
Best match: ninja 1.11.1
Adding ninja 1.11.1 to easy-install.pth file
Installing ninja script to /usr/local/bin

Using /usr/local/lib/python3.10/dist-packages
Searching for packaging==24.2
Best match: packaging 24.2
Adding packaging 24.2 to easy-install.pth file

Using /usr/local/lib/python3.10/dist-packages
Searching for torch==2.5.0
Best match: torch 2.5.0
Adding torch 2.5.0 to easy-install.pth file
Installing convert-caffe2-to-onnx script to /usr/local/bin
Installing convert-onnx-to-caffe2 script to /usr/local/bin
Installing torchfrtrace script to /usr/local/bin
Installing torchrun script to /usr/local/bin

Using /usr/local/lib/python3.10/dist-packages
Searching for sympy==1.13.1
Best match: sympy 1.13.1
Adding sympy 1.13.1 to easy-install.pth file
Installing isympy script to /usr/local/bin

Using /usr/local/lib/python3.10/dist-packages
Searching for fsspec==2025.9.0
Best match: fsspec 2025.9.0
Adding fsspec 2025.9.0 to easy-install.pth file

Using /usr/local/lib/python3.10/dist-packages
Searching for jinja2==3.1.6
Best match: jinja2 3.1.6
Adding jinja2 3.1.6 to easy-install.pth file

Using /usr/local/lib/python3.10/dist-packages
Searching for networkx==3.4.2
Best match: networkx 3.4.2
Adding networkx 3.4.2 to easy-install.pth file

Using /usr/local/lib/python3.10/dist-packages
Searching for typing-extensions==4.15.0
Best match: typing-extensions 4.15.0
Adding typing-extensions 4.15.0 to easy-install.pth file

Using /usr/local/lib/python3.10/dist-packages
Searching for filelock==3.19.1
Best match: filelock 3.19.1
Adding filelock 3.19.1 to easy-install.pth file

Using /usr/local/lib/python3.10/dist-packages
Searching for mpmath==1.3.0
Best match: mpmath 1.3.0
Adding mpmath 1.3.0 to easy-install.pth file

Using /usr/local/lib/python3.10/dist-packages
Searching for MarkupSafe==3.0.2
Best match: MarkupSafe 3.0.2
Adding MarkupSafe 3.0.2 to easy-install.pth file

Using /usr/local/lib/python3.10/dist-packages
Finished processing dependencies for fast-hadamard-transform==1.0.4.post1
root@worker3218:/ws# cd tests/
root@worker3218:/ws/tests# pytest test_fast_hadamard_transform.py 
================================================================= test session starts ==================================================================
platform linux -- Python 3.10.12, pytest-7.2.2, pluggy-1.6.0
rootdir: /ws
plugins: hypothesis-6.140.2, anyio-4.10.0
collected 51 items                                                                                                                                     

test_fast_hadamard_transform.py ...................................................                                                              [100%]

=================================================================== warnings summary ===================================================================
../../usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py:61
  /usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py:61: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
    import pynvml  # type: ignore[import]

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
======================================================= 51 passed, 1 warning in 82.26s (0:01:22) =======================================================
root@worker3218:/ws/tests#

fanshao123456 · 2025-10-14T12:15:42Z

How long does the compilation usually take?

yeahdongcn · 2025-10-14T13:31:39Z

How long does the compilation usually take?

Tested on a GPU Droplet from AMD Developer Cloud.

root@7615ee52f135:/sgl-workspace/fast-hadamard-transform# time python setup.py install
...
real    2m0.123s
user    2m19.199s
sys     0m3.304s

fanshao123456 · 2025-10-14T13:37:27Z

How long does the compilation usually take?

Tested on a GPU Droplet from AMD Developer Cloud.
root@7615ee52f135:/sgl-workspace/fast-hadamard-transform# time python setup.py install
...
real    2m0.123s
user    2m19.199s
sys     0m3.304s

Why does it get stuck during compilation? I'm running it on a Hygon GPU
like this
python3 setup.py install torch.version = 2.5.1 /workspace/whl/20251013/fast-hadamard-transform-xd-multi-platform/csrc/vendor.h -> /workspace/whl/20251013/fast-hadamard-transform-xd-multi-platform/csrc/vendor_hip.h [skipped, already hipified] /workspace/whl/20251013/fast-hadamard-transform-xd-multi-platform/csrc/fast_hadamard_transform.h -> /workspace/whl/20251013/fast-hadamard-transform-xd-multi-platform/csrc/fast_hadamard_transform.h [skipped, no changes] /workspace/whl/20251013/fast-hadamard-transform-xd-multi-platform/csrc/fast_hadamard_transform.cpp -> /workspace/whl/20251013/fast-hadamard-transform-xd-multi-platform/csrc/fast_hadamard_transform_hip.cpp [skipped, already hipified] /workspace/whl/20251013/fast-hadamard-transform-xd-multi-platform/csrc/fast_hadamard_transform_common.h -> /workspace/whl/20251013/fast-hadamard-transform-xd-multi-platform/csrc/fast_hadamard_transform_common_hip.h [skipped, already hipified] /workspace/whl/20251013/fast-hadamard-transform-xd-multi-platform/csrc/fast_hadamard_transform_special.h -> /workspace/whl/20251013/fast-hadamard-transform-xd-multi-platform/csrc/fast_hadamard_transform_special.h [skipped, no changes] /workspace/whl/20251013/fast-hadamard-transform-xd-multi-platform/csrc/static_switch.h -> /workspace/whl/20251013/fast-hadamard-transform-xd-multi-platform/csrc/static_switch.h [skipped, no changes] /workspace/whl/20251013/fast-hadamard-transform-xd-multi-platform/csrc/fast_hadamard_transform_gpu.cu -> /workspace/whl/20251013/fast-hadamard-transform-xd-multi-platform/csrc/fast_hadamard_transform_gpu.hip [skipped, already hipified] Successfully preprocessed all matching files. Total number of unsupported CUDA function calls: 0 Total number of replaced kernel launches: 5 each_nvcc_Input_output: -O3 -O3 each_nvcc_Input_output: -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_OPERATORS__ each_nvcc_Input_output: -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF_CONVERSIONS__ each_nvcc_Input_output: -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ each_nvcc_Input_output: -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ each_nvcc_Input_output: -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ each_nvcc_Input_output: -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ each_nvcc_Input_output: -DUSE_ROCM=1 -DUSE_ROCM=1 input_nvcc_args: ['-O3', '-U__CUDA_NO_HALF_OPERATORS__', '-U__CUDA_NO_HALF_CONVERSIONS__', '-U__CUDA_NO_BFLOAT16_OPERATORS__', '-U__CUDA_NO_BFLOAT16_CONVERSIONS__', '-U__CUDA_NO_BFLOAT162_OPERATORS__', '-U__CUDA_NO_BFLOAT162_CONVERSIONS__', '-DUSE_ROCM=1'] output_nvcc_args: ['-O3', '-U__CUDA_NO_HALF_OPERATORS__', '-U__CUDA_NO_HALF_CONVERSIONS__', '-U__CUDA_NO_BFLOAT16_OPERATORS__', '-U__CUDA_NO_BFLOAT16_CONVERSIONS__', '-U__CUDA_NO_BFLOAT162_OPERATORS__', '-U__CUDA_NO_BFLOAT162_CONVERSIONS__', '-DUSE_ROCM=1'] /usr/local/lib/python3.10/dist-packages/setuptools/dist.py:759: SetuptoolsDeprecationWarning: License classifiers are deprecated. !! ******************************************************************************** Please consider removing the following classifiers in favor of a SPDX license expression: License :: OSI Approved :: BSD License See https://packaging.python.org/en/latest/guides/writing-pyproject-toml/#license for details. ******************************************************************************** !! self._finalize_license_expression() running install /usr/local/lib/python3.10/dist-packages/setuptools/_distutils/cmd.py:90: SetuptoolsDeprecationWarning: setup.py install is deprecated. !! ******************************************************************************** Please avoid running `setup.py directly. Instead, use pypa/build, pypa/installer or other standards-based tools. By 2025-Oct-31, you need to update your project and remove deprecated calls or your builds will no longer be supported. See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details. ******************************************************************************** !! self.initialize_options() running build running build_py running build_ext building 'fast_hadamard_transform_cuda' extension Emitting ninja build file /workspace/whl/20251013/fast-hadamard-transform-xd-multi-platform/build/temp.linux-x86_64-cpython-310/build.ninja... Compiling objects... Using envvar MAX_JOBS (1) as the number of workers

yeahdongcn · 2025-10-14T13:45:52Z

@fanshao123456 This should be related to your dev env setup.

fanshao123456 · 2025-10-14T13:48:50Z

fast-hadamard-transform

So do I need to modify the environment, like Torch or something else? I noticed that fast-hadamard-transform takes a 3D input—can it be replaced with the 2D hadamard-transform?

hadipourh · 2025-11-27T14:11:47Z

I created a new library supporting multiple backends:
https://github.com/hadipourh/fwht

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

yiakwy-xpu-ml-framework-team · 2025-11-30T11:11:32Z

@yeahdongcn Have you verified in any AMD devices ?

yeahdongcn · 2025-11-30T11:38:40Z

@yeahdongcn Have you verified in any AMD devices ?

Yes, see👆 verified on mi300x

yeahdongcn force-pushed the xd/multi-platform branch 4 times, most recently from 1546fb1 to 52e8ff1 Compare October 11, 2025 07:44

yeahdongcn changed the title ~~Multi-backend support~~ Multi-backend Support (ROCm and MUSA) Oct 11, 2025

yeahdongcn changed the title ~~Multi-backend Support (ROCm and MUSA)~~ Multi-backend support (ROCm and MUSA) Oct 11, 2025

Multi-backend support

da3c2f4

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

yeahdongcn force-pushed the xd/multi-platform branch from 52e8ff1 to da3c2f4 Compare November 28, 2025 02:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-backend support (ROCm and MUSA)#21

Multi-backend support (ROCm and MUSA)#21
yeahdongcn wants to merge 1 commit intoDao-AILab:masterfrom
yeahdongcn:xd/multi-platform

yeahdongcn commented Oct 10, 2025 •

edited

Loading

Uh oh!

fanshao123456 commented Oct 14, 2025

Uh oh!

yeahdongcn commented Oct 14, 2025

Uh oh!

fanshao123456 commented Oct 14, 2025

Uh oh!

yeahdongcn commented Oct 14, 2025

Uh oh!

fanshao123456 commented Oct 14, 2025

Uh oh!

hadipourh commented Nov 27, 2025

Uh oh!

yiakwy-xpu-ml-framework-team commented Nov 30, 2025

Uh oh!

yeahdongcn commented Nov 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

yeahdongcn commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Testing Done

ROCm 7.0.0 + Torch 2.9.0a0+git7bcbafe

MUSA 4.3.0 + Torch 2.5.0

Uh oh!

fanshao123456 commented Oct 14, 2025

Uh oh!

yeahdongcn commented Oct 14, 2025

Uh oh!

fanshao123456 commented Oct 14, 2025

Uh oh!

yeahdongcn commented Oct 14, 2025

Uh oh!

fanshao123456 commented Oct 14, 2025

Uh oh!

hadipourh commented Nov 27, 2025

Uh oh!

yiakwy-xpu-ml-framework-team commented Nov 30, 2025

Uh oh!

yeahdongcn commented Nov 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yeahdongcn commented Oct 10, 2025 •

edited

Loading