Skip to content

Conversation

@yeahdongcn
Copy link
Collaborator

As @slaren suggested in #8383, it is beneficial to organize vendor-specific headers separately. This PR creates a new vendors directory and adds cuda.h, hip.h, and musa.h for the three supported vendors.

Testing done

  • make GGML_MUSA=1 -> passed

@github-actions github-actions bot added the Nvidia GPU Issues specific to Nvidia GPUs label Jul 29, 2024
@slaren slaren merged commit 439b3fc into ggml-org:master Jul 29, 2024
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Aug 2, 2024
@m828
Copy link

m828 commented Aug 27, 2024

Hello, I used the make GGML_MUSA=1 command to compile and got an error:

I ccache found, compilation results will be cached. Disable with GGML_NO_CCACHE.
I llama.cpp build info:
I UNAME_S: Linux
I UNAME_P: x86_64
I UNAME_M: x86_64
I CFLAGS: -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_MUSA -DGGML_USE_OPENMP -I/usr/lib/llvm-10/include/openmp -DGGML_USE_LLAMAFILE -DGGML_USE_CUDA -I/usr/local/musa/include -std=c11 -fPIC -O3 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -pthread -march=native -mtune=native -fopenmp -Wunreachable-code-break -Wunreachable-code-return -Wdouble-promotion
I CXXFLAGS: -std=c++11 -fPIC -O3 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -fopenmp -march=native -mtune=native -Wunreachable-code-break -Wunreachable-code-return -Wmissing-prototypes -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_MUSA -DGGML_USE_OPENMP -I/usr/lib/llvm-10/include/openmp -DGGML_USE_LLAMAFILE -DGGML_USE_CUDA -I/usr/local/musa/include
I NVCCFLAGS: -std=c++11 -O3 -g -x musa -mtgpu --cuda-gpu-arch=mp_22 -arch=native -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128
I LDFLAGS: -L/usr/lib/llvm-10/lib -lmusa -lmublas -lmusart -lpthread -ldl -lrt -L/usr/local/musa/lib -L/usr/lib64
I CC: clang version 14.0.0 ([email protected]:mthreads/mtcc.git 228d4651d8fcb8511ca196a5740eef83326ce1cb)
I CXX: clang version 14.0.0 ([email protected]:mthreads/mtcc.git 228d4651d8fcb8511ca196a5740eef83326ce1cb)
I NVCC: InstalledDir: /usr/local/musa/bin

/usr/bin/ccache mcc -std=c++11 -O3 -g -x musa -mtgpu --cuda-gpu-arch=mp_22 -arch=native -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_MUSA -DGGML_USE_OPENMP -I/usr/lib/llvm-10/include/openmp -DGGML_USE_LLAMAFILE -DGGML_USE_CUDA -I/usr/local/musa/include -c ggml/src/ggml-cuda/mmvq.cu -o ggml/src/ggml-cuda/mmvq.o
clang-14: warning: argument unused during compilation: '-arch=native' [-Wunused-command-line-argument]
In file included from ggml/src/ggml-cuda/mmvq.cu:1:
In file included from ggml/src/ggml-cuda/mmvq.cuh:1:
ggml/src/ggml-cuda/common.cuh:164:1: warning: function declared 'noreturn' should not return [-Winvalid-noreturn]
}
^
ggml/src/ggml-cuda/common.cuh:268:1: warning: non-void function does not return a value [-Wreturn-type]
}
^
In file included from :1:
In file included from /usr/local/musa-2.0.0/lib/clang/14.0.0/include/__clang_musa_runtime_wrapper.h:169:
/usr/local/musa-2.0.0/lib/clang/14.0.0/include/__clang_musa_device_functions.h:2293:11: error: couldn't allocate output register for constraint 'r'
asm("vsub4.s32.s32.s32.sat %0,%1,%2,%3;"
^
/usr/local/musa-2.0.0/lib/clang/14.0.0/include/__clang_musa_device_functions.h:2293:11: error: couldn't allocate output register for constraint 'r'
/usr/local/musa-2.0.0/lib/clang/14.0.0/include/__clang_musa_device_functions.h:2293:11: error: couldn't allocate output register for constraint 'r'
/usr/local/musa-2.0.0/lib/clang/14.0.0/include/__clang_musa_device_functions.h:2293:11: error: couldn't allocate output register for constraint 'r'
error: couldn't allocate output register for constraint 'r'
error: couldn't allocate output register for constraint 'r'
error: couldn't allocate output register for constraint 'r'
error: couldn't allocate output register for constraint 'r'
error: couldn't allocate output register for constraint 'r'
error: couldn't allocate output register for constraint 'r'
error: couldn't allocate output register for constraint 'r'
error: couldn't allocate output register for constraint 'r'
error: couldn't allocate output register for constraint 'r'
error: couldn't allocate output register for constraint 'r'
error: couldn't allocate output register for constraint 'r'
error: couldn't allocate output register for constraint 'r'
error: couldn't allocate output register for constraint 'r'
error: couldn't allocate output register for constraint 'r'
error: couldn't allocate output register for constraint 'r'
fatal error: too many errors emitted, stopping now [-ferror-limit=]
2 warnings and 20 errors generated when compiling for mp_22.
make: *** [Makefile:736: ggml/src/ggml-cuda/mmvq.o] Error 1

What is the reason?

@yeahdongcn
Copy link
Collaborator Author

What is the reason?

Which version of MUSA Toolkits are you using?
Feel free to contact me through WeChat: yeahdongcn

@m828
Copy link

m828 commented Aug 27, 2024

What is the reason?

Which version of MUSA Toolkits are you using? Feel free to contact me through WeChat: yeahdongcn

好的,已加您微信

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Nvidia GPU Issues specific to Nvidia GPUs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants