-
Notifications
You must be signed in to change notification settings - Fork 15
Description
What are the problems?(screenshots or detailed error messages)
使用offline_inference测试llama2_7b时,会报如下错误:
””“
[LLMCUDA][pmx/rms_norm_kernel.cc:84] |-DataFormat: NDARRAY
[LLMCUDA][pmx/column_parallel_linear_kernel.cc:29] Entry LlmCudaKernel: [/layers.0/w
[LLMCUDA][pmx/column_parallel_linear_kernel.cc:36] Input [input]:
[LLMCUDA][pmx/column_parallel_linear_kernel.cc:37] TensorName: [/layers.0/attention
[LLMCUDA][pmx/column_parallel_linear_kernel.cc:37] |-Data: 0x1120000000
[LLMCUDA][pmx/column_parallel_linear_kernel.cc:37] |-DimCount: 2
[LLMCUDA][pmx/column_parallel_linear_kernel.cc:37] |- Dim[0]: 6 Pads: [0, 0]
[LLMCUDA][pmx/column_parallel_linear_kernel.cc:37] |- Dim[1]: 4096 Pads: [0, 0]
[LLMCUDA][pmx/column_parallel_linear_kernel.cc:37] |-DeviceType: cuda
[LLMCUDA][pmx/column_parallel_linear_kernel.cc:37] |-DataType: FLOAT16
[LLMCUDA][pmx/column_parallel_linear_kernel.cc:37] |-DataFormat: NDARRAY
[LLMCUDA][pmx/column_parallel_linear_kernel.cc:38] Input [weight]:
[LLMCUDA][pmx/column_parallel_linear_kernel.cc:39] TensorName: [layers.0.attention.
[LLMCUDA][pmx/column_parallel_linear_kernel.cc:39] |-Data: 0x7992ea000000
[LLMCUDA][pmx/column_parallel_linear_kernel.cc:39] |-DimCount: 2
[LLMCUDA][pmx/column_parallel_linear_kernel.cc:39] |- Dim[0]: 12288 Pads: [0, 0]
[LLMCUDA][pmx/column_parallel_linear_kernel.cc:39] |- Dim[1]: 4096 Pads: [0, 0]
[LLMCUDA][pmx/column_parallel_linear_kernel.cc:39] |-DeviceType: cuda
[LLMCUDA][pmx/column_parallel_linear_kernel.cc:39] |-DataType: FLOAT16
[LLMCUDA][pmx/column_parallel_linear_kernel.cc:39] |-DataFormat: NDARRAY
[LLMCUDA][pmx/column_parallel_linear_kernel.cc:45] in_features: 4096
[LLMCUDA][pmx/column_parallel_linear_kernel.cc:46] out_features: 12288
[LLMCUDA][pmx/column_parallel_linear_kernel.cc:47] bias_term: 0
[LLMCUDA][pmx/column_parallel_linear_kernel.cc:48] gather_output: 0
[LLMCUDA][pmx/column_parallel_linear_kernel.cc:53] Output [output]:
[LLMCUDA][pmx/column_parallel_linear_kernel.cc:54] TensorName: [/layers.0/wqkv/Colu
[LLMCUDA][pmx/column_parallel_linear_kernel.cc:54] |-Data: 0x1120018000
[LLMCUDA][pmx/column_parallel_linear_kernel.cc:54] |-DimCount: 2
[LLMCUDA][pmx/column_parallel_linear_kernel.cc:54] |- Dim[0]: 6 Pads: [0, 0]
[LLMCUDA][pmx/column_parallel_linear_kernel.cc:54] |- Dim[1]: 12288 Pads: [0, 0]
[LLMCUDA][pmx/column_parallel_linear_kernel.cc:54] |-DeviceType: cuda
[LLMCUDA][pmx/column_parallel_linear_kernel.cc:54] |-DataType: FLOAT16
[LLMCUDA][pmx/column_parallel_linear_kernel.cc:54] |-DataFormat: NDARRAY
[ERROR][2024-10-25 09:29:01.319][gemm.cu:125] cublasLt failed: an unsupported value
[ERROR][2024-10-25 09:29:01.319][kernel.cc:169] DoExecute kernel [/layers.0/wqkv/Col
[ERROR][2024-10-25 09:29:01.319][sequential_scheduler.cc:130] exec kernel[/layers.0/ailed: device runtime error
[ERROR][2024-10-25 09:29:01.319][runtime_impl.cc:315] Run() failed: device runtime e
[ERROR][2024-10-25 09:29:01.320][utils.h:52] ParallelExecute task[0] failed
[ERROR][2024-10-25 09:29:01.320][llama_worker.cc:778] ParallelExecute(RunModelTask)
[DEBUG][2024-10-25 09:29:01.320][llama_worker.cc:759] Step: 0 ----------------------
“”“
What are the types of GPU/CPU you are using?
A40
NVIDIA-SMI 555.42.02
Driver Version: 555.42.02
CUDA Version: 12.5
What's the operating system ppl.llm.serving runs on?
ubuntu22.04
What's the compiler and its version?
Which version(commit id or tag) of ppl.llm.serving is used?
master分支,commit id: 3abe5d2
What are the commands used to build ppl.llm.serving?
./build.sh -DPPLNN_USE_LLM_CUDA=ON -DPPLNN_CUDA_ENABLE_NCCL=ON -DPPLNN_ENABLE_CUDA_JIT=OFF -DPPLNN_CUDA_ARCHITECTURES="'80;86;87'" -DPPLCOMMON_CUDA_ARCHITECTURES="'80;86;87'"
What are the execution commands?
./ppl-build/offline_inference src/models/llama/conf/llama_7b_config_example.json
minimal code snippets for reproducing these problems(if necessary)
models and inputs for reproducing these problems (send them to [email protected] if necessary)
模型来自huggingface上的llama2_7b