Skip to content

Error for llama-13B on V100 #21

@yisongsong

Description

@yisongsong

An error was encountered while executing client_qps_measure.

Platform: llama-13B on 2 V100 GPUS

[INFO][2023-09-13 03:35:21.764][llama_server.cc:539] max_tokens: 75630
[INFO][2023-09-13 03:35:21.827][llama_server.cc:484] VOCAB_SIZE: 32000; BOS ID: 1; EOS ID: 2; PAD ID: -1
[INFO][2023-09-13 03:35:21.827][llama_server.cc:606] End init nccl, cuda engine, kv cache, kv scale manager
[INFO][2023-09-13 03:35:21.827][llama_server.cc:626] Init llama worker successed
[INFO][2023-09-13 03:35:21.827][llama_worker.cc:1043] waiting for request ...
[INFO][2023-09-13 03:35:21.829][llama_server.cc:639] listening on [0.0.0.0:23333]
[ERROR][2023-09-13 03:35:34.009][gemm.cu:113] cublasLt failed: the requested functionality is not supported
[ERROR][2023-09-13 03:35:34.009][kernel.cc:176] DoExecute kernel [/layers.0/wqkv/ColumnParallelLinear] failed: device runtime error
[ERROR][2023-09-13 03:35:34.009][gemm.cu:113] cublasLt failed: the requested functionality is not supported
[ERROR][2023-09-13 03:35:34.009][sequential_scheduler.cc:130] exec kernel[/layers.0/wqkv/ColumnParallelLinear] of type[pmx:ColumnParallelLinear:1] failed: device runtime error
[ERROR][2023-09-13 03:35:34.009][kernel.cc:176] DoExecute kernel [/layers.0/wqkv/ColumnParallelLinear] failed: device runtime error
[ERROR][2023-09-13 03:35:34.009][runtime_impl.cc:333] Run() failed: device runtime error
[ERROR][2023-09-13 03:35:34.009][sequential_scheduler.cc:130] exec kernel[/layers.0/wqkv/ColumnParallelLinear] of type[pmx:ColumnParallelLinear:1] failed: device runtime error
[ERROR][2023-09-13 03:35:34.009][llm_cuda_device.cc:276] [ERROR][2023-09-13 03:35:34.009][runtime_impl.cc:333] Run() failed: device runtime error
cudaStreamSynchronize failed: 700, an illegal memory access was encountered
[ERROR][2023-09-13 03:35:34.009][llm_cuda_device.cc:276] cudaStreamSynchronize failed: 700, an illegal memory access was encountered
[ERROR][2023-09-13 03:35:34.009][runtime_impl.cc:316] sync device[llm_cuda] failed: internal error
[ERROR][2023-09-13 03:35:34.009][runtime_impl.cc:316] sync device[llm_cuda] failed: internal error
[ERROR][2023-09-13 03:35:34.009][llama_worker.cc:922] ParallelExecute(RunModelTask) failed.
[INFO][2023-09-13 03:35:34.009][llama_worker.cc:1043] waiting for request ...
[ERROR][2023-09-13 03:35:34.010][llm_cuda_device.cc:112] cudaMemcpyAsync failed: 700, an illegal memory access was encountered
[ERROR][2023-09-13 03:35:34.010][llm_cuda_device.cc:112] [ERROR][2023-09-13 03:35:34.010][llama_worker.cc:724] cudaMemcpyAsync failed: 700, an illegal memory access was encountered
set token_ids [token_ids] failed: other error
[ERROR][2023-09-13 03:35:34.010][llama_worker.cc:724] set token_ids [token_ids] failed: other error
[ERROR][2023-09-13 03:35:34.010][llama_worker.cc:910] ParallelExecute(SetInputTask) failed.
[INFO][2023-09-13 03:35:34.010][llama_worker.cc:1043] waiting for request ...
[INFO][2023-09-13 03:35:34.010][llama_worker.cc:1043] waiting for request ...
[INFO][2023-09-13 03:35:34.011][llama_worker.cc:1043] waiting for request ...
[INFO][2023-09-13 03:35:34.011][llama_worker.cc:1043] waiting for request ...
[INFO][2023-09-13 03:35:34.011][llama_worker.cc:1043] waiting for request ...

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions