-
Notifications
You must be signed in to change notification settings - Fork 15
Open
Description
What are the problems?(screenshots or detailed error messages)
I need to benchmark llama 2 7b time to first token(ttft) with openppl, and I have to benchmark it with static input and output. But I cannot find the scripts to generate custom dataset,can you provide it?
What are the types of GPU/CPU you are using?
4090/A100 40G
What's the operating system ppl.llm.serving runs on?
ubuntu 22.04
What's the compiler and its version?
gcc 11.4
Which version(commit id or tag) of ppl.llm.serving is used?
What are the commands used to build ppl.llm.serving?
./build.sh -DPPLNN_USE_LLM_CUDA=ON -DPPLNN_CUDA_ENABLE_NCCL=ON -DPPLNN_ENABLE_CUDA_JIT=OFF -DPPLNN_CUDA_ARCHITECTURES="'80;86;87;89'" -DPPLCOMMON_CUDA_ARCHITECTURES="'80;86;87;89'"
What are the execution commands?
minimal code snippets for reproducing these problems(if necessary)
models and inputs for reproducing these problems (send them to [email protected] if necessary)
Metadata
Metadata
Assignees
Labels
No labels