Skip to content

Conversation

@bachelor-dou
Copy link
Contributor

@bachelor-dou bachelor-dou commented Apr 17, 2025

llama-bench is an excellent model testing tool. This PR adds the n_graph_splits (graph partition count) parameter to the output of llama-bench. This parameter provides a more intuitive way to assess how well the hardware backend supports the model, helping developers improve model support.

Here is a comparison of the results with -o json.

Before

[
  {
    "build_commit": "12b17501",
    "build_number": 5145,
    "cpu_info": "CPU",
    "gpu_info": "Ascend910B3",
    "backends": "CANN",
    "model_filename": "/home/models/qwen2.5:0.5b-instruct-fp16",
    "model_type": "qwen2 1B F16",
    "model_size": 988208640,
    "model_n_params": 494032768,
    "n_batch": 2048,
    "n_ubatch": 512,
    "n_threads": 192,
    "cpu_mask": "0x0",
    "cpu_strict": false,
    "poll": 50,
    "type_k": "f16",
    "type_v": "f16",
    "n_gpu_layers": 32,
    "split_mode": "layer",
    "main_gpu": 0,
    "no_kv_offload": false,
    "flash_attn": false,
    "tensor_split": "0.00",
    "use_mmap": true,
    "embeddings": false,
    "n_prompt": 0,
    "n_gen": 128,
    "test_time": "2025-04-17T08:51:41Z",
    "avg_ns": 2099229836,
    "stddev_ns": 28969817,
    "avg_ts": 60.984063,
    "stddev_ts": 0.844204,
    "samples_ns": [ 2067742567, 2068888383, 2113888137, 2130481700, 2115148394 ],
    "samples_ts": [ 61.9033, 61.869, 60.5519, 60.0803, 60.5158 ]
  }
]

After

[
  {
    "build_commit": "12790280",
    "build_number": 5155,
    "cpu_info": "CPU",
    "gpu_info": "Ascend910B3",
    "backends": "CANN",
    "model_filename": "/home/models/qwen2.5:0.5b-instruct-fp16",
    "n_graph_splits": 2,
    "model_type": "qwen2 1B F16",
    "model_size": 988208640,
    "model_n_params": 494032768,
    "n_batch": 2048,
    "n_ubatch": 512,
    "n_threads": 192,
    "cpu_mask": "0x0",
    "cpu_strict": false,
    "poll": 50,
    "type_k": "f16",
    "type_v": "f16",
    "n_gpu_layers": 32,
    "split_mode": "layer",
    "main_gpu": 0,
    "no_kv_offload": false,
    "flash_attn": false,
    "tensor_split": "0.00",
    "use_mmap": true,
    "embeddings": false,
    "n_prompt": 0,
    "n_gen": 128,
    "test_time": "2025-04-18T02:21:20Z",
    "avg_ns": 2988958856,
    "stddev_ns": 86828048,
    "avg_ts": 42.853434,
    "stddev_ts": 1.254731,
    "samples_ns": [ 2890490934, 2900927646, 3036466484, 3038520785, 3078388432 ],
    "samples_ts": [ 44.2831, 44.1238, 42.1543, 42.1258, 41.5802 ]
  }
]

@bachelor-dou bachelor-dou changed the title [CANN] Add the n_graph_splits metric to the performance test output paramete… [CANN] Add the n_graph_splits performance metric to llama-bench. Apr 17, 2025
@hipudding hipudding added the Ascend NPU issues specific to Ascend NPUs label Apr 17, 2025
@bachelor-dou bachelor-dou marked this pull request as ready for review April 17, 2025 09:36
*/
static std::string get_modelfile_name(const std::string & path_str) {
namespace fs = std::filesystem;
fs::path path = path_str;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fs::path path = path_str;
std::filesystem::path path = path_str;

* @return Full name of the model.
*/
static std::string get_modelfile_name(const std::string & path_str) {
namespace fs = std::filesystem;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
namespace fs = std::filesystem;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

gpu_info(get_gpu_info()) {

model_filename = inst.model;
model_filename = get_modelfile_name(inst.model);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a parameter? Output path or filename.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@hipudding
Copy link
Collaborator

hipudding commented Apr 17, 2025

It's better to add n_gpus to show how many gpus are used for this test.

Ignore this comment. gou_info will show gpu numbers.

@slaren slaren self-requested a review April 17, 2025 12:16
@bachelor-dou bachelor-dou force-pushed the master branch 2 times, most recently from 1279028 to 8e06ed1 Compare April 18, 2025 06:25
@bachelor-dou
Copy link
Contributor Author

@slaren I'm very confused why this PR is failing so many CI checks for windows environments. If you have any solution for this please let me know, thanks. The reason for the error is basically something like this.

C:\Windows\system32\cmd.exe /C "cd . && C:\PROGRA~1\LLVM\bin\CLANG_~1.EXE --target=arm64-pc-windows-msvc -nostartfiles -nostdlib -march=armv8.7-a -fvectorize -ffp-model=fast -fno-finite-math-only -Wno-format -Wno-unused-variable -Wno-unused-function -Wno-gnu-zero-variadic-macro-arguments -O3 -DNDEBUG -D_DLL -D_MT -Xclang --dependent-lib=msvcrt -Xlinker /subsystem:console   -fuse-ld=lld-link examples/llama-bench/CMakeFiles/llama-bench.dir/Release/llama-bench.cpp.obj -o bin\Release\llama-bench.exe -Xlinker /MANIFEST:EMBED -Xlinker /implib:examples\llama-bench\Release\llama-bench.lib -Xlinker /pdb:bin\Release\llama-bench.pdb -Xlinker /version:0.0   common/Release/common.lib  src/Release/llama.lib  ggml/src/Release/ggml.lib  ggml/src/Release/ggml-cpu.lib  ggml/src/Release/ggml-base.lib  D:/a/_temp/libcurl/lib/libcurl.dll.a  -lkernel32 -luser32 -lgdi32 -lwinspool -lshell32 -lole32 -loleaut32 -luuid -lcomdlg32 -ladvapi32 -loldnames  && cd ."
lld-link: error: undefined symbol: public: int __cdecl llama_context::get_graph_splits(void) const

>>> referenced by examples/llama-bench/CMakeFiles/llama-bench.dir/Release/llama-bench.cpp.obj:(public: __cdecl test::test(struct cmd_params_instance const &, struct llama_model const *, struct llama_context const *))

clang++: error: linker command failed with exit code 1 (use -v to see invocation)

@hipudding hipudding removed the Ascend NPU issues specific to Ascend NPUs label Apr 21, 2025
@slaren
Copy link
Member

slaren commented Apr 21, 2025

On Windows DLL symbols are not exported by default, you need to use LLAMA_API with all functions that are intended to be used by applications. That's why you get linker errors on Windows. However the solution is not to export it either, llama-context.h is an internal, private header of llama.cpp, and cannot be used by applications in this way.

This information maybe could be exposed in a public function added to llama.h, but I am not convinced that this is a good idea either. Generally we need to avoid exposing too many details of the implementation in the public API, because it can change at any moment. If the goal is to help debugging performance issues, keep in mind that llama.cpp already prints the number of splits every time a context is created, and llama-bench will also print this information if run with -v.

@bachelor-dou
Copy link
Contributor Author

On Windows DLL symbols are not exported by default, you need to use LLAMA_API with all functions that are intended to be used by applications. That's why you get linker errors on Windows. However the solution is not to export it either, llama-context.h is an internal, private header of llama.cpp, and cannot be used by applications in this way.

This information maybe could be exposed in a public function added to llama.h, but I am not convinced that this is a good idea either. Generally we need to avoid exposing too many details of the implementation in the public API, because it can change at any moment. If the goal is to help debugging performance issues, keep in mind that llama.cpp already prints the number of splits every time a context is created, and llama-bench will also print this information if run with -v.

You are right. After studying the relevant materials, the reason we added n_graph_splits is to test the performance of a large batch of models or some specific models through scripts and generate tables. By including this parameter, we can more clearly see which models still have issues with hardware backend support. Otherwise, we would have to extract the number of splits manually.

@hipudding hipudding closed this Apr 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants