[CANN] Add the n_graph_splits performance metric to llama-bench. #12994

bachelor-dou · 2025-04-17T08:20:44Z

llama-bench is an excellent model testing tool. This PR adds the n_graph_splits (graph partition count) parameter to the output of llama-bench. This parameter provides a more intuitive way to assess how well the hardware backend supports the model, helping developers improve model support.

Here is a comparison of the results with -o json.

Before

[
  {
    "build_commit": "12b17501",
    "build_number": 5145,
    "cpu_info": "CPU",
    "gpu_info": "Ascend910B3",
    "backends": "CANN",
    "model_filename": "/home/models/qwen2.5:0.5b-instruct-fp16",
    "model_type": "qwen2 1B F16",
    "model_size": 988208640,
    "model_n_params": 494032768,
    "n_batch": 2048,
    "n_ubatch": 512,
    "n_threads": 192,
    "cpu_mask": "0x0",
    "cpu_strict": false,
    "poll": 50,
    "type_k": "f16",
    "type_v": "f16",
    "n_gpu_layers": 32,
    "split_mode": "layer",
    "main_gpu": 0,
    "no_kv_offload": false,
    "flash_attn": false,
    "tensor_split": "0.00",
    "use_mmap": true,
    "embeddings": false,
    "n_prompt": 0,
    "n_gen": 128,
    "test_time": "2025-04-17T08:51:41Z",
    "avg_ns": 2099229836,
    "stddev_ns": 28969817,
    "avg_ts": 60.984063,
    "stddev_ts": 0.844204,
    "samples_ns": [ 2067742567, 2068888383, 2113888137, 2130481700, 2115148394 ],
    "samples_ts": [ 61.9033, 61.869, 60.5519, 60.0803, 60.5158 ]
  }
]

After

[
  {
    "build_commit": "12790280",
    "build_number": 5155,
    "cpu_info": "CPU",
    "gpu_info": "Ascend910B3",
    "backends": "CANN",
    "model_filename": "/home/models/qwen2.5:0.5b-instruct-fp16",
    "n_graph_splits": 2,
    "model_type": "qwen2 1B F16",
    "model_size": 988208640,
    "model_n_params": 494032768,
    "n_batch": 2048,
    "n_ubatch": 512,
    "n_threads": 192,
    "cpu_mask": "0x0",
    "cpu_strict": false,
    "poll": 50,
    "type_k": "f16",
    "type_v": "f16",
    "n_gpu_layers": 32,
    "split_mode": "layer",
    "main_gpu": 0,
    "no_kv_offload": false,
    "flash_attn": false,
    "tensor_split": "0.00",
    "use_mmap": true,
    "embeddings": false,
    "n_prompt": 0,
    "n_gen": 128,
    "test_time": "2025-04-18T02:21:20Z",
    "avg_ns": 2988958856,
    "stddev_ns": 86828048,
    "avg_ts": 42.853434,
    "stddev_ts": 1.254731,
    "samples_ns": [ 2890490934, 2900927646, 3036466484, 3038520785, 3078388432 ],
    "samples_ts": [ 44.2831, 44.1238, 42.1543, 42.1258, 41.5802 ]
  }
]

examples/llama-bench/llama-bench.cpp

hipudding · 2025-04-17T11:57:33Z

examples/llama-bench/llama-bench.cpp

+ */
+static std::string get_modelfile_name(const std::string & path_str) {
+    namespace fs = std::filesystem;
+    fs::path path = path_str;


Suggested change

fs::path path = path_str;

std::filesystem::path path = path_str;

hipudding · 2025-04-17T11:57:44Z

examples/llama-bench/llama-bench.cpp

+ * @return Full name of the model.
+ */
+static std::string get_modelfile_name(const std::string & path_str) {
+    namespace fs = std::filesystem;


Suggested change

namespace fs = std::filesystem;

hipudding · 2025-04-17T11:58:33Z

examples/llama-bench/llama-bench.cpp

        gpu_info(get_gpu_info()) {

-        model_filename = inst.model;
+        model_filename = get_modelfile_name(inst.model);


Add a parameter? Output path or filename.

hipudding · 2025-04-17T12:01:04Z

It's better to add n_gpus to show how many gpus are used for this test.

Ignore this comment. gou_info will show gpu numbers.

bachelor-dou · 2025-04-18T07:13:02Z

@slaren I'm very confused why this PR is failing so many CI checks for windows environments. If you have any solution for this please let me know, thanks. The reason for the error is basically something like this.

C:\Windows\system32\cmd.exe /C "cd . && C:\PROGRA~1\LLVM\bin\CLANG_~1.EXE --target=arm64-pc-windows-msvc -nostartfiles -nostdlib -march=armv8.7-a -fvectorize -ffp-model=fast -fno-finite-math-only -Wno-format -Wno-unused-variable -Wno-unused-function -Wno-gnu-zero-variadic-macro-arguments -O3 -DNDEBUG -D_DLL -D_MT -Xclang --dependent-lib=msvcrt -Xlinker /subsystem:console   -fuse-ld=lld-link examples/llama-bench/CMakeFiles/llama-bench.dir/Release/llama-bench.cpp.obj -o bin\Release\llama-bench.exe -Xlinker /MANIFEST:EMBED -Xlinker /implib:examples\llama-bench\Release\llama-bench.lib -Xlinker /pdb:bin\Release\llama-bench.pdb -Xlinker /version:0.0   common/Release/common.lib  src/Release/llama.lib  ggml/src/Release/ggml.lib  ggml/src/Release/ggml-cpu.lib  ggml/src/Release/ggml-base.lib  D:/a/_temp/libcurl/lib/libcurl.dll.a  -lkernel32 -luser32 -lgdi32 -lwinspool -lshell32 -lole32 -loleaut32 -luuid -lcomdlg32 -ladvapi32 -loldnames  && cd ."
lld-link: error: undefined symbol: public: int __cdecl llama_context::get_graph_splits(void) const

>>> referenced by examples/llama-bench/CMakeFiles/llama-bench.dir/Release/llama-bench.cpp.obj:(public: __cdecl test::test(struct cmd_params_instance const &, struct llama_model const *, struct llama_context const *))

clang++: error: linker command failed with exit code 1 (use -v to see invocation)

…rs in llama-bench

slaren · 2025-04-21T19:06:46Z

On Windows DLL symbols are not exported by default, you need to use LLAMA_API with all functions that are intended to be used by applications. That's why you get linker errors on Windows. However the solution is not to export it either, llama-context.h is an internal, private header of llama.cpp, and cannot be used by applications in this way.

This information maybe could be exposed in a public function added to llama.h, but I am not convinced that this is a good idea either. Generally we need to avoid exposing too many details of the implementation in the public API, because it can change at any moment. If the goal is to help debugging performance issues, keep in mind that llama.cpp already prints the number of splits every time a context is created, and llama-bench will also print this information if run with -v.

bachelor-dou · 2025-04-22T02:18:58Z

On Windows DLL symbols are not exported by default, you need to use LLAMA_API with all functions that are intended to be used by applications. That's why you get linker errors on Windows. However the solution is not to export it either, llama-context.h is an internal, private header of llama.cpp, and cannot be used by applications in this way.

This information maybe could be exposed in a public function added to llama.h, but I am not convinced that this is a good idea either. Generally we need to avoid exposing too many details of the implementation in the public API, because it can change at any moment. If the goal is to help debugging performance issues, keep in mind that llama.cpp already prints the number of splits every time a context is created, and llama-bench will also print this information if run with -v.

You are right. After studying the relevant materials, the reason we added n_graph_splits is to test the performance of a large batch of models or some specific models through scripts and generate tables. By including this parameter, we can more clearly see which models still have issues with hardware backend support. Otherwise, we would have to extract the number of splits manually.

github-actions bot added the examples label Apr 17, 2025

bachelor-dou changed the title ~~[CANN] Add the n_graph_splits metric to the performance test output paramete…~~ [CANN] Add the n_graph_splits performance metric to llama-bench. Apr 17, 2025

hipudding assigned bachelor-dou Apr 17, 2025

hipudding added the Ascend NPU issues specific to Ascend NPUs label Apr 17, 2025

hipudding reviewed Apr 17, 2025

View reviewed changes

examples/llama-bench/llama-bench.cpp Outdated Show resolved Hide resolved

bachelor-dou force-pushed the master branch from d719fee to 7844941 Compare April 17, 2025 09:31

bachelor-dou marked this pull request as ready for review April 17, 2025 09:36

hipudding reviewed Apr 17, 2025

View reviewed changes

slaren self-requested a review April 17, 2025 12:16

bachelor-dou force-pushed the master branch 2 times, most recently from 1279028 to 8e06ed1 Compare April 18, 2025 06:25

bachelor-dou added 4 commits April 21, 2025 09:22

Add the n_graph_splits metric to the performance test output paramete…

bd6f1dd

…rs in llama-bench

Adaptation of model name extraction for different systems

b54eaa6

delete trailing whitespace

3138f01

Undo model name changes

feb049e

bachelor-dou force-pushed the master branch from 705a09f to feb049e Compare April 21, 2025 01:23

hipudding removed the Ascend NPU issues specific to Ascend NPUs label Apr 21, 2025

hipudding closed this Apr 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CANN] Add the n_graph_splits performance metric to llama-bench. #12994

[CANN] Add the n_graph_splits performance metric to llama-bench. #12994

Uh oh!

bachelor-dou commented Apr 17, 2025 •

edited

Loading

Uh oh!

Uh oh!

hipudding Apr 17, 2025

Uh oh!

hipudding Apr 17, 2025

Uh oh!

bachelor-dou Apr 18, 2025

Uh oh!

hipudding Apr 17, 2025

Uh oh!

bachelor-dou Apr 18, 2025

Uh oh!

hipudding commented Apr 17, 2025 •

edited

Loading

Uh oh!

bachelor-dou commented Apr 18, 2025

Uh oh!

slaren commented Apr 21, 2025

Uh oh!

bachelor-dou commented Apr 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	fs::path path = path_str;
	std::filesystem::path path = path_str;

[CANN] Add the n_graph_splits performance metric to llama-bench. #12994

[CANN] Add the n_graph_splits performance metric to llama-bench. #12994

Uh oh!

Conversation

bachelor-dou commented Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Before

After

Uh oh!

Uh oh!

hipudding Apr 17, 2025

Choose a reason for hiding this comment

Uh oh!

hipudding Apr 17, 2025

Choose a reason for hiding this comment

Uh oh!

bachelor-dou Apr 18, 2025

Choose a reason for hiding this comment

Uh oh!

hipudding Apr 17, 2025

Choose a reason for hiding this comment

Uh oh!

bachelor-dou Apr 18, 2025

Choose a reason for hiding this comment

Uh oh!

hipudding commented Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bachelor-dou commented Apr 18, 2025

Uh oh!

slaren commented Apr 21, 2025

Uh oh!

bachelor-dou commented Apr 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bachelor-dou commented Apr 17, 2025 •

edited

Loading

hipudding commented Apr 17, 2025 •

edited

Loading