Skip to content

Conversation

@walidbr
Copy link

@walidbr walidbr commented Sep 7, 2025

This uses perfetto in process profiling, and will produce a perfetto binary by the end of the inference. This is very useful to help visualise how the handles the inference.

Build:
cmake -S . -B build-vk -DCMAKE_BUILD_TYPE=RelWithDebInfo -DGGML_VULKAN=ON cmake --build build-vk -j8

Run:
GGML_VK_PERF_SILENT=1 GGML_VK_PERF_LOGGER=1 LLAMA_PERFETTO_TRACE=./out.pftrace build-vk/bin/llama-cli -m model.gguf

Test:
Tested on M4 Mac

In detail this patch does the following:

  1. Including the LlamaPerfetto.h header file, which contains the definitions for the Perfetto-related functions and variables used in this example.
  2. Calling the llama_perfetto_start() function to start tracing at the beginning of the conversation.
  3. Calling the llama_perfetto_stop_flush() function to stop tracing and flush the trace after each generation.
  4. Adding a call to the llama_perfetto_trace_begin_with_text() function to begin an event in Perfetto with a text description of the current evaluation.
  5. Adding a call to the llama_perfetto_trace_end() function to end the event after each evaluation.
  6. Adding a call to the llama_perfetto_counter_tokens_per_s() function to update the Perfetto counter for tokens per second during idle periods.
  7. Calling the llama_perfetto_emit_gpu_timeline() function to emit GPU timeline slices into Perfetto.
  8. Adding a call to the llama_perfetto_print_gpu_stats() function to print GPU statistics at idle periods.
  9. Calling the llama_perfetto_flush_dump_stats() function to flush and dump the Perfetto trace stats to a file at idle periods.

Make sure to read the contributing guidelines before submitting a PR

@walidbr walidbr requested a review from 0cc4m as a code owner September 7, 2025 02:06
@github-actions github-actions bot added Vulkan Issues specific to the Vulkan backend examples devops improvements to build systems and github actions ggml changes relating to the ggml tensor library for machine learning labels Sep 7, 2025
This uses perfetto in process profiling, and will produce a perfetto binary by
the end of the inference. This is very useful to help visualise how the
handles the inference.

Build:
cmake -S . -B build-vk -DCMAKE_BUILD_TYPE=RelWithDebInfo -DGGML_VULKAN=ON
cmake --build build-vk -j8

Run:
GGML_VK_PERF_SILENT=1 GGML_VK_PERF_LOGGER=1 LLAMA_PERFETTO_TRACE=./out.pftrace build-vk/bin/llama-cli -m model.gguf

Test:
Tested on M4 Mac

In detail this patch does the following:
1. Including the `LlamaPerfetto.h` header file, which contains the definitions for the Perfetto-related functions and variables used in this example.
2. Calling the `llama_perfetto_start()` function to start tracing at the beginning of the conversation.
3. Calling the `llama_perfetto_stop_flush()` function to stop tracing and flush the trace after each generation.
4. Adding a call to the `llama_perfetto_trace_begin_with_text()` function to begin an event in Perfetto with a text description of the current evaluation.
5. Adding a call to the `llama_perfetto_trace_end()` function to end the event after each evaluation.
6. Adding a call to the `llama_perfetto_counter_tokens_per_s()` function to update the Perfetto counter for tokens per second during idle periods.
7. Calling the `llama_perfetto_emit_gpu_timeline()` function to emit GPU timeline slices into Perfetto.
8. Adding a call to the `llama_perfetto_print_gpu_stats()` function to print GPU statistics at idle periods.
9. Calling the `llama_perfetto_flush_dump_stats()` function to flush and dump the Perfetto trace stats to a file at idle periods.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

devops improvements to build systems and github actions examples ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant