You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This uses perfetto in process profiling, and will produce a perfetto binary by
the end of the inference. This is very useful to help visualise how the
handles the inference.
Build:
cmake -S . -B build-vk -DCMAKE_BUILD_TYPE=RelWithDebInfo -DGGML_VULKAN=ON
cmake --build build-vk -j8
Run:
GGML_VK_PERF_SILENT=1 GGML_VK_PERF_LOGGER=1 LLAMA_PERFETTO_TRACE=./out.pftrace build-vk/bin/llama-cli -m model.gguf
Test:
Tested on M4 Mac
In detail this patch does the following:
1. Including the `LlamaPerfetto.h` header file, which contains the definitions for the Perfetto-related functions and variables used in this example.
2. Calling the `llama_perfetto_start()` function to start tracing at the beginning of the conversation.
3. Calling the `llama_perfetto_stop_flush()` function to stop tracing and flush the trace after each generation.
4. Adding a call to the `llama_perfetto_trace_begin_with_text()` function to begin an event in Perfetto with a text description of the current evaluation.
5. Adding a call to the `llama_perfetto_trace_end()` function to end the event after each evaluation.
6. Adding a call to the `llama_perfetto_counter_tokens_per_s()` function to update the Perfetto counter for tokens per second during idle periods.
7. Calling the `llama_perfetto_emit_gpu_timeline()` function to emit GPU timeline slices into Perfetto.
8. Adding a call to the `llama_perfetto_print_gpu_stats()` function to print GPU statistics at idle periods.
9. Calling the `llama_perfetto_flush_dump_stats()` function to flush and dump the Perfetto trace stats to a file at idle periods.
0 commit comments