-
Notifications
You must be signed in to change notification settings - Fork 13.4k
Introduce Graph Profiler #9659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Introduce Graph Profiler #9659
Conversation
@max-krasnyansky I am using the graph-profiler branch but I'm unsure how to trigger and get the profiling details. Any docs, commands or references would be appreciated. Thanks. |
6246824
to
e7e9a7f
Compare
Sorry for the delay. Here is how to build (arm64-ubuntu)
And here is how to run
This will get you the output I included in the PR
|
d4051c8
to
a362c74
Compare
Hi, I am also trying to find how to do profile properly with llama.cpp. In my case, I would like to know the performance beyond the node level. For example, I would like to know the aggregated time of all nodes generated by |
I think a good approach can be that for each |
a362c74
to
ca40774
Compare
I'm thinking for that it might make sense to insert dummy graph nodes that record profiling data. |
ca40774
to
dd0b9aa
Compare
Hi, I was looking for a tool exactly like this to dump the actual graph operators during execution. Very useful. One feedback but I actually needed the following change to compile in my environment (Windows 11 x64 + VS2022). diff --git a/ggml/src/ggml-cpu/CMakeLists.txt b/ggml/src/ggml-cpu/CMakeLists.txt
index 2cc42d4b0..acfa79fff 100644
--- a/ggml/src/ggml-cpu/CMakeLists.txt
+++ b/ggml/src/ggml-cpu/CMakeLists.txt
@@ -583,6 +583,10 @@ function(ggml_add_cpu_backend_variant_impl tag_name)
list(APPEND GGML_CPU_SOURCES ${GGML_KLEIDIAI_SOURCES})
endif()
+ if (GGML_GRAPH_PROFILER)
+ target_link_libraries(${GGML_CPU_NAME} PRIVATE ggml-base)
+ endif()
+
message(STATUS "Adding CPU backend variant ${GGML_CPU_NAME}: ${ARCH_FLAGS} ${ARCH_DEFINITIONS}")
target_sources(${GGML_CPU_NAME} PRIVATE ${GGML_CPU_SOURCES})
target_compile_options(${GGML_CPU_NAME} PRIVATE ${ARCH_FLAGS})
diff --git a/ggml/src/ggml-profile.h b/ggml/src/ggml-profile.h
index 3f8fecc08..f63f019ce 100644
--- a/ggml/src/ggml-profile.h
+++ b/ggml/src/ggml-profile.h
@@ -77,11 +77,11 @@ static inline void ggml_graph_profile_event(const struct ggml_cgraph *cg, enum g
#else
-void ggml_graph_profile_init(struct ggml_cgraph *cg, int n_threads);
-void ggml_graph_profile_start(struct ggml_cgraph *cg, int n_threads);
-void ggml_graph_profile_finish(struct ggml_cgraph *cg, int n_threads);
-void ggml_graph_profile_free(struct ggml_cgraph *cg);
-void ggml_graph_profile_event(const struct ggml_cgraph *cg, enum ggml_profile_event e, int node_n, int ith);
+GGML_API void ggml_graph_profile_init(struct ggml_cgraph *cg, int n_threads);
+GGML_API void ggml_graph_profile_start(struct ggml_cgraph *cg, int n_threads);
+GGML_API void ggml_graph_profile_finish(struct ggml_cgraph *cg, int n_threads);
+GGML_API void ggml_graph_profile_free(struct ggml_cgraph *cg);
+GGML_API void ggml_graph_profile_event(const struct ggml_cgraph *cg, enum ggml_profile_event e, int node_n, int ith);
#endif // GGML_GRAPH_PROFILER |
aa5a7c6
to
0033263
Compare
Here is an attempt at reintroducing the original whole-graph profiler (LLAMA_PERF) with some additional features.
Not ready for the merge into master but useful for profiling different models (on CPU).
Features:
Known issues:
ggml_init_param.graph_profile
or it'll be moved into the backend paramsIf there is interest it should be easy to extend to other backends where they could update per-node/per-thread
ggml_profile_timing
data (they'd have to collect it on the accelerator and then export into this common format.See original PR #9647 for additional details.
Example of the terminal output
Same example in rendered MarkDown