Skip to content

Conversation

@IMbackK
Copy link
Collaborator

@IMbackK IMbackK commented Aug 5, 2025

This adds a convenience option GGML_HIP_EXPORT_METRICS that -Rpass-analysis=kernel-resource-usage --save-temps to the hip compilers flags.

This causes the hip compiler to save the assembly of the compiled kernels and output resource usage to stdout:

fattn-vec-f32.cuh:34:1: remark: Function Name: _ZL22flash_attn_vec_ext_f32ILi128ELi8EL9ggml_type1ELS0_1ELb1EEvPKcS2_S2_S2_PKiPfP15HIP_vector_typeIfLj2EEffffjfiiiiiiiiiiiiiliiliiiiil [-Rpass-analysis=kernel-resource-usage]
   34 |                             const int32_t nb31, const int32_t nb32, const int64_t nb33) {
      | ^
fattn-vec-f32.cuh:34:1: remark:     SGPRs: 92 [-Rpass-analysis=kernel-resource-usage]
fattn-vec-f32.cuh:34:1: remark:     VGPRs: 63 [-Rpass-analysis=kernel-resource-usage]
fattn-vec-f32.cuh:34:1: remark:     AGPRs: 64 [-Rpass-analysis=kernel-resource-usage]
fattn-vec-f32.cuh:34:1: remark:     ScratchSize [bytes/lane]: 12 [-Rpass-analysis=kernel-resource-usage]
fattn-vec-f32.cuh:34:1: remark:     Dynamic Stack: False [-Rpass-analysis=kernel-resource-usage]
fattn-vec-f32.cuh:34:1: remark:     Occupancy [waves/SIMD]: 4 [-Rpass-analysis=kernel-resource-usage]
fattn-vec-f32.cuh:34:1: remark:     SGPRs Spill: 0 [-Rpass-analysis=kernel-resource-usage]
fattn-vec-f32.cuh:34:1: remark:     VGPRs Spill: 14 [-Rpass-analysis=kernel-resource-usage]
fattn-vec-f32.cuh:34:1: remark:     LDS Size [bytes/block]: 10240 [-Rpass-analysis=kernel-resource-usage]

While the same information can also be just as well be gathered by rocprof, and these flags can just as well be added by setting them directly when invoking cmake (which is what i do atm) i would find the inclusion of the above option convenient.

@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Aug 5, 2025
@IMbackK IMbackK merged commit 7ad67ba into ggml-org:master Aug 7, 2025
116 of 129 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants