-
Notifications
You must be signed in to change notification settings - Fork 0
UPSTREAM PR #17479: vulkan: Implement GGML_OP_CUMSUM #312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Explore the complete analysis inside the Version Insights Performance Analysis Summary - PR #312OverviewPR #312 implements Performance Impact AssessmentNo Impact on Inference Performance The implementation adds a new operation ( Power Consumption Analysis Binary-level analysis shows no changes to inference-related binaries:
The 56.96% overall power consumption reduction observed in the version comparison reflects architectural reorganization in other binaries ( Key FindingsFunctional Additions The PR introduces cumulative sum computation using Vulkan subgroup arithmetic primitives. The implementation processes one row per workgroup with 128 threads, utilizing Code Structure Refactored Hardware Requirements Operation requires Integration Points Changes integrate into existing Vulkan backend infrastructure: pipeline registration, operation dispatch, graph builder, and test harness. No modifications to public API or core GGML operations. Performance Characteristics The implementation exhibits linear scaling with row width (n_cols) due to iteration loops processing 128 elements per iteration. Inter-iteration dependency via |
3 similar comments
|
Explore the complete analysis inside the Version Insights Performance Analysis Summary - PR #312OverviewPR #312 implements Performance Impact AssessmentNo Impact on Inference Performance The implementation adds a new operation ( Power Consumption Analysis Binary-level analysis shows no changes to inference-related binaries:
The 56.96% overall power consumption reduction observed in the version comparison reflects architectural reorganization in other binaries ( Key FindingsFunctional Additions The PR introduces cumulative sum computation using Vulkan subgroup arithmetic primitives. The implementation processes one row per workgroup with 128 threads, utilizing Code Structure Refactored Hardware Requirements Operation requires Integration Points Changes integrate into existing Vulkan backend infrastructure: pipeline registration, operation dispatch, graph builder, and test harness. No modifications to public API or core GGML operations. Performance Characteristics The implementation exhibits linear scaling with row width (n_cols) due to iteration loops processing 128 elements per iteration. Inter-iteration dependency via |
|
Explore the complete analysis inside the Version Insights Performance Analysis Summary - PR #312OverviewPR #312 implements Performance Impact AssessmentNo Impact on Inference Performance The implementation adds a new operation ( Power Consumption Analysis Binary-level analysis shows no changes to inference-related binaries:
The 56.96% overall power consumption reduction observed in the version comparison reflects architectural reorganization in other binaries ( Key FindingsFunctional Additions The PR introduces cumulative sum computation using Vulkan subgroup arithmetic primitives. The implementation processes one row per workgroup with 128 threads, utilizing Code Structure Refactored Hardware Requirements Operation requires Integration Points Changes integrate into existing Vulkan backend infrastructure: pipeline registration, operation dispatch, graph builder, and test harness. No modifications to public API or core GGML operations. Performance Characteristics The implementation exhibits linear scaling with row width (n_cols) due to iteration loops processing 128 elements per iteration. Inter-iteration dependency via |
|
Explore the complete analysis inside the Version Insights Performance Analysis Summary - PR #312OverviewPR #312 implements Performance Impact AssessmentNo Impact on Inference Performance The implementation adds a new operation ( Power Consumption Analysis Binary-level analysis shows no changes to inference-related binaries:
The 56.96% overall power consumption reduction observed in the version comparison reflects architectural reorganization in other binaries ( Key FindingsFunctional Additions The PR introduces cumulative sum computation using Vulkan subgroup arithmetic primitives. The implementation processes one row per workgroup with 128 threads, utilizing Code Structure Refactored Hardware Requirements Operation requires Integration Points Changes integrate into existing Vulkan backend infrastructure: pipeline registration, operation dispatch, graph builder, and test harness. No modifications to public API or core GGML operations. Performance Characteristics The implementation exhibits linear scaling with row width (n_cols) due to iteration loops processing 128 elements per iteration. Inter-iteration dependency via |
92ef8cd to
7dd50b8
Compare
Mirrored from ggml-org/llama.cpp#17479
One row per workgroup, similar to sum_rows. Depending on how large real-world use cases are it may be possible to make it faster.