You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: benchmarks/README.md
+37-4Lines changed: 37 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,8 +6,10 @@ This directory contains benchmark tracking for the DeviceSparseArrays.jl package
6
6
7
7
-`Project.toml`: Dependencies for running benchmarks
8
8
-`runbenchmarks.jl`: Main script that runs all benchmarks
9
+
-`benchmark_utils.jl`: Utility functions for benchmarking (synchronization helpers)
9
10
-`vector_benchmarks.jl`: Benchmarks for sparse vector operations
10
11
-`matrix_benchmarks.jl`: Benchmarks for sparse matrix operations
12
+
-`conversion_benchmarks.jl`: Benchmarks for format conversion operations
11
13
12
14
## Benchmarks Tracked
13
15
@@ -23,6 +25,11 @@ All matrix operations are benchmarked for CSC, CSR, and COO formats to compare t
23
25
-**Matrix-Vector Multiplication**: `mul!(y, A, x)` for sparse matrix A and dense vectors x, y
24
26
-**Matrix-Matrix Multiplication**: `mul!(C, A, B)` for sparse matrix A and dense matrix B
25
27
-**Three-argument dot**: `dot(x, A, y)` for sparse matrix A and dense vectors x, y
28
+
-**Sparse + Dense Addition**: `A + B` for sparse matrix A and dense matrix B
29
+
30
+
### Format Conversions
31
+
-**CSC ↔ COO**: Conversions between Compressed Sparse Column and Coordinate formats
32
+
-**CSR ↔ COO**: Conversions between Compressed Sparse Row and Coordinate formats
26
33
27
34
## Array Types
28
35
@@ -88,20 +95,46 @@ To add new benchmarks:
88
95
SUITE[group_name] = BenchmarkGroup()
89
96
end
90
97
91
-
SUITE[group_name]["Test Case [$array_type_name]"] =
92
-
@benchmarkable operation($adapted_data)
98
+
# IMPORTANT: Wrap operations with synchronization for accurate GPU timing
99
+
SUITE[group_name]["Test Case [$array_type_name]"] = @benchmarkable begin
100
+
operation($adapted_data)
101
+
_synchronize_backend($adapted_data)
102
+
end
93
103
94
104
returnnothing
95
105
end
96
106
```
97
107
3. Call your function in `runbenchmarks.jl` for each array type
98
108
4. Test locally with `make benchmark`
99
109
110
+
## GPU Synchronization
111
+
112
+
All benchmarks include backend synchronization to ensure accurate timing on GPU backends. GPU operations are often asynchronous, meaning they may return before the computation completes. Without synchronization, benchmarks would underestimate the actual execution time.
113
+
114
+
The `_synchronize_backend(arr)` helper function:
115
+
- Calls `KernelAbstractions.synchronize(get_backend(arr))` for arrays supporting KernelAbstractions
116
+
- Is a no-op for CPU arrays and arrays without KernelAbstractions support
117
+
- Safely handles any array type, even those without `get_backend` defined
118
+
119
+
This approach works for:
120
+
-**CPU arrays**: No synchronization needed (no-op)
121
+
-**GPU arrays with KernelAbstractions**: Proper synchronization
122
+
-**Other array types**: Gracefully degrades to no-op
123
+
124
+
All benchmarks follow the pattern:
125
+
```julia
126
+
@benchmarkable begin
127
+
my_operation(...)
128
+
_synchronize_backend($some_array)
129
+
end
130
+
```
131
+
100
132
## Notes
101
133
102
-
- Benchmarks use `BLAS.set_num_threads(1)` to ensure consistent results
0 commit comments