@@ -25,14 +25,14 @@ objects. `Func` needs to have the following signature:
25
25
26
26
Note that the return type of the key ` type_t ` needs to be one of the following
27
27
: ` [float, uint32_t, int32_t, double, uint64_t, int64_t] ` . ` object_qsort ` has a
28
- space complexity of ` O(N) ` . Specifically, it requires ` arrsize*(sizeof(type_t) `
29
- \+ ` sizeof(uint32_t)) ` additional space. It allocates two ` std::vectors ` : one
30
- for storing all the keys and another storing the indexes of the object array.
31
- For performance reasons, we support ` object_qsort ` only when the array size
32
- is less than or equal to ` UINT32_MAX ` . An example usage of ` object_qsort `
33
- is provided in the [ examples] ( #Sort-an-array-of-Points-using-object_qsort )
34
- section. Refer to [ section] ( #Performance-of-object_qsort ) to get a sense
35
- of how fast this is relative to ` std::sort ` .
28
+ space complexity of ` O(N) ` . Specifically, it requires `arrsize *
29
+ sizeof(type_t)` bytes to store a vector with all the keys and an additional
30
+ ` arrsize * sizeof(uint32_t) ` bytes to store the indexes of the object array.
31
+ For performance reasons, we support ` object_qsort ` only when the array size is
32
+ less than or equal to ` UINT32_MAX ` . An example usage of ` object_qsort ` is
33
+ provided in the [ examples] ( #Sort-an-array-of-Points-using-object_qsort )
34
+ section. Refer to [ section] ( #Performance-of-object_qsort ) to get a sense of
35
+ how fast this is relative to ` std::sort ` .
36
36
37
37
## Sort an array of built-in integers and floats
38
38
``` cpp
@@ -143,23 +143,29 @@ array. You can read details of all the implementations
143
143
[here](https://github.com/intel/x86-simd-sort/blob/main/src/README.md).
144
144
145
145
## Performance comparison on AVX-512: `object_qsort` v/s `std::sort`
146
- `object_qsort` relies on key-value sort which is currently accelerated only on
147
- AVX-512 (we plan to add AVX2 version soon). Benchmarks added in
148
- [bench-objsort.hpp](./benchmarks/bench-objsort.hpp) measures performance of
149
- `object_qsort` relative to `std::sort` when sorting an array of `struct Point
150
- {double x, y, z;}` and `struct Point {float x, y, x;}` for various metrics:
146
+ Performance of `object_qsort` can vary significantly depending on the defintion
147
+ of the custom class and we highly recommend benchmarking before using it. For
148
+ the sake of illustration, we provide a few examples in
149
+ [./benchmarks/bench-objsort.hpp](./benchmarks/bench-objsort.hpp) which measures
150
+ performance of `object_qsort` relative to `std::sort` when sorting an array of
151
+ points in the cartesian coordinates represented by the class: `struct Point
152
+ {double x, y, z;}` and `struct Point {float x, y, x;}`. We sort these points
153
+ based on several different metrics:
151
154
152
155
+ sort by coordinate `x`
153
156
+ sort by manhanttan distance (relative to origin): `abs(x) + abx(y) + abs(z)`
154
157
+ sort by Euclidean distance (relative to origin): `sqrt(x*x + y*y + z*z)`
155
158
+ sort by Chebyshev distance (relative to origin): `max(x, y, z)`
156
159
157
- The data was collected on a processor with AVX-512 and is shown in the plot
158
- below. For the simplest of cases where we want to sort an array of struct by
159
- one of its members, `object_qsort` can be up-to 5x faster for 32-bit data type
160
- and about 4x for 64-bit data type. It tends to do better when the metric to
161
- sort by gets more complicated. Sorting by Euclidean distance can be up-to 10x
162
- faster.
160
+ The performance data (shown in the plot below) can be collected by building the
161
+ benchmarks suite and running `./builddir/benchexe --benchmark_filter==*obj*`.
162
+ The data plot shown below was collected on a processor with AVX-512 because
163
+ `object_qsort` is currently accelerated only on AVX-512 (we plan to add the
164
+ AVX2 version soon). For the simplest of cases where we want to sort an array of
165
+ struct by one of its members, `object_qsort` can be up-to 5x faster for 32-bit
166
+ data type and about 4x for 64-bit data type. It tends to do even better when
167
+ the metric to sort by gets more complicated. Sorting by Euclidean distance can
168
+ be up-to 10x faster.
163
169
164
170

165
171
0 commit comments