Skip to content

Commit 4d7c10a

Browse files
author
Raghuveer Devulapalli
committed
Update README with objsort
1 parent 6362001 commit 4d7c10a

File tree

2 files changed

+82
-14
lines changed

2 files changed

+82
-14
lines changed

README.md

Lines changed: 82 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,32 +1,57 @@
11
# x86-simd-sort
22

33
C++ template library for high performance SIMD based sorting routines for
4-
16-bit, 32-bit and 64-bit data types. The sorting routines are accelerated
5-
using AVX-512/AVX2 when available. The library auto picks the best version
6-
depending on the processor it is run on. If you are looking for the AVX-512 or
7-
AVX2 specific implementations, please see
8-
[README](https://github.com/intel/x86-simd-sort/blob/main/src/README.md) file under
9-
`src/` directory. The following routines are currently supported:
4+
built-in integers and floats (16-bit, 32-bit and 64-bit data types) and custom
5+
defined C++ objects. The sorting routines are accelerated using AVX-512/AVX2
6+
when available. The library auto picks the best version depending on the
7+
processor it is run on. If you are looking for the AVX-512 or AVX2 specific
8+
implementations, please see
9+
[README](https://github.com/intel/x86-simd-sort/blob/main/src/README.md) file
10+
under `src/` directory. The following routines are currently supported:
11+
12+
## Sort an array of custom defined class objects (uses `O(N)` space)
13+
``` cpp
14+
template <typename T, typename Func>
15+
void x86simdsort::object_qsort(T *arr, uint32_t arrsize, Func key_func)
16+
```
17+
`T` is any user defined struct or class and `arr` is a pointer to the first
18+
element in the array of objects of type `T`. `Func` is a lambda function that
19+
computes the `key` value for each object which is the metric used to sort the
20+
objects. `Func` needs to have the following signature:
1021
22+
```cpp
23+
[] (T obj) -> type_t { type_t key; /* compute key for obj */ return key; }
24+
```
1125

12-
### Sort routines on arrays
26+
Note that the return type of the key `type_t` needs to be one of the following
27+
: `[float, uint32_t, int32_t, double, uint64_t, int64_t]`. `object_qsort` has a
28+
space complexity of `O(N)`. Specifically, it requires `arrsize*(sizeof(type_t)`
29+
\+ `sizeof(uint32_t))` additional space. It allocates two `std::vectors`: one
30+
for storing all the keys and another storing the indexes of the object array.
31+
For performance reasons, we support `object_qsort` only when the array size
32+
is less than or equal to `UINT32_MAX`. An example usage of `object_qsort`
33+
is provided in the [examples](#Sort-an-array-of-Points-using-object_qsort)
34+
section. Refer to [section](#Performance-of-object_qsort) to get a sense
35+
of how fast this is relative to `std::sort`.
36+
37+
## Sort an array of built-in integers and floats
1338
```cpp
14-
x86simdsort::qsort(T* arr, size_t size, bool hasnan);
15-
x86simdsort::qselect(T* arr, size_t k, size_t size, bool hasnan);
16-
x86simdsort::partial_qsort(T* arr, size_t k, size_t size, bool hasnan);
39+
void x86simdsort::qsort(T* arr, size_t size, bool hasnan);
40+
void x86simdsort::qselect(T* arr, size_t k, size_t size, bool hasnan);
41+
void x86simdsort::partial_qsort(T* arr, size_t k, size_t size, bool hasnan);
1742
```
1843
Supported datatypes: `T` $\in$ `[_Float16, uint16_t, int16_t, float, uint32_t,
1944
int32_t, double, uint64_t, int64_t]`
2045
21-
### Key-value sort routines on pairs of arrays
46+
## Key-value sort routines on pairs of arrays
2247
```cpp
23-
x86simdsort::keyvalue_qsort(T1* key, T2* val, size_t size, bool hasnan);
48+
void x86simdsort::keyvalue_qsort(T1* key, T2* val, size_t size, bool hasnan);
2449
```
2550
Supported datatypes: `T1`, `T2` $\in$ `[float, uint32_t, int32_t, double,
2651
uint64_t, int64_t]` Note that keyvalue sort is not yet supported for 16-bit
2752
data types.
2853

29-
### Arg sort routines on arrays
54+
## Arg sort routines on arrays
3055
```cpp
3156
std::vector<size_t> arg = x86simdsort::argsort(T* arr, size_t size, bool hasnan);
3257
std::vector<size_t> arg = x86simdsort::argselect(T* arr, size_t k, size_t size, bool hasnan);
@@ -55,16 +80,38 @@ can configure meson to build them both by using `-Dbuild_tests=true` and
5580

5681
## Example usage
5782

83+
#### Sort an array of floats
84+
5885
```cpp
5986
#include "x86simdsort.h"
6087

6188
int main() {
6289
std::vector<float> arr{1000};
63-
x86simdsort::qsort(arr, 1000, true);
90+
x86simdsort::qsort(arr.data(), 1000, true);
6491
return 0;
6592
}
6693
```
6794

95+
#### Sort an array of Points using object_qsort
96+
```cpp
97+
#include "x86simdsort.h"
98+
#include <cmath>
99+
100+
struct Point {
101+
double x, y, z;
102+
};
103+
104+
int main() {
105+
std::vector<Point> arr{1000};
106+
// Sort an array of Points by its x value:
107+
x86simdsort::object_qsort(arr.data(), 1000, [](Point p) { return p.x; });
108+
// Sort an array of Points by its distance from origin:
109+
x86simdsort::object_qsort(arr.data(), 1000, [](Point p) {
110+
return sqrt(p.x*p.x+p.y*p.y+p.z*p.z);
111+
});
112+
return 0;
113+
}
114+
```
68115
69116
## Details
70117
@@ -95,6 +142,27 @@ argselect) will not use the SIMD based algorithms if they detect NAN's in the
95142
array. You can read details of all the implementations
96143
[here](https://github.com/intel/x86-simd-sort/blob/main/src/README.md).
97144
145+
## Performance comparison on AVX-512: `object_qsort` v/s `std::sort`
146+
`object_qsort` relies on key-value sort which is currently accelerated only on
147+
AVX-512 (we plan to add AVX2 version soon). Benchmarks added in
148+
[bench-objsort.hpp](./benchmarks/bench-objsort.hpp) measures performance of
149+
`object_qsort` relative to `std::sort` when sorting an array of `struct Point
150+
{double x, y, z;}` and `struct Point {float x, y, x;}` for various metrics:
151+
152+
+ sort by coordinate `x`
153+
+ sort by manhanttan distance (relative to origin): `abs(x) + abx(y) + abs(z)`
154+
+ sort by Euclidean distance (relative to origin): `sqrt(x*x + y*y + z*z)`
155+
+ sort by Chebyshev distance (relative to origin): `max(x, y, z)`
156+
157+
The data was collected on a processor with AVX-512 and is shown in the plot
158+
below. For the simplest of cases where we want to sort an array of struct by
159+
one of its members, `object_qsort` can be up-to 5x faster for 32-bit data type
160+
and about 4x for 64-bit data type. It tends to do better when the metric to
161+
sort by gets more complicated. Sorting by Euclidean distance can be up-to 10x
162+
faster.
163+
164+
![alt text](./benchmarks/object_qsort-perf.jpg?raw=true)
165+
98166
## Downstream projects using x86-simd-sort
99167
100168
- NumPy uses this as a [submodule](https://github.com/numpy/numpy/pull/22315) to accelerate `np.sort, np.argsort, np.partition and np.argpartition`.

benchmarks/object_qsort-perf.jpg

152 KB
Loading

0 commit comments

Comments
 (0)