Commit de56279
authored
vulkan: Optimize argsort (#15354)
- Launch an appropriate number of invocations (next larger power of two).
32 invocations is common and the barrier is much cheaper there.
- Specialize for "needs bounds checking" vs not.
- Make the code less branchy and [[unroll]] the loops. In the final code,
I see no branches inside the main loop (only predicated stores) when
needs_bounds_check is false.
- Always sort ascending, then apply the ascending vs descending option when
doing the final stores to memory.
- Copy the values into shared memory, makes them slightly cheaper to access.1 parent 65349f2 commit de56279
File tree
3 files changed
+51
-42
lines changed- ggml/src/ggml-vulkan
- vulkan-shaders
- tests
3 files changed
+51
-42
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
345 | 345 | | |
346 | 346 | | |
347 | 347 | | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
348 | 351 | | |
349 | 352 | | |
350 | 353 | | |
| |||
505 | 508 | | |
506 | 509 | | |
507 | 510 | | |
508 | | - | |
| 511 | + | |
509 | 512 | | |
510 | 513 | | |
511 | 514 | | |
| |||
870 | 873 | | |
871 | 874 | | |
872 | 875 | | |
873 | | - | |
874 | 876 | | |
875 | 877 | | |
876 | 878 | | |
| |||
3099 | 3101 | | |
3100 | 3102 | | |
3101 | 3103 | | |
3102 | | - | |
| 3104 | + | |
| 3105 | + | |
| 3106 | + | |
3103 | 3107 | | |
3104 | 3108 | | |
3105 | 3109 | | |
| |||
7160 | 7164 | | |
7161 | 7165 | | |
7162 | 7166 | | |
7163 | | - | |
| 7167 | + | |
| 7168 | + | |
7164 | 7169 | | |
7165 | 7170 | | |
7166 | 7171 | | |
| |||
8485 | 8490 | | |
8486 | 8491 | | |
8487 | 8492 | | |
8488 | | - | |
8489 | | - | |
8490 | | - | |
8491 | | - | |
8492 | | - | |
8493 | | - | |
8494 | | - | |
8495 | 8493 | | |
8496 | 8494 | | |
8497 | | - | |
8498 | 8495 | | |
8499 | 8496 | | |
8500 | 8497 | | |
| |||
11367 | 11364 | | |
11368 | 11365 | | |
11369 | 11366 | | |
| 11367 | + | |
| 11368 | + | |
11370 | 11369 | | |
11371 | 11370 | | |
11372 | 11371 | | |
| |||
11376 | 11375 | | |
11377 | 11376 | | |
11378 | 11377 | | |
11379 | | - | |
11380 | 11378 | | |
11381 | 11379 | | |
11382 | 11380 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
| 2 | + | |
2 | 3 | | |
3 | 4 | | |
4 | 5 | | |
5 | | - | |
| 6 | + | |
| 7 | + | |
6 | 8 | | |
7 | 9 | | |
8 | | - | |
| 10 | + | |
9 | 11 | | |
10 | 12 | | |
11 | 13 | | |
12 | 14 | | |
13 | 15 | | |
14 | 16 | | |
15 | | - | |
16 | 17 | | |
17 | 18 | | |
18 | 19 | | |
19 | 20 | | |
| 21 | + | |
20 | 22 | | |
21 | 23 | | |
22 | 24 | | |
23 | 25 | | |
24 | 26 | | |
25 | 27 | | |
26 | 28 | | |
27 | | - | |
| 29 | + | |
28 | 30 | | |
29 | 31 | | |
30 | 32 | | |
31 | 33 | | |
32 | 34 | | |
33 | 35 | | |
34 | 36 | | |
35 | | - | |
36 | | - | |
37 | | - | |
| 37 | + | |
| 38 | + | |
38 | 39 | | |
39 | 40 | | |
40 | | - | |
41 | | - | |
42 | | - | |
43 | | - | |
44 | | - | |
45 | | - | |
46 | | - | |
47 | | - | |
48 | | - | |
49 | | - | |
50 | | - | |
51 | | - | |
52 | | - | |
53 | | - | |
54 | | - | |
55 | | - | |
56 | | - | |
57 | | - | |
58 | | - | |
59 | | - | |
60 | | - | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
61 | 58 | | |
| 59 | + | |
62 | 60 | | |
63 | 61 | | |
64 | 62 | | |
65 | 63 | | |
66 | 64 | | |
67 | | - | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
68 | 78 | | |
69 | 79 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6028 | 6028 | | |
6029 | 6029 | | |
6030 | 6030 | | |
| 6031 | + | |
6031 | 6032 | | |
6032 | 6033 | | |
6033 | 6034 | | |
| |||
0 commit comments