Commit e4709ec
An even better solution to carryStep. Optimizer was having trouble generating v_bfe_i32 instructions.
Used the approved amdgcn_builtin. Saved 30 bytes of assembly code. (Yes, I know that is not terribly important. I'm hoping
that by reducing the complexity of carryFused the optimizer won't go bonkers and generate poor code quite as often).
The key to getting reduced code is creating sequences where both nBits and 32-nBits need to be genereated.1 parent 8285e9f commit e4709ec
1 file changed
+7
-3
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
18 | 21 | | |
| 22 | + | |
| 23 | + | |
19 | 24 | | |
20 | 25 | | |
21 | 26 | | |
| |||
69 | 74 | | |
70 | 75 | | |
71 | 76 | | |
72 | | - | |
73 | | - | |
74 | | - | |
| 77 | + | |
| 78 | + | |
75 | 79 | | |
76 | 80 | | |
77 | 81 | | |
| |||
0 commit comments