Commit 8f7d319
authored
Perf: Optimize Davidson by fusing operators, offloading CPU computation to GPU, and reducing memory transfers (#6493)
* Perf: Optimize Diago_DavSubspace with GPU operators by adding and fusing custom kernels.
Signed-off-by:Tianxiang Wang<[email protected]>, Contributed under MetaX Integrated Circuits (Shanghai) Co., Ltd.
* Perf: reduce memory allocation and copy in Diago_DavSubspace::diag_zhegvx
Signed-off-by:Tianxiang Wang<[email protected]>, Contributed under MetaX Integrated Circuits (Shanghai) Co., Ltd.
* Perf: Replace loop-based 2D copy and memset with memcpy_2d_op, memset_2d_op
Signed-off-by:Tianxiang Wang<[email protected]>, Contributed under MetaX Integrated Circuits (Shanghai) Co., Ltd.
* Perf: use warp reduce instead of shared memory for better efficiency
Signed-off-by:Tianxiang Wang<[email protected]>, Contributed under MetaX Integrated Circuits (Shanghai) Co., Ltd.
* Fix compilation error
Signed-off-by:Tianxiang Wang<[email protected]>, Contributed under MetaX Integrated Circuits (Shanghai) Co., Ltd.1 parent ec3f0a6 commit 8f7d319
File tree
15 files changed
+768
-190
lines changed- source
- source_base
- kernels
- cuda
- rocm
- module_device
- cuda
- source_hsolver
- kernels
- cuda
- rocm
15 files changed
+768
-190
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
133 | 133 | | |
134 | 134 | | |
135 | 135 | | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
136 | 144 | | |
137 | 145 | | |
138 | 146 | | |
| |||
147 | 155 | | |
148 | 156 | | |
149 | 157 | | |
150 | | - | |
| 158 | + | |
151 | 159 | | |
152 | 160 | | |
153 | 161 | | |
| |||
438 | 446 | | |
439 | 447 | | |
440 | 448 | | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
441 | 479 | | |
442 | 480 | | |
443 | 481 | | |
444 | 482 | | |
445 | 483 | | |
446 | 484 | | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
447 | 489 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
119 | 119 | | |
120 | 120 | | |
121 | 121 | | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
122 | 144 | | |
123 | 145 | | |
124 | 146 | | |
125 | 147 | | |
126 | 148 | | |
127 | 149 | | |
| 150 | + | |
128 | 151 | | |
129 | 152 | | |
130 | 153 | | |
| |||
133 | 156 | | |
134 | 157 | | |
135 | 158 | | |
| 159 | + | |
| 160 | + | |
136 | 161 | | |
137 | 162 | | |
138 | 163 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
104 | 104 | | |
105 | 105 | | |
106 | 106 | | |
107 | | - | |
| 107 | + | |
108 | 108 | | |
109 | 109 | | |
110 | 110 | | |
| |||
298 | 298 | | |
299 | 299 | | |
300 | 300 | | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
301 | 326 | | |
302 | 327 | | |
303 | 328 | | |
| |||
314 | 339 | | |
315 | 340 | | |
316 | 341 | | |
317 | | - | |
| 342 | + | |
318 | 343 | | |
319 | 344 | | |
320 | 345 | | |
| |||
393 | 418 | | |
394 | 419 | | |
395 | 420 | | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
396 | 432 | | |
397 | 433 | | |
398 | 434 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
145 | 145 | | |
146 | 146 | | |
147 | 147 | | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
148 | 157 | | |
149 | 158 | | |
150 | 159 | | |
| |||
159 | 168 | | |
160 | 169 | | |
161 | 170 | | |
162 | | - | |
| 171 | + | |
163 | 172 | | |
164 | 173 | | |
165 | 174 | | |
| |||
437 | 446 | | |
438 | 447 | | |
439 | 448 | | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
440 | 458 | | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
441 | 481 | | |
442 | 482 | | |
443 | 483 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
85 | 85 | | |
86 | 86 | | |
87 | 87 | | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
88 | 98 | | |
89 | 99 | | |
90 | 100 | | |
| |||
112 | 122 | | |
113 | 123 | | |
114 | 124 | | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
115 | 161 | | |
116 | 162 | | |
117 | 163 | | |
| |||
196 | 242 | | |
197 | 243 | | |
198 | 244 | | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
199 | 251 | | |
200 | 252 | | |
201 | 253 | | |
| |||
212 | 264 | | |
213 | 265 | | |
214 | 266 | | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
215 | 283 | | |
216 | 284 | | |
217 | 285 | | |
| |||
0 commit comments