Commit de5c576
Adding occupancy tuning for CUDA architectures (kokkos#6788)
* Merging occupancy tuning changes from David Polikoff.
Note: This is a re-commit of a somehow polluted branch when I rebased on
develop. I started over with the 5 changed files.
The old Kokkos fork/branch from :
davidp git@github.com:DavidPoliakoff/kokkos.git (fetch)
was merged with current Kokkos develop, and tested with ArborX to
confirm that autotuning occupancy for the DBSCAN benchmark worked.
In tests on a system with V100, the original benchmark when iterated
600 times took 119.064 seconds to run. During the tuning process
(using simulated annealing), the runtime was 108.014 seconds.
When using cached results, the runtime was 109.058 seconds. The
converged occupancy value was 70. Here are the cached results
from APEX autotuning:
Input_1:
name: kokkos.kernel_name
id: 1
info.type: string
info.category: categorical
info.valueQuantity: unbounded
info.candidates: unbounded
num_bins: 0
Input_2:
name: kokkos.kernel_type
id: 2
info.type: string
info.category: categorical
info.valueQuantity: set
info.candidates: [parallel_for,parallel_reduce,parallel_scan,parallel_copy]
Output_3:
name: ArborX::Experimental::HalfTraversal
id: 3
info.type: int64
info.category: ratio
info.valueQuantity: range
info.candidates:
lower: 5
upper: 100
step: 5
open upper: 0
open lower: 0
Context_0:
Name: "[2:parallel_for,1:ArborX::Experimental::HalfTraversal,tree_node:default]"
Converged: true
Results:
NumVars: 1
id: 3
value: 70
In manual experiments, the ArborX team determined that the optimal
occupancy for this example was beetween 40-90, which were a 10%
improvement over baseline default of 100. See arborx/ArborX#815
for details.
One deviation from the branch that David had written - the occupancy
range is [5-100], with a step size of 5. The original implementation
in Kokkos used [1-100] with a step size of 1.
* Fixing formatting check, not sure how those reverted
* Fixing problems with recursive Impl namespace, MDRange Reduce tuning and OpenMP Reduce tuning. Now trying to fix Team tuning...
* removing comments that failed format check
* Removing commented code
* Final code fixes, likely to be some formatting fixes needed.
* Expected formatting changes
* Yet another formatting fix...
* Removing default operators and copy constructors that aren't needed
* Update core/src/impl/Kokkos_Profiling.hpp
Co-authored-by: Daniel Arndt <arndtd@ornl.gov>
* Fixing formatting check
* Clang-format complained about a newline
* Update Kokkos_Profiling.hpp
Minor fix to prevent incrementing the context id index when not calling `context_begin()`. In actuality, this should be refactored so that `begin_context()` increments the id, and returns it. `end_context()` is the only location that decrements the context id index.
* Unify [begin|end]_parallel_* APIs
* Merge more functionality
* Update TestViewMapping_a test
* Remove Reducers_d from MSVC tests
---------
Co-authored-by: Daniel Arndt <arndtd@ornl.gov>1 parent ef560bf commit de5c576
File tree
8 files changed
+389
-127
lines changed- core
- src
- impl
- traits
- unit_test
8 files changed
+389
-127
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
134 | 134 | | |
135 | 135 | | |
136 | 136 | | |
137 | | - | |
138 | | - | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
139 | 141 | | |
140 | 142 | | |
141 | 143 | | |
| |||
348 | 350 | | |
349 | 351 | | |
350 | 352 | | |
351 | | - | |
352 | | - | |
353 | | - | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
354 | 358 | | |
355 | 359 | | |
356 | 360 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1498 | 1498 | | |
1499 | 1499 | | |
1500 | 1500 | | |
1501 | | - | |
1502 | | - | |
1503 | | - | |
1504 | | - | |
1505 | 1501 | | |
1506 | 1502 | | |
1507 | 1503 | | |
1508 | 1504 | | |
1509 | 1505 | | |
1510 | 1506 | | |
1511 | | - | |
1512 | 1507 | | |
1513 | 1508 | | |
| 1509 | + | |
| 1510 | + | |
| 1511 | + | |
| 1512 | + | |
| 1513 | + | |
| 1514 | + | |
| 1515 | + | |
| 1516 | + | |
| 1517 | + | |
1514 | 1518 | | |
1515 | 1519 | | |
1516 | 1520 | | |
1517 | 1521 | | |
1518 | | - | |
1519 | | - | |
1520 | | - | |
1521 | | - | |
| 1522 | + | |
1522 | 1523 | | |
1523 | 1524 | | |
1524 | 1525 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
52 | 52 | | |
53 | 53 | | |
54 | 54 | | |
| 55 | + | |
| 56 | + | |
55 | 57 | | |
56 | 58 | | |
57 | 59 | | |
| |||
420 | 422 | | |
421 | 423 | | |
422 | 424 | | |
423 | | - | |
| 425 | + | |
424 | 426 | | |
425 | 427 | | |
426 | | - | |
| 428 | + | |
| 429 | + | |
427 | 430 | | |
428 | 431 | | |
429 | 432 | | |
| |||
505 | 508 | | |
506 | 509 | | |
507 | 510 | | |
508 | | - | |
| 511 | + | |
| 512 | + | |
509 | 513 | | |
510 | 514 | | |
511 | 515 | | |
| |||
515 | 519 | | |
516 | 520 | | |
517 | 521 | | |
| 522 | + | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
| 531 | + | |
| 532 | + | |
| 533 | + | |
| 534 | + | |
| 535 | + | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
| 543 | + | |
| 544 | + | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
| 549 | + | |
| 550 | + | |
| 551 | + | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
| 563 | + | |
| 564 | + | |
| 565 | + | |
| 566 | + | |
| 567 | + | |
| 568 | + | |
| 569 | + | |
| 570 | + | |
| 571 | + | |
| 572 | + | |
| 573 | + | |
| 574 | + | |
| 575 | + | |
| 576 | + | |
| 577 | + | |
| 578 | + | |
| 579 | + | |
| 580 | + | |
| 581 | + | |
| 582 | + | |
| 583 | + | |
| 584 | + | |
| 585 | + | |
| 586 | + | |
| 587 | + | |
| 588 | + | |
| 589 | + | |
| 590 | + | |
| 591 | + | |
| 592 | + | |
| 593 | + | |
| 594 | + | |
| 595 | + | |
| 596 | + | |
| 597 | + | |
| 598 | + | |
| 599 | + | |
| 600 | + | |
| 601 | + | |
| 602 | + | |
| 603 | + | |
| 604 | + | |
| 605 | + | |
| 606 | + | |
| 607 | + | |
| 608 | + | |
| 609 | + | |
| 610 | + | |
| 611 | + | |
| 612 | + | |
| 613 | + | |
| 614 | + | |
| 615 | + | |
| 616 | + | |
| 617 | + | |
| 618 | + | |
| 619 | + | |
| 620 | + | |
| 621 | + | |
| 622 | + | |
| 623 | + | |
| 624 | + | |
| 625 | + | |
| 626 | + | |
518 | 627 | | |
519 | 628 | | |
520 | 629 | | |
| |||
578 | 687 | | |
579 | 688 | | |
580 | 689 | | |
581 | | - | |
| 690 | + | |
| 691 | + | |
582 | 692 | | |
583 | 693 | | |
584 | 694 | | |
585 | 695 | | |
| 696 | + | |
586 | 697 | | |
587 | 698 | | |
588 | 699 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
20 | 29 | | |
21 | 30 | | |
22 | 31 | | |
| |||
64 | 73 | | |
65 | 74 | | |
66 | 75 | | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
67 | 81 | | |
68 | 82 | | |
69 | 83 | | |
| |||
260 | 274 | | |
261 | 275 | | |
262 | 276 | | |
| 277 | + | |
| 278 | + | |
263 | 279 | | |
264 | 280 | | |
265 | 281 | | |
| |||
375 | 391 | | |
376 | 392 | | |
377 | 393 | | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
378 | 399 | | |
0 commit comments