Skip to content

[STF] stackable stf resources#2674

Open
caugonnet wants to merge 631 commits intoNVIDIA:mainfrom
caugonnet:stackable_ctx_data
Open

[STF] stackable stf resources#2674
caugonnet wants to merge 631 commits intoNVIDIA:mainfrom
caugonnet:stackable_ctx_data

Conversation

@caugonnet
Copy link
Contributor

@caugonnet caugonnet commented Oct 31, 2024

Description

This introduces helper methods to improve how we nest contexts to better leverage CUDA Graphs

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Oct 31, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@caugonnet
Copy link
Contributor Author

/ok to test

1 similar comment
@caugonnet
Copy link
Contributor Author

/ok to test

@github-actions
Copy link
Contributor

🟩 CI finished in 23m 04s: Pass: 100%/54 | Total: 4h 36m | Avg: 5m 06s | Max: 17m 44s | Hits: 89%/224
  • 🟩 cudax: Pass: 100%/54 | Total: 4h 36m | Avg: 5m 06s | Max: 17m 44s | Hits: 89%/224

    🟩 cpu
      🟩 amd64              Pass: 100%/50  | Total:  4h 20m | Avg:  5m 13s | Max: 17m 44s | Hits:  89%/224   
      🟩 arm64              Pass: 100%/4   | Total: 15m 16s | Avg:  3m 49s | Max:  4m 54s
    🟩 ctk
      🟩 12.0               Pass: 100%/19  | Total:  1h 40m | Avg:  5m 16s | Max: 17m 32s | Hits:  89%/112   
      🟩 12.5               Pass: 100%/2   | Total:  9m 25s | Avg:  4m 42s | Max:  4m 47s
      🟩 12.6               Pass: 100%/33  | Total:  2h 46m | Avg:  5m 03s | Max: 17m 44s | Hits:  89%/112   
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/19  | Total:  1h 40m | Avg:  5m 16s | Max: 17m 32s | Hits:  89%/112   
      🟩 nvcc12.5           Pass: 100%/2   | Total:  9m 25s | Avg:  4m 42s | Max:  4m 47s
      🟩 nvcc12.6           Pass: 100%/33  | Total:  2h 46m | Avg:  5m 03s | Max: 17m 44s | Hits:  89%/112   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/54  | Total:  4h 36m | Avg:  5m 06s | Max: 17m 44s | Hits:  89%/224   
    🟩 cxx
      🟩 Clang9             Pass: 100%/2   | Total:  7m 21s | Avg:  3m 40s | Max:  4m 02s
      🟩 Clang10            Pass: 100%/2   | Total:  6m 59s | Avg:  3m 29s | Max:  3m 41s
      🟩 Clang11            Pass: 100%/4   | Total: 13m 28s | Avg:  3m 22s | Max:  3m 34s
      🟩 Clang12            Pass: 100%/4   | Total: 13m 16s | Avg:  3m 19s | Max:  3m 27s
      🟩 Clang13            Pass: 100%/4   | Total: 13m 11s | Avg:  3m 17s | Max:  3m 23s
      🟩 Clang14            Pass: 100%/4   | Total: 27m 40s | Avg:  6m 55s | Max: 17m 27s
      🟩 Clang15            Pass: 100%/2   | Total:  7m 01s | Avg:  3m 30s | Max:  3m 37s
      🟩 Clang16            Pass: 100%/4   | Total: 13m 48s | Avg:  3m 27s | Max:  3m 48s
      🟩 Clang17            Pass: 100%/2   | Total:  7m 19s | Avg:  3m 39s | Max:  3m 42s
      🟩 Clang18            Pass: 100%/2   | Total: 19m 21s | Avg:  9m 40s | Max: 15m 57s
      🟩 GCC9               Pass: 100%/2   | Total:  7m 29s | Avg:  3m 44s | Max:  3m 56s
      🟩 GCC10              Pass: 100%/4   | Total: 15m 40s | Avg:  3m 55s | Max:  4m 13s
      🟩 GCC11              Pass: 100%/4   | Total: 14m 55s | Avg:  3m 43s | Max:  3m 53s
      🟩 GCC12              Pass: 100%/7   | Total:  1h 07m | Avg:  9m 34s | Max: 17m 44s
      🟩 GCC13              Pass: 100%/3   | Total: 12m 09s | Avg:  4m 03s | Max:  4m 54s
      🟩 MSVC14.36          Pass: 100%/1   | Total: 10m 04s | Avg: 10m 04s | Max: 10m 04s | Hits:  89%/112   
      🟩 MSVC14.39          Pass: 100%/1   | Total: 10m 06s | Avg: 10m 06s | Max: 10m 06s | Hits:  89%/112   
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  9m 25s | Avg:  4m 42s | Max:  4m 47s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/30  | Total:  2h 09m | Avg:  4m 18s | Max: 17m 27s
      🟩 GCC                Pass: 100%/20  | Total:  1h 57m | Avg:  5m 51s | Max: 17m 44s
      🟩 MSVC               Pass: 100%/2   | Total: 20m 10s | Avg: 10m 05s | Max: 10m 06s | Hits:  89%/224   
      🟩 NVHPC              Pass: 100%/2   | Total:  9m 25s | Avg:  4m 42s | Max:  4m 47s
    🟩 gpu
      🟩 v100               Pass: 100%/54  | Total:  4h 36m | Avg:  5m 06s | Max: 17m 44s | Hits:  89%/224   
    🟩 jobs
      🟩 Build              Pass: 100%/49  | Total:  3h 10m | Avg:  3m 53s | Max: 10m 06s | Hits:  89%/224   
      🟩 Test               Pass: 100%/5   | Total:  1h 25m | Avg: 17m 04s | Max: 17m 44s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  3m 15s | Avg:  3m 15s | Max:  3m 15s
      🟩 90a                Pass: 100%/1   | Total:  3m 22s | Avg:  3m 22s | Max:  3m 22s
    🟩 std
      🟩 17                 Pass: 100%/29  | Total:  2h 12m | Avg:  4m 33s | Max: 17m 32s
      🟩 20                 Pass: 100%/25  | Total:  2h 24m | Avg:  5m 45s | Max: 17m 44s | Hits:  89%/224   
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

🏃‍ Runner counts (total jobs: 54)

# Runner
43 linux-amd64-cpu16
5 linux-amd64-gpu-v100-latest-1
4 linux-arm64-cpu16
2 windows-amd64-cpu16

@caugonnet
Copy link
Contributor Author

/ok to test

@github-actions
Copy link
Contributor

github-actions bot commented Nov 4, 2024

🟩 CI finished in 1h 12m: Pass: 100%/54 | Total: 4h 28m | Avg: 4m 58s | Max: 23m 23s | Hits: 89%/224
  • 🟩 cudax: Pass: 100%/54 | Total: 4h 28m | Avg: 4m 58s | Max: 23m 23s | Hits: 89%/224

    🟩 cpu
      🟩 amd64              Pass: 100%/50  | Total:  4h 15m | Avg:  5m 06s | Max: 23m 23s | Hits:  89%/224   
      🟩 arm64              Pass: 100%/4   | Total: 13m 38s | Avg:  3m 24s | Max:  4m 29s
    🟩 ctk
      🟩 12.0               Pass: 100%/19  | Total:  1h 35m | Avg:  5m 02s | Max: 22m 16s | Hits:  89%/112   
      🟩 12.5               Pass: 100%/2   | Total:  9m 39s | Avg:  4m 49s | Max:  4m 55s
      🟩 12.6               Pass: 100%/33  | Total:  2h 43m | Avg:  4m 56s | Max: 23m 23s | Hits:  89%/112   
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/19  | Total:  1h 35m | Avg:  5m 02s | Max: 22m 16s | Hits:  89%/112   
      🟩 nvcc12.5           Pass: 100%/2   | Total:  9m 39s | Avg:  4m 49s | Max:  4m 55s
      🟩 nvcc12.6           Pass: 100%/33  | Total:  2h 43m | Avg:  4m 56s | Max: 23m 23s | Hits:  89%/112   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/54  | Total:  4h 28m | Avg:  4m 58s | Max: 23m 23s | Hits:  89%/224   
    🟩 cxx
      🟩 Clang9             Pass: 100%/2   | Total:  6m 47s | Avg:  3m 23s | Max:  3m 29s
      🟩 Clang10            Pass: 100%/2   | Total:  7m 03s | Avg:  3m 31s | Max:  3m 42s
      🟩 Clang11            Pass: 100%/4   | Total: 12m 35s | Avg:  3m 08s | Max:  3m 26s
      🟩 Clang12            Pass: 100%/4   | Total: 12m 22s | Avg:  3m 05s | Max:  3m 12s
      🟩 Clang13            Pass: 100%/4   | Total: 12m 54s | Avg:  3m 13s | Max:  3m 25s
      🟩 Clang14            Pass: 100%/4   | Total: 27m 30s | Avg:  6m 52s | Max: 17m 45s
      🟩 Clang15            Pass: 100%/2   | Total:  6m 52s | Avg:  3m 26s | Max:  3m 37s
      🟩 Clang16            Pass: 100%/4   | Total: 13m 29s | Avg:  3m 22s | Max:  3m 40s
      🟩 Clang17            Pass: 100%/2   | Total:  7m 10s | Avg:  3m 35s | Max:  3m 38s
      🟩 Clang18            Pass: 100%/2   | Total: 20m 58s | Avg: 10m 29s | Max: 17m 48s
      🟩 GCC9               Pass: 100%/2   | Total:  6m 10s | Avg:  3m 05s | Max:  3m 17s
      🟩 GCC10              Pass: 100%/4   | Total: 12m 43s | Avg:  3m 10s | Max:  3m 21s
      🟩 GCC11              Pass: 100%/4   | Total: 12m 17s | Avg:  3m 04s | Max:  3m 12s
      🟩 GCC12              Pass: 100%/7   | Total:  1h 16m | Avg: 10m 58s | Max: 23m 23s
      🟩 GCC13              Pass: 100%/3   | Total: 10m 06s | Avg:  3m 22s | Max:  4m 29s
      🟩 MSVC14.36          Pass: 100%/1   | Total:  6m 56s | Avg:  6m 56s | Max:  6m 56s | Hits:  89%/112   
      🟩 MSVC14.39          Pass: 100%/1   | Total:  6m 30s | Avg:  6m 30s | Max:  6m 30s | Hits:  89%/112   
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  9m 39s | Avg:  4m 49s | Max:  4m 55s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/30  | Total:  2h 07m | Avg:  4m 15s | Max: 17m 48s
      🟩 GCC                Pass: 100%/20  | Total:  1h 58m | Avg:  5m 54s | Max: 23m 23s
      🟩 MSVC               Pass: 100%/2   | Total: 13m 26s | Avg:  6m 43s | Max:  6m 56s | Hits:  89%/224   
      🟩 NVHPC              Pass: 100%/2   | Total:  9m 39s | Avg:  4m 49s | Max:  4m 55s
    🟩 gpu
      🟩 v100               Pass: 100%/54  | Total:  4h 28m | Avg:  4m 58s | Max: 23m 23s | Hits:  89%/224   
    🟩 jobs
      🟩 Build              Pass: 100%/49  | Total:  2h 49m | Avg:  3m 27s | Max:  6m 56s | Hits:  89%/224   
      🟩 Test               Pass: 100%/5   | Total:  1h 39m | Avg: 19m 56s | Max: 23m 23s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  2m 46s | Avg:  2m 46s | Max:  2m 46s
      🟩 90a                Pass: 100%/1   | Total:  2m 45s | Avg:  2m 45s | Max:  2m 45s
    🟩 std
      🟩 17                 Pass: 100%/29  | Total:  2h 14m | Avg:  4m 39s | Max: 23m 23s
      🟩 20                 Pass: 100%/25  | Total:  2h 13m | Avg:  5m 21s | Max: 18m 29s | Hits:  89%/224   
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

🏃‍ Runner counts (total jobs: 54)

# Runner
43 linux-amd64-cpu16
5 linux-amd64-gpu-v100-latest-1
4 linux-arm64-cpu16
2 windows-amd64-cpu16

@caugonnet caugonnet added the stf Sequential Task Flow programming model label Nov 7, 2024
@caugonnet caugonnet changed the title stackable stf resources [STF] stackable stf resources Jan 14, 2025
@caugonnet
Copy link
Contributor Author

/ok to test

@github-actions
Copy link
Contributor

🟩 CI finished in 40m 53s: Pass: 100%/20 | Total: 3h 17m | Avg: 9m 53s | Max: 24m 35s | Hits: 582%/312
  • 🟩 cudax: Pass: 100%/20 | Total: 3h 17m | Avg: 9m 53s | Max: 24m 35s | Hits: 582%/312

    🟩 cpu
      🟩 amd64              Pass: 100%/16  | Total:  2h 45m | Avg: 10m 19s | Max: 24m 35s | Hits: 582%/312   
      🟩 arm64              Pass: 100%/4   | Total: 32m 37s | Avg:  8m 09s | Max:  8m 52s
    🟩 ctk
      🟩 12.0               Pass: 100%/1   | Total: 11m 33s | Avg: 11m 33s | Max: 11m 33s | Hits: 582%/156   
      🟩 12.5               Pass: 100%/2   | Total: 11m 00s | Avg:  5m 30s | Max:  5m 42s
      🟩 12.6               Pass: 100%/17  | Total:  2h 55m | Avg: 10m 18s | Max: 24m 35s | Hits: 582%/156   
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/1   | Total: 11m 33s | Avg: 11m 33s | Max: 11m 33s | Hits: 582%/156   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 11m 00s | Avg:  5m 30s | Max:  5m 42s
      🟩 nvcc12.6           Pass: 100%/17  | Total:  2h 55m | Avg: 10m 18s | Max: 24m 35s | Hits: 582%/156   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/20  | Total:  3h 17m | Avg:  9m 53s | Max: 24m 35s | Hits: 582%/312   
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total:  9m 26s | Avg:  9m 26s | Max:  9m 26s
      🟩 Clang15            Pass: 100%/1   | Total:  9m 50s | Avg:  9m 50s | Max:  9m 50s
      🟩 Clang16            Pass: 100%/1   | Total:  9m 15s | Avg:  9m 15s | Max:  9m 15s
      🟩 Clang17            Pass: 100%/1   | Total:  9m 56s | Avg:  9m 56s | Max:  9m 56s
      🟩 Clang18            Pass: 100%/4   | Total: 41m 39s | Avg: 10m 24s | Max: 16m 15s
      🟩 GCC10              Pass: 100%/1   | Total:  9m 42s | Avg:  9m 42s | Max:  9m 42s
      🟩 GCC11              Pass: 100%/1   | Total:  9m 13s | Avg:  9m 13s | Max:  9m 13s
      🟩 GCC12              Pass: 100%/2   | Total: 34m 07s | Avg: 17m 03s | Max: 24m 35s
      🟩 GCC13              Pass: 100%/4   | Total: 30m 57s | Avg:  7m 44s | Max:  8m 52s
      🟩 MSVC14.36          Pass: 100%/1   | Total: 11m 33s | Avg: 11m 33s | Max: 11m 33s | Hits: 582%/156   
      🟩 MSVC14.39          Pass: 100%/1   | Total: 11m 06s | Avg: 11m 06s | Max: 11m 06s | Hits: 582%/156   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 11m 00s | Avg:  5m 30s | Max:  5m 42s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/8   | Total:  1h 20m | Avg: 10m 00s | Max: 16m 15s
      🟩 GCC                Pass: 100%/8   | Total:  1h 23m | Avg: 10m 29s | Max: 24m 35s
      🟩 MSVC               Pass: 100%/2   | Total: 22m 39s | Avg: 11m 19s | Max: 11m 33s | Hits: 582%/312   
      🟩 NVHPC              Pass: 100%/2   | Total: 11m 00s | Avg:  5m 30s | Max:  5m 42s
    🟩 gpu
      🟩 v100               Pass: 100%/20  | Total:  3h 17m | Avg:  9m 53s | Max: 24m 35s | Hits: 582%/312   
    🟩 jobs
      🟩 Build              Pass: 100%/18  | Total:  2h 36m | Avg:  8m 43s | Max: 11m 33s | Hits: 582%/312   
      🟩 Test               Pass: 100%/2   | Total: 40m 50s | Avg: 20m 25s | Max: 24m 35s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  6m 57s | Avg:  6m 57s | Max:  6m 57s
      🟩 90a                Pass: 100%/1   | Total:  7m 24s | Avg:  7m 24s | Max:  7m 24s
    🟩 std
      🟩 17                 Pass: 100%/4   | Total: 28m 13s | Avg:  7m 03s | Max:  7m 50s
      🟩 20                 Pass: 100%/16  | Total:  2h 49m | Avg: 10m 35s | Max: 24m 35s | Hits: 582%/312   
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

🏃‍ Runner counts (total jobs: 20)

# Runner
12 linux-amd64-cpu16
4 linux-arm64-cpu16
2 windows-amd64-cpu16
2 linux-amd64-gpu-v100-latest-1

@caugonnet
Copy link
Contributor Author

/ok to test

@github-actions
Copy link
Contributor

🟩 CI finished in 42m 04s: Pass: 100%/20 | Total: 4h 07m | Avg: 12m 22s | Max: 22m 12s | Hits: 582%/312
  • 🟩 cudax: Pass: 100%/20 | Total: 4h 07m | Avg: 12m 22s | Max: 22m 12s | Hits: 582%/312

    🟩 cpu
      🟩 amd64              Pass: 100%/16  | Total:  3h 22m | Avg: 12m 39s | Max: 22m 12s | Hits: 582%/312   
      🟩 arm64              Pass: 100%/4   | Total: 44m 46s | Avg: 11m 11s | Max: 11m 49s
    🟩 ctk
      🟩 12.0               Pass: 100%/1   | Total: 10m 50s | Avg: 10m 50s | Max: 10m 50s | Hits: 582%/156   
      🟩 12.5               Pass: 100%/2   | Total: 11m 41s | Avg:  5m 50s | Max:  5m 51s
      🟩 12.6               Pass: 100%/17  | Total:  3h 44m | Avg: 13m 13s | Max: 22m 12s | Hits: 582%/156   
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/1   | Total: 10m 50s | Avg: 10m 50s | Max: 10m 50s | Hits: 582%/156   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 11m 41s | Avg:  5m 50s | Max:  5m 51s
      🟩 nvcc12.6           Pass: 100%/17  | Total:  3h 44m | Avg: 13m 13s | Max: 22m 12s | Hits: 582%/156   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/20  | Total:  4h 07m | Avg: 12m 22s | Max: 22m 12s | Hits: 582%/312   
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total: 12m 12s | Avg: 12m 12s | Max: 12m 12s
      🟩 Clang15            Pass: 100%/1   | Total: 13m 19s | Avg: 13m 19s | Max: 13m 19s
      🟩 Clang16            Pass: 100%/1   | Total: 13m 09s | Avg: 13m 09s | Max: 13m 09s
      🟩 Clang17            Pass: 100%/1   | Total: 13m 12s | Avg: 13m 12s | Max: 13m 12s
      🟩 Clang18            Pass: 100%/4   | Total: 54m 39s | Avg: 13m 39s | Max: 18m 11s
      🟩 GCC10              Pass: 100%/1   | Total: 13m 50s | Avg: 13m 50s | Max: 13m 50s
      🟩 GCC11              Pass: 100%/1   | Total: 14m 10s | Avg: 14m 10s | Max: 14m 10s
      🟩 GCC12              Pass: 100%/2   | Total: 36m 46s | Avg: 18m 23s | Max: 22m 12s
      🟩 GCC13              Pass: 100%/4   | Total: 42m 15s | Avg: 10m 33s | Max: 11m 49s
      🟩 MSVC14.36          Pass: 100%/1   | Total: 10m 50s | Avg: 10m 50s | Max: 10m 50s | Hits: 582%/156   
      🟩 MSVC14.39          Pass: 100%/1   | Total: 11m 18s | Avg: 11m 18s | Max: 11m 18s | Hits: 582%/156   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 11m 41s | Avg:  5m 50s | Max:  5m 51s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/8   | Total:  1h 46m | Avg: 13m 18s | Max: 18m 11s
      🟩 GCC                Pass: 100%/8   | Total:  1h 47m | Avg: 13m 22s | Max: 22m 12s
      🟩 MSVC               Pass: 100%/2   | Total: 22m 08s | Avg: 11m 04s | Max: 11m 18s | Hits: 582%/312   
      🟩 NVHPC              Pass: 100%/2   | Total: 11m 41s | Avg:  5m 50s | Max:  5m 51s
    🟩 gpu
      🟩 v100               Pass: 100%/20  | Total:  4h 07m | Avg: 12m 22s | Max: 22m 12s | Hits: 582%/312   
    🟩 jobs
      🟩 Build              Pass: 100%/18  | Total:  3h 26m | Avg: 11m 29s | Max: 14m 34s | Hits: 582%/312   
      🟩 Test               Pass: 100%/2   | Total: 40m 23s | Avg: 20m 11s | Max: 22m 12s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  8m 53s | Avg:  8m 53s | Max:  8m 53s
      🟩 90a                Pass: 100%/1   | Total: 10m 48s | Avg: 10m 48s | Max: 10m 48s
    🟩 std
      🟩 17                 Pass: 100%/4   | Total: 36m 10s | Avg:  9m 02s | Max: 10m 45s
      🟩 20                 Pass: 100%/16  | Total:  3h 31m | Avg: 13m 11s | Max: 22m 12s | Hits: 582%/312   
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

🏃‍ Runner counts (total jobs: 20)

# Runner
12 linux-amd64-cpu16
4 linux-arm64-cpu16
2 windows-amd64-cpu16
2 linux-amd64-gpu-v100-latest-1

* @brief This class defines a context that behaves as a context which can have nested subcontexts (implemented as local
* CUDA graphs)
*/
class stackable_ctx
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a == operator too


ctx.pop();
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO check results.

@caugonnet
Copy link
Contributor Author

/ok to test

@github-actions
Copy link
Contributor

🟨 CI finished in 39m 08s: Pass: 85%/20 | Total: 4h 10m | Avg: 12m 30s | Max: 17m 58s | Hits: 388%/522
  • 🟨 cudax: Pass: 85%/20 | Total: 4h 10m | Avg: 12m 30s | Max: 17m 58s | Hits: 388%/522

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  81%/16  | Total:  3h 21m | Avg: 12m 34s | Max: 17m 58s | Hits: 388%/522   
      🟩 arm64              Pass: 100%/4   | Total: 49m 07s | Avg: 12m 16s | Max: 13m 08s
    🔍 ctk: 12.6 🔍
      🟩 12.0               Pass: 100%/1   | Total:  9m 06s | Avg:  9m 06s | Max:  9m 06s | Hits: 388%/261   
      🟩 12.5               Pass: 100%/2   | Total: 14m 30s | Avg:  7m 15s | Max:  7m 17s
      🔍 12.6               Pass:  82%/17  | Total:  3h 46m | Avg: 13m 20s | Max: 17m 58s | Hits: 388%/261   
    🔍 cudacxx: nvcc12.6 🔍
      🟩 nvcc12.0           Pass: 100%/1   | Total:  9m 06s | Avg:  9m 06s | Max:  9m 06s | Hits: 388%/261   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 14m 30s | Avg:  7m 15s | Max:  7m 17s
      🔍 nvcc12.6           Pass:  82%/17  | Total:  3h 46m | Avg: 13m 20s | Max: 17m 58s | Hits: 388%/261   
    🔍 std: 20 🔍
      🟩 17                 Pass: 100%/4   | Total: 41m 03s | Avg: 10m 15s | Max: 11m 56s
      🔍 20                 Pass:  81%/16  | Total:  3h 29m | Avg: 13m 04s | Max: 17m 58s | Hits: 388%/522   
    🟨 cxx
      🟩 Clang14            Pass: 100%/1   | Total: 12m 44s | Avg: 12m 44s | Max: 12m 44s
      🟩 Clang15            Pass: 100%/1   | Total: 13m 43s | Avg: 13m 43s | Max: 13m 43s
      🟩 Clang16            Pass: 100%/1   | Total: 15m 20s | Avg: 15m 20s | Max: 15m 20s
      🟩 Clang17            Pass: 100%/1   | Total: 15m 37s | Avg: 15m 37s | Max: 15m 37s
      🟨 Clang18            Pass:  75%/4   | Total: 57m 22s | Avg: 14m 20s | Max: 17m 25s
      🟥 GCC10              Pass:   0%/1   | Total:  4m 18s | Avg:  4m 18s | Max:  4m 18s
      🟩 GCC11              Pass: 100%/1   | Total: 14m 26s | Avg: 14m 26s | Max: 14m 26s
      🟨 GCC12              Pass:  50%/2   | Total: 34m 39s | Avg: 17m 19s | Max: 17m 58s
      🟩 GCC13              Pass: 100%/4   | Total: 46m 27s | Avg: 11m 36s | Max: 13m 08s
      🟩 MSVC14.36          Pass: 100%/1   | Total:  9m 06s | Avg:  9m 06s | Max:  9m 06s | Hits: 388%/261   
      🟩 MSVC14.39          Pass: 100%/1   | Total: 12m 07s | Avg: 12m 07s | Max: 12m 07s | Hits: 388%/261   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 14m 30s | Avg:  7m 15s | Max:  7m 17s
    🟨 cxx_family
      🟨 Clang              Pass:  87%/8   | Total:  1h 54m | Avg: 14m 20s | Max: 17m 25s
      🟨 GCC                Pass:  75%/8   | Total:  1h 39m | Avg: 12m 28s | Max: 17m 58s
      🟩 MSVC               Pass: 100%/2   | Total: 21m 13s | Avg: 10m 36s | Max: 12m 07s | Hits: 388%/522   
      🟩 NVHPC              Pass: 100%/2   | Total: 14m 30s | Avg:  7m 15s | Max:  7m 17s
    🟨 cudacxx_family
      🟨 nvcc               Pass:  85%/20  | Total:  4h 10m | Avg: 12m 30s | Max: 17m 58s | Hits: 388%/522   
    🟨 gpu
      🟨 v100               Pass:  85%/20  | Total:  4h 10m | Avg: 12m 30s | Max: 17m 58s | Hits: 388%/522   
    🟨 jobs
      🟨 Build              Pass:  94%/18  | Total:  3h 34m | Avg: 11m 56s | Max: 16m 41s | Hits: 388%/522   
      🟥 Test               Pass:   0%/2   | Total: 35m 23s | Avg: 17m 41s | Max: 17m 58s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total: 10m 09s | Avg: 10m 09s | Max: 10m 09s
      🟩 90a                Pass: 100%/1   | Total: 11m 14s | Avg: 11m 14s | Max: 11m 14s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

🏃‍ Runner counts (total jobs: 20)

# Runner
12 linux-amd64-cpu16
4 linux-arm64-cpu16
2 windows-amd64-cpu16
2 linux-amd64-gpu-v100-latest-1

@caugonnet
Copy link
Contributor Author

/ok to test 69d714b

Replay all five transformation steps on the CPU after finalize() and
EXPECT each element matches, ensuring the graph_scope RAII test
verifies correctness rather than just running without error.

Made-with: Cursor
Both operations must abort when called inside a nested context (i.e.
after push/graph_scope). The tests use the same SIGABRT handler
pattern as the existing stackable error checks.

Made-with: Cursor
Verifies that read-only data is auto-pushed as read in nested graph
scopes and that the original host buffer is not modified after
finalize, confirming write-back is skipped for read-only data.

Made-with: Cursor
Computes sqrt of 1..1024 via Newton's Babylonian method, iterating
until max |change| < 1e-12. Demonstrates while_graph_scope with
reduce-based convergence checking in ~80 lines, as a simpler
introduction than the existing Jacobi examples.

Made-with: Cursor
@caugonnet
Copy link
Contributor Author

/ok to test f7c3c19

@github-actions

This comment has been minimized.

wait() returns a value and requires a copyable scalar type; slice<int>
causes an incomplete-type compilation error. Switch to scalar_view<int>
which is the intended usage pattern for wait().

Made-with: Cursor
Move graph_scope_guard and while_graph_scope_guard out of the
stackable_ctx class into standalone definitions in stackable_ctx.cuh,
consistent with repeat_graph_scope_guard which was already standalone.

The nested-name syntax (stackable_ctx::graph_scope_guard) is preserved
via forward declarations inside the class. Factory methods are now
defined out-of-line after the guard classes.

Reduces stackable_ctx_impl.cuh from 1654 to 1441 lines.

Made-with: Cursor
The previous commit placed guard definitions after the UNITTESTED_FILE
section, causing incomplete-type errors in the inline unit tests.
Move all guard definitions (graph_scope_guard, while_graph_scope_guard,
repeat_graph_scope_guard) before the #ifdef UNITTESTED_FILE block.

Made-with: Cursor
@caugonnet
Copy link
Contributor Author

/ok to test 7f486e5

@caugonnet caugonnet marked this pull request as ready for review March 20, 2026 08:31
@caugonnet caugonnet requested review from a team as code owners March 20, 2026 08:31
@cccl-authenticator-app cccl-authenticator-app bot moved this from In Progress to In Review in CCCL Mar 20, 2026
@github-actions

This comment has been minimized.

@caugonnet
Copy link
Contributor Author

/ok to test 5c8f6f0

@github-actions
Copy link
Contributor

🥳 CI Workflow Results

🟩 Finished in 16m 14s: Pass: 100%/48 | Total: 4h 39m | Max: 14m 37s | Hits: 99%/26011

See results here.

caugonnet and others added 2 commits March 20, 2026 18:54
This STF-specific dot file cleanup utility doesn't belong in the
benchmarks directory. It demangles and simplifies CUDA STF template
names in dot graph output, so it belongs alongside the STF code.

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stf Sequential Task Flow programming model

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

5 participants