Cleanup CUDA implementation a bit #199

gonzalobg · 2024-05-30T14:55:57Z

Refactor all kernels into a generic "parallel for" algorithm
- Supports grid-stride and block-stride loops, configurable with model flag
- Handles devices of different sizes via occupancy APIs
Refactor memory allocation APIs
Prints more GPU details, in particular, the theoretical peak BW in GB/s of the current device, using the NVML library (which is part of the CUDA Toolkit and always available)
Fixes 2 bugs:
- Prints the "order" used to run the benchmarks (e.g. classic vs isolated)
- Fixes a division by zero bug in the solution checking

gonzalobg · 2024-05-31T14:45:45Z

This was passing. Seems like this and other PRs are spuriously failing due to some cache issue @tom91136 @tomdeakin

gonzalobg · 2024-06-05T08:21:15Z

Closing for #202

gonzalobg added 6 commits May 27, 2024 11:45

Refactor CUDA memory allocation

759f7b1

Refactor CUDA kernels and support block-stride loops

293ed77

Bugfix: division by zero in solution-check for individual benchmarks

13e870f

Print order used

46b6d41

Print device peak BW using NVML

51231ac

Capitalize order options for consistency with benchmarks

321ba62

gonzalobg force-pushed the cuda_cleanup branch from c2d9ef5 to 321ba62 Compare May 30, 2024 16:44

gonzalobg closed this Jun 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cleanup CUDA implementation a bit #199

Cleanup CUDA implementation a bit #199

Uh oh!

gonzalobg commented May 30, 2024

Uh oh!

gonzalobg commented May 31, 2024

Uh oh!

gonzalobg commented Jun 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Cleanup CUDA implementation a bit #199

Cleanup CUDA implementation a bit #199

Uh oh!

Conversation

gonzalobg commented May 30, 2024

Uh oh!

gonzalobg commented May 31, 2024

Uh oh!

gonzalobg commented Jun 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant