You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Mar 25, 2025. It is now read-only.
[LLVM][GPU] Added CUDADriver to execute benchmark on GPU (#829)
- Added CUDADriver to compile LLVM IR string generated from CodegenLLVMVisitor to PTX string and then execute it using CUDA API
- Ability to select the compilation GPU architecture and then set the proper GPU architecture based on the GPU that is going to be used
- Link `libdevice` math library with GPU LLVM module
- Handles kernel and wrapper functions attributes properly for GPU execution (wrapper function is `kernel` and kernel attribute is `device`)
- Small fixes in InstanceStruct declaration and setup to allocate the pointer variables properly, including the shadow variables
- Adds tests in the CI that run small benchmarks in CPU and GPU on BB5
- Adds replacement of `log` math function for SLEEF and libdevice, `pow` and `fabs` for libdevice
- Adds GPU execution ability in PyJIT
- Small improvement in PyJIT benchmark python script to handle arguments and GPU execution
- Separated benchmark info from benchmark driver
- Added hh and expsyn mod files in benchmarking tests
**Note**: In order for all unit tests to function correctly when building without linking against libpython we must
142
142
set `NMODL_PYLIB` before running cmake!
143
143
144
+
### Using CUDA backend to run benchmarks
145
+
146
+
`NMODL` supports generating code and compiling it for execution on an `NVIDIA` GPU via its benchmark infrastructure using the `LLVM` backend. To enable the `CUDA` backend to compile and execute the GPU code we need to set the following `CMake` flag during compilation of `NMODL`:
147
+
```
148
+
-DNMODL_ENABLE_LLVM_CUDA=ON
149
+
```
150
+
151
+
To find the need `CUDA` libraries (`cudart` and `nvrtc`) it's needed to have CUDA Toolkit installed on your system.
152
+
This can be done by installing the CUDA Toolkit from the [CUDA Toolkit website](https://developer.nvidia.com/cuda-downloads) or by installing the `CUDA` spack package and loading the corresponding module.
153
+
154
+
Then given a supported MOD file you can execute the benchmark on GPU in you supported NVIDIA GPU by running the following command:
The above command executes the benchmark on a GPU with `Compute Architecture``sm_80` and links the generated code to the `libdevice` optimized math library provided by `NVIDIA`.
159
+
Using the above command you can also select the optimization level of the generated code, the instance size of the generated data, the number of repetitions and the grid and block dimensions for the GPU execution.
160
+
161
+
**Note**: In order for the CUDA backend to be able to compile and execute the generated code on GPU the CUDA Toolkit version installed needs to have the same version as the `CUDA` installed by the NVIDIA driver in the system that will be used to run the benchmark.
162
+
You can find the CUDA Toolkit version by running the following command:
163
+
```
164
+
nvidia-smi
165
+
```
166
+
and noting the `CUDA Version` stated there. For example if `CUDA Version` reported by `nvidia-smi` is CUDA 11.4 you need to install the `CUDA Toolkit 11.4.*` to be able to compile and execute the GPU code.
0 commit comments