-
Notifications
You must be signed in to change notification settings - Fork 57
Open
Description
Bug description
When using strings as device function arguments, compilation time explodes
Steps to reproduce bug
- Prepare a clean python environment
- Execute
pip install numba-cuda - The code below demonstrates the behaviour:
import numpy as np
from numba import cuda
import sys
@cuda.jit(device=True, forceinline=True)
def load_slow(gmem, value):
if value == "1":
gmem[cuda.threadIdx.x] = 1.0
else:
gmem[cuda.threadIdx.x] = 0.0
@cuda.jit(device=True, forceinline=True)
def load(gmem, value):
if value == 1.0:
gmem[cuda.threadIdx.x] = 1.0
else:
gmem[cuda.threadIdx.x] = 0.0
def get_kernel(use_slow):
if use_slow:
@cuda.jit
def slow(buff):
load_slow(buff, "1")
return slow
@cuda.jit
def quick(buff):
load(buff, 1.0)
return quick
def main():
arg = sys.argv[1] if len(sys.argv) > 1 else "slow"
kernel = get_kernel(arg == "slow")
threads = 32
buff = np.zeros((threads,), dtype=np.float32)
buff_d = cuda.to_device(buff)
kernel[1, threads](buff_d)
cuda.synchronize()
buff_result = buff_d.copy_to_host()
assert np.linalg.norm(buff_result - np.ones((threads,), dtype=np.float32)) == 0.0
print("finished")
if __name__ == "__main__":
main()
Expected behavior
Using string arguments slows down compilation significantly (~2s vs ~2min):
time python .test.dir.c/slow.py quick
real 0m2.536s
user 0m2.442s
sys 0m0.166s
time python .test.dir.c/slow.py slow
real 1m57.233s
user 1m31.162s
sys 0m26.162s
Environment details:
- Environment location: wsl local machine
- Method of numba-cuda install: pip install
- Python env:
numba-cuda==0.24.0
numba==0.63.1
Python 3.11.14
Additional context
This issue does not appear in numba-cuda==0.22. It appears that a redirect (in the error module), added in numba-cuda==0.22.1 triggers this behaviour.
Command log:
$ python3.11 -m venv .slow
$ source .slow/bin/activate
$ pip cache purge
$ rm -rf ~/.nv/ComputeCache/
$ find . -type d -name __pycache__ -exec rm -rf {} +
$ find . -name "*.pyc" -delete
$ rm -rf ~/.cache/numba
$ pip install numba-cuda
$ time python .test.dir.c/slow.py quick
real 0m2.536s
user 0m2.442s
sys 0m0.166s
$ time python .test.dir.c/slow.py slow
real 1m57.233s
user 1m31.162s
sys 0m26.162s
$ pip install numba-cuda==0.22
$ time python .test.dir.c/slow.py slow
real 0m3.242s
user 0m3.095s
sys 0m0.274s
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working