Skip to content

[BUG] Slow compile times for string arguments in device functions (0.22 vs 0.24) #755

@Jlisowskyy

Description

@Jlisowskyy

Bug description
When using strings as device function arguments, compilation time explodes

Steps to reproduce bug

  1. Prepare a clean python environment
  2. Execute pip install numba-cuda
  3. The code below demonstrates the behaviour:
import numpy as np
from numba import cuda
import sys

@cuda.jit(device=True, forceinline=True)
def load_slow(gmem, value):
    if value == "1":
        gmem[cuda.threadIdx.x] = 1.0
    else:
        gmem[cuda.threadIdx.x] = 0.0

@cuda.jit(device=True, forceinline=True)
def load(gmem, value):
    if value == 1.0:
        gmem[cuda.threadIdx.x] = 1.0
    else:
        gmem[cuda.threadIdx.x] = 0.0

def get_kernel(use_slow):
    if use_slow:
        @cuda.jit
        def slow(buff):
            load_slow(buff, "1")
            
        return slow
        
    @cuda.jit
    def quick(buff):
        load(buff, 1.0)
        
    return quick

def main():
    arg = sys.argv[1] if len(sys.argv) > 1 else "slow"
    kernel = get_kernel(arg == "slow")
    
    threads = 32

    buff = np.zeros((threads,), dtype=np.float32)
    buff_d = cuda.to_device(buff)

    kernel[1, threads](buff_d)
    cuda.synchronize()
    
    buff_result = buff_d.copy_to_host()
    
    assert np.linalg.norm(buff_result - np.ones((threads,), dtype=np.float32)) == 0.0
    print("finished")

if __name__ == "__main__":
    main()

Expected behavior
Using string arguments slows down compilation significantly (~2s vs ~2min):

time python .test.dir.c/slow.py quick

real    0m2.536s
user    0m2.442s
sys     0m0.166s

time python .test.dir.c/slow.py slow

real    1m57.233s
user    1m31.162s
sys     0m26.162s

Environment details:

  • Environment location: wsl local machine
  • Method of numba-cuda install: pip install
  • Python env:
numba-cuda==0.24.0
numba==0.63.1
Python 3.11.14

Additional context
This issue does not appear in numba-cuda==0.22. It appears that a redirect (in the error module), added in numba-cuda==0.22.1 triggers this behaviour.

Command log:

$ python3.11 -m venv .slow
$ source .slow/bin/activate
$ pip cache purge
$ rm -rf ~/.nv/ComputeCache/
$ find . -type d -name __pycache__ -exec rm -rf {} +
$ find . -name "*.pyc" -delete
$ rm -rf ~/.cache/numba
$ pip install numba-cuda
$ time python .test.dir.c/slow.py quick

real    0m2.536s
user    0m2.442s
sys     0m0.166s
$ time python .test.dir.c/slow.py slow

real    1m57.233s
user    1m31.162s
sys     0m26.162s
$ pip install numba-cuda==0.22
$ time python .test.dir.c/slow.py slow

real    0m3.242s
user    0m3.095s
sys     0m0.274s

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions