[BUG] Integer overflow in device array when numpy version > 2.0 for T4 GPU:s

**Describe the bug**
It seems like the MemoryPointer object relies on the auto-casting behaviour of old numpy versions to upgrade int32 to int64 on demand, which results in semi-deterministic errors with large GPU:s on newer numpy versions, at least in some cases. I can reproduce the error on T4 GPU:s with driver version 535.161.07, which is what Gitlab runners use. 

https://github.com/NVIDIA/numba-cuda/blob/1dd1e7a05888d1c7f3d07e2eb2376ad721c0c7dd/numba_cuda/numba/cuda/cudadrv/driver.py#L1839

Here with `numpy==2.2.6` and `numba_cuda==0.21.3`.
E.g. pytest traceback:
```
self = <numba.cuda.cudadrv.driver.AutoFreePointer object at 0x7f4590b57f70>
start = np.int32(0), stop = np.int32(1152)
    def view(self, start, stop=None):
        if stop is None:
            size = self.size - start
        else:
            size = stop - start
    
        # Handle NULL/empty memory buffer
        if not self.device_pointer_value:
            if size != 0:
                raise RuntimeError("non-empty slice into empty slice")
            view = self  # new view is just a reference to self
        # Handle normal case
        else:
>           base = self.device_pointer_value + start
E           OverflowError: Python integer 139939146685440 out of bounds for int32
local_installation_linux/numba_cuda/numba/cuda/cudadrv/driver.py:1852: OverflowError
```

**Steps/Code to reproduce bug**
It happens in my Gitlab CI when something much like the following is done:

```
import numba.cuda as cuda
import numpy as np
arr = np.arange(50 * 50 * 100, dtype=np.float32).reshape(50, 50, 100)
arr2 = cuda.to_device(arr)
```

The problem goes away when I downgrade numpy below 2.0. I can't reproduce it on e.g. an L40S with 580.95.05 drivers.

**Expected behavior**
There should be no reliance on auto-casting of typed integers in the code. Wrapping start in int64 would probably fix the issue?

**Environment details (please complete the following information):**
 - Environment location: Gitlab enterprise GPU runner (VM with T4 GPU)
 - Method of numba-cuda install: pip

**Additional context**
It's a bit hard to deterministically reproduce since I don't know where device_pointer_value comes from, but it should be easy enough to make sure this doesn't happen at all.

Here's an example of a recent CI failure of mine:

https://gitlab.com/liebi-group/software/mumott/-/jobs/12288859623

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Integer overflow in device array when numpy version > 2.0 for T4 GPU:s #623

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] Integer overflow in device array when numpy version > 2.0 for T4 GPU:s #623

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions