{WIP] Follow-on from #612: try to align with NumPy by gmarkall · Pull Request #613 · NVIDIA/numba-cuda

gmarkall · 2025-11-27T14:07:27Z

Aims to align more closely with NumPy contiguity logic.

The example in the commit message from #612 still runs correctly with this change.

I think this needs a little more consideration for now.

cc @kaeun97 for thoughts / input.

Aims to align more closely with NumPy contiguity logic. The example in the commit message from NVIDIA#612 still runs correctly with this change. I think this needs a little more consideration for now.

copy-pr-bot · 2025-11-27T14:07:30Z

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

gmarkall · 2025-11-27T14:07:39Z

/ok to test

gmarkall · 2025-11-27T14:08:56Z

numba_cuda/numba/cuda/cudadrv/dummyarray.py

        if not self.dims:
-            return {"C_CONTIGUOUS": True, "F_CONTIGUOUS": True}
-
-        # All 0-size arrays are considered contiguous, even if they are multidimensional


It feels like this special case is needed because of the (erroneous) special case immediately following on line 286. Once the "If this is a broadcast array then it is not contiguous" case is gone, then I think this case was not needed to avoid it.

greptile-apps · 2025-11-27T14:11:48Z

Greptile Overview

Greptile Summary

Refactored array contiguity detection to align more closely with NumPy's implementation, building on the fix from #612.

Key changes:

Removed explicit broadcast array detection from devicearray.py, now relies on contiguity flags computed by _compute_layout()
Simplified _compute_layout() in dummyarray.py by removing early checks for empty arrays and broadcast arrays
The stride-checking loop logic naturally handles both cases: empty arrays return early when dim.size == 0, and broadcast arrays (with stride == 0) are correctly identified as non-contiguous when the stride doesn't match the expected value
Added order="A" parameter to reshape call in __getitem__ to preserve memory layout (fortran-contiguous arrays remain fortran-contiguous)

Confidence Score: 4/5

This PR is safe to merge after thorough testing, as it simplifies the code while maintaining correctness
Score reflects clean refactoring that aligns with NumPy, but the [WIP] status and author's comment "needs a little more consideration" suggests more validation is needed. The logic appears sound: broadcast arrays will be correctly identified as non-contiguous via stride checking, and the order="A" addition properly preserves memory layout. However, comprehensive testing is critical given this affects core array type inference.
Both files require careful testing with edge cases: broadcast arrays, empty arrays with various shapes, and fortran-contiguous arrays after slicing operations

Important Files Changed

File Analysis

Filename	Score	Overview
numba_cuda/numba/cuda/cudadrv/devicearray.py	5/5	Removed broadcast check from layout determination logic, now relies solely on contiguity flags computed by Array class
numba_cuda/numba/cuda/cudadrv/dummyarray.py	4/5	Simplified `_compute_layout()` to follow NumPy's logic more closely; added `order=\"A\"` to reshape in `__getitem__` to preserve memory layout

Sequence Diagram

sequenceDiagram
    participant User
    participant DeviceNDArray
    participant Array
    
    Note over User,Array: Array Type Inference Flow
    
    User->>DeviceNDArray: Create device array
    DeviceNDArray->>DeviceNDArray: Access _numba_type_ property
    DeviceNDArray->>DeviceNDArray: Check flags["C_CONTIGUOUS"]
    
    alt C_CONTIGUOUS is True
        DeviceNDArray->>DeviceNDArray: Set layout = "C"
    else F_CONTIGUOUS is True
        DeviceNDArray->>DeviceNDArray: Set layout = "F"
    else Neither contiguous
        DeviceNDArray->>DeviceNDArray: Set layout = "A"
    end
    
    DeviceNDArray-->>User: Return Array type with layout
    
    Note over User,Array: Array Layout Computation (dummyarray)
    
    User->>Array: Create Array with dims
    Array->>Array: Call _compute_layout()
    Array->>Array: Initialize flags as {C: True, F: True}
    
    alt No dims (records)
        Array-->>Array: Return both True
    else Has dims
        loop For each dim (reversed for C)
            alt dim.size == 0 (empty array)
                Array-->>Array: Return both True (early exit)
            else dim.size != 1
                alt dim.stride != expected_stride
                    Array->>Array: Set C_CONTIGUOUS = False
                end
            end
        end
        
        loop For each dim (forward for F)
            alt dim.size != 1
                alt dim.stride != expected_stride
                    Array->>Array: Set F_CONTIGUOUS = False
                end
            end
        end
    end
    
    Array-->>User: Return computed flags
    
    Note over User,Array: Array Slicing/Indexing
    
    User->>Array: Call __getitem__ with indices
    Array->>Array: Process indices and create new dims
    Array->>Array: Create new Array from dims
    
    alt newshape is not empty
        Array->>Array: Call reshape(*newshape, order="A")
        Note right of Array: order="A" preserves<br/>memory layout
        Array-->>User: Return reshaped array
    else newshape is empty
        Array-->>User: Return Element
    end

greptile-apps

_{2 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2025-11-27T14:11:48Z

numba_cuda/numba/cuda/cudadrv/dummyarray.py

+            # See the NumPy reshape documentation of the order argument for a
+            # description of the behaviour this is following:
+            # https://numpy.org/doc/stable/reference/generated/numpy.reshape.html
+            return arr.reshape(*newshape, order="A")[0]


style: verify that the default order in reshape() (line 390) being "C" doesn't conflict with this "A" order usage - confirm all test cases pass, especially for fortran-contiguous arrays

All test cases do pass for me locally (CI is still pending). With the other changes in this PR but without this one, test_devicearray_broadcast_host_copy() would fail.

gmarkall · 2025-11-27T14:16:35Z

numba_cuda/numba/cuda/cudadrv/dummyarray.py

+            # See the NumPy reshape documentation of the order argument for a
+            # description of the behaviour this is following:
+            # https://numpy.org/doc/stable/reference/generated/numpy.reshape.html
+            return arr.reshape(*newshape, order="A")[0]


With the correct computation of the contiguity of broadcasted arrays, I think the reshape(*newshape) call become "wrong", because the default order is "C". Whereas previously we would not have considered a broadcasted fortran-order array as contiguous, and therefore not transposed its indices (or data), we now do recognise broadcasted fortran-order data as f-contiguous, so we need to allow the reshape the freedom to read / write it in F order.

Without this change, the following:

# Derived from test_devicearray_broadcast_host_copy import numpy as np from numba import cuda # Set up a broadcasted array as per the test case broadsize = 4 coreshape = (2, 3) coresize = coresize = np.prod(coreshape) core_f = np.arange(coresize).reshape(coreshape, order="F") dim = 0 newindex = (slice(None),) * dim + (np.newaxis,) broadshape = coreshape[:dim] + (broadsize,) + coreshape[dim:] broad_f = np.broadcast_to(core_f[newindex], broadshape) dbroad_f = cuda.to_device(broad_f) # Set up the index with which to slice the array core_index = tuple([0, slice(None, None, None), slice(None, None, None)]) # For info, display the original arrays' shape and strides print("NumPy shape and strides - original array:") print(broad_f.shape) print(broad_f.strides) print("Device array shape and strides - original array:") print(dbroad_f.shape) print(dbroad_f.strides) # Slice the NumPy and device arrays sliced_broad_f = broad_f[core_index] sliced_dbroad_f = dbroad_f[core_index] # For info, display the sliced arrays' shape and strides print("NumPy shape and strides - sliced array:") print(sliced_broad_f.shape) print(sliced_broad_f.strides) print("Device array shape and strides - sliced array:") print(sliced_dbroad_f.shape) print(sliced_dbroad_f.strides) # Demonstrate that the device array did not correctly preserve the strides assert sliced_broad_f.strides == sliced_dbroad_f.strides

would fail with:

NumPy shape and strides - original array: (4, 2, 3) (0, 8, 16) Device array shape and strides - original array: (4, 2, 3) (0, 8, 16) NumPy shape and strides - sliced array: (2, 3) (8, 16) Device array shape and strides - sliced array: (2, 3) (24, 8) Traceback (most recent call last): File "/home/gmarkall/numbadev/issues/612/repro.py", line 48, in <module> assert sliced_broad_f.strides == sliced_dbroad_f.strides ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AssertionError

when it should have printed:

NumPy shape and strides - original array: (4, 2, 3) (0, 8, 16) Device array shape and strides - original array: (4, 2, 3) (0, 8, 16) NumPy shape and strides - sliced array: (2, 3) (8, 16) Device array shape and strides - sliced array: (2, 3) (8, 16)

A more simple reproducer that just invokes the core "dummy array" logic around this looks like:

from numba.cuda.cudadrv.dummyarray import Array offset = 0 shape = (4, 2, 3) strides = (0, 8, 16) itemsize = 8 arr = Array.from_desc(offset, shape, strides, itemsize) index = tuple([0, slice(None, None, None), slice(None, None, None)]) sliced = arr[index] print(sliced.shape) print(sliced.strides)

and prints

(2, 3) (8, 16)

(correct F-order strides) with this fix, and

(2, 3) (24, 8)

(correct C-order strides, but incorrect F-order strides, and we need F-order).

{WIP] Follow-on from NVIDIA#612: try to align with NumPy

193d28d

Aims to align more closely with NumPy contiguity logic. The example in the commit message from NVIDIA#612 still runs correctly with this change. I think this needs a little more consideration for now.

gmarkall commented Nov 27, 2025

View reviewed changes

greptile-apps bot reviewed Nov 27, 2025

View reviewed changes

gmarkall commented Nov 27, 2025

View reviewed changes

gmarkall added the 3 - Ready for Review Ready for review by team label Nov 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

{WIP] Follow-on from #612: try to align with NumPy#613

{WIP] Follow-on from #612: try to align with NumPy#613
gmarkall wants to merge 1 commit intoNVIDIA:mainfrom
gmarkall:followup-612

gmarkall commented Nov 27, 2025

Uh oh!

copy-pr-bot bot commented Nov 27, 2025

Uh oh!

gmarkall commented Nov 27, 2025

Uh oh!

gmarkall Nov 27, 2025

Uh oh!

greptile-apps bot commented Nov 27, 2025

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Nov 27, 2025

Uh oh!

gmarkall Nov 27, 2025

Uh oh!

gmarkall Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gmarkall commented Nov 27, 2025

Uh oh!

copy-pr-bot bot commented Nov 27, 2025

Uh oh!

gmarkall commented Nov 27, 2025

Uh oh!

gmarkall Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Nov 27, 2025

Greptile Overview

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

gmarkall Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

gmarkall Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant