{WIP] Follow-on from #612: try to align with NumPy#613
{WIP] Follow-on from #612: try to align with NumPy#613gmarkall wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
Aims to align more closely with NumPy contiguity logic. The example in the commit message from NVIDIA#612 still runs correctly with this change. I think this needs a little more consideration for now.
|
Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test |
| if not self.dims: | ||
| return {"C_CONTIGUOUS": True, "F_CONTIGUOUS": True} | ||
|
|
||
| # All 0-size arrays are considered contiguous, even if they are multidimensional |
There was a problem hiding this comment.
It feels like this special case is needed because of the (erroneous) special case immediately following on line 286. Once the "If this is a broadcast array then it is not contiguous" case is gone, then I think this case was not needed to avoid it.
Greptile OverviewGreptile SummaryRefactored array contiguity detection to align more closely with NumPy's implementation, building on the fix from #612. Key changes:
Confidence Score: 4/5
Important Files ChangedFile Analysis
Sequence DiagramsequenceDiagram
participant User
participant DeviceNDArray
participant Array
Note over User,Array: Array Type Inference Flow
User->>DeviceNDArray: Create device array
DeviceNDArray->>DeviceNDArray: Access _numba_type_ property
DeviceNDArray->>DeviceNDArray: Check flags["C_CONTIGUOUS"]
alt C_CONTIGUOUS is True
DeviceNDArray->>DeviceNDArray: Set layout = "C"
else F_CONTIGUOUS is True
DeviceNDArray->>DeviceNDArray: Set layout = "F"
else Neither contiguous
DeviceNDArray->>DeviceNDArray: Set layout = "A"
end
DeviceNDArray-->>User: Return Array type with layout
Note over User,Array: Array Layout Computation (dummyarray)
User->>Array: Create Array with dims
Array->>Array: Call _compute_layout()
Array->>Array: Initialize flags as {C: True, F: True}
alt No dims (records)
Array-->>Array: Return both True
else Has dims
loop For each dim (reversed for C)
alt dim.size == 0 (empty array)
Array-->>Array: Return both True (early exit)
else dim.size != 1
alt dim.stride != expected_stride
Array->>Array: Set C_CONTIGUOUS = False
end
end
end
loop For each dim (forward for F)
alt dim.size != 1
alt dim.stride != expected_stride
Array->>Array: Set F_CONTIGUOUS = False
end
end
end
end
Array-->>User: Return computed flags
Note over User,Array: Array Slicing/Indexing
User->>Array: Call __getitem__ with indices
Array->>Array: Process indices and create new dims
Array->>Array: Create new Array from dims
alt newshape is not empty
Array->>Array: Call reshape(*newshape, order="A")
Note right of Array: order="A" preserves<br/>memory layout
Array-->>User: Return reshaped array
else newshape is empty
Array-->>User: Return Element
end
|
| # See the NumPy reshape documentation of the order argument for a | ||
| # description of the behaviour this is following: | ||
| # https://numpy.org/doc/stable/reference/generated/numpy.reshape.html | ||
| return arr.reshape(*newshape, order="A")[0] |
There was a problem hiding this comment.
style: verify that the default order in reshape() (line 390) being "C" doesn't conflict with this "A" order usage - confirm all test cases pass, especially for fortran-contiguous arrays
There was a problem hiding this comment.
All test cases do pass for me locally (CI is still pending). With the other changes in this PR but without this one, test_devicearray_broadcast_host_copy() would fail.
| # See the NumPy reshape documentation of the order argument for a | ||
| # description of the behaviour this is following: | ||
| # https://numpy.org/doc/stable/reference/generated/numpy.reshape.html | ||
| return arr.reshape(*newshape, order="A")[0] |
There was a problem hiding this comment.
With the correct computation of the contiguity of broadcasted arrays, I think the reshape(*newshape) call become "wrong", because the default order is "C". Whereas previously we would not have considered a broadcasted fortran-order array as contiguous, and therefore not transposed its indices (or data), we now do recognise broadcasted fortran-order data as f-contiguous, so we need to allow the reshape the freedom to read / write it in F order.
Without this change, the following:
# Derived from test_devicearray_broadcast_host_copy
import numpy as np
from numba import cuda
# Set up a broadcasted array as per the test case
broadsize = 4
coreshape = (2, 3)
coresize = coresize = np.prod(coreshape)
core_f = np.arange(coresize).reshape(coreshape, order="F")
dim = 0
newindex = (slice(None),) * dim + (np.newaxis,)
broadshape = coreshape[:dim] + (broadsize,) + coreshape[dim:]
broad_f = np.broadcast_to(core_f[newindex], broadshape)
dbroad_f = cuda.to_device(broad_f)
# Set up the index with which to slice the array
core_index = tuple([0, slice(None, None, None), slice(None, None, None)])
# For info, display the original arrays' shape and strides
print("NumPy shape and strides - original array:")
print(broad_f.shape)
print(broad_f.strides)
print("Device array shape and strides - original array:")
print(dbroad_f.shape)
print(dbroad_f.strides)
# Slice the NumPy and device arrays
sliced_broad_f = broad_f[core_index]
sliced_dbroad_f = dbroad_f[core_index]
# For info, display the sliced arrays' shape and strides
print("NumPy shape and strides - sliced array:")
print(sliced_broad_f.shape)
print(sliced_broad_f.strides)
print("Device array shape and strides - sliced array:")
print(sliced_dbroad_f.shape)
print(sliced_dbroad_f.strides)
# Demonstrate that the device array did not correctly preserve the strides
assert sliced_broad_f.strides == sliced_dbroad_f.strideswould fail with:
NumPy shape and strides - original array:
(4, 2, 3)
(0, 8, 16)
Device array shape and strides - original array:
(4, 2, 3)
(0, 8, 16)
NumPy shape and strides - sliced array:
(2, 3)
(8, 16)
Device array shape and strides - sliced array:
(2, 3)
(24, 8)
Traceback (most recent call last):
File "/home/gmarkall/numbadev/issues/612/repro.py", line 48, in <module>
assert sliced_broad_f.strides == sliced_dbroad_f.strides
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError
when it should have printed:
NumPy shape and strides - original array:
(4, 2, 3)
(0, 8, 16)
Device array shape and strides - original array:
(4, 2, 3)
(0, 8, 16)
NumPy shape and strides - sliced array:
(2, 3)
(8, 16)
Device array shape and strides - sliced array:
(2, 3)
(8, 16)
A more simple reproducer that just invokes the core "dummy array" logic around this looks like:
from numba.cuda.cudadrv.dummyarray import Array
offset = 0
shape = (4, 2, 3)
strides = (0, 8, 16)
itemsize = 8
arr = Array.from_desc(offset, shape, strides, itemsize)
index = tuple([0, slice(None, None, None), slice(None, None, None)])
sliced = arr[index]
print(sliced.shape)
print(sliced.strides)and prints
(2, 3)
(8, 16)
(correct F-order strides) with this fix, and
(2, 3)
(24, 8)
(correct C-order strides, but incorrect F-order strides, and we need F-order).
Aims to align more closely with NumPy contiguity logic.
The example in the commit message from #612 still runs correctly with this change.
I think this needs a little more consideration for now.
cc @kaeun97 for thoughts / input.