You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Use size based MultiplicativeInverse to speedup sequential access of ReshapedArray (#43518)
This performance difference was found when working on #42736.
Currently, our `ReshapedArray` use stride based `MultiplicativeInverse`
to speed up index transformation.
For example, for `a::AbstractArray{T,3}` and `b = vec(a)`, the index
transformation is equivalent to:
```julia
offset = i - 1 # b[i]
d1, r1 = divrem(offset, stride(a, 3)) # stride(a, 3) = size(a, 1) * size(a, 2)
d2, r2 = divrem(r1, stride(a, 2)) # stride(a, 2) = size(a, 1)
CartesianIndex(r2 + 1, d2 +1, d1 + 1) # a has one-based axes
```
(All the `stride` is replaced with a `MultiplicativeInverse` to
accelerate `divrem`)
This PR wants to replace the above machinery with:
```julia
offset = i - 1
d1, r1 = divrem(offset, size(a, 1))
d2, r2 = divrem(d1, size(a, 2))
CartesianIndex(r1 + 1, r2 +1, d2 + 1)
```
For random access, they should have the same computational cost. But for
sequential access, like `sum(b)`, `size` based transformation seems
faster.
To avoid bottleneck from IO, use `reshape(::CartesianIndices, x...)` to
benchmark:
```julia
f(x) = let r = 0
for i in eachindex(x)
@inbounds r |= +(x[i].I...)
end
r
end
a = CartesianIndices((99,100,101));
@Btime f(vec($a)); #2.766 ms --> 2.591 ms
@Btime f(reshape($a,990,1010)); #3.412 ms --> 2.626 ms
@Btime f(reshape($a,33,300,101)); #3.422 ms --> 2.342 ms
```
I haven't looked into the reason for this performance difference.
Beside acceleration, this also makes it possible to reuse the
`MultiplicativeInverse` in some cases (like #42736).
So I think it might be useful?
---------
Co-authored-by: Andy Dienes <[email protected]>
Co-authored-by: Andy Dienes <[email protected]>
0 commit comments