-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
Unwrap contiguous views in copyto!
#56657
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
9bbe697 to
47bac98
Compare
|
@LilithHafner I wonder if you might have any insight into why the additional recompilation happens, leading to the test failure? I suspect the recursive The failing tests appear to be Test Failed at /cache/build/tester-amdci5-8/julialang/julia-master/julia-b418cc25b7/share/julia/test/sorting.jl:986
Expression: 2 >= #= /cache/build/tester-amdci5-8/julialang/julia-master/julia-b418cc25b7/share/julia/test/sorting.jl:986 =# @allocations(sortperm(v, rev = true))
Evaluated: 2 >= 5
Test Failed at /cache/build/tester-amdci5-8/julialang/julia-master/julia-b418cc25b7/share/julia/test/sorting.jl:987
Expression: 2 >= #= /cache/build/tester-amdci5-8/julialang/julia-master/julia-b418cc25b7/share/julia/test/sorting.jl:987 =# @allocations(sortperm(v, rev = false))
Evaluated: 2 >= 5 |
|
Ah, I likely added those tests. It looks like this both adds an extra layer of indirection to unwrap views, which makes sense to me and explains the performance improvement in the OP; and also some other stuff around bounds checking and LinearIndexing that I havn't groked yet. |
|
Something reproducible is going on here: julia> @b rand(10) sortperm(_, rev=true)
44.313 ns (2 allocs: 144 bytes) # master
418.945 ns (5 allocs: 272 bytes) # PR |
|
According to profiling, most of the runtime is constructing a namedtuple on line 1894: _sortperm(A; alg, order=ord(lt, by, true, order), scratch, dims...)I think this is due to sorting performance depending on compiler heuristics which means you can go ahead and merge this and loosen the sorting allocation tests if you want to. This diff should fix the sorting test: --- a/base/sort.jl
+++ b/base/sort.jl
@@ -1891,12 +1891,12 @@ function sortperm(A::AbstractArray;
scratch::Union{Vector{<:Integer}, Nothing}=nothing,
dims...) #to optionally specify dims argument
if rev === true
- _sortperm(A; alg, order=ord(lt, by, true, order), scratch, dims...)
+ _sortperm(A, alg, ord(lt, by, true, order), scratch, dims)
else
- _sortperm(A; alg, order=ord(lt, by, nothing, order), scratch, dims...)
+ _sortperm(A, alg, ord(lt, by, nothing, order), scratch, dims)
end
end
-function _sortperm(A::AbstractArray; alg, order, scratch, dims...)
+function _sortperm(A::AbstractArray, alg, order, scratch, dims)
if order === Forward && isa(A,Vector) && eltype(A)<:Integer
n = length(A)
if n > 1julia> @b rand(10) sortperm(_, rev=true)
44.313 ns (2 allocs: 144 bytes) # master
418.945 ns (5 allocs: 272 bytes) # PR
61.263 ns (2 allocs: 144 bytes) # PR with the diffA takeaway for this PR is that it increases stress on the compiler in some cases which, in certain use cases, can cause performance regressions. That can be said about any change, though, so I think it's not blocking. |
|
Thanks for looking into this! |
|
Sorry sorting is being so finicky here. #56661 should at least make the types in errors more readable. |
vtjnash
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I really have the ability to review this
0945185 to
f571b27
Compare
f571b27 to
0a8333d
Compare
e02bbe3 to
0e22216
Compare
|
Looks like removing the bounds-checking changes has resolved the failures in the sorting tests. These weren't essential anyway. |
|
I still see the sorting regression, strange that tests now pass |
This reverts commit 0e22216.
d4cc9f3 to
9017e0e
Compare
|
Oddly, I don't see the regressions in allocations (which might explain why the tests pass now), although there appears to be a performance regression. julia> @b rand(10) sortperm(_, rev=true)
106.121 ns (2 allocs: 144 bytes)
julia> versioninfo()
Julia Version 1.13.0-DEV.216
Commit 9017e0ec7f (2025-03-13 10:19 UTC)
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 8 × Intel(R) Core(TM) i5-10310U CPU @ 1.70GHz
WORD_SIZE: 64
LLVM: libLLVM-19.1.7 (ORCJIT, skylake)
GC: Built with stock GC
Threads: 1 default, 1 interactive, 1 GC (on 8 virtual cores)
Environment:
JULIA_EDITOR = sublvs nightly: julia> @b rand(10) sortperm(_, rev=true)
72.554 ns (2 allocs: 144 bytes)
julia> versioninfo()
Julia Version 1.13.0-DEV.207
Commit 96eb8762cab (2025-03-12 18:04 UTC)
Build Info:
Official https://julialang.org release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 8 × Intel(R) Core(TM) i5-10310U CPU @ 1.70GHz
WORD_SIZE: 64
LLVM: libLLVM-19.1.7 (ORCJIT, skylake)
GC: Built with stock GC
Threads: 1 default, 1 interactive, 1 GC (on 8 virtual cores)
Environment:
JULIA_EDITOR = subl |
In
copyto!(A, B)where both the arrays use linear indexing, if one or both the arrays are contiguous linear views, we may forward thecopyto!to the parents. This ensures that we hit optimizedcopyto!implementations for the parent, if any. Similarly, for the 5-argument version. In fact, in this PR, we call the 5-argument version within the 2-argumentcopyto!if both the arrays use linear indexing.An example of a performance improvement:
In this case, we dispatch to the optimized
copyto!(::Array, ::Integer, ::Array, ::Integer, ::Integer)method by unwrapping the view. This method usesmemmove, which is faster than the fallback implementation that uses loops.