Skip to content

Conversation

mbauman
Copy link
Member

@mbauman mbauman commented Apr 30, 2020

In my cursory spot-tests, it appears as though we no longer need to force inlining the whole way through to the innermost loops of broadcast. This is huge, and in my naive understanding I think it'll greatly improve codegen times and sizes. It'll also allow for embedding more alternative loop designs as they no longer need to be inlined into the same function body — solving my reservations in #30973.

I've kept the preparatory @inlines for now, just removing them on the actual implementation.

Making this an early PR just to allow a Nanosoldier run.

@mbauman mbauman added the broadcast Applying a function over a collection label Apr 30, 2020
@mbauman
Copy link
Member Author

mbauman commented Apr 30, 2020

@nanosoldier runbenchmarks(ALL, vs = ":master")

@mbauman
Copy link
Member Author

mbauman commented Apr 30, 2020

On master:

2 $ time make -C test broadcast
    JULIA test/broadcast
Test  (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB)
broadcast  (1) |        started at 2020-04-30T16:43:19.276
broadcast  (1) |    47.61 |   1.28 |  2.7 |    5773.20 |   478.94

Test Summary: | Pass  Total
  Overall     |  439    439
    SUCCESS

real	0m51.426s
user	0m50.828s
sys	0m0.784s

On this PR:

$ time make -C test broadcast
    JULIA test/broadcast
Test  (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB)
broadcast  (1) |        started at 2020-04-30T16:43:02.881
broadcast  (1) |    37.41 |   1.06 |  2.8 |    4296.83 |   386.69

Test Summary: | Pass  Total
  Overall     |  439    439
    SUCCESS

real	0m41.086s
user	0m40.656s
sys	0m0.649s

@@ -1102,15 +1102,15 @@ struct BitMaskedBitArray{N,M}
mask::BitArray{M}
BitMaskedBitArray{N,M}(parent, mask) where {N,M} = new(parent, mask)
end
@inline function BitMaskedBitArray(parent::BitArray{N}, mask::BitArray{M}) where {N,M}
function BitMaskedBitArray(parent::BitArray{N}, mask::BitArray{M}) where {N,M}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
function BitMaskedBitArray(parent::BitArray{N}, mask::BitArray{M}) where {N,M}
@inline function BitMaskedBitArray(parent::BitArray{N}, mask::BitArray{M}) where {N,M}

This one was overzealous — we should keep the force @inline here for the@boundscheck. It currently still inlines, but I think it's good practice to keep this here in case something else in here gets big enough to lose the auto-inline.

@StefanKarpinski
Copy link
Member

Beautiful!

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@timholy
Copy link
Member

timholy commented May 1, 2020

Very impressive gain!

@StefanKarpinski
Copy link
Member

Making the inlining cost model more accurate by making reality match the model FTW.

@mbauman
Copy link
Member Author

mbauman commented May 1, 2020

Well that turned out exceptionally well! I wasn't expecting any run-time performance gains but was fearful of losses. Looks like there are just a handful of extra allocations in a few cases... but it's in exchange for a 20%+ compile time speedup in the unit tests. I'll see if anything can be done about those allocations, but I think the fact that this will allow for future runtime improvements (like #30973) will make the broadcast superusers more than pleased.

@mbauman
Copy link
Member Author

mbauman commented May 5, 2020

Ok, the regressions are limited to broadcast expressions that use literal pows (like x.^2) or types (like round.(Int, x)). These end up using Refs inside tuples — like (Ref(^), x, Ref(Val(2))) and (Ref{Type{Int}}(Int), x). The problem is that these Refs are properly mutable and thus must be allocated because an inner (non-inlined) function could pull them out and store them somewhere. In short, this exasperates the need for a true immutable scalar type for broadcast; #18379.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented May 5, 2020

Maybe ImmutableRef would be a good name for it? The need for immutability here also informs the whole &x syntax: for the ccall usage we specifically want that to create a mutable box; for the broadcast usage we want it to create an immutable box. Good thing we didn't pick yet!

@vtjnash
Copy link
Member

vtjnash commented May 5, 2020

The main confusion with that might be that it's not really a Ref unless it's mutable (as that's the Julia keyword for pinning something to the heap to make it reference-able). It might more accurately be Scalarize, but I think Some might fit here too?

@mbauman
Copy link
Member Author

mbauman commented May 5, 2020

Oh interesting. Some is a new idea. It's exactly the struct we need and a half-decent (short!) word, but it doesn't have the getindex/length/size methods we need. Before settling on Ref I had once called this a struct ZeroDimArray{T} <: AbstractArray{T,0} in the broadcast internals. Scalar (or some variation thereof) sure seems to be the crowd favorite word here, but gosh do I find it confusing. Imagine adding this Scalar into the mix in this FAQ, for example. Singleton is another option that was recently proposed.

I reviewed all the issues/discourse threads I could find to look for other alternatives, but the above is just about the universe of what I found (Nullable anyone? hah). All the typical verbiage tends have mutable associations (e.g., Box).

@mbauman
Copy link
Member Author

mbauman commented May 5, 2020

What about using a lazy Fill a la FillArrays? Inspired by #33206, which gives:

julia> show(reshape([1], ()))
fill(1)

It actually fits linguistically, too — you can imagine the Fill()'ed argument as filling in across the broadcasted shape. The downside is that it's not as simple anymore — it covers far more than just a zero-dimensional array — and coordinating with FillArrays makes this more complicated yet.

@mbauman
Copy link
Member Author

mbauman commented May 5, 2020

Back to the issue at hand, I suppose this points towards another form of performance regression that this would introduce — it'd block constant propagation in the same vein that Ref blocked the constant propagation in the motivating example in #35591.

@vtjnash
Copy link
Member

vtjnash commented May 5, 2020

but it doesn't have the methods we need

Mostly, neither did Ref until it got co-opted for this use. And Some is the new name for the Nullable container. The Singleton doesn't seem quite right (it's not the only item in the set)

@tkf
Copy link
Member

tkf commented May 6, 2020

Having x[] for Some would be nice also if we have more Union{Nothing,Some}-based API like #34821 and #35699 (comment).

x = get(dict, key)
if x !== nothing
    f(x[])  # nicer than `f(something(x))`?
else
    ...
end

@rfourquet
Copy link
Member

I just proposed somewhere else

what about Only? only is already used in a container context with only one element, and there would be the property only(Only(x)) === x.

But Some and an hypothetical Only seem redundant. Anyway, only dropping an idea here.

@rfourquet rfourquet changed the title Remove inline annations from broadcast kernels Remove inline annotations from broadcast kernels May 6, 2020
@mbauman mbauman force-pushed the mb/outline-broadcast branch from 18b6944 to b688db5 Compare January 27, 2021 19:44
@vtjnash
Copy link
Member

vtjnash commented Apr 12, 2021

bump?

@mbauman mbauman force-pushed the mb/outline-broadcast branch from b688db5 to 5b8bba6 Compare April 12, 2021 18:54
@mbauman
Copy link
Member Author

mbauman commented Apr 12, 2021

I've rebased, but in doing so I was reminded that the blocker here was performance of Ref-ified scalars. The path forward here is through #39184

vtjnash added a commit that referenced this pull request Nov 23, 2021
N5N3 pushed a commit to N5N3/julia that referenced this pull request Nov 30, 2021
This removes the dependence on inlining for performance, so we also
remove `@inline`, since it can harm performance.

make Some type a zero-dim broadcast container (e.g. a scalar)

Replaces JuliaLang#35778
Replaces JuliaLang#39184
Fixes JuliaLang#39151
Refs JuliaLang#35675
Refs JuliaLang#43200
N5N3 pushed a commit to N5N3/julia that referenced this pull request Dec 1, 2021
This removes the dependence on inlining for performance, so we also
remove `@inline`, since it can harm performance.

make Some type a zero-dim broadcast container (e.g. a scalar)

Replaces JuliaLang#35778
Replaces JuliaLang#39184
Fixes JuliaLang#39151
Refs JuliaLang#35675
Refs JuliaLang#43200
N5N3 pushed a commit to N5N3/julia that referenced this pull request Dec 1, 2021
This removes the dependence on inlining for performance, so we also
remove `@inline`, since it can harm performance.

make Some type a zero-dim broadcast container (e.g. a scalar)

Replaces JuliaLang#35778
Replaces JuliaLang#39184
Fixes JuliaLang#39151
Refs JuliaLang#35675
Refs JuliaLang#43200
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
broadcast Applying a function over a collection
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants