Skip to content

Commit 40eae8e

Browse files
committed
Tweak docs and remove debugging code
1 parent ca1cf8a commit 40eae8e

File tree

2 files changed

+12
-7
lines changed

2 files changed

+12
-7
lines changed

src/constructors.jl

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -111,8 +111,12 @@ One can find some circumstances where `inline=true` is faster, and other circums
111111
where `inline=false` is faster, so the best setting may require experimentation.
112112
113113
`unroll` is an integer that specifies the loop unrolling factor, or a
114-
tuple `(4, 2)` signaling that the generated code should unroll more than
115-
one loop.
114+
tuple `(u₁, u₂) = (4, 2)` signaling that the generated code should unroll more than
115+
one loop. `u₁` is the unrolling factor for the innermost loop and `u₂` for the next-innermost loop,
116+
but it applies to the loop ordering that will be chosen by LoopVectorization,
117+
*not* the order in `body`.
118+
`uᵢ=0` (the default) indicates that LoopVectorization should pick its own value,
119+
and `uᵢ=-1` disables unrolling for the correspond loop.
116120
"""
117121
macro avx(q)
118122
q = macroexpand(__module__, q)

src/reconstruct_loopset.jl

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -418,12 +418,13 @@ function _avx_loopset(OPSsv, ARFsv, AMsv, LPSYMsv, LBsv, @nospecialize(vargs))
418418
AMsv, LPSYMsv, LBsv, vargs
419419
)
420420
end
421-
const _body_ = Ref{Any}(nothing)
422421
"""
423-
_avx_!(ut, ops, arf, am, lpsym, lb, vargs...)
422+
_avx_!(unroll, ops, arf, am, lpsym, lb, vargs...)
424423
425424
Execute an `@avx` block. The block's code is represented via the arguments:
426-
- `ut` is `Val((U,T))`, where `U` is the unrolling factor and `T` ?has something to do with tiling?
425+
- `unroll` is `Val((u₁,u₂))` and specifies the loop unrolling factor(s).
426+
These values may be supplied manually via the `unroll` keyword
427+
of [`@avx`](@ref).
427428
- `ops` is `Tuple{mod1, sym1, op1, mod2, sym2, op2...}` encoding the operations of the loop.
428429
`mod` and `sym` encode the module and symbol of the called function; `op` is an [`OperationStruct`](@ref)
429430
encoding the details of the operation.
@@ -436,8 +437,8 @@ Execute an `@avx` block. The block's code is represented via the arguments:
436437
`StaticLowerUnitRange(1)` because the lower bound of the iterator can be determined to be 1.
437438
- `vargs...` holds the encoded pointers of all the arrays (see `VectorizationBase`'s various pointer types).
438439
"""
439-
@generated function _avx_!(::Val{UT}, ::Type{OPS}, ::Type{ARF}, ::Type{AM}, ::Type{LPSYM}, lb::LB, vargs...) where {UT, OPS, ARF, AM, LPSYM, LB}
440+
@generated function _avx_!(::Val{UNROLL}, ::Type{OPS}, ::Type{ARF}, ::Type{AM}, ::Type{LPSYM}, lb::LB, vargs...) where {UNROLL, OPS, ARF, AM, LPSYM, LB}
440441
1 + 1 # Irrelevant line you can comment out/in to force recompilation...
441442
ls = _avx_loopset(OPS.parameters, ARF.parameters, AM.parameters, LPSYM.parameters, LB.parameters, vargs)
442-
return _body_[] = copy(avx_body(ls, UT))
443+
avx_body(ls, UNROLL)
443444
end

0 commit comments

Comments
 (0)