Skip to content

Commit be2af94

Browse files
authored
safe=true and add a docstring, closes #434 (#435)
1 parent 9d8fb66 commit be2af94

File tree

4 files changed

+13
-7
lines changed

4 files changed

+13
-7
lines changed

src/condense_loopset.jl

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -914,6 +914,8 @@ function can_turbo(f::F, ::Val{NARGS})::Bool where {F,NARGS}
914914
promoted_op = Base.promote_op(f, ntuple(RetVec2Int(), Val(NARGS))...)
915915
return promoted_op !== Union{}
916916
end
917+
can_turbo(::typeof(vfmaddsub), ::Val{3}) = true
918+
can_turbo(::typeof(vfmsubadd), ::Val{3}) = true
917919

918920
"""
919921
check_turbo_safe(ls::LoopSet)

src/constructors.jl

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -154,7 +154,7 @@ function process_args(
154154
v::Int8 = zero(Int8),
155155
threads::Int = 1,
156156
warncheckarg::Int = 1,
157-
safe::Bool = false,
157+
safe::Bool = true,
158158
)
159159
for arg args
160160
inline, check_empty, u₁, u₂, v, threads, warncheckarg, safe =
@@ -295,6 +295,14 @@ The integer's value indicates the number of threads to use.
295295
It is clamped to be between `1` and `min(Threads.nthreads(),LoopVectorization.num_cores())`.
296296
`false` is equivalent to `1`, and `true` is equivalent to `min(Threads.nthreads(),LoopVectorization.num_cores())`.
297297
298+
`safe` (defaults to `true`) will cause `@turbo` to fall back to `@inbounds @fastmath` if `can_turbo` returns false for any of the functions called in the loop. You can disable the associated warning with `warn_check_args=false`.
299+
300+
Setting the keyword argument `warn_check_args=true` (e.g. `@turbo warn_check_args=true for ...`) in a loop or
301+
broadcast statement will cause it to warn once if `LoopVectorization.check_args` fails and the fallback
302+
loop is executed instead of the LoopVectorization-optimized loop.
303+
Setting it to an integer > 0 will warn that many times, while setting it to a negative integer will warn
304+
an unlimited amount of times. The default is `warn_check_args = 1`. Failure means that there may have been an array with unsupported type, unsupported element types, or (if `safe=true`) a function for which `can_turbo` returned `false`.
305+
298306
`inline` is a Boolean. When `true`, `body` will be directly inlined
299307
into the function (via a forced-inlining call to `_turbo_!`).
300308
When `false`, it wont force inlining of the call to `_turbo_!` instead, letting Julia's own inlining engine
@@ -324,12 +332,6 @@ and `@fastmath` is generated. Note that `VectorizationBase` provides functions s
324332
ignore `@fastmath`, preserving IEEE semantics both within `@turbo` and `@fastmath`.
325333
`check_args` currently returns false for some wrapper types like `LinearAlgebra.UpperTriangular`, requiring you to
326334
use their `parent`. Triangular loops aren't yet supported.
327-
328-
Setting the keyword argument `warn_check_args=true` (e.g. `@turbo warn_check_args=true for ...`) in a loop or
329-
broadcast statement will cause it to warn once if `LoopVectorization.check_args` fails and the fallback
330-
loop is executed instead of the LoopVectorization-optimized loop.
331-
Setting it to an integer > 0 will warn that many times, while setting it to a negative integer will warn
332-
an unlimited amount of times. The default is `warn_check_args = 0`.
333335
"""
334336
macro turbo(args...)
335337
turbo_macro(__module__, __source__, last(args), Base.front(args)...)

test/offsetarrays.jl

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
using LoopVectorization, ArrayInterface, OffsetArrays, Test
2+
using LoopVectorization: StaticInt
23
# T = Float64; r = -1:1;
34
# T = Float32; r = -1:1;
45

test/shuffleloadstores.jl

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
using LoopVectorization: vpermilps177, vmovshdup, vfmsubadd, vfmaddsub, vmovsldup
12
function dot_simd(a::AbstractVector, b::AbstractVector)
23
s = zero(eltype(a))
34
@fastmath @inbounds @simd for i eachindex(a)

0 commit comments

Comments
 (0)