Disable `CUDNN_SOFTMAX_FAST` or use a separate math mode variable for `softmax`

Since https://github.com/FluxML/NNlib.jl/pull/455 is merged, I want to point out that `CUDNN_SOFTMAX_FAST` would easily cause problem for attention operation. In the masking scenario, we would usually set the masked value to `-Inf` or some really small value, like `-1e9`. But if we want to use `CUDA.math_mode!(CUDA.FAST_MATH)` to accelerate the `gemm`, `softmax` would actually introduce many `NaN`s.

MWE:

```
julia> using CUDA, Flux
                                                                                                                       
julia> x = CUDA.randn(Float32, 512, 10); fill!(x, -1f3);

julia> CUDA.math_mode!(CUDA.FAST_MATH)

julia> softmax(x)
512×10 CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}:                                                                     
 NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
 NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
 NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
   ⋮                        ⋮
 NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
 NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
 NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
 NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Disable `CUDNN_SOFTMAX_FAST` or use a separate math mode variable for `softmax` #506

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Disable CUDNN_SOFTMAX_FAST or use a separate math mode variable for softmax #506

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Disable `CUDNN_SOFTMAX_FAST` or use a separate math mode variable for `softmax` #506