5555    simdgroup_load(data::MtlDeviceArray{T}, matrix_origin=(1, 1)) 
5656
5757Loads data from device or threadgroup memory into an 8x8 SIMD-group matrix 
58- and returns it. `T` must be either `Float16`  or `Float32 `. 
58+ and returns it. `T` must be either `Float16`, `Float32`,  or `BFloat16 `. 
5959
6060# Arguments 
6161- `matrix_origin::NTuple{2, Int64}=(1, 1)`: origin in the source memory to load from. 
@@ -65,7 +65,7 @@ and returns it. `T` must be either `Float16` or `Float32`.
6565    simdgroup_store(src, dest::MtlDeviceArray{T}, matrix_origin=(1, 1)) 
6666
6767Stores data from an 8x8 SIMD-group matrix into device or threadgroup memory. 
68- `T` must be either `Float16` or  `Float32`. 
68+ `T` must be either `Float16`,  `Float32`, `BFloat16 `. 
6969
7070# Arguments 
7171- `matrix_origin::NTuple{2, Int64}=(1, 1)`: origin in the destination memory to store to. 
@@ -119,7 +119,7 @@ The value for delta must be the same for all threads in the SIMD-group. This fun
119119doesn’t modify the upper delta lanes of data because it doesn’t wrap values around 
120120the SIMD-group. 
121121
122- T must be one of the following: Float32, Float16, Int32, UInt32, Int16, UInt16, Int8, or UInt8 
122+ T must be one of the following: Float32, Float16, BFloat16,  Int32, UInt32, Int16, UInt16, Int8, or UInt8 
123123""" 
124124simd_shuffle_down
125125
@@ -132,6 +132,6 @@ lane ID minus delta.
132132The value of delta must be the same for all threads in a SIMD-group. This function doesn’t 
133133modify the lower delta lanes of data because it doesn’t wrap values around the SIMD-group. 
134134
135- T must be one of the following: Float32, Float16, Int32, UInt32, Int16, UInt16, Int8, or UInt8 
135+ T must be one of the following: Float32, Float16, BFloat16,  Int32, UInt32, Int16, UInt16, Int8, or UInt8 
136136""" 
137137simd_shuffle_up
0 commit comments