Skip to content

Commit 21b86bb

Browse files
authored
Fix typo (#463)
temsor -> tensor
1 parent b4424f8 commit 21b86bb

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

docs/src/examples/multithreading.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -228,7 +228,7 @@ Meaning we could fit three 886x886 matrices in our L2 cache by splitting them up
228228

229229
Aside from the fact that LoopVectorization did much better than OpenBLAS--Julia's default library--over this size range, LoopVectorization's major advantage that it should perform similarly well for a wide variety of comparable operations and not just GEMM (GEneral Matrix-Matrix multiplication) specifically. GEMM has long been a motivating benchmark, as it's one of the best optimized routines available to compare against and get a sense of how well you're doing vs hand-tuned limits optimized in assembly.
230230

231-
Because it is so well optimized, a standard trick for implementing more general optimized routines is to convert them into GEMM calls. For example, this is commonly done for temsor operations (see, e.g., [TensorOperations.jl](https://github.com/Jutho/TensorOperations.jl)) as well as for convolutions, e.g. in [NNlib](https://github.com/FluxML/NNlib.jl/blob/ca82fb23928c7ee7d08afb722718cf93be13f81c/src/impl/conv_im2col.jl#L25)'s `conv_im2col!`, their default optimized convolution function.
231+
Because it is so well optimized, a standard trick for implementing more general optimized routines is to convert them into GEMM calls. For example, this is commonly done for tensor operations (see, e.g., [TensorOperations.jl](https://github.com/Jutho/TensorOperations.jl)) as well as for convolutions, e.g. in [NNlib](https://github.com/FluxML/NNlib.jl/blob/ca82fb23928c7ee7d08afb722718cf93be13f81c/src/impl/conv_im2col.jl#L25)'s `conv_im2col!`, their default optimized convolution function.
232232

233233
Lets take a look at convolutions as our next example. We create a batch of a hundred 256x256 images with 3 input channels, and convolve them with a 5x5 kernel producing 6 output channels.
234234
```julia

0 commit comments

Comments
 (0)