Skip to content

Commit 58a2864

Browse files
(partially) Update readme
1 parent 0bcbbf7 commit 58a2864

File tree

1 file changed

+46
-0
lines changed

1 file changed

+46
-0
lines changed

README.md

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1 +1,47 @@
11
# VectorizedReduction
2+
3+
## Installation
4+
5+
```julia
6+
using Pkg
7+
Pkg.add("VectorizedReduction")
8+
```
9+
10+
## Usage
11+
12+
This library provides "vectorized" (with/without multithreading) versions of the following functions
13+
1. `mapreduce` and common derived functions: `reduce`, `sum`, `prod`, `minimum`, `maximum`, `extrema`, `count`
14+
2. `any`, `all` (listed separately to emphasize theoretical considerations on applicability)
15+
3. `findmin`, `findmin`, `argmin`, `argmax`
16+
4. `logsumexp`, `softmax`, `logsoftmax`
17+
18+
The naming convention is as follows: a vectorized (without threading) version is prefixed by `v`, and a vectorized with threading version is prefixed by `vt`.
19+
There is a single exception to this rule: vectorized (without threading) versions of the functions listen in 1. are prefixed by `vv` in order to avoid name collisions with LoopVectorization and VectorizedStatistics.
20+
21+
## Motivation
22+
23+
When writing numerical code, one may with to perform a reduction, perhaps across multiple dimensions, as the most natural means of expressing the relevant mathematical operation.
24+
For readers well-acquainted with LoopVectorization.jl, the thought immediately comes to mind: writing out the loops will inevitably be a large performance gain. Thus, in that neverending pursuit of fast code, we write the loops -- but this produces specific code, tailored to the dimensions of the problem.
25+
Instead, we might have liked to write generic code, parameterizing our function with an index set of dimensions. This package attempts to resolve this perpetual dilemma using metaprogramming. The hope is that the next time one asks the question: is it worthwhile to write the loops (gain performance, lose genericity), or can I make do with the Base implementation? that one can confidently reach for one of the "vectorized" versions provided by this package.
26+
27+
Now, before diving into other interesting topics, a few salient comments. Foremost, this package is not intended as a univeral replacement for Base implementations of the same functions. The scope of functions provided by this package is limited to numerical code, and subject to all restrictions imposed by LoopVectorization (the user is encouraged to familiarize themselves with said package).
28+
Moreover, there exist limitations of `vfindmin` and friends which are not yet resolved (a workaround is provided, but not without a performance cost). That being said, the user may find useful the expanded capabilities of `vfindmin`/`vfindmax` which are not yet part of Base (an aspect the author hopes to rectify). The user is encouraged to read the additional commentary below, but it is not strictly necessary.
29+
30+
## Commentary
31+
### Why provide `dims` directly, rather than as kwargs? (an optional performance enhancement for small arrays)
32+
To define a multi-dimensional mapreduce, one specifies a index set of dimensions, `dims` over which the reduction will occur. In Base, one passes `dims` using keyword arguments, e.g. `sum(A, dims=(2,4))`. However, this sometimes incurs an overhead of ~20ns. Admittedly, small, and in almost all cases negligible, but, if one wishes to ensure that such costs are avoided, the interface also supports direct calls which provide `init` and `dims` as positional arguments.
33+
34+
### When to consider replacing any/all with vectorized versions
35+
Assumptions: x is inherently unordered, such that the probablilty of a "success" is independent of the index (i.e. independent of a random permutation of the effective vector over which the reduction occurs).
36+
This is a very reasonable assumption for certain types of data, e.g. Monte Carlo simulations. However, for cases in which there is some inherent order (e.g. a solution to an ODE, or even very simply, `map(exp, 0.0:0.1:10.0)`), then the analysis below does not hold. If inherent order exists, then the probability of success depends on said ordering -- the caller must then exercise judgment based on where the first success might land (if at all); as this is inevitably problem-specific, I am unable to offer general advice.
37+
38+
For inherently unordered data:
39+
Define `p` as the probability of success, i.e. the probability of `true` w.r.t. `any` and the probability of `false` w.r.t. `all`.
40+
The cumulative probability of evaluating all elements:
41+
```math
42+
Pr(x \leq 0) = {n \\choose 0} p^0 (1 - p)^{n-0} = (1 - p)^n
43+
```
44+
Define a linearized cost model, with `t` the time required to evaluate one element, and `n` the length of the vector. Denote `t_0` as the non-vectorized evaluation time per element, and `t_v` as the vectorized evaluation time per element. A crude estimate for the expected cost of the call is therefore
45+
```math
46+
t_{0} n (1 - p)^n
47+
```

0 commit comments

Comments
 (0)