Add use case to docs, update README

pszufe · jpsamaroo · commit 586b986c7804 · 2023-08-15T10:44:09.000-05:00
diff --git a/README.md b/README.md
@@ -19,30 +19,54 @@ At the core of Dagger.jl is a scheduler heavily inspired by [Dask](https://docs.
 
 ## Installation
 
-Dagger.jl can be installed using the Julia package manager. Enter the Pkg REPL mode by typing "]" in the Julia REPL and then run:
+Dagger.jl can be installed using the Julia package manager. Enter the Pkg REPL
+mode by typing "]" in the Julia REPL and then run:
 
 ```julia
 pkg> add Dagger
 ```
-Or, equivalently, via the Pkg API:
+
+Or, equivalently, install Dagger via the Pkg API:
+
 ```julia
 julia> import Pkg; Pkg.add("Dagger")
 ```
 
 ## Usage
 
-Once installed, the `Dagger` package can by used like so
+Once installed, the `Dagger` package can be loaded with `using Dagger`, or if
+you want to use Dagger for distributed computing, it can be loaded as:
 
 ```julia
-using Distributed; addprocs() # get us some workers
+using Distributed; addprocs() # Add one Julia worker per CPU core
 using Dagger
+```
+
+You can run the following example to see how Dagger exposes easy parallelism:
+
+```julia
+# This runs first:
+a = Dagger.@spawn rand(100, 100)
 
-# do some stuff in parallel!
-a = Dagger.@spawn 1+3
-b = Dagger.@spawn rand(a, 4)
-c = Dagger.@spawn sum(b)
-fetch(c) # some number!
+# These run in parallel:
+b = Dagger.@spawn sum(a)
+c = Dagger.@spawn prod(a)
+
+# Finally, this runs:
+wait(Dagger.@spawn println("b: ", b, ", c: ", c))
 ```
+
+## Use Cases
+
+Dagger can support a variety of use cases that benefit from easy, automatic
+parallelism, such as:
+
+- [Parallelizing Nested Loops](https://juliaparallel.github.io/Dagger.jl/dev/use-cases/#Parallel-Nested-Loops)
+
+This isn't an exhaustive list of the use cases that Dagger supports. There are
+more examples in the docs, and more use cases examples are welcome (just file
+an issue or PR).
+
 ## Contributing Guide
 [![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg?style=flat-square)](http://makeapullrequest.com)
 [![GitHub issues](https://img.shields.io/github/issues/JuliaParallel/Dagger.jl)](https://github.com/JuliaParallel/Dagger.jl/issues)
diff --git a/docs/make.jl b/docs/make.jl
@@ -13,6 +13,9 @@ makedocs(;
     ),
     pages = [
         "Home" => "index.md",
+        "Use Cases" => [
+            "Parallel Nested Loops" => "use-cases/parallel-nested-loops.md",
+        ],
         "Task Spawning" => "task-spawning.md",
         "Data Management" => "data-management.md",
         "Distributed Arrays" => "darray.md",
diff --git a/docs/src/use-cases/parallel-nested-loops.md b/docs/src/use-cases/parallel-nested-loops.md
@@ -0,0 +1,86 @@
+# Use Case: Parallel Nested Loops
+
+One of the many applications of Dagger is that it can be used as a drop-in
+replacement for nested multi-threaded loops that would otherwise be written
+with `Threads.@threads`.
+
+Consider a simplified scenario where you want to calculate the maximum mean
+values of random samples of various lengths that have been generated by several
+distributions provided by the Distributions.jl package. The results should be
+collected into a DataFrame. We have the following function:
+
+```julia
+using Dagger, Random, Distributions, StatsBase, DataFrames
+
+function f(dist, len, reps, σ)
+    v = Vector{Float64}(undef, len) # avoiding allocations
+    maximum(mean(rand!(dist, v)) for _ in 1:reps)/σ
+end
+```
+
+Let us consider the following probability distributions for numerical
+experiments, all of which have expected values equal to zero, and the following
+lengths of vectors:
+
+```julia
+dists =  [Cosine, Epanechnikov, Laplace, Logistic, Normal, NormalCanon, PGeneralizedGaussian, SkewNormal, SkewedExponentialPower, SymTriangularDist]
+lens = [10, 20, 50, 100, 200, 500]
+```
+
+Using `Threads.@threads` those experiments could be parallelized as:
+
+```julia
+function experiments_threads(dists, lens, K=1000)
+    res = DataFrame()
+    lck = ReentrantLock()
+    Threads.@threads for T in dists
+        dist = T()
+        σ = std(dist)
+        for L in lens
+            z = f(dist, L, K, σ)
+            Threads.lock(lck) do
+                push!(res, (;T, σ, L, z))
+            end
+        end
+    end
+    res
+end
+```
+
+Note that `DataFrames.push!` is not a thread safe operation and hence we need
+to utilize a locking mechanism in order to avoid two threads appending the
+DataFrame at the same time.
+
+The same code could be rewritten in Dagger as:
+
+```julia
+function experiments_dagger(dists, lens, K=1000)
+    res = DataFrame()
+    @sync for T in dists
+        dist = T()
+        σ = Dagger.@spawn std(dist)
+        for L in lens
+            z = Dagger.@spawn f(dist, L, K, σ)
+            push!(res, (;T, σ, L, z))
+        end
+    end
+    res.z = fetch.(res.z)
+    res.σ = fetch.(res.σ)
+    res
+end
+```
+
+In this code we have job interdependence. Firstly, we are calculating the
+standard deviation `σ` and than we are using that value in the function `f`.
+Since `Dagger.@spawn` yields an `EagerThunk` rather than actual values, we need
+to use the `fetch` function to obtain those values. In this example, the value
+fetching is perfomed once all computations are completed (note that `@sync`
+preceding the loop forces the loop to wait for all jobs to complete). Also,
+note that contrary to the previous example, we do not need to implement locking
+as we are just pushing the `EagerThunk` results of `Dagger.@spawn` serially
+into the DataFrame (which is fast since `Dagger.@spawn` doesn't block).
+
+The above use case scenario has been tested by running `julia -t 8` (or with
+`JULIA_NUM_THREADS=8` as environment variable). The `Threads.@threads` code
+takes 1.8 seconds to run, while the Dagger code, which is also one line
+shorter, runs around 0.9 seconds, resulting in a 2x speedup.