Skip to content

Commit 586b986

Browse files
pszufejpsamaroo
authored andcommitted
Add use case to docs, update README
1 parent 5bbb9d1 commit 586b986

File tree

3 files changed

+122
-9
lines changed

3 files changed

+122
-9
lines changed

README.md

Lines changed: 33 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -19,30 +19,54 @@ At the core of Dagger.jl is a scheduler heavily inspired by [Dask](https://docs.
1919

2020
## Installation
2121

22-
Dagger.jl can be installed using the Julia package manager. Enter the Pkg REPL mode by typing "]" in the Julia REPL and then run:
22+
Dagger.jl can be installed using the Julia package manager. Enter the Pkg REPL
23+
mode by typing "]" in the Julia REPL and then run:
2324

2425
```julia
2526
pkg> add Dagger
2627
```
27-
Or, equivalently, via the Pkg API:
28+
29+
Or, equivalently, install Dagger via the Pkg API:
30+
2831
```julia
2932
julia> import Pkg; Pkg.add("Dagger")
3033
```
3134

3235
## Usage
3336

34-
Once installed, the `Dagger` package can by used like so
37+
Once installed, the `Dagger` package can be loaded with `using Dagger`, or if
38+
you want to use Dagger for distributed computing, it can be loaded as:
3539

3640
```julia
37-
using Distributed; addprocs() # get us some workers
41+
using Distributed; addprocs() # Add one Julia worker per CPU core
3842
using Dagger
43+
```
44+
45+
You can run the following example to see how Dagger exposes easy parallelism:
46+
47+
```julia
48+
# This runs first:
49+
a = Dagger.@spawn rand(100, 100)
3950

40-
# do some stuff in parallel!
41-
a = Dagger.@spawn 1+3
42-
b = Dagger.@spawn rand(a, 4)
43-
c = Dagger.@spawn sum(b)
44-
fetch(c) # some number!
51+
# These run in parallel:
52+
b = Dagger.@spawn sum(a)
53+
c = Dagger.@spawn prod(a)
54+
55+
# Finally, this runs:
56+
wait(Dagger.@spawn println("b: ", b, ", c: ", c))
4557
```
58+
59+
## Use Cases
60+
61+
Dagger can support a variety of use cases that benefit from easy, automatic
62+
parallelism, such as:
63+
64+
- [Parallelizing Nested Loops](https://juliaparallel.github.io/Dagger.jl/dev/use-cases/#Parallel-Nested-Loops)
65+
66+
This isn't an exhaustive list of the use cases that Dagger supports. There are
67+
more examples in the docs, and more use cases examples are welcome (just file
68+
an issue or PR).
69+
4670
## Contributing Guide
4771
[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg?style=flat-square)](http://makeapullrequest.com)
4872
[![GitHub issues](https://img.shields.io/github/issues/JuliaParallel/Dagger.jl)](https://github.com/JuliaParallel/Dagger.jl/issues)

docs/make.jl

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,9 @@ makedocs(;
1313
),
1414
pages = [
1515
"Home" => "index.md",
16+
"Use Cases" => [
17+
"Parallel Nested Loops" => "use-cases/parallel-nested-loops.md",
18+
],
1619
"Task Spawning" => "task-spawning.md",
1720
"Data Management" => "data-management.md",
1821
"Distributed Arrays" => "darray.md",
Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
# Use Case: Parallel Nested Loops
2+
3+
One of the many applications of Dagger is that it can be used as a drop-in
4+
replacement for nested multi-threaded loops that would otherwise be written
5+
with `Threads.@threads`.
6+
7+
Consider a simplified scenario where you want to calculate the maximum mean
8+
values of random samples of various lengths that have been generated by several
9+
distributions provided by the Distributions.jl package. The results should be
10+
collected into a DataFrame. We have the following function:
11+
12+
```julia
13+
using Dagger, Random, Distributions, StatsBase, DataFrames
14+
15+
function f(dist, len, reps, σ)
16+
v = Vector{Float64}(undef, len) # avoiding allocations
17+
maximum(mean(rand!(dist, v)) for _ in 1:reps)/σ
18+
end
19+
```
20+
21+
Let us consider the following probability distributions for numerical
22+
experiments, all of which have expected values equal to zero, and the following
23+
lengths of vectors:
24+
25+
```julia
26+
dists = [Cosine, Epanechnikov, Laplace, Logistic, Normal, NormalCanon, PGeneralizedGaussian, SkewNormal, SkewedExponentialPower, SymTriangularDist]
27+
lens = [10, 20, 50, 100, 200, 500]
28+
```
29+
30+
Using `Threads.@threads` those experiments could be parallelized as:
31+
32+
```julia
33+
function experiments_threads(dists, lens, K=1000)
34+
res = DataFrame()
35+
lck = ReentrantLock()
36+
Threads.@threads for T in dists
37+
dist = T()
38+
σ = std(dist)
39+
for L in lens
40+
z = f(dist, L, K, σ)
41+
Threads.lock(lck) do
42+
push!(res, (;T, σ, L, z))
43+
end
44+
end
45+
end
46+
res
47+
end
48+
```
49+
50+
Note that `DataFrames.push!` is not a thread safe operation and hence we need
51+
to utilize a locking mechanism in order to avoid two threads appending the
52+
DataFrame at the same time.
53+
54+
The same code could be rewritten in Dagger as:
55+
56+
```julia
57+
function experiments_dagger(dists, lens, K=1000)
58+
res = DataFrame()
59+
@sync for T in dists
60+
dist = T()
61+
σ = Dagger.@spawn std(dist)
62+
for L in lens
63+
z = Dagger.@spawn f(dist, L, K, σ)
64+
push!(res, (;T, σ, L, z))
65+
end
66+
end
67+
res.z = fetch.(res.z)
68+
res.σ = fetch.(res.σ)
69+
res
70+
end
71+
```
72+
73+
In this code we have job interdependence. Firstly, we are calculating the
74+
standard deviation `σ` and than we are using that value in the function `f`.
75+
Since `Dagger.@spawn` yields an `EagerThunk` rather than actual values, we need
76+
to use the `fetch` function to obtain those values. In this example, the value
77+
fetching is perfomed once all computations are completed (note that `@sync`
78+
preceding the loop forces the loop to wait for all jobs to complete). Also,
79+
note that contrary to the previous example, we do not need to implement locking
80+
as we are just pushing the `EagerThunk` results of `Dagger.@spawn` serially
81+
into the DataFrame (which is fast since `Dagger.@spawn` doesn't block).
82+
83+
The above use case scenario has been tested by running `julia -t 8` (or with
84+
`JULIA_NUM_THREADS=8` as environment variable). The `Threads.@threads` code
85+
takes 1.8 seconds to run, while the Dagger code, which is also one line
86+
shorter, runs around 0.9 seconds, resulting in a 2x speedup.

0 commit comments

Comments
 (0)