Skip to content

Suggestion: usen_workers=2 when starting Dask clusterΒ #12

@paigem

Description

@paigem

Currently in the dask.ipynb dask tutorial, a cluster is spun up with only one worker. This doesn't yield an errors, but I think it is suboptimal for demonstrating the dask.delayed functionality near the end of the notebook. With only one worker, the following two blocks of code run in the same amount of time (~400ms):

# Define functions
def inc(x):
    time.sleep(0.1)
    return x + 1

def dec(x):
    time.sleep(0.1)
    return x - 1
    
def add(x, y):
    time.sleep(0.2)
    return x + y 

# Run without using Dask
x = inc(1)
y = dec(2)
z = add(x, y)
z
# Run using Dask
inc = dask.delayed(inc)
dec = dask.delayed(dec)
add = dask.delayed(add)

x = inc(1)
y = dec(2)
z = add(x, y)
z.compute()

With (at least) two workers, the second block runs in ~300ms, and shows the user that x and y can be computed in parallel (which can be seen on the Dask dashboard) and thus decrease the amount of time it takes to run this example.

So I suggest adding n_workers=2 to the following lines of code at near the beginning of the tutorial:

from dask.distributed import Client, LocalCluster
cluster = LocalCluster(n_workers=2)

Alternatively, we could spin up a new cluster with 2 workers specifically for the dask.delayed portion of the tutorial. Any preferences from folks, e.g. @rabernat?

Again, this was noticed during a workshop by @NickMortimer during the Dask Summit this week. πŸ™‚

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions