Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
677 changes: 677 additions & 0 deletions _binder_notebooks/01_local_cluster_monte_carlo_estimate_of_pi.ipynb

Large diffs are not rendered by default.

822 changes: 822 additions & 0 deletions _binder_notebooks/02_slurm_cluster_monte_carlo_estimate_of_pi.ipynb

Large diffs are not rendered by default.

391 changes: 391 additions & 0 deletions _binder_notebooks/03_tuning_adaptive_clusters.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,391 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Monte-Carlo Estimate of $\\pi$\n",
"\n",
"We want to estimate the number $\\pi$ using a [Monte-Carlo method](https://en.wikipedia.org/wiki/Pi#Monte_Carlo_methods) exploiting that the area of a quarter circle of unit radius is $\\pi/4$ and that hence the probability of any randomly chosen point in a unit square to lie in a unit circle centerd at a corner of the unit square is $\\pi/4$ as well. So for N randomly chosen pairs $(x, y)$ with $x\\in[0, 1)$ and $y\\in[0, 1)$, we count the number $N_{circ}$ of pairs that also satisfy $(x^2 + y^2) < 1$ and estimage $\\pi \\approx 4 \\cdot N_{circ} / N$.\n",
"\n",
"[<img src=\"https://upload.wikimedia.org/wikipedia/commons/8/84/Pi_30K.gif\" \n",
" width=\"50%\" \n",
" align=top\n",
" alt=\"PI monte-carlo estimate\">](https://en.wikipedia.org/wiki/Pi#Monte_Carlo_methods)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Core Lessons\n",
"\n",
"- Adaptive clusters\n",
"- Tuning the adaptivity"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Set up a Slurm cluster"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from dask.distributed import Client\n",
"from dask_jobqueue import SLURMCluster"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"cluster = SLURMCluster(\n",
" cores=24,\n",
" processes=2,\n",
" memory=\"100GB\",\n",
" shebang='#!/usr/bin/env bash',\n",
" queue=\"batch\",\n",
" walltime=\"00:30:00\",\n",
" local_directory='/tmp',\n",
" death_timeout=\"15s\",\n",
" interface=\"ib0\",\n",
" log_directory=f'{os.environ[\"SCRATCH_cecam\"]}/{os.environ[\"USER\"]}/dask_jobqueue_logs/',\n",
" project=\"ecam\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"client = Client(cluster)\n",
"client"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## The job scripts"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(cluster.job_script())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Scale the cluster to two nodes"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"cluster.scale(4)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## The Monte Carlo Method"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import dask.array as da\n",
"import numpy as np"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def calc_pi_mc(size_in_bytes, chunksize_in_bytes=200e6):\n",
" \"\"\"Calculate PI using a Monte Carlo estimate.\"\"\"\n",
" \n",
" size = int(size_in_bytes / 8)\n",
" chunksize = int(chunksize_in_bytes / 8)\n",
" \n",
" xy = da.random.uniform(0, 1,\n",
" size=(size / 2, 2),\n",
" chunks=(chunksize / 2, 2))\n",
" \n",
" in_circle = ((xy ** 2).sum(axis=-1) < 1)\n",
" pi = 4 * in_circle.mean()\n",
"\n",
" return pi"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def print_pi_stats(size, pi, time_delta, num_workers):\n",
" \"\"\"Print pi, calculate offset from true value, and print some stats.\"\"\"\n",
" print(f\"{size / 1e9} GB\\n\"\n",
" f\"\\tMC pi: {pi : 13.11f}\"\n",
" f\"\\tErr: {abs(pi - np.pi) : 10.3e}\\n\"\n",
" f\"\\tWorkers: {num_workers}\"\n",
" f\"\\t\\tTime: {time_delta : 7.3f}s\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## The actual calculations\n",
"\n",
"We loop over different volumes of double-precision random numbers and estimate $\\pi$ as described above."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from time import time"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for size in (1e9 * n for n in (1, 10, 100)):\n",
" \n",
" start = time()\n",
" pi = calc_pi_mc(size).compute()\n",
" elaps = time() - start\n",
"\n",
" print_pi_stats(size, pi, time_delta=elaps,\n",
" num_workers=len(cluster.scheduler.workers))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Scaling the Cluster to twice its size\n",
"\n",
"We increase the number of workers by 2 and the re-run the experiments."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from time import sleep"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"new_num_workers = 2 * len(cluster.scheduler.workers)\n",
"\n",
"print(f\"Scaling from {len(cluster.scheduler.workers)} to {new_num_workers} workers.\")\n",
"\n",
"cluster.scale(new_num_workers)\n",
"\n",
"sleep(3)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"client"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Re-run same experiments with doubled cluster"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for size in (1e9 * n for n in (1, 10, 100)):\n",
" \n",
" \n",
" start = time()\n",
" pi = calc_pi_mc(size).compute()\n",
" elaps = time() - start\n",
"\n",
" print_pi_stats(size, pi,\n",
" time_delta=elaps,\n",
" num_workers=len(cluster.scheduler.workers))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Automatically scale the cluster towards a target duration\n",
"\n",
"We'll target a wall time of 30 seconds.\n",
"\n",
"_**Watch** how the cluster will scale down to the minimum a few seconds after being made adaptive._"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ca = cluster.adapt(\n",
" minimum=2, maximum=30,\n",
" target_duration=\"360s\", # measured in CPU time per worker\n",
" # -> 30 seconds at 12 cores / worker\n",
" scale_factor=1.0 # prevent from scaling up because of CPU or MEM need\n",
");\n",
"\n",
"sleep(4) # Allow for scale-down"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"client"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Repeat the calculation from above with larger work loads\n",
"\n",
"(And watch the dash board!)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for size in (n * 1e9 for n in (200, 400, 800)):\n",
" \n",
" \n",
" start = time()\n",
" pi = calc_pi_mc(size, min(size / 1000, 500e6)).compute()\n",
" elaps = time() - start\n",
"\n",
" print_pi_stats(size, pi, time_delta=elaps,\n",
" num_workers=len(cluster.scheduler.workers))\n",
" \n",
" sleep(20) # allow for scale-down time"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Summary\n",
"\n",
"- adaptivity with a target duration"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Complete listing of software used here"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip list"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%conda list --explicit"
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python [conda env:dask_jobqueue_workshop]",
"language": "python",
"name": "conda-env-dask_jobqueue_workshop-py"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Loading