willirath · willirath · Jun 20, 2019 · Jun 20, 2019 · Jun 20, 2019 · Jun 20, 2019
diff --git a/_binder_notebooks/01_local_cluster_monte_carlo_estimate_of_pi.ipynb b/_binder_notebooks/01_local_cluster_monte_carlo_estimate_of_pi.ipynb
diff --git a/_binder_notebooks/02_slurm_cluster_monte_carlo_estimate_of_pi.ipynb b/_binder_notebooks/02_slurm_cluster_monte_carlo_estimate_of_pi.ipynb
diff --git a/_binder_notebooks/03_tuning_adaptive_clusters.ipynb b/_binder_notebooks/03_tuning_adaptive_clusters.ipynb
@@ -0,0 +1,391 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Monte-Carlo Estimate of $\\pi$\n",
+    "\n",
+    "We want to estimate the number $\\pi$ using a [Monte-Carlo method](https://en.wikipedia.org/wiki/Pi#Monte_Carlo_methods) exploiting that the area of a quarter circle of unit radius is $\\pi/4$ and that hence the probability of any randomly chosen point in a unit square to lie in a unit circle centerd at a corner of the unit square is $\\pi/4$ as well.  So for N randomly chosen pairs $(x, y)$ with $x\\in[0, 1)$ and $y\\in[0, 1)$, we count the number $N_{circ}$ of pairs that also satisfy $(x^2 + y^2) < 1$ and estimage $\\pi \\approx 4 \\cdot N_{circ} / N$.\n",
+    "\n",
+    "[<img src=\"https://upload.wikimedia.org/wikipedia/commons/8/84/Pi_30K.gif\" \n",
+    "     width=\"50%\" \n",
+    "     align=top\n",
+    "     alt=\"PI monte-carlo estimate\">](https://en.wikipedia.org/wiki/Pi#Monte_Carlo_methods)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Core Lessons\n",
+    "\n",
+    "- Adaptive clusters\n",
+    "- Tuning the adaptivity"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Set up a Slurm cluster"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from dask.distributed import Client\n",
+    "from dask_jobqueue import SLURMCluster"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "cluster = SLURMCluster(\n",
+    "    cores=24,\n",
+    "    processes=2,\n",
+    "    memory=\"100GB\",\n",
+    "    shebang='#!/usr/bin/env bash',\n",
+    "    queue=\"batch\",\n",
+    "    walltime=\"00:30:00\",\n",
+    "    local_directory='/tmp',\n",
+    "    death_timeout=\"15s\",\n",
+    "    interface=\"ib0\",\n",
+    "    log_directory=f'{os.environ[\"SCRATCH_cecam\"]}/{os.environ[\"USER\"]}/dask_jobqueue_logs/',\n",
+    "    project=\"ecam\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "client = Client(cluster)\n",
+    "client"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## The job scripts"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(cluster.job_script())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Scale the cluster to two nodes"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "cluster.scale(4)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## The Monte Carlo Method"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import dask.array as da\n",
+    "import numpy as np"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def calc_pi_mc(size_in_bytes, chunksize_in_bytes=200e6):\n",
+    "    \"\"\"Calculate PI using a Monte Carlo estimate.\"\"\"\n",
+    "    \n",
+    "    size = int(size_in_bytes / 8)\n",
+    "    chunksize = int(chunksize_in_bytes / 8)\n",
+    "    \n",
+    "    xy = da.random.uniform(0, 1,\n",
+    "                           size=(size / 2, 2),\n",
+    "                           chunks=(chunksize / 2, 2))\n",
+    "    \n",
+    "    in_circle = ((xy ** 2).sum(axis=-1) < 1)\n",
+    "    pi = 4 * in_circle.mean()\n",
+    "\n",
+    "    return pi"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def print_pi_stats(size, pi, time_delta, num_workers):\n",
+    "    \"\"\"Print pi, calculate offset from true value, and print some stats.\"\"\"\n",
+    "    print(f\"{size / 1e9} GB\\n\"\n",
+    "          f\"\\tMC pi: {pi : 13.11f}\"\n",
+    "          f\"\\tErr: {abs(pi - np.pi) : 10.3e}\\n\"\n",
+    "          f\"\\tWorkers: {num_workers}\"\n",
+    "          f\"\\t\\tTime: {time_delta : 7.3f}s\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## The actual calculations\n",
+    "\n",
+    "We loop over different volumes of double-precision random numbers and estimate $\\pi$ as described above."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from time import time"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "for size in (1e9 * n for n in (1, 10, 100)):\n",
+    "    \n",
+    "    start = time()\n",
+    "    pi = calc_pi_mc(size).compute()\n",
+    "    elaps = time() - start\n",
+    "\n",
+    "    print_pi_stats(size, pi, time_delta=elaps,\n",
+    "                   num_workers=len(cluster.scheduler.workers))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Scaling the Cluster to twice its size\n",
+    "\n",
+    "We increase the number of workers by 2 and the re-run the experiments."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from time import sleep"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "new_num_workers = 2 * len(cluster.scheduler.workers)\n",
+    "\n",
+    "print(f\"Scaling from {len(cluster.scheduler.workers)} to {new_num_workers} workers.\")\n",
+    "\n",
+    "cluster.scale(new_num_workers)\n",
+    "\n",
+    "sleep(3)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "client"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Re-run same experiments with doubled cluster"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "for size in (1e9 * n for n in (1, 10, 100)):\n",
+    "    \n",
+    "        \n",
+    "    start = time()\n",
+    "    pi = calc_pi_mc(size).compute()\n",
+    "    elaps = time() - start\n",
+    "\n",
+    "    print_pi_stats(size, pi,\n",
+    "                   time_delta=elaps,\n",
+    "                   num_workers=len(cluster.scheduler.workers))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Automatically scale the cluster towards a target duration\n",
+    "\n",
+    "We'll target a wall time of 30 seconds.\n",
+    "\n",
+    "_**Watch** how the cluster will scale down to the minimum a few seconds after being made adaptive._"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "ca = cluster.adapt(\n",
+    "    minimum=2, maximum=30,\n",
+    "    target_duration=\"360s\",  # measured in CPU time per worker\n",
+    "                             # -> 30 seconds at 12 cores / worker\n",
+    "    scale_factor=1.0  # prevent from scaling up because of CPU or MEM need\n",
+    ");\n",
+    "\n",
+    "sleep(4)  # Allow for scale-down"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "client"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Repeat the calculation from above with larger work loads\n",
+    "\n",
+    "(And watch the dash board!)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "for size in (n * 1e9 for n in (200, 400, 800)):\n",
+    "    \n",
+    "    \n",
+    "    start = time()\n",
+    "    pi = calc_pi_mc(size, min(size / 1000, 500e6)).compute()\n",
+    "    elaps = time() - start\n",
+    "\n",
+    "    print_pi_stats(size, pi, time_delta=elaps,\n",
+    "                   num_workers=len(cluster.scheduler.workers))\n",
+    "    \n",
+    "    sleep(20)  # allow for scale-down time"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Summary\n",
+    "\n",
+    "- adaptivity with a target duration"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Complete listing of software used here"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%pip list"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%conda list --explicit"
+   ]
+  }
+ ],
+ "metadata": {
+  "anaconda-cloud": {},
+  "kernelspec": {
+   "display_name": "Python [conda env:dask_jobqueue_workshop]",
+   "language": "python",
+   "name": "conda-env-dask_jobqueue_workshop-py"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.7.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}