diff --git a/episodes/notebooks/0-introduction.ipynb b/episodes/notebooks/0-introduction.ipynb new file mode 100644 index 00000000..981f59cb --- /dev/null +++ b/episodes/notebooks/0-introduction.ipynb @@ -0,0 +1,80 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "e711a389-fa02-411b-a549-4612172c6a01", + "metadata": {}, + "source": [ + "# This course is\n", + "* Designed for researchers who write Python but lack formal computer science training\n", + "* Teaches how to assess where time is spent during the execution of a Python program\n", + "* Provides a high-level understanding of how code executes\n", + "* Explains how execution maps to performance bottlenecks and highlights good practices" + ] + }, + { + "cell_type": "markdown", + "id": "e2ab7b89-eada-4c43-8259-066cd5ec53ab", + "metadata": {}, + "source": [ + "# Expected Outcomes: \n", + "After this training, participants will be able to:\n", + "* Use tools like cProfile and line_profiler to find which functions or lines of code take the most time.\n", + "* Check code to understand what slows it down.\n", + "* Learn about some common performance problems and apply fixes to make code run faster" + ] + }, + { + "cell_type": "markdown", + "id": "303b5679-d006-4682-aa54-f818a77dbffb", + "metadata": {}, + "source": [ + "# Requirements to follow along the course:\n", + "1. Create a conda environment using python 3.11 or newer\n", + " In command line, run: \n", + "```bash\n", + "conda create --name py311_env python=3.11\n", + "conda activate py311_env\n", + "```\n", + "2. Install the required packages:\n", + "```bash\n", + "pip install pytest snakeviz line_profiler[all] numpy pandas matplotlib\n", + "```\n", + "\n", + "**Note for MacOS users**:\n", + "`line_profiler` could also be installed as well using conda:\n", + "```bash\n", + "conda install -c conda-forge line_profiler\n", + "```\n", + "\n", + "3. Install jupyter lab and register an ipykernel:\n", + "```bash\n", + "pip install jupyterlab \n", + "conda install ipykernel -y\n", + "python -m ipykernel install --user --name py311_env --display-name \"py311_env\"\n", + "```\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.8" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/episodes/notebooks/1-profiling-introduction.ipynb b/episodes/notebooks/1-profiling-introduction.ipynb new file mode 100644 index 00000000..a9ba2889 --- /dev/null +++ b/episodes/notebooks/1-profiling-introduction.ipynb @@ -0,0 +1,188 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "a47fdbc2-1d23-4913-9474-49c255ecbcd9", + "metadata": {}, + "source": [ + "# What is Performance Profiling?\n", + "It is the process of analysing and measuring the performance of a running code.\n", + "It is a dynamic analysis\n", + "\n", + "# Why should you profile your code?\n", + "##### 1. To assess the performance of you program/code, i.e, identify which operation is taking the longest time to execcute\n", + "##### 2. Useful when code grows more complex, making slow parts harder to spot \n", + "##### 3. Profiling highlights true bottlenecks, avoiding wasted effort on minor optimisations, and can lead to dramatic speedups.\n", + "##### 4. In HPC and beyond, profiling also ensures efficient use of energy and resources.\n", + "##### 5. It is a quick and inexpensive process, i.e., you get an instanteneous feedback about your code performance\n", + "* If no bottlenecks is identified, then you can be confident your code is performant\n", + "* Otherwise, the profiler will identify the piece of code that can benefit from an optimisation.\n", + "##### 6. Profiling is for everyone, not only for novices !!!\n" + ] + }, + { + "cell_type": "markdown", + "id": "239476f1-1f05-4151-98f8-8586d7872897", + "metadata": {}, + "source": [ + "## Different types of profilers:\n", + "* Manual profiling\n", + "* Function-Level Profiling\n", + "* Line-Level Profiling, among others" + ] + }, + { + "cell_type": "markdown", + "id": "849a0bb2-5fd2-4a74-b41b-85cfa9384158", + "metadata": {}, + "source": [ + "# 1. Manual profiling\n", + "* Manually adding timers around sections of code\n", + "* It provides a simple way to measure execution time and get a basic form of profiling.\n", + "* But it is intrusive to the code as we add plenty of `temporary lines of code`" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "d2979752-71b3-4b56-89b2-5f40bce977bb", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "first hello\n", + "hello all\n", + "A: 0.0003226249828003347 seconds\n", + "B: 3.500000457279384e-05 seconds\n", + "C: 6.700001540593803e-05 seconds\n", + "C: 0.00044095798511989415 seconds\n" + ] + } + ], + "source": [ + "# example of manual profiling:\n", + "import time\n", + "\n", + "# Record timestamps before and after different sections of code using time.monotonic()\n", + "t_a = time.monotonic()\n", + "print('first hello')\n", + "\n", + "t_b = time.monotonic()\n", + "a = \"hello\"\n", + "\n", + "t_c = time.monotonic()\n", + "c = a + \" all\"\n", + "print(c)\n", + "\n", + "t_d = time.monotonic()\n", + "mainTimer_stop = time.monotonic()\n", + "\n", + "# Calculate the time taken by subtracting the start time from the end time for each block.\n", + "print(f\"A: {t_b - t_a} seconds\")\n", + "print(f\"B: {t_c - t_b} seconds\")\n", + "print(f\"C: {t_d - t_c} seconds\")\n", + "print(f\"C: {mainTimer_stop - t_a} seconds\")" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "cacf5d33-73d3-4d31-9019-929dc6f5813b", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.7316456299404912" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "0.0003226249828003347/0.00044095798511989415" + ] + }, + { + "cell_type": "markdown", + "id": "b284fa39-1d01-45c4-9346-d5e73fe6f201", + "metadata": {}, + "source": [ + "### Summary of Manual profiling :\n", + "* It is handy for small sections of code\n", + "* Increasingly impractical as a project grows in size and complexity\n", + "* Also, it is time consuming to be routinely adding and removing these timestamp recordings if they are not relevant as outputs" + ] + }, + { + "cell_type": "markdown", + "id": "db4706da-ddd2-4c5a-acd7-c82889eedba2", + "metadata": {}, + "source": [ + "# 2. Function-Level Profiling\n", + "Software is made up of many functions, including those you write and those from the standard library or third-party packages.\n", + "\n", + "* `Function-level profiling` measures how much time your program spends in each function, including or excluding time spent in child functions\n", + "* Counts how often each function is called\n", + "* Helps identify functions that take up the most time, so you can focus on optimizing them.\n", + "* `Function-level profiling` may not always give enough detail, especially if a function is particularly complex.\n", + "\n", + "In this course, we will use `cProfile` for function-level profiling and `snakeviz` to visualize the results." + ] + }, + { + "cell_type": "markdown", + "id": "7649e2fd-5abc-4004-ba53-e74db341d5dc", + "metadata": {}, + "source": [ + "# 3. Line-Level Profiling\n", + "\n", + "* `Line-level profiling` looks at how much time is spent on `each individual line of code`.\n", + "* This helps identify specific lines that take up a large portion of the total runtime.\n", + "* In this course, we will use `line_profiler` for `line-level profiling`.\n", + "* `line_profiler` is deterministic, tracking every line of code executed, which could be very expensive\n", + "* To avoid it being too costly, the profiling is restricted to methods targeted with the decorator `@profile`.\n" + ] + }, + { + "cell_type": "markdown", + "id": "ebba3060-dccc-483b-bdee-95306246f6a7", + "metadata": {}, + "source": [ + "## Start Small, Scale Smart\n", + "A representative test-case should be profiled, that is large enough to amplify any bottlenecks whilst executing to completion quickly.\n", + "\n", + "* Profiling slows programs, so use a small, representative test-case.\n", + "\n", + "* Keep runs short (a few minutes if possible) to avoid huge output data.\n", + "\n", + "* Start small (e.g., one day of a year-long model) and scale if needed to spot bottlenecks." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.8" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/episodes/notebooks/2-profiling-functions.ipynb b/episodes/notebooks/2-profiling-functions.ipynb new file mode 100644 index 00000000..8fe15365 --- /dev/null +++ b/episodes/notebooks/2-profiling-functions.ipynb @@ -0,0 +1,501 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "a45fbb65-3542-47af-a57e-076033e1f147", + "metadata": {}, + "source": [ + "# Function profiling\n", + "* When is function level profiling appropriate?\n", + "* How can `cProfile` and `snakeviz` tools be used to profile a Python program?\n", + "* How are the outputs from function level profiling interpreted?" + ] + }, + { + "cell_type": "markdown", + "id": "26c6f773-6a70-4a19-8ed4-35abd37abc77", + "metadata": {}, + "source": [ + "### Software is usually made up of a hierarchy of function calls:\n", + "* Some written by the developer\n", + "\n", + "* Others from the standard library or third-party packages\n", + "\n", + "#### `Function-level profiling` shows where time is being spent in functions.\n", + "* It counts how many times each function is called\n", + " \n", + "* It measures total time spent in each function, including and excluding child function calls\n", + "This helps quickly find functions that take up a large share of the runtime.\n", + "\n", + "#### In this episode, we will:\n", + "\n", + "* Use the function-level profiler `cProfile`\n", + "\n", + "* Visualise the output with `snakeviz`\n", + "\n", + "* Learn how to interpret the results" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "0085d249-088e-47b9-8021-eb0e19a8b285", + "metadata": {}, + "outputs": [], + "source": [ + "# ![title](stack.png)" + ] + }, + { + "cell_type": "markdown", + "id": "cb57ed26-da76-4077-a7db-122885bef904", + "metadata": {}, + "source": [ + "# What is a Call Stack?\n", + "* A Last-in-first-out queue\n", + "* It keeps track of function calls and their associated variables\n", + "* Info is kept as long as the functions are active, i.e., still running\n", + "\n", + "Example of a function call stack using traceback package" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7a05824c", + "metadata": { + "scrolled": true + }, + "outputs": [], + "source": [ + "import traceback\n", + "\n", + "def a():\n", + " b1()\n", + " b2()\n", + " \n", + "def b1():\n", + " pass\n", + " \n", + "def b2():\n", + " c()\n", + " \n", + "def c():\n", + " traceback.print_stack()\n", + "\n", + "a()\n", + "\n", + "# File \"/var/folders/s0/2zsybdkd50s2xdxzyg37j4611m1b1l/T/ipykernel_16681/3225044458.py\", line 16, in \n", + "# a()\n", + "# File \"/var/folders/s0/2zsybdkd50s2xdxzyg37j4611m1b1l/T/ipykernel_16681/3225044458.py\", line 5, in a\n", + "# b2()\n", + "# File \"/var/folders/s0/2zsybdkd50s2xdxzyg37j4611m1b1l/T/ipykernel_16681/3225044458.py\", line 11, in b2\n", + "# c()\n", + "# File \"/var/folders/s0/2zsybdkd50s2xdxzyg37j4611m1b1l/T/ipykernel_16681/3225044458.py\", line 14, in c\n", + "# traceback.print_stack()" + ] + }, + { + "cell_type": "markdown", + "id": "de2c4b5c-7eb5-41ee-8391-484428444e82", + "metadata": {}, + "source": [ + "The stack trace shows the filename and line number of each function call.\n", + "\n", + "You may see stack traces like this when an unhandled exception is thrown by your code.\n", + "Stack traces are very helpful to identify the source line of the error" + ] + }, + { + "cell_type": "markdown", + "id": "90218270-5ded-42c2-b5aa-c3c01fb6b199", + "metadata": {}, + "source": [ + "# `cProfile`" + ] + }, + { + "cell_type": "markdown", + "id": "4e71fc96-2f99-4d74-b0b1-1bd222635468", + "metadata": {}, + "source": [ + "`cProfile` is part of python standard library, i.e., no need to install it.\n", + "* you can import it from within your python code\n", + "* or run it in command line as follows:\n", + " ``` bash\n", + " python -m cProfile -o