Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 80 additions & 0 deletions episodes/notebooks/0-introduction.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "e711a389-fa02-411b-a549-4612172c6a01",
"metadata": {},
"source": [
"# This course is\n",
"* Designed for researchers who write Python but lack formal computer science training\n",
"* Teaches how to assess where time is spent during the execution of a Python program\n",
"* Provides a high-level understanding of how code executes\n",
"* Explains how execution maps to performance bottlenecks and highlights good practices"
]
},
{
"cell_type": "markdown",
"id": "e2ab7b89-eada-4c43-8259-066cd5ec53ab",
"metadata": {},
"source": [
"# Expected Outcomes: \n",
"After this training, participants will be able to:\n",
"* Use tools like cProfile and line_profiler to find which functions or lines of code take the most time.\n",
"* Check code to understand what slows it down.\n",
"* Learn about some common performance problems and apply fixes to make code run faster"
]
},
{
"cell_type": "markdown",
"id": "303b5679-d006-4682-aa54-f818a77dbffb",
"metadata": {},
"source": [
"# Requirements to follow along the course:\n",
"1. Create a conda environment using python 3.11 or newer\n",
" In command line, run: \n",
"```bash\n",
"conda create --name py311_env python=3.11\n",
"conda activate py311_env\n",
"```\n",
"2. Install the required packages:\n",
"```bash\n",
"pip install pytest snakeviz line_profiler[all] numpy pandas matplotlib\n",
"```\n",
"\n",
"**Note for MacOS users**:\n",
"`line_profiler` could also be installed as well using conda:\n",
"```bash\n",
"conda install -c conda-forge line_profiler\n",
"```\n",
"\n",
"3. Install jupyter lab and register an ipykernel:\n",
"```bash\n",
"pip install jupyterlab \n",
"conda install ipykernel -y\n",
"python -m ipykernel install --user --name py311_env --display-name \"py311_env\"\n",
"```\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.8"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
188 changes: 188 additions & 0 deletions episodes/notebooks/1-profiling-introduction.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,188 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "a47fdbc2-1d23-4913-9474-49c255ecbcd9",
"metadata": {},
"source": [
"# What is Performance Profiling?\n",
"It is the process of analysing and measuring the performance of a running code.\n",
"It is a dynamic analysis\n",
"\n",
"# Why should you profile your code?\n",
"##### 1. To assess the performance of you program/code, i.e, identify which operation is taking the longest time to execcute\n",
"##### 2. Useful when code grows more complex, making slow parts harder to spot \n",
"##### 3. Profiling highlights true bottlenecks, avoiding wasted effort on minor optimisations, and can lead to dramatic speedups.\n",
"##### 4. In HPC and beyond, profiling also ensures efficient use of energy and resources.\n",
"##### 5. It is a quick and inexpensive process, i.e., you get an instanteneous feedback about your code performance\n",
"* If no bottlenecks is identified, then you can be confident your code is performant\n",
"* Otherwise, the profiler will identify the piece of code that can benefit from an optimisation.\n",
"##### 6. Profiling is for everyone, not only for novices !!!\n"
]
},
{
"cell_type": "markdown",
"id": "239476f1-1f05-4151-98f8-8586d7872897",
"metadata": {},
"source": [
"## Different types of profilers:\n",
"* Manual profiling\n",
"* Function-Level Profiling\n",
"* Line-Level Profiling, among others"
]
},
{
"cell_type": "markdown",
"id": "849a0bb2-5fd2-4a74-b41b-85cfa9384158",
"metadata": {},
"source": [
"# 1. Manual profiling\n",
"* Manually adding timers around sections of code\n",
"* It provides a simple way to measure execution time and get a basic form of profiling.\n",
"* But it is intrusive to the code as we add plenty of `temporary lines of code`"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "d2979752-71b3-4b56-89b2-5f40bce977bb",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"first hello\n",
"hello all\n",
"A: 0.0003226249828003347 seconds\n",
"B: 3.500000457279384e-05 seconds\n",
"C: 6.700001540593803e-05 seconds\n",
"C: 0.00044095798511989415 seconds\n"
]
}
],
"source": [
"# example of manual profiling:\n",
"import time\n",
"\n",
"# Record timestamps before and after different sections of code using time.monotonic()\n",
"t_a = time.monotonic()\n",
"print('first hello')\n",
"\n",
"t_b = time.monotonic()\n",
"a = \"hello\"\n",
"\n",
"t_c = time.monotonic()\n",
"c = a + \" all\"\n",
"print(c)\n",
"\n",
"t_d = time.monotonic()\n",
"mainTimer_stop = time.monotonic()\n",
"\n",
"# Calculate the time taken by subtracting the start time from the end time for each block.\n",
"print(f\"A: {t_b - t_a} seconds\")\n",
"print(f\"B: {t_c - t_b} seconds\")\n",
"print(f\"C: {t_d - t_c} seconds\")\n",
"print(f\"C: {mainTimer_stop - t_a} seconds\")"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "cacf5d33-73d3-4d31-9019-929dc6f5813b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.7316456299404912"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"0.0003226249828003347/0.00044095798511989415"
]
},
{
"cell_type": "markdown",
"id": "b284fa39-1d01-45c4-9346-d5e73fe6f201",
"metadata": {},
"source": [
"### Summary of Manual profiling :\n",
"* It is handy for small sections of code\n",
"* Increasingly impractical as a project grows in size and complexity\n",
"* Also, it is time consuming to be routinely adding and removing these timestamp recordings if they are not relevant as outputs"
]
},
{
"cell_type": "markdown",
"id": "db4706da-ddd2-4c5a-acd7-c82889eedba2",
"metadata": {},
"source": [
"# 2. Function-Level Profiling\n",
"Software is made up of many functions, including those you write and those from the standard library or third-party packages.\n",
"\n",
"* `Function-level profiling` measures how much time your program spends in each function, including or excluding time spent in child functions\n",
"* Counts how often each function is called\n",
"* Helps identify functions that take up the most time, so you can focus on optimizing them.\n",
"* `Function-level profiling` may not always give enough detail, especially if a function is particularly complex.\n",
"\n",
"In this course, we will use `cProfile` for function-level profiling and `snakeviz` to visualize the results."
]
},
{
"cell_type": "markdown",
"id": "7649e2fd-5abc-4004-ba53-e74db341d5dc",
"metadata": {},
"source": [
"# 3. Line-Level Profiling\n",
"\n",
"* `Line-level profiling` looks at how much time is spent on `each individual line of code`.\n",
"* This helps identify specific lines that take up a large portion of the total runtime.\n",
"* In this course, we will use `line_profiler` for `line-level profiling`.\n",
"* `line_profiler` is deterministic, tracking every line of code executed, which could be very expensive\n",
"* To avoid it being too costly, the profiling is restricted to methods targeted with the decorator `@profile`.\n"
]
},
{
"cell_type": "markdown",
"id": "ebba3060-dccc-483b-bdee-95306246f6a7",
"metadata": {},
"source": [
"## Start Small, Scale Smart\n",
"A representative test-case should be profiled, that is large enough to amplify any bottlenecks whilst executing to completion quickly.\n",
"\n",
"* Profiling slows programs, so use a small, representative test-case.\n",
"\n",
"* Keep runs short (a few minutes if possible) to avoid huge output data.\n",
"\n",
"* Start small (e.g., one day of a year-long model) and scale if needed to spot bottlenecks."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.8"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading
Loading