11# Profiling
22
3- ``` {questions }
4- - When shall we worry about the performance of our code?
5- - How do we find bottlenecks in our code?
6- - How do we measure improvements in running time and memory usage?
7- ```
3+ :::{objectives }
4+ - Understand when improving code performance is worth the time and effort.
5+ - Knowing how to find performance bottlenecks in Python code.
6+ - Try ` scalene ` as one of many tools to profile Python code.
7+ :::
88
9- ``` {objectives}
10- - Understand when improving code performance is worth the time and effort.
11- - Learn how to use profilers in Python.
12- - Use `scalene` to find and optimize bottlenecks in a given code example.
13- ```
9+ :::{instructor-note}
10+ - Discussion: 20 min
11+ - Exercise: 20 min
12+ :::
1413
1514
16- > [ !IMPORTANT]
17- > Left to do:
18- > Give 20 minutes introduction to profiling:
19- > - [ ] Discuss when to profile
20- > - [ ] Discuss breifly manual profiling
21- > - [ ] Introduce function call profilers
22- > - [ ] Introduce line profilers
23- > - [ ] Visualize one code example using ` scalane `
15+ ## Should we even optimize the code?
2416
17+ Classic quote to keep in mind: "Premature optimization is the root of all evil." [ Donald Knuth]
18+
19+ :::{discussion}
20+ It is important to ask ourselves whether it is worth it.
21+ - Is it worth spending e.g. 2 days to make a program run 20% faster?
22+ - Is it worth optimizing the code so that it spends 90% less memory?
23+
24+ Depends. What does it depend on?
25+ :::
26+
27+
28+ ## Measure instead of guessing
29+
30+ Before doing code surgery to optimize the run time or lower the memory usage,
31+ we should ** measure** where the bottlenecks are. This is called ** profiling** .
32+
33+ Analogy: Medical doctors don't start surgery based on guessing. They first measure
34+ (X-ray, MRI, ...) to know precisely where the problem is.
35+
36+ Not only programming beginners can otherwise guess wrong, but also experienced
37+ programmers can be surprised by the results of profiling.
38+
39+
40+ ## One of the simplest tools is to insert timers
41+
42+ Below we will list some tools that can be used to profile Python code.
43+ But even without these tools you can find ** time-consuming parts** of your code
44+ by inserting timers:
45+
46+
47+
48+ ``` {code-block} python
49+ ---
50+ emphasize-lines: 1,8,10
51+ ---
52+ import time
53+
54+
55+ # ...
56+ # code before the function
57+
58+
59+ start = time.time()
60+ result = some_function()
61+ print(f"some_function took {time.time() - start} seconds")
2562
26- ## Exercises
2763
28- :::{exercise} Exercise Profiling-1
29- Work in progress: we will provide an exercise showing the improvement in
30- performance when introducing numpy and/or pandas.
64+ # code after the function
65+ # ...
66+ ```
67+
68+
69+ ## Many tools exist
70+
71+ The list below here is probably not complete, but it gives an overview of the
72+ different tools available for profiling Python code.
73+
74+ CPU profilers:
75+ - [ cProfile and profile] ( https://docs.python.org/3/library/profile.html )
76+ - [ line_profiler] ( https://kernprof.readthedocs.io/ )
77+ - [ py-spy] ( https://github.com/benfred/py-spy )
78+ - [ Yappi] ( https://github.com/sumerc/yappi )
79+ - [ pyinstrument] ( https://pyinstrument.readthedocs.io/ )
80+ - [ Perfetto] ( https://perfetto.dev/docs/analysis/trace-processor-python )
81+
82+ Memory profilers:
83+ - [ memory_profiler] ( https://pypi.org/project/memory-profiler/ ) (not actively maintained)
84+ - [ Pympler] ( https://pympler.readthedocs.io/ )
85+ - [ tracemalloc] ( https://docs.python.org/3/library/tracemalloc.html )
86+ - [ guppy/heapy] ( https://github.com/zhuyifei1999/guppy3/ )
87+
88+ Both CPU and memory:
89+ - [ Scalene] ( https://github.com/plasma-umass/scalene )
90+
91+ In the exercise below, we will use Scalene to profile a Python program. Scalene
92+ is a sampling profiler that can profile CPU, memory, and GPU usage of Python.
93+
94+
95+ ## Tracing profilers vs. sampling profilers
96+
97+ ** Tracing profilers** record every function call and event in the program,
98+ logging the exact sequence and duration of events.
99+ - ** Pros:**
100+ - Provides detailed information on the program's execution.
101+ - Deterministic: Captures exact call sequences and timings.
102+ - ** Cons:**
103+ - Higher overhead, slowing down the program.
104+ - Can generate larger amount of data.
105+
106+ ** Sampling profilers** periodically samples the program's state (where it is
107+ and how much memory is used), providing a statistical view of where time is
108+ spent.
109+ - ** Pros:**
110+ - Lower overhead, as it doesn't track every event.
111+ - Scales better with larger programs.
112+ - ** Cons:**
113+ - Less precise, potentially missing infrequent or short calls.
114+ - Provides an approximation rather than exact timing.
115+
116+ :::{discussion} Analogy: Imagine we want to optimize the London Underground (subway) system
117+ We wish to detect bottlenecks in the system to improve the service and for this we have
118+ asked few passengers to help us by tracking their journey.
119+ - ** Tracing** : We follow every train and passenger, recording every stop
120+ and delay. When passengers enter and exit the train, we record the exact time
121+ and location.
122+ - ** Sampling** : Every 5 minutes the phone notifies the passenger to note
123+ down their current location. We then use this information to estimate
124+ the most crowded stations and trains.
31125:::
32126
33- ::::{exercise} Exercise Profiling-2
34- In this exercise we will use the ` scalene ` profiler to find out where most of the time is spent
127+
128+ ## Choosing the right system size
129+
130+ Sometimes we can configure the system size (for instance the time step in a simulation
131+ or the number of time steps or the matrix dimensions) to make the program finish sooner.
132+
133+ For profiling, we should choose a system size that is ** representative of the real-world**
134+ use case. If we profile a program with a small input size, we might not see the same
135+ bottlenecks as when running the program with a larger input size.
136+
137+ Often, when we scale up the system size, or scale the number of processors, new bottlenecks
138+ might appear which we didn't see before. This brings us back to: "measure instead of guessing".
139+
140+
141+ ## Exercises
142+
143+ ::::{exercise} Exercise: Practicing profiling
144+ In this exercise we will use the Scalene profiler to find out where most of the time is spent
35145and most of the memory is used in a given code example.
36146
37147Please try to go through the exercise in the following steps:
@@ -58,55 +168,11 @@ Please try to go through the exercise in the following steps:
58168 You can find an example of the generated HTML report in the solution below.
591691 . Does the result match your prediction? Can you explain the results?
60170
61- ``` python
62- """
63- The code below reads a text file and counts the number of unique words in it
64- (case-insensitive).
65- """
66- import re
67-
68-
69- def count_unique_words1 (file_path : str ) -> int :
70- with open (file_path, " r" , encoding = " utf-8" ) as file :
71- text = file .read()
72- words = re.findall(r " \b\w + \b " , text.lower())
73- return len (set (words))
74-
75-
76- def count_unique_words2 (file_path : str ) -> int :
77- unique_words = []
78- with open (file_path, " r" , encoding = " utf-8" ) as file :
79- for line in file :
80- words = re.findall(r " \b\w + \b " , line.lower())
81- for word in words:
82- if word not in unique_words:
83- unique_words.append(word)
84- return len (unique_words)
85-
86-
87- def count_unique_words3 (file_path : str ) -> int :
88- unique_words = set ()
89- with open (file_path, " r" , encoding = " utf-8" ) as file :
90- for line in file :
91- words = re.findall(r " \b\w + \b " , line.lower())
92- for word in words:
93- unique_words.add(word)
94- return len (unique_words)
95-
96-
97- def main ():
98- # book.txt is downloaded from https://www.gutenberg.org/cache/epub/2600/pg2600.txt
99- _result = count_unique_words1(" book.txt" )
100- _result = count_unique_words2(" book.txt" )
101- _result = count_unique_words3(" book.txt" )
102-
103-
104- if __name__ == " __main__" :
105- main()
106- ```
171+ :::{literalinclude} profiling/exercise.py
172+ :::
107173
108174:::{solution}
109- ``` {figure} profiling/exercise2 .png
175+ ``` {figure} profiling/exercise .png
110176 :alt: Result of the profiling run for the above code example.
111177 :width: 100%
112178
0 commit comments