Skip to content

Commit 7a8bf22

Browse files
committed
update the language
1 parent 0714228 commit 7a8bf22

File tree

1 file changed

+43
-18
lines changed

1 file changed

+43
-18
lines changed

docs/docs/codeflash-concepts/benchmarking.md

Lines changed: 43 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,27 @@ Codeflash reports benchmarking results that look like this.
44

55
⏱️ Runtime : 32.8 microseconds → 29.2 microseconds (best of 315 runs)
66

7+
To measure the runtime of code, Codeflash runs the function multiple times, on several inputs
8+
and sums the minimum time of each input to get the total runtime.
9+
10+
A simplified pseudocode of the benchmarking that Codeflash implements looks like this -
11+
12+
```python
13+
loops = 0
14+
start_time = time.time()
15+
min_input_runtime = [float('inf')] * len(test_inputs)
16+
while loops <= 5 or time.time() - start_time < 10:
17+
loops += 1
18+
for input_index, input in enumerate(test_inputs):
19+
t = time(function(input))
20+
if t < min_input_runtime[input_index]:
21+
min_input_runtime[input_index] = t
22+
total_runtime = sum(min_input_runtime)
23+
number_of_runs = loops
24+
```
25+
26+
The above code runs the function multiple times on different inputs and takes the minimum time for each input.
27+
728
In this document we explain -
829
- how we measure the runtime of code
930
- how we determine if an optimization is actually faster
@@ -71,61 +92,65 @@ So the key idea is that we find the minimum time that the train can achieve betw
7192

7293
## How Codeflash benchmarks code
7394

74-
The idea of measuring the fastest speed, which minimizes the noise, is the same idea that Codeflash uses to measure the runtime of code.
95+
The idea of measuring the fastest speed, which minimizes the external noise, is the same idea that Codeflash uses to measure the runtime of code.
7596
With computer processors, there are many different types of noise that can increase the runtime of a function.
7697
The noise can be caused by -
7798
- The hardware - there can be cache misses, cpu frequency scaling up and down, etc.
7899
- the operating system - there can be context switches, memory allocation, etc.
79100
- the language - there can be garbage collection, thread scheduling etc.
80101

81102
Codeflash tries to minimize the noise by running the function multiple times and taking the minimum time.
82-
This is when the function is not slowed down by any hindrances. The processor frequency is at its maximum,
83-
cache misses are not happening, the operating system is not doing any context switches etc.
103+
This happens when the function is not slowed down by any hindrances. The processor frequency is at its maximum,
104+
cache misses are minimal, the operating system is not doing any context switches etc.
84105
This is the fastest speed that the function can achieve, and is the most accurate measure of the intrinsic speed of the function.
85106

86107
When Codeflash wants to measure if an optimization is faster than the original function, it runs the two functions
87108
multiple times and takes the minimum time for both the functions. This most accurate measurement of the
88109
intrinstic speed of the function, which is the signal we are looking for. We can now compare the two functions and see which one is faster.
89110

90111
We have found that when we run the function several times, the chance of getting "lucky" when the function is not
91-
slowed down by any hindrances gets very high. There codeflash tries to run the function as many times as reasonably possible.
92-
Currently we loop the code for 10 seconds and a minimum of 5 loops, which gives us a good balance between accuracy of runtime of and the time it takes to run the function.
112+
slowed down by any hindrances becomes very high. To maximize this luck, codeflash tries to run the function as many times as reasonably possible.
113+
Currently, we loop the code for 10 seconds with a minimum of 5 loops, which gives us a good balance between accuracy of time measurement and the time it takes to run the function.
93114

94115
## What happens when there are multiple inputs to a function?
95116

96117
The above idea works well when there is only one input to a function. But what if there are multiple inputs?
97118

98-
Lets consider the train analogy again. Now the train goes between multiple stations. It first starts from Seattle up north,
119+
Let's consider the train analogy again. Now the train race is extended between multiple stations. It first starts from Seattle up north,
99120
and then goes south to San Francisco, then Los Angeles, and finally terminating at San Diego. We want to again measure
100-
which train is the faster one on this route.
121+
which train is the faster one for this route.
101122

102123
We can only measure the time taken by the train to go from one station to the next.
103124

104125
Here is how the timing data looks like. The units are in hours.
105126

106127
![img_1.png](img_1.png)
107-
Now our task becomes seemingly harder to decide which train is faster because now there are 50x3x2 data points to consider.
128+
Now the task to decide which train is faster becomes even harder since now there are 50x3x2 data points to consider.
108129
Unfortunately, the timing data is also noisy. Running the same train on the same route might not give the same time, because
109130
of external factors like weather, traffic, etc. This makes it hard to determine which train is faster.
110131

111132
So, which train is faster?
112133

113134
The above insight of measuring the fastest speed of the train is still applicable here. But since there are multiple
114135
sectors, we need to measure the fastest speed of the train separately for each sector. This is because one sector might
115-
have hills or winding tracks, which might slow down the train. But these will affect both the trains equally.
136+
have hills or winding tracks, which might slow down the train.
116137

117-
So to find the train that is fastest between the two stations, we find the minimum time taken by the train to go from one station to the next.
138+
So, we divide the route into sectors between the stations,
139+
and measure the fastest speed of the train for each sector. To find the train that is fastest between the two stations, we find the minimum time taken by the train to go from one station to the next.
118140
We then sum the minimum times for all the sectors to get the total time taken by the train to go from the first station to the last station.
119141
The train that has the smallest sum of minimum times is the fastest train. Since this measures the intrinsic speed of the
120-
train on a given route.
142+
train on a given route. The reason to calculate the minimum time for each sector is to increase our "luck" of not
143+
getting slowed down by any hindrances. The chance of encountering external noise is lower in one sector than in the whole route.
144+
This makes the time measurement more accurate, when we measure the minimum time for each sector.
145+
146+
147+
This is the same idea that Codeflash applies to functions. For a workload composed of multiple inputs,
148+
it measures the intrinsic speed of a function on different inputs separately.
149+
The total instrinsic runtime of the workload is the sum of the intrinsic runtime of the function on each input.
121150

122-
This is the same idea that Codeflash applies to functions. It measures the intrinsic speed of a function on separate inputs.
123-
It then assumes a workload is composed of multiple inputs, and measures the intrinsic speed of the function on each input.
124-
Then the instrinsic runtime of the function on the workload which consists of multipe inputs
125-
is the sum of the intrinsic runtime of the function on each input.
126151
(make drawings for each of the concepts)
127152

128-
We have found that this approach is very accurate and is the best way to measure the speed of a function, even in noisy Virtual machines.
129-
We use a noise floor of 5% of the runtime, and only of the optimization is at least 5% faster than the original function, we consider it to be a significant improvement.
130-
This technique gets rid of most of the measurement noise, and gives us a very accurate measure of the intrinsic speed of the function.
153+
We have found that this approach to be very accurate and is a great way to measure the speed of a function, even in noisy Virtual machines.
154+
We use a noise floor of 5% of the runtime (10% on Github Actions), and only the optimizations that are at least 5% faster than the original function, we consider it to be a significant improvement.
155+
This technique gets rid of most of the measurement noise, and gives us a very accurate measure of the noise-free intrinsic speed of the function.
131156

0 commit comments

Comments
 (0)