Skip to content

Commit 0a39175

Browse files
committed
improve the language
1 parent 871d4cc commit 0a39175

File tree

1 file changed

+70
-85
lines changed

1 file changed

+70
-85
lines changed
Lines changed: 70 additions & 85 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
## How Codeflash measures the runtime of code
22

3-
Codeflash reports benchmarking results that look like this.
3+
Codeflash reports benchmarking results that look like this :
44

55
⏱️ Runtime : 32.8 microseconds → 29.2 microseconds (best of 315 runs)
66

7-
To measure the runtime of code, Codeflash runs the function multiple times, on several inputs
8-
and sums the minimum time of each input to get the total runtime.
7+
To measure runtime, Codeflash runs a function multiple times with several inputs
8+
and sums the minimum time for each input to get the total runtime.
99

1010
A simplified pseudocode of Codeflash benchmarking looks like this -
1111

@@ -25,132 +25,117 @@ number_of_runs = loops
2525

2626
The above code runs the function multiple times on different inputs and uses the minimum time for each input.
2727

28-
In this document we explain -
28+
In this document we explain :
2929
- How we measure the runtime of code
3030
- How we determine if an optimization is actually faster
3131
- Why we measure the timing as best of N runs
3232
- How we measure the runtime when we run on a wide variety of test cases.
3333

3434
## Goals of Codeflash auto-benchmarking
3535

36-
A core principle of Codeflash is that it does not make assumptions
37-
on what types of optimizations that might be faster. It generates multiple possible optimizations with LLMs and then automatically benchmarks the code
38-
on a variety of inputs to verify empirically if the optimization is actually faster.
36+
A core principle of Codeflash is that it makes no assumptions about which optimizations might be faster.
37+
Instead, it generates multiple possible optimizations with LLMs and automatically benchmarks the code
38+
on a variety of inputs to empirically verify if the optimization is actually faster.
3939

4040
The goals of Codeflash auto-benchmarking are:
41-
- Accurately measure the runtime of code.
42-
- Measure runtime for a wide variety of code.
43-
- Measure runtime on a variety of inputs.
44-
- Do all the above on a real machine, where there might be other processes running, causing timing measurement noise.
45-
- Finally make a binary decision whether an optimization is faster or not.
41+
- Accurately measure the runtime of code
42+
- Measure runtime for a wide variety of code
43+
- Measure runtime on a variety of inputs
44+
- Do all the above on a real machine, where other processes might be running and causing timing measurement noise
45+
- Finally make a binary decision whether an optimization is faster or not
4646

4747
## Racing Trains as an analogy-
4848

49-
Imagine that you are a boss at a train company who wants to purchase a train to run between the two cities of San Francisco and Los Angeles.
50-
You want to decide between two trains, Train A and Train B, and want to run the train that is the fastest between the two cities.
49+
Imagine you're a boss at a train company choosing between two trains to runs between San Francisco and Los Angeles.
50+
You want to determine which train is faster.
5151

52-
You can measure the speed of the trains by timing how long it takes to go from San Francisco to Los Angeles.
52+
You can measure their by timing how long each takes to travel between the two cities.
5353

54-
Unfortunately, there are real life factors that can affect the speed of the trains. There might
55-
be rail traffic, unfavorable weather conditions, hills or other factors that can slow down the trains.
54+
However, real-life factors affect train speeds: rail traffic, unfavorable weather, hills, and other obstacles.
55+
These can slow them down.
5656

57-
To settle the contest, you ask a train driver to race the two trains and run the trains as fast as possible.
58-
You measure the time it takes to go from San Francisco to Los Angeles.
57+
To settle the contest, you have a driver race the two trains at maximum possible speed.
58+
You measure the travel times between the two cities for each train.
5959

60-
Now the train A took 5% less time than train B. But the driver complaints that
61-
train B's run had poor weather on the way so they can't make conclusions yet. It is very important to definitively
62-
know which train is faster.
60+
Train A took 5% less time than Train B. But the driver points out that Train B encountered poor weather,
61+
making it impossible to draw firm conclusions. Since it's crucial to know which train is truly faster, you need more data.
6362

64-
You now ask the driver to repeat the race between the two trains multiple times.
65-
Now in this world they have plenty of free time so they end up repeating the race 50 times.
63+
You ask the driver to repeat the race multiple times. In this scenario, since they have plenty of time, they repeat the race 50 times.
6664

67-
This gives us timing data looking like the following. The units are in hours.
65+
This gives us timing data (in hours) that looks like the following.
6866

6967
![img_2.png](img_2.png)
7068

71-
Now the task to decide which train is faster becomes harder since now there are 50x2 data points.
69+
With 100 data points (50 per train), determining the faster train becomes more complex.
7270

73-
Unfortunately, the timing data is noisy. Other trains might be running on the tracks, the weather might change etc.
74-
This makes it hard to determine which train is faster in reality.
71+
The timing data contains noise from various factors: other trains on the tracks, changing weather, and so on.
72+
This makes it challenging to determine which train is faster.
7573

76-
The crucial point here is that the noise in the timing data is not the fault of the train.
77-
Speed of the train is an intrinsic property of the train and not the external hindrances.
78-
an important property of the noise is that it is only additive in nature , i.e. when there is a hindrance, the time taken only increases.
79-
There is no negative noise, which would make the trains go faster.
80-
The ideal way to decide the run time would be to clear out all the hindrances and rerun the race.
81-
That way we would have a clean data set that is not noisy, and the run times would be the true speed of the train.
74+
Here's the crucial insight: timing noise isn't the train's fault. A train's speed is an intrinsic property,
75+
independent of external hindrances. The noise only adds time—there's no "negative noise" that makes trains go faster.
76+
Ideally, we'd measure speed with no hindrances at all, giving us clean, noise-free data that shows true speed.
8277

83-
But in real life, we cannot do that. The best we can do is to try to minimize the noise,
84-
to get close to the "signal" which is the intrinsic speed of the train, and not the noise which is the time added by hindrances.
85-
Luckily, we can do that. When we repeat the race multiple times, we get multiple data points.
86-
There will be many cases where when the train runs between the two stations, all the conditions are favorable,
87-
and the train is able to run at its maximum speed. This will be when the noise is the least, and the
88-
measured time will be the smallest. This is the "signal" we are looking for - the fastest speed that the train
89-
can achieve over the whole route. This speed can be compared to find the fastest train.
9078

91-
So the key idea is that we find the minimum time that the train can achieve between two cities. That is very close to the fastest speed that the train can achieve.
79+
In reality, we can't eliminate all noise. Instead, we minimize it by focusing on the "signal"—the train's intrinsic
80+
speed—rather than the noise from hindrances. By running multiple races, we get multiple data points. Sometimes conditions
81+
are nearly perfect, allowing the train to reach maximum speed. These minimal-noise runs produce the smallest times—our
82+
"signal" that reveals the train's true capabilities. We can compare these best times to determine the faster train.
83+
84+
The key is finding each train's minimum time between cities—this closely approximates its maximum achievable speed.
9285

9386
## How Codeflash benchmarks code
9487

95-
The idea of measuring the fastest speed, which minimizes the external noise, is the same idea that Codeflash uses to measure the runtime of code.
96-
With computer processors, there are many different types of noise that can increase the runtime of a function.
97-
The noise can be caused by -
98-
- The hardware - there can be cache misses, cpu frequency scaling up and down, etc.
99-
- the operating system - there can be context switches, memory allocation, etc.
100-
- the language - there can be garbage collection, thread scheduling etc.
88+
This principle of measuring peak performance while minimizing external noise is exactly how Codeflash measures code runtime.
89+
Computer processors face various sources of noise that can increase function runtime:
90+
91+
- Hardware: cache misses, CPU frequency scaling, etc.
92+
- Operating system: context switches, memory allocation, etc.
93+
- Programming language: garbage collection, thread scheduling, etc.
10194

102-
Codeflash tries to minimize the noise by running the function multiple times and taking the minimum time.
103-
This happens when the function is not slowed down by any hindrances. The processor frequency is at its maximum,
104-
cache misses are minimal, the operating system is not doing any context switches etc.
105-
This is the fastest speed that the function can achieve, and is the most accurate measure of the intrinsic speed of the function.
95+
Codeflash minimizes noise by running functions multiple times and taking the minimum time.
96+
This minimum typically occurs when there are fewest hindrances: the processor frequency is maximal,
97+
cache misses are minimal, and the operating system is not doing context switches. This approaches the function's true speed.
10698

107-
When Codeflash wants to measure if an optimization is faster than the original function, it runs the two functions
108-
multiple times and takes the minimum time for both the functions. This most accurate measurement of the
109-
intrinstic speed of the function, which is the signal we are looking for. We can now compare the two functions and see which one is faster.
99+
When comparing an optimization to the original function, Codeflash runs both multiple times and compares their
100+
minimum times. This gives us the most accurate measurement of each function's intrinsic speed which is our signal, allowing for a
101+
meaningful comparison.
110102

111-
We have found that when we run the function several times, the chance of getting "lucky" when the function is not
112-
slowed down by any hindrances becomes very high. To maximize this luck, codeflash tries to run the function as many times as reasonably possible.
113-
Currently, we loop the code for 10 seconds with a minimum of 5 loops, which gives us a good balance between accuracy of time measurement and the time it takes to run the function.
103+
We've found that running a function multiple times increases the likelihood of getting these "lucky" minimal-noise runs.
104+
To maximize this, Codeflash runs each function for 10 seconds with a minimum of 5 loops, balancing measurement accuracy with reasonable runtime.
114105

115106
## What happens when there are multiple inputs to a function?
116107

117-
The above idea works well when there is only one input to a function. But what if there are multiple inputs?
108+
While this approach works well for single inputs, what about multiple inputs?
118109

119-
Let's consider the train analogy again. Now the train race is extended between multiple stations. It first starts from Seattle up north,
120-
and then goes south to San Francisco, then Los Angeles, and finally terminating at San Diego. We want to again measure
121-
which train is the faster one for this route.
110+
Now the race runs through multiple stations: Seattle to San Francisco to Los Angeles to San Diego.
111+
We still need to determine the faster train for this route.
122112

123-
We can only measure the time taken by the train to go from one station to the next.
113+
We can only measure times between adjacent stations.
124114

125-
Here is how the timing data looks like. The units are in hours.
115+
Here is how the timing data looks like (in hours):
126116

127117
![img_1.png](img_1.png)
128-
Now the task to decide which train is faster becomes even harder since now there are 50x3x2 data points to consider.
129-
Unfortunately, the timing data is also noisy. Running the same train on the same route might not give the same time, because
130-
of external factors like weather, traffic, etc. This makes it hard to determine which train is faster.
131118

132-
So, which train is faster?
119+
With 300 data points (50 runs × 3 segments × 2 trains) and varying conditions on each segment,
120+
determining the faster train becomes even more challenging.
133121

134-
The above insight of measuring the fastest speed of the train is still applicable here. But since there are multiple
135-
sectors, we need to measure the fastest speed of the train separately for each sector. This is because one sector might
136-
have hills or winding tracks, which might slow down the train.
122+
Which train is faster?
137123

138-
So, we divide the route into sectors between the stations,
139-
and measure the fastest speed of the train for each sector. To find the train that is fastest between the two stations, we find the minimum time taken by the train to go from one station to the next.
140-
We then sum the minimum times for all the sectors to get the total time taken by the train to go from the first station to the last station.
141-
The train that has the smallest sum of minimum times is the fastest train. Since this measures the intrinsic speed of the
142-
train on a given route. The reason to calculate the minimum time for each sector is to increase our "luck" of not
143-
getting slowed down by any hindrances. The chance of encountering external noise is lower in one sector than in the whole route.
144-
This makes the time measurement more accurate, when we measure the minimum time for each sector.
124+
Our insight about measuring peak performance still applies, but we need to measure each segment separately
125+
since the track differs between segments due to hills and track curves.
145126

146127

147-
This is the same idea that Codeflash applies to functions. For a workload composed of multiple inputs,
148-
it measures the intrinsic speed of a function on different inputs separately.
149-
The total instrinsic runtime of the workload is the sum of the intrinsic runtime of the function on each input.
128+
We divide the route into segments between stations and measure each train's fastest time per segment.
129+
We find the minimum time for each segment, then sum these minimums to get the total route time.
130+
The train with the lowest sum of minimum times is fastest. This approach better captures each train's
131+
intrinsic speed because measuring shorter segments reduces the chance of encountering noise in that segment, compared to measuring the entire route.
132+
The result is more accurate timing data.
150133

151-
(make drawings for each of the concepts)
134+
Codeflash applies this same principle to functions with multiple inputs. For workloads with multiple inputs,
135+
it measures a function's intrinsic speed on each input separately. The total intrinsic runtime is the sum
136+
of these individual minimums.
152137

153-
We have found that this approach to be very accurate and is a great way to measure the speed of a function, even in noisy Virtual machines.
154-
We use a noise floor of 5% of the runtime (10% on Github Actions), and only the optimizations that are at least 5% faster than the original function, we consider it to be a significant improvement.
155-
This technique gets rid of most of the measurement noise, and gives us a very accurate measure of the noise-free intrinsic speed of the function.
156138

139+
This approach proves highly accurate, even on noisy virtual machines. We use a 5% noise floor for runtime
140+
(10% on GitHub Actions) and only consider optimizations significant if they're at least 5% faster than the original function.
141+
This technique effectively minimizes measurement noise, giving us an accurate measure of a function's true, noise-free, intrinsic speed.

0 commit comments

Comments
 (0)