Skip to content

Commit 2be0980

Browse files
committed
WIP Draft
1 parent 3aae8c2 commit 2be0980

File tree

4 files changed

+136
-0
lines changed

4 files changed

+136
-0
lines changed
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
{
2+
"label": "Codeflash Concepts",
3+
"position": 4,
4+
"collapsed": false
5+
}
Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
## How codeflash decides if an optimization is faster
2+
3+
Codeflash reports benchmarking results that look something like this.
4+
5+
⏱️ Runtime : 32.8 microseconds → 29.2 microseconds (best of 315 runs)
6+
7+
In this document we explain how we measure the runtime of code, how we determine if an optimization is faster, why we measure
8+
timing as bes of N runs, how we measure the runtime of a wide variety of codes.
9+
10+
## Design of Codeflash auto-benchmarking
11+
12+
A core part of the design of Codeflash is that it does not make strong assumptions
13+
of what types of optimizations are faster. Codeflash automatically benchmarks the code
14+
on a variety of inputs and determines empirically if the optimization is actually faster.
15+
16+
The aims behind the design of Codeflash auto-benchmarking are:
17+
- Be able to accurately measure the runtime of code.
18+
- Be able to measure runtime of a wide variety of codes.
19+
- Be able to measure runtime of code on a variety of inputs.
20+
- Do all the above on real machine, where there might be other processes running, creating timing measurement noise.
21+
- Be able to make a binary decision on whether an optimization is faster or not.
22+
23+
A useful train analogy -
24+
(timing decision is a binary decision)
25+
Imagine that you are a train supervisor who is comparing that between two trains, Train A and Train B, which one is faster.
26+
27+
[//]: # (Your objective is to figure out which train is the fastest to go from Seattle to San Diego. )
28+
29+
[//]: # (The route first goes to San Francisco, Los Angeles and then ends at San Diego.)
30+
You can measure the speed of the trains by timing how long it takes to go from San Francisco to Los Angeles.
31+
32+
Unfortunately, there are real life factors that can affect the speed of the trains. There might
33+
be rail traffic, weather conditions, terrain or other factors that can slow down the trains.
34+
35+
To settle the contest, you ask a train driver to race the two trains and run the trains as fast as possible.
36+
You run both the trains A and B from San Francisco to Los Angeles and measure the time it takes.
37+
38+
Now the train A took 5% less time to do Seattle->San Diego than train B. But the driver complaints that
39+
train B's run had poor weather on the way so they can't make conclusions yet. It is very important to definitively
40+
know which train is faster.
41+
42+
You now ask the driver to repeat the race between the two trains multiple times.
43+
Now in this world they have plenty of free time so they end up repeating the race 50 times.
44+
45+
This gives us timing data looking like the following. The units are in hours.
46+
47+
![img_2.png](img_2.png)
48+
49+
Now our task becomes seemingly harder to decide which train is faster because now there are 50x2 data points.
50+
51+
Unfortunately, the timing data is also noisy. Other trains might be running on the tracks, the weather might change,
52+
or the train might be delayed for some other reason. This makes it hard to determine which train is faster.
53+
54+
The crucial point is that, the noise in the timing data is not the fault of the train.
55+
If we think about which train is fast - speed is a property of the train and not the hindrances.
56+
The ideal way to decide the time would be to clear out all the hindrances and measure the time.
57+
That way we would have a clean data set that is not noisy.
58+
59+
But in real life, we cannot do that. The best we can do is to try to minimize the noise,
60+
and get the "signal" which is the speed of the train, and not the noise which is the time added by hindrances.
61+
Luckily, we can do that. When we repeat the race multiple times, we get multiple data points.
62+
There will be a lot of cases where when the train goes between two stations, all the conditions are favorable,
63+
and the train is able to run at its maximum speed. This will be when the noise is the least, and the
64+
measured time will be the smallest. This is the "signal" we are looking for - the fastest speed that the train
65+
can achieve. The noise is only additive noise, i.e. when there is a hindrance, the time taken only increases, there is no
66+
negative noise, which would make the train go faster.
67+
68+
So the key idea is that we find the minimum time that the train can achieve at a sector. That is very close to the fastest speed that the train can achieve.
69+
70+
## How Codeflash benchmarks code
71+
72+
From the above, it is clear that we want to measure the fastest speed, which corresponds to the minimum time that the train can achieve.
73+
This has the least amount of additive noise, and is the most accurate measure of the intrinsic speed of the train.
74+
75+
The same idea applies to Codeflash . With processors, there are many different types of noise that can increase the runtime of a function.
76+
The noise can be caused by -
77+
- The hardware - there can be cache misses, cpu frequency scaling up and down, etc.
78+
- the operating system - there can be context switches, memory allocation, etc.
79+
- the language - there can be garbage collection, thread scheduling etc.
80+
81+
Codeflash tries to minimize the noise by running the function multiple times and taking the minimum time.
82+
This is when the function is not slowed down by any hindrances. The processor frequency is at its maximum,
83+
cache misses are not happening, the operating system is not doing any context switches etc.
84+
This is the fastest speed that the function can achieve, and is the most accurate measure of the intrinsic speed of the function.
85+
86+
When Codeflash wants to measure if an optimization is faster than the original function, it runs the two functions
87+
multiple times and takes the minimum time for both the functions. This most accurate measurement of the
88+
intrinstic speed of the function, which is the signal we are looking for. We can now compare the two functions and see which one is faster.
89+
90+
We have found that when we run the function several times, the chance of getting "lucky" when the function is not
91+
slowed down by any hindrances gets very high. There codeflash tries to run the function as many times as reasonably possible.
92+
Currently we loop the code for 10 seconds and a minimum of 5 loops, which gives us a good balance between accuracy of runtime of and the time it takes to run the function.
93+
94+
## What happens when there are multiple inputs to a function?
95+
96+
The above idea works well when there is only one input to a function. But what if there are multiple inputs?
97+
98+
Lets consider the train analogy again. Now the train goes between multiple stations. It first starts from Seattle up north,
99+
and then goes south to San Francisco, then Los Angeles, and finally terminating at San Diego. We want to again measure
100+
which train is the faster one on this route.
101+
102+
We can only measure the time taken by the train to go from one station to the next.
103+
104+
Here is how the timing data looks like. The units are in hours.
105+
106+
![img_1.png](img_1.png)
107+
Now our task becomes seemingly harder to decide which train is faster because now there are 50x3x2 data points to consider.
108+
Unfortunately, the timing data is also noisy. Running the same train on the same route might not give the same time, because
109+
of external factors like weather, traffic, etc. This makes it hard to determine which train is faster.
110+
111+
So, which train is faster?
112+
113+
The above insight of measuring the fastest speed of the train is still applicable here. But since there are multiple
114+
sectors, we need to measure the fastest speed of the train separately for each sector. This is because one sector might
115+
have hills or winding tracks, which might slow down the train. But these will affect both the trains equally.
116+
117+
So to find the train that is fastest between the two stations, we find the minimum time taken by the train to go from one station to the next.
118+
We then sum the minimum times for all the sectors to get the total time taken by the train to go from the first station to the last station.
119+
The train that has the smallest sum of minimum times is the fastest train. Since this measures the intrinsic speed of the
120+
train on a given route.
121+
122+
This is the same idea that Codeflash applies to functions. It measures the intrinsic speed of a function on separate inputs.
123+
It then assumes a workload is composed of multiple inputs, and measures the intrinsic speed of the function on each input.
124+
Then the instrinsic runtime of the function on the workload which consists of multipe inputs
125+
is the sum of the intrinsic runtime of the function on each input.
126+
(make drawings for each of the concepts)
127+
128+
We have found that this approach is very accurate and is the best way to measure the speed of a function, even in noisy Virtual machines.
129+
We use a noise floor of 5% of the runtime, and only of the optimization is at least 5% faster than the original function, we consider it to be a significant improvement.
130+
This technique gets rid of most of the measurement noise, and gives us a very accurate measure of the intrinsic speed of the function.
131+
21.4 KB
Loading
7.63 KB
Loading

0 commit comments

Comments
 (0)