| 
 | 1 | +## How codeflash decides if an optimization is faster  | 
 | 2 | + | 
 | 3 | +Codeflash reports benchmarking results that look something like this.  | 
 | 4 | + | 
 | 5 | +⏱️ Runtime : 32.8 microseconds → 29.2 microseconds (best of 315 runs)  | 
 | 6 | + | 
 | 7 | +In this document we explain how we measure the runtime of code, how we determine if an optimization is faster, why we measure  | 
 | 8 | +timing as bes of N runs, how we measure the runtime of a wide variety of codes.  | 
 | 9 | + | 
 | 10 | +## Design of Codeflash auto-benchmarking  | 
 | 11 | + | 
 | 12 | +A core part of the design of Codeflash is that it does not make strong assumptions  | 
 | 13 | +of what types of optimizations are faster. Codeflash automatically benchmarks the code   | 
 | 14 | +on a variety of inputs and determines empirically if the optimization is actually faster.  | 
 | 15 | + | 
 | 16 | +The aims behind the design of Codeflash auto-benchmarking are:  | 
 | 17 | +- Be able to accurately measure the runtime of code.  | 
 | 18 | +- Be able to measure runtime of a wide variety of codes.  | 
 | 19 | +- Be able to measure runtime of code on a variety of inputs.  | 
 | 20 | +- Do all the above on real machine, where there might be other processes running, creating timing measurement noise.  | 
 | 21 | +- Be able to make a binary decision on whether an optimization is faster or not.  | 
 | 22 | + | 
 | 23 | +A useful train analogy -  | 
 | 24 | +(timing decision is a binary decision)  | 
 | 25 | +Imagine that you are a train supervisor who is comparing that between two trains, Train A and Train B, which one is faster.  | 
 | 26 | + | 
 | 27 | +[//]: # (Your objective is to figure out which train is the fastest to go from Seattle to San Diego. )  | 
 | 28 | + | 
 | 29 | +[//]: # (The route first goes to San Francisco, Los Angeles and then ends at San Diego.)  | 
 | 30 | +You can measure the speed of the trains by timing how long it takes to go from San Francisco to Los Angeles.  | 
 | 31 | + | 
 | 32 | +Unfortunately, there are real life factors that can affect the speed of the trains. There might   | 
 | 33 | +be rail traffic, weather conditions, terrain or other factors that can slow down the trains.  | 
 | 34 | + | 
 | 35 | +To settle the contest, you ask a train driver to race the two trains and run the trains as fast as possible.  | 
 | 36 | +You run both the trains A and B from San Francisco to Los Angeles and measure the time it takes.  | 
 | 37 | + | 
 | 38 | +Now the train A took 5% less time to do Seattle->San Diego than train B. But the driver complaints that  | 
 | 39 | +train B's run had poor weather on the way so they can't make conclusions yet. It is very important to definitively  | 
 | 40 | +know which train is faster.  | 
 | 41 | + | 
 | 42 | +You now ask the driver to repeat the race between the two trains multiple times.  | 
 | 43 | +Now in this world they have plenty of free time so they end up repeating the race 50 times.  | 
 | 44 | + | 
 | 45 | +This gives us timing data looking like the following. The units are in hours.  | 
 | 46 | + | 
 | 47 | +  | 
 | 48 | + | 
 | 49 | +Now our task becomes seemingly harder to decide which train is faster because now there are 50x2 data points.  | 
 | 50 | + | 
 | 51 | +Unfortunately, the timing data is also noisy. Other trains might be running on the tracks, the weather might change,   | 
 | 52 | +or the train might be delayed for some other reason. This makes it hard to determine which train is faster.  | 
 | 53 | + | 
 | 54 | +The crucial point is that, the noise in the timing data is not the fault of the train.  | 
 | 55 | +If we think about which train is fast - speed is a property of the train and not the hindrances.  | 
 | 56 | +The ideal way to decide the time would be to clear out all the hindrances and measure the time.  | 
 | 57 | +That way we would have a clean data set that is not noisy.  | 
 | 58 | + | 
 | 59 | +But in real life, we cannot do that. The best we can do is to try to minimize the noise,   | 
 | 60 | +and get the "signal" which is the speed of the train, and not the noise which is the time added by hindrances.  | 
 | 61 | +Luckily, we can do that. When we repeat the race multiple times, we get multiple data points.  | 
 | 62 | +There will be a lot of cases where when the train goes between two stations, all the conditions are favorable,  | 
 | 63 | +and the train is able to run at its maximum speed. This will be when the noise is the least, and the  | 
 | 64 | +measured time will be the smallest. This is the "signal" we are looking for - the fastest speed that the train   | 
 | 65 | +can achieve. The noise is only additive noise, i.e. when there is a hindrance, the time taken only increases, there is no  | 
 | 66 | +negative noise, which would make the train go faster.  | 
 | 67 | + | 
 | 68 | +So the key idea is that we find the minimum time that the train can achieve at a sector. That is very close to the fastest speed that the train can achieve.  | 
 | 69 | + | 
 | 70 | +## How Codeflash benchmarks code  | 
 | 71 | + | 
 | 72 | +From the above, it is clear that we want to measure the fastest speed, which corresponds to the minimum time that the train can achieve.  | 
 | 73 | +This has the least amount of additive noise, and is the most accurate measure of the intrinsic speed of the train.  | 
 | 74 | + | 
 | 75 | +The same idea applies to Codeflash . With processors, there are many different types of noise that can increase the runtime of a function.  | 
 | 76 | +The noise can be caused by -  | 
 | 77 | +- The hardware - there can be cache misses, cpu frequency scaling up and down, etc.  | 
 | 78 | +- the operating system - there can be context switches, memory allocation, etc.  | 
 | 79 | +- the language - there can be garbage collection, thread scheduling etc.  | 
 | 80 | + | 
 | 81 | +Codeflash tries to minimize the noise by running the function multiple times and taking the minimum time.  | 
 | 82 | +This is when the function is not slowed down by any hindrances. The processor frequency is at its maximum,  | 
 | 83 | +cache misses are not happening, the operating system is not doing any context switches etc.  | 
 | 84 | +This is the fastest speed that the function can achieve, and is the most accurate measure of the intrinsic speed of the function.  | 
 | 85 | + | 
 | 86 | +When Codeflash wants to measure if an optimization is faster than the original function, it runs the two functions  | 
 | 87 | +multiple times and takes the minimum time for both the functions. This most accurate measurement of the  | 
 | 88 | +intrinstic speed of the function, which is the signal we are looking for. We can now compare the two functions and see which one is faster.  | 
 | 89 | + | 
 | 90 | +We have found that when we run the function several times, the chance of getting "lucky" when the function is not   | 
 | 91 | +slowed down by any hindrances gets very high. There codeflash tries to run the function as many times as reasonably possible.  | 
 | 92 | +Currently we loop the code for 10 seconds and a minimum of 5 loops, which gives us a good balance between accuracy of runtime of and the time it takes to run the function.  | 
 | 93 | + | 
 | 94 | +## What happens when there are multiple inputs to a function?  | 
 | 95 | + | 
 | 96 | +The above idea works well when there is only one input to a function. But what if there are multiple inputs?  | 
 | 97 | + | 
 | 98 | +Lets consider the train analogy again. Now the train goes between multiple stations. It first starts from Seattle up north,   | 
 | 99 | +and then goes south to San Francisco, then Los Angeles, and finally terminating at San Diego. We want to again measure   | 
 | 100 | +which train is the faster one on this route.  | 
 | 101 | + | 
 | 102 | +We can only measure the time taken by the train to go from one station to the next.  | 
 | 103 | + | 
 | 104 | +Here is how the timing data looks like. The units are in hours.  | 
 | 105 | + | 
 | 106 | +  | 
 | 107 | +Now our task becomes seemingly harder to decide which train is faster because now there are 50x3x2 data points to consider.  | 
 | 108 | +Unfortunately, the timing data is also noisy. Running the same train on the same route might not give the same time, because   | 
 | 109 | +of external factors like weather, traffic, etc. This makes it hard to determine which train is faster.  | 
 | 110 | + | 
 | 111 | +So, which train is faster?  | 
 | 112 | + | 
 | 113 | +The above insight of measuring the fastest speed of the train is still applicable here. But since there are multiple   | 
 | 114 | +sectors, we need to measure the fastest speed of the train separately for each sector. This is because one sector might  | 
 | 115 | +have hills or winding tracks, which might slow down the train. But these will affect both the trains equally.  | 
 | 116 | + | 
 | 117 | +So to find the train that is fastest between the two stations, we find the minimum time taken by the train to go from one station to the next.  | 
 | 118 | +We then sum the minimum times for all the sectors to get the total time taken by the train to go from the first station to the last station.  | 
 | 119 | +The train that has the smallest sum of minimum times is the fastest train. Since this measures the intrinsic speed of the   | 
 | 120 | +train on a given route.  | 
 | 121 | + | 
 | 122 | +This is the same idea that Codeflash applies to functions. It measures the intrinsic speed of a function on separate inputs.   | 
 | 123 | +It then assumes a workload is composed of multiple inputs, and measures the intrinsic speed of the function on each input.  | 
 | 124 | +Then the instrinsic runtime of the function on the workload which consists of multipe inputs  | 
 | 125 | +is the sum of the intrinsic runtime of the function on each input.  | 
 | 126 | +(make drawings for each of the concepts)  | 
 | 127 | + | 
 | 128 | +We have found that this approach is very accurate and is the best way to measure the speed of a function, even in noisy Virtual machines.  | 
 | 129 | +We use a noise floor of 5% of the runtime, and only of the optimization is at least 5% faster than the original function, we consider it to be a significant improvement.  | 
 | 130 | +This technique gets rid of most of the measurement noise, and gives us a very accurate measure of the intrinsic speed of the function.  | 
 | 131 | + | 
0 commit comments