You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In this document we explain how we measure the runtime of code, how we determine if an optimization is faster, why we measure
8
-
timing as bes of N runs, how we measure the runtime of a wide variety of codes.
7
+
In this document we explain -
8
+
- how we measure the runtime of code
9
+
- how we determine if an optimization is actually faster
10
+
- why we measure the timing as best of N runs
11
+
- how we measure the runtime when we run on a wide variety of test cases.
9
12
10
-
## Design of Codeflash auto-benchmarking
13
+
## Goals of Codeflash auto-benchmarking
11
14
12
-
A core part of the design of Codeflash is that it does not make strong assumptions
13
-
of what types of optimizations are faster. Codeflash automatically benchmarks the code
14
-
on a variety of inputs and determines empirically if the optimization is actually faster.
15
+
A core design of Codeflash is that it does not make assumptions
16
+
on the types of optimizations that might be faster. It generates multiple possible optimizations with LLMs and then automatically benchmarks the code
17
+
on a variety of inputs to verify empirically if the optimization is actually faster.
15
18
16
-
The aims behind the design of Codeflash auto-benchmarking are:
17
-
-Be able to accurately measure the runtime of code.
18
-
-Be able to measure runtime of a wide variety of codes.
19
-
-Be able to measure runtime of code on a variety of inputs.
20
-
- Do all the above on real machine, where there might be other processes running, creating timing measurement noise.
21
-
-Be able to make a binary decision on whether an optimization is faster or not.
19
+
The goals of Codeflash auto-benchmarking are:
20
+
-Accurately measure the runtime of code.
21
+
-Measure runtime of a wide variety of codes.
22
+
-Measure runtime on a variety of inputs.
23
+
- Do all the above on real machine, where there might be other processes running, causing timing measurement noise.
24
+
-Make a binary decision on whether an optimization is faster or not.
22
25
23
-
A useful train analogy -
24
-
(timing decision is a binary decision)
25
-
Imagine that you are a train supervisor who is comparing that between two trains, Train A and Train B, which one is faster.
26
+
## A useful train analogy -
26
27
27
-
[//]: #(Your objective is to figure out which train is the fastest to go from Seattle to San Diego. )
28
+
Imagine that you are a boss at a train company who wants to purchase a train to run between the two cities of San Francisco and Los Angeles.
29
+
You are deciding between two trains, Train A and Train B, and want to run the train that is the fastest between the two cities.
28
30
29
-
[//]: #(The route first goes to San Francisco, Los Angeles and then ends at San Diego.)
30
31
You can measure the speed of the trains by timing how long it takes to go from San Francisco to Los Angeles.
31
32
32
33
Unfortunately, there are real life factors that can affect the speed of the trains. There might
33
-
be rail traffic, weather conditions, terrain or other factors that can slow down the trains.
34
+
be rail traffic, unfavorable weather conditions, hills or other factors that can slow down the trains.
34
35
35
36
To settle the contest, you ask a train driver to race the two trains and run the trains as fast as possible.
36
-
You run both the trains A and B from San Francisco to Los Angeles and measure the time it takes.
37
+
You measure the time it takes to go from San Francisco to Los Angeles.
37
38
38
-
Now the train A took 5% less time to do Seattle->San Diego than train B. But the driver complaints that
39
+
Now the train A took 5% less time than train B. But the driver complaints that
39
40
train B's run had poor weather on the way so they can't make conclusions yet. It is very important to definitively
40
41
know which train is faster.
41
42
@@ -46,33 +47,32 @@ This gives us timing data looking like the following. The units are in hours.
46
47
47
48

48
49
49
-
Now our task becomes seemingly harder to decide which train is faster because now there are 50x2 data points.
50
+
Now the task to decide which train is faster becomes harder since now there are 50x2 data points.
50
51
51
-
Unfortunately, the timing data is also noisy. Other trains might be running on the tracks, the weather might change,
52
-
or the train might be delayed for some other reason. This makes it hard to determine which train is faster.
52
+
Unfortunately, the timing data is noisy. Other trains might be running on the tracks, the weather might change etc.
53
+
This makes it hard to determine which train is faster in reality.
53
54
54
-
The crucial point is that, the noise in the timing data is not the fault of the train.
55
-
If we think about which train is fast - speed is a property of the train and not the hindrances.
56
-
The ideal way to decide the time would be to clear out all the hindrances and measure the time.
57
-
That way we would have a clean data set that is not noisy.
55
+
The crucial point here is that the noise in the timing data is not the fault of the train.
56
+
Speed of the train is an intrinsic property of the train and not the external hindrances.
57
+
an important property of the noise is that it is only additive in nature , i.e. when there is a hindrance, the time taken only increases.
58
+
There is no negative noise, which would make the trains go faster.
59
+
The ideal way to decide the run time would be to clear out all the hindrances and rerun the race.
60
+
That way we would have a clean data set that is not noisy, and the run times would be the true speed of the train.
58
61
59
62
But in real life, we cannot do that. The best we can do is to try to minimize the noise,
60
-
and get the "signal" which is the speed of the train, and not the noise which is the time added by hindrances.
63
+
to get close to the "signal" which is the intrinsic speed of the train, and not the noise which is the time added by hindrances.
61
64
Luckily, we can do that. When we repeat the race multiple times, we get multiple data points.
62
-
There will be a lot of cases where when the train goes between two stations, all the conditions are favorable,
65
+
There will be many cases where when the train runs between the two stations, all the conditions are favorable,
63
66
and the train is able to run at its maximum speed. This will be when the noise is the least, and the
64
67
measured time will be the smallest. This is the "signal" we are looking for - the fastest speed that the train
65
-
can achieve. The noise is only additive noise, i.e. when there is a hindrance, the time taken only increases, there is no
66
-
negative noise, which would make the train go faster.
68
+
can achieve over the whole route. This speed can be compared to find the fastest train.
67
69
68
-
So the key idea is that we find the minimum time that the train can achieve at a sector. That is very close to the fastest speed that the train can achieve.
70
+
So the key idea is that we find the minimum time that the train can achieve between two cities. That is very close to the fastest speed that the train can achieve.
69
71
70
72
## How Codeflash benchmarks code
71
73
72
-
From the above, it is clear that we want to measure the fastest speed, which corresponds to the minimum time that the train can achieve.
73
-
This has the least amount of additive noise, and is the most accurate measure of the intrinsic speed of the train.
74
-
75
-
The same idea applies to Codeflash . With processors, there are many different types of noise that can increase the runtime of a function.
74
+
The idea of measuring the fastest speed, which minimizes the noise, is the same idea that Codeflash uses to measure the runtime of code.
75
+
With computer processors, there are many different types of noise that can increase the runtime of a function.
76
76
The noise can be caused by -
77
77
- The hardware - there can be cache misses, cpu frequency scaling up and down, etc.
78
78
- the operating system - there can be context switches, memory allocation, etc.
0 commit comments