Skip to content

Commit fadbedb

Browse files
authored
Merge branch 'main' into github-app-not-installed-flow
2 parents af8a735 + d3d8dee commit fadbedb

File tree

5 files changed

+153
-1
lines changed

5 files changed

+153
-1
lines changed
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
{
2+
"label": "Codeflash Concepts",
3+
"position": 4,
4+
"collapsed": false
5+
}
Lines changed: 147 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,147 @@
1+
---
2+
sidebar_position: 2
3+
---
4+
5+
# How Codeflash measures code runtime
6+
7+
Codeflash reports benchmarking results that look like this:
8+
9+
```text
10+
⏱️ Runtime : 32.8 microseconds → 29.2 microseconds (best of 315 runs)
11+
```
12+
13+
To measure runtime, Codeflash runs a function multiple times with several inputs
14+
and sums the minimum time for each input to get the total runtime.
15+
16+
A simplified pseudocode of Codeflash benchmarking looks like this:
17+
18+
```python
19+
loops = 0
20+
min_input_runtime = [float('inf')] * len(test_inputs)
21+
start_time = time.time()
22+
while loops <= 5 or time.time() - start_time < 10:
23+
loops += 1
24+
for input_index, input in enumerate(test_inputs):
25+
t = time(function_to_optimize(input))
26+
if t < min_input_runtime[input_index]:
27+
min_input_runtime[input_index] = t
28+
total_runtime = sum(min_input_runtime)
29+
number_of_runs = loops
30+
```
31+
32+
The above code runs the function multiple times on different inputs and uses the minimum time for each input.
33+
34+
In this document we explain:
35+
- How we measure the runtime of code
36+
- How we determine if an optimization is actually faster
37+
- Why we measure the timing as best of N runs
38+
- How we measure the runtime when we run on a wide variety of test cases.
39+
40+
## Goals of Codeflash auto-benchmarking
41+
42+
A core principle of Codeflash is that it makes no assumptions about which optimizations might be faster.
43+
Instead, it generates multiple possible optimizations with LLMs and automatically benchmarks the code
44+
on a variety of inputs to empirically verify if the optimization is actually faster.
45+
46+
The goals of Codeflash auto-benchmarking are:
47+
- Accurately measure the runtime of code
48+
- Measure runtime for a wide variety of code
49+
- Measure runtime on a variety of inputs
50+
- Do all the above on a real machine, where other processes might be running and causing timing measurement noise
51+
- Finally make a binary decision whether an optimization is faster or not
52+
53+
## Racing Trains as an analogy
54+
55+
Imagine you're a boss at a train company choosing between two trains to runs between San Francisco and Los Angeles.
56+
You want to determine which train is faster.
57+
58+
You can measure their by timing how long each takes to travel between the two cities.
59+
60+
However, real-life factors affect train speeds: rail traffic, unfavorable weather, hills, and other obstacles.
61+
These can slow them down.
62+
63+
To settle the contest, you have a driver race the two trains at maximum possible speed.
64+
You measure the travel times between the two cities for each train.
65+
66+
Train A took 5% less time than Train B. But the driver points out that Train B encountered poor weather,
67+
making it impossible to draw firm conclusions. Since it's crucial to know which train is truly faster, you need more data.
68+
69+
You ask the driver to repeat the race multiple times. In this scenario, since they have plenty of time, they repeat the race 50 times.
70+
71+
This gives us timing data (in hours) that looks like the following.
72+
73+
![img_2.png](img_2.png)
74+
75+
With 100 data points (50 per train), determining the faster train becomes more complex.
76+
77+
The timing data contains noise from various factors: other trains on the tracks, changing weather, and so on.
78+
This makes it challenging to determine which train is faster.
79+
80+
Here's the crucial insight: timing noise isn't the train's fault. A train's speed is an intrinsic property,
81+
independent of external hindrances. The noise only adds time—there's no "negative noise" that makes trains go faster.
82+
Ideally, we'd measure speed with no hindrances at all, giving us clean, noise-free data that shows true speed.
83+
84+
85+
In reality, we can't eliminate all noise. Instead, we minimize it by focusing on the "signal"—the train's intrinsic
86+
speed—rather than the noise from hindrances. By running multiple races, we get multiple data points. Sometimes conditions
87+
are nearly perfect, allowing the train to reach maximum speed. These minimal-noise runs produce the smallest times—our
88+
"signal" that reveals the train's true capabilities. We can compare these best times to determine the faster train.
89+
90+
The key is finding each train's minimum time between cities—this closely approximates its maximum achievable speed.
91+
92+
## How Codeflash benchmarks code
93+
94+
This principle of measuring peak performance while minimizing external noise is exactly how Codeflash measures code runtime.
95+
Computer processors face various sources of noise that can increase function runtime:
96+
97+
- Hardware: cache misses, CPU frequency scaling, etc.
98+
- Operating system: context switches, memory allocation, etc.
99+
- Programming language: garbage collection, thread scheduling, etc.
100+
101+
Codeflash minimizes noise by running functions multiple times and taking the minimum time.
102+
This minimum typically occurs when there are fewest hindrances: the processor frequency is maximal,
103+
cache misses are minimal, and the operating system is not doing context switches. This approaches the function's true speed.
104+
105+
When comparing an optimization to the original function, Codeflash runs both multiple times and compares their
106+
minimum times. This gives us the most accurate measurement of each function's intrinsic speed which is our signal, allowing for a
107+
meaningful comparison.
108+
109+
We've found that running a function multiple times increases the likelihood of getting these "lucky" minimal-noise runs.
110+
To maximize this, Codeflash runs each function for 10 seconds with a minimum of 5 loops, balancing measurement accuracy with reasonable runtime.
111+
112+
## What happens when there are multiple inputs to a function?
113+
114+
While this approach works well for single inputs, what about multiple inputs?
115+
116+
Now the race runs through multiple stations: Seattle to San Francisco to Los Angeles to San Diego.
117+
We still need to determine the faster train for this route.
118+
119+
We can only measure times between adjacent stations.
120+
121+
Here is how the timing data looks like (in hours):
122+
123+
![img_1.png](img_1.png)
124+
125+
With 300 data points (50 runs × 3 segments × 2 trains) and varying conditions on each segment,
126+
determining the faster train becomes even more challenging.
127+
128+
Which train is faster?
129+
130+
Our insight about measuring peak performance still applies, but we need to measure each segment separately
131+
since the track differs between segments due to hills and track curves.
132+
133+
134+
We divide the route into segments between stations and measure each train's fastest time per segment.
135+
We find the minimum time for each segment, then sum these minimums to get the total route time.
136+
The train with the lowest sum of minimum times is fastest. This approach better captures each train's
137+
intrinsic speed because measuring shorter segments reduces the chance of encountering noise in that segment, compared to measuring the entire route.
138+
The result is more accurate timing data.
139+
140+
Codeflash applies this same principle to functions with multiple inputs. For workloads with multiple inputs,
141+
it measures a function's intrinsic speed on each input separately. The total intrinsic runtime is the sum
142+
of these individual minimums.
143+
144+
145+
This approach proves highly accurate, even on noisy virtual machines. We use a 5% noise floor for runtime
146+
(10% on GitHub Actions) and only consider optimizations significant if they're at least 5% faster than the original function.
147+
This technique effectively minimizes measurement noise, giving us an accurate measure of a function's true, noise-free, intrinsic speed.

docs/docs/how-codeflash-works.md renamed to docs/docs/codeflash-concepts/how-codeflash-works.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
sidebar_position: 4
2+
sidebar_position: 1
33
---
44
# How Codeflash Works
55

21.4 KB
Loading
7.63 KB
Loading

0 commit comments

Comments
 (0)