Skip to content

Commit b516f54

Browse files
amscannedanobi
authored andcommitted
Add basic benchmark as a --test option
Standardized benchmarks are hard! Since we most often want to use benchmarks in an ad-hoc fashion, can build it directly into the main binary as a `--test` option (a new test mode). The benchmark mechanism is implementation totally transparently to all the internals, and operates exclusively on the `PassManager`. It executes a benchmark for each pass. This shows a structured output with every pass listed: ``` System OS: Linux 6.9.0-0_fbk3_1265_g43ac291a024d #1 SMP Wed Dec 4 07:06:17 PST 2024 Arch: x86_64 Build version: v0.21.0-713-g4969-dirty LLVM: 18.1.8 bfd: yes libdw (DWARF support): yes libsystemd (systemd notify support): no blazesym (advanced symbolization): yes Kernel helpers probe_read: no probe_read_str: no probe_read_user: no probe_read_user_str: no probe_read_kernel: no probe_read_kernel_str: no get_current_cgroup_id: no send_signal: no override_return: no get_boot_ns: no dpath: no skboutput: no get_tai_ns: no get_func_ip: no jiffies64: no for_each_map_elem: no get_ns_current_pid_tgid: no lookup_percpu_elem: no Kernel features Instruction limit: -1 btf: yes module btf: no map batch: yes uprobe refcount: yes Map types hash: yes array: yes percpu array: yes stack_trace: no perf_event_array: yes ringbuf: yes Probe types kprobe: no tracepoint: no perf_event: no fentry: no kprobe_multi: no uprobe_multi: no kprobe_session: no iter: no ast 10000 7205047 720 ± 582 ns bpftrace 10000 7724379 772 ± 576 ns parse 2103 100040389 47 ± 5 μs ConfigAnalyser 3355 100029020 29 ± 3 μs ResolveImports 1478 100015641 67 ± 9 μs ImportScripts 10000 15044124 1504 ± 878 ns UnstableFeature 10000 18684501 1868 ± 972 ns MacroExpansion 10000 19686928 1968 ± 910 ns Deprecated 10000 14804521 1480 ± 921 ns attachpoints 10000 52022211 5202 ± 1792 ns btf 10000 31688754 3168 ± 178502 ns tracepoint 10000 23252392 2325 ± 871 ns FieldAnalyser 10000 30305116 3030 ± 1106 ns ClangParser 4 105528964 26 ± 3 ms CMacroExpansion 10000 17427514 1742 ± 815 ns MapSugar 10000 36848365 3684 ± 1439 ns FoldLiterals 10000 19861081 1986 ± 833 ns PidFilter 10000 16219805 1621 ± 911 ns Semantic 10000 37571204 3757 ± 2062 ns ResourceAnalyser 10000 56294908 5629 ± 1611 ns RecursionCheck 10000 21742022 2174 ± 1016 ns ReturnPath 10000 10208386 1020 ± 581 ns Probe 10000 24246794 2424 ± 1238 ns llvm-init 10000 58527480 5852 ± 2056 ns compile 1974 100048050 50 ± 30 μs optimize 567 100030093 176 ± 58 μs object 115 100232684 871 ± 167 μs extern 10000 14664920 1466 ± 806 ns link 4511 100005119 22 ± 7 μs total 29 803344776 27 ± 3 ms PASS ``` Additional docs have been added for how to use this functionality, with appropriate caveats. Signed-off-by: Adin Scannell <[email protected]> stack-info: PR: bpftrace#3998, branch: user/amscanne/map_pipelines2/3
1 parent 5ed9903 commit b516f54

File tree

8 files changed

+224
-5
lines changed

8 files changed

+224
-5
lines changed

docs/developers.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,15 @@ The distro build is documented in [INSTALL.md](../INSTALL.md#generic-build-proce
4040
Every contribution should (1) not break the existing tests and (2) introduce new
4141
tests if relevant. See existing tests for inspiration on how to write new ones. [Read more on the different kinds and how to run them](../tests/README.md).
4242

43+
## Performance
44+
45+
We aim to not be wasteful, but always keep in mind that performance of the BPF
46+
programs and runtime are the things in the critical path. Often, simplicity and
47+
understandability on non-critical paths is often more important than
48+
performance. That said, occasionally it is useful to measure the performance of
49+
different parts of the pipeline. You may run bpftrace using `--test benchmark`
50+
in order to see the performance of the various passes during compilation.
51+
4352
## Continuous integration
4453

4554
CI executes the above tests in a matrix of different LLVM versions on NixOS.

src/CMakeLists.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,7 @@ set_target_properties(libbpftrace PROPERTIES PREFIX "")
5555

5656
add_executable(bpftrace
5757
main.cpp
58+
benchmark.cpp
5859
)
5960

6061
# TODO: Honor `STATIC_LINKING` properly.

src/ast/context.cpp

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,4 +31,11 @@ ASTContext::ASTContext() : ASTContext("", "")
3131
{
3232
}
3333

34+
void ASTContext::clear()
35+
{
36+
root = nullptr;
37+
nodes_.clear();
38+
diagnostics_->clear();
39+
}
40+
3441
} // namespace bpftrace::ast

src/ast/context.h

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,11 @@ class ASTContext : public ast::State<"ast"> {
8787
return source_;
8888
}
8989

90+
// clears all the nodes and diagnostics, but does not affect the underlying
91+
// `ASTSource` object. This is useful if you want to e.g. reparse the full
92+
// syntax tree in place.
93+
void clear();
94+
9095
Program *root = nullptr;
9196

9297
private:

src/ast/pass_manager.h

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,9 @@ class PassContext {
3030
char value[N];
3131
std::string str() const
3232
{
33-
return std::string(value, sizeof(value));
33+
// N.B. the value here includes the trailing zero, so when constructing a
34+
// string we truncate this zero.
35+
return std::string(value, sizeof(value) - 1);
3436
}
3537
};
3638

@@ -87,6 +89,14 @@ class PassContext {
8789
no_object_failure(type_id);
8890
}
8991

92+
// has indicates whether the given state is present or not.
93+
template <typename T>
94+
bool has()
95+
{
96+
int type_id = TypeId<T>::type_id();
97+
return state_.contains(type_id) || extern_state_.contains(type_id);
98+
}
99+
90100
private:
91101
// for the failed lookup path, see above.
92102
[[noreturn]] static void no_object_failure(int type_id);

src/benchmark.cpp

Lines changed: 151 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,151 @@
1+
#include <chrono>
2+
#include <cmath>
3+
#include <ctime>
4+
#include <iomanip>
5+
#include <iostream>
6+
#include <sstream>
7+
8+
#include "ast/ast.h"
9+
#include "ast/context.h"
10+
#include "ast/passes/printer.h"
11+
#include "benchmark.h"
12+
13+
namespace bpftrace {
14+
15+
char TimerError::ID;
16+
void TimerError::log(llvm::raw_ostream &OS) const
17+
{
18+
OS << "timer error: " << strerror(err_);
19+
}
20+
21+
using time_point = std::chrono::time_point<std::chrono::steady_clock,
22+
std::chrono::nanoseconds>;
23+
24+
static Result<time_point> processor_time()
25+
{
26+
struct timespec ts = {};
27+
int rc = clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &ts);
28+
if (rc < 0) {
29+
return make_error<TimerError>(errno);
30+
}
31+
return time_point(std::chrono::seconds(ts.tv_sec) +
32+
std::chrono::nanoseconds(ts.tv_nsec));
33+
}
34+
35+
static int64_t delta(time_point start, time_point end)
36+
{
37+
return std::chrono::duration_cast<std::chrono::nanoseconds>(end - start)
38+
.count();
39+
}
40+
41+
Result<> benchmark(std::ostream &out, ast::PassManager &mgr)
42+
{
43+
ast::PassContext ctx;
44+
45+
// See below; we aggregate at the end.
46+
int64_t full_mean = 0;
47+
double full_variance = 0;
48+
size_t full_count = 0;
49+
50+
// We print out the confidence interval at p95, which corresponds to a
51+
// z-score of 1.96 (see the `err` value below).
52+
auto emit = [&](const std::string &name,
53+
int64_t total,
54+
int64_t count,
55+
double variance) {
56+
size_t mean = total / count;
57+
auto stddev = std::sqrt(variance);
58+
auto err = static_cast<int64_t>(1.96 * stddev /
59+
std::sqrt(static_cast<double>(count)));
60+
std::string unit = "ns";
61+
if (mean > 10000000) {
62+
unit = "ms";
63+
mean /= 1000000;
64+
err /= 1000000;
65+
} else if (mean > 10000) {
66+
unit = "μs";
67+
mean /= 1000;
68+
err /= 1000;
69+
}
70+
out << std::left << std::setw(30) << name;
71+
out << std::left << std::setw(8) << count;
72+
out << std::left << std::setw(14) << total;
73+
out << mean << " ± " << err << " " << unit << std::endl;
74+
};
75+
76+
auto ok = mgr.foreach([&](auto &pass) -> Result<> {
77+
// Copy out the AST. We allow passes to mutate the AST, and therefore we
78+
// copy this out and reset it each time.
79+
ast::ASTContext saved;
80+
if (ctx.has<ast::ASTContext>()) {
81+
auto &ast = ctx.get<ast::ASTContext>();
82+
saved.root = saved.clone_node(ast.root, ast::Location());
83+
}
84+
85+
// We run the function until we are able to accumulate at least three
86+
// iterations, and 100 milliseconds (but we never bother doing more than
87+
// 10,000). This should provide reasonable data for the below. The times
88+
// are all recorded in process CPU time, only while the pass itself is
89+
// running. We may accumulate additional time rebuilding the AST, etc.
90+
int64_t goal = std::chrono::duration_cast<std::chrono::nanoseconds>(
91+
std::chrono::milliseconds(100))
92+
.count();
93+
std::vector<int64_t> samples;
94+
int64_t total = 0;
95+
while (true) {
96+
auto start = processor_time();
97+
if (!start) {
98+
return start.takeError();
99+
}
100+
auto ok = pass.run(ctx);
101+
if (!ok) {
102+
return ok.takeError();
103+
}
104+
auto end = processor_time();
105+
if (!end) {
106+
return end.takeError();
107+
}
108+
int64_t current = delta(*start, *end);
109+
samples.push_back(current);
110+
total += current;
111+
112+
// Do we have enough (or too much)?
113+
if (samples.size() >= 10000 || (samples.size() > 3 && total >= goal)) {
114+
break;
115+
}
116+
117+
// Restore the original tree.
118+
auto &ast = ctx.get<ast::ASTContext>();
119+
ast.clear();
120+
ast.root = clone(ast, saved.root, ast::Location());
121+
}
122+
123+
// Compute the variance of the samples.
124+
int64_t mean = total / samples.size();
125+
double variance = 0;
126+
for (const auto &sample : samples) {
127+
variance += std::pow(static_cast<double>(sample - mean), 2);
128+
}
129+
emit(pass.name(), total, samples.size(), variance);
130+
131+
// Aggregate for printing the final stats. Note that we treat each pass as
132+
// independent, therefore the final variance is the sum of the variances.
133+
full_mean += mean;
134+
full_variance += variance;
135+
full_count++;
136+
return OK();
137+
});
138+
if (!ok) {
139+
out << "FAIL\n"; // See below.
140+
return ok.takeError();
141+
}
142+
143+
// The final `PASS` is emitted when all passes have finished correctly. This
144+
// makes the output format compatible with `gobench` or other aggregation
145+
// tools that can compare benchmarks.
146+
emit("total", full_mean * full_count, full_count, full_variance);
147+
out << "PASS\n";
148+
return OK();
149+
}
150+
151+
} // namespace bpftrace

src/benchmark.h

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
#pragma once
2+
3+
#include <iostream>
4+
5+
#include "ast/pass_manager.h"
6+
#include "util/result.h"
7+
8+
namespace bpftrace {
9+
10+
class TimerError : public ErrorInfo<TimerError> {
11+
public:
12+
TimerError(int err) : err_(err) {};
13+
static char ID;
14+
void log(llvm::raw_ostream &OS) const override;
15+
16+
private:
17+
int err_;
18+
};
19+
20+
Result<OK> benchmark(std::ostream &out, ast::PassManager &mgr);
21+
22+
} // namespace bpftrace

src/main.cpp

Lines changed: 18 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@
3232
#include "ast/passes/resource_analyser.h"
3333
#include "ast/passes/return_path_analyser.h"
3434
#include "ast/passes/semantic_analyser.h"
35+
#include "benchmark.h"
3536
#include "bpffeature.h"
3637
#include "bpftrace.h"
3738
#include "btf.h"
@@ -62,8 +63,9 @@ enum class OutputBufferConfig {
6263
};
6364

6465
enum class TestMode {
65-
UNSET = 0,
66+
NONE = 0,
6667
CODEGEN,
68+
BENCHMARK,
6769
};
6870

6971
enum class BuildMode {
@@ -335,7 +337,7 @@ struct Args {
335337
bool usdt_file_activation = false;
336338
int helper_check_level = 1;
337339
bool no_warnings = false;
338-
TestMode test_mode = TestMode::UNSET;
340+
TestMode test_mode = TestMode::NONE;
339341
std::string script;
340342
std::string search;
341343
std::string filename;
@@ -465,8 +467,10 @@ Args parse_args(int argc, char* argv[])
465467
args.helper_check_level = 0;
466468
break;
467469
case Options::TEST: // --test
468-
if (std::strcmp(optarg, "codegen") == 0)
470+
if (std::strcmp(optarg, "codegen") == 0) {
469471
args.test_mode = TestMode::CODEGEN;
472+
} else if (std::strcmp(optarg, "benchmark") == 0)
473+
args.test_mode = TestMode::BENCHMARK;
470474
else {
471475
LOG(ERROR) << "USAGE: --test can only be 'codegen'.";
472476
exit(1);
@@ -829,7 +833,7 @@ int main(int argc, char* argv[])
829833
}
830834

831835
// If we are not running anything, then we don't require root.
832-
if (args.test_mode != TestMode::CODEGEN) {
836+
if (args.test_mode == TestMode::NONE) {
833837
check_is_root();
834838

835839
auto lockdown_state = lockdown::detect();
@@ -949,6 +953,16 @@ int main(int argc, char* argv[])
949953
pm.add(ast::CreateExternObjectPass());
950954
pm.add(ast::CreateLinkPass());
951955

956+
if (args.test_mode == TestMode::BENCHMARK) {
957+
info(args.no_feature);
958+
auto ok = benchmark(std::cout, pm);
959+
if (!ok) {
960+
std::cerr << "Benchmark error: " << ok.takeError();
961+
return 1;
962+
}
963+
return 0;
964+
}
965+
952966
auto pmresult = pm.run();
953967
if (!pmresult) {
954968
std::cerr << pmresult.takeError() << "\n";

0 commit comments

Comments
 (0)