Record and Use Engine Flags in Summarize/Effect-Size Tools by posborne · Pull Request #293 · bytecodealliance/sightglass

posborne · 2025-12-03T21:16:04Z

Recently, for a number of performance comparisons I've had the need to compare the same engine with different engine flags. While this has been possible to do with workarounds that involve having copies of the engine that are identical with different names, it is somewhat cumbersome.

Given that I think this sort of comparison is likely to continue to be useful, this PR introduces the following changes:

When benchmarking, the ability to override the engine name is provided. This is what I used for some time as a workaround and which could still be useful for some. Ultimately, it was still cumbersome to update this to track the flags being used.
Changes were introduced to not lose the engine flags by including it in the Measurements recorded and written to raw data records.
Changes were made to use this information and display it in a reasonable way when using the summarize or effect-size tooling.

I also have prepped changes for HTML report generation to follow these changes which will also make use of this information (e.g. as used with bytecodealliance/wasmtime#1749 (comment) to compare 3 sets of flags for the same engine .so). That will follow this change once the dust settles.

This can be useful when the engine filename is not useful or when wanting a concise way to differentiate benchmark runs that may use the same engine but with different flags.

Previously, only the engine name/path was stored for each measurement. The flags used represents valuable information for both understanding how a historical run was performed as well as determining how different configurations impact performance for the same engine.

Previously, recordings were updated to capture engine flag information. This change updates the code used for summaries and effect-size analysis to allow for comparing engine/flag combinations against each other which is useful for a class of comparisons.

abrown

I think this is the right change to make. Thanks for this!

abrown · 2025-12-04T19:09:11Z

crates/analysis/src/effect_size.rs

-        let engines: BTreeSet<_> = key_measurements.iter().map(|m| &m.engine).collect();
+        let engines: BTreeSet<_> = key_measurements
+            .iter()
+            .map(|m| (&m.engine, &m.engine_flags))


I debated whether to even bring this up, please take it or leave it: if we were to coalesce the engine path and engine flags into a single field, it seems like we might be able to avoid a lot of the "oh, let's also add flags in here too" kinds of changes. Perhaps we still want separate CSV columns for each part, but IIRC there is a way to flatten a struct Engine { path: ..., flags: ... } out into separate columns. If so, using a single struct everywhere (with auto-derived PartialEq, Eq, ...) could make your life easier in the future and simplify this PR.

And then just generating the label is a matter of impl Display for Engine... but now I'm over-selling this suggestion.

@abrown I like this idea; it occurred to me that something like it could be a good approach as I continued to have to update more callsites with a_engine, b_engine, etc. I will take a pass at implementing the idea as it should simplify a good chunk of code (probably exchanging a bit of complexity around ser/de).

Implemented in 31bd2de; things did get gross with ser/de as rust-csv doesn't support flattened structs. After trying a few approaches, I ended up just have private "Wire" versions of each of the data structs to handle dealing with that.

An alternative could have been to format the struct as a single field/string but I didn't love that idea, especially if we may want to process the data with external programs (that would then need to parse/split that column to get the individual pieces of data).

This will break ingest of existing csv that doesn't have headers -- I don't think that really exists in the wild.

abrown · 2025-12-04T19:10:43Z

crates/analysis/src/effect_size.rs

+    format!(
+        "{}{}",
+        engine,
+        if let Some(ef) = engine_flags {
+            format!(" ({ef})")
+        } else {
+            "".into()
+        }
+    )


Another style suggestion:

Suggested change

format!(

"{}{}",

engine,

if let Some(ef) = engine_flags {

format!(" ({ef})")

} else {

"".into()

}

)

if let Some(ef) = engine_flags {

format!("{engine} ({ef})")

} else {

format!("{engine}")

}

This simplifies consuming and building measurement and other related structures but at the cost of somewhat more complex serialization. This is largely due to our use of CSV as an input/output format and rust-csv's longstanding inability to handle flattened structures. See BurntSushi/rust-csv#98

posborne added 3 commits December 2, 2025 21:01

Allow benchmark names to be overriden

92e600b

This can be useful when the engine filename is not useful or when wanting a concise way to differentiate benchmark runs that may use the same engine but with different flags.

posborne force-pushed the record-and-use-engine-flags branch from 8f2b993 to d29ef92 Compare December 4, 2025 00:19

Fix issues raised by the tests, improve coverage

3c81ea5

posborne force-pushed the record-and-use-engine-flags branch from d29ef92 to 3c81ea5 Compare December 4, 2025 00:19

abrown approved these changes Dec 4, 2025

View reviewed changes

abrown merged commit 372c47a into bytecodealliance:main Dec 8, 2025
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record and Use Engine Flags in Summarize/Effect-Size Tools#293

Record and Use Engine Flags in Summarize/Effect-Size Tools#293
abrown merged 5 commits intobytecodealliance:mainfrom
posborne:record-and-use-engine-flags

posborne commented Dec 3, 2025

Uh oh!

abrown left a comment

Uh oh!

abrown Dec 4, 2025

Uh oh!

abrown Dec 4, 2025

Uh oh!

posborne Dec 4, 2025

Uh oh!

posborne Dec 5, 2025

Uh oh!

abrown Dec 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

posborne commented Dec 3, 2025

Uh oh!

abrown left a comment

Choose a reason for hiding this comment

Uh oh!

abrown Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

abrown Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

posborne Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

posborne Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

abrown Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants