Skip to content

Record and Use Engine Flags in Summarize/Effect-Size Tools#293

Merged
abrown merged 5 commits intobytecodealliance:mainfrom
posborne:record-and-use-engine-flags
Dec 8, 2025
Merged

Record and Use Engine Flags in Summarize/Effect-Size Tools#293
abrown merged 5 commits intobytecodealliance:mainfrom
posborne:record-and-use-engine-flags

Conversation

@posborne
Copy link
Collaborator

@posborne posborne commented Dec 3, 2025

Recently, for a number of performance comparisons I've had the need to compare the same engine with different engine flags. While this has been possible to do with workarounds that involve having copies of the engine that are identical with different names, it is somewhat cumbersome.

Given that I think this sort of comparison is likely to continue to be useful, this PR introduces the following changes:

  • When benchmarking, the ability to override the engine name is provided. This is what I used for some time as a workaround and which could still be useful for some. Ultimately, it was still cumbersome to update this to track the flags being used.
  • Changes were introduced to not lose the engine flags by including it in the Measurements recorded and written to raw data records.
  • Changes were made to use this information and display it in a reasonable way when using the summarize or effect-size tooling.

I also have prepped changes for HTML report generation to follow these changes which will also make use of this information (e.g. as used with bytecodealliance/wasmtime#1749 (comment) to compare 3 sets of flags for the same engine .so). That will follow this change once the dust settles.

This can be useful when the engine filename is not useful
or when wanting a concise way to differentiate benchmark
runs that may use the same engine but with different flags.
Previously, only the engine name/path was stored for each measurement.
The flags used represents valuable information for both understanding
how a historical run was performed as well as determining how different
configurations impact performance for the same engine.
Previously, recordings were updated to capture
engine flag information.  This change updates the code
used for summaries and effect-size analysis to allow
for comparing engine/flag combinations against each other
which is useful for a class of comparisons.
@posborne posborne force-pushed the record-and-use-engine-flags branch from 8f2b993 to d29ef92 Compare December 4, 2025 00:19
@posborne posborne force-pushed the record-and-use-engine-flags branch from d29ef92 to 3c81ea5 Compare December 4, 2025 00:19
Copy link
Member

@abrown abrown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the right change to make. Thanks for this!

let engines: BTreeSet<_> = key_measurements.iter().map(|m| &m.engine).collect();
let engines: BTreeSet<_> = key_measurements
.iter()
.map(|m| (&m.engine, &m.engine_flags))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I debated whether to even bring this up, please take it or leave it: if we were to coalesce the engine path and engine flags into a single field, it seems like we might be able to avoid a lot of the "oh, let's also add flags in here too" kinds of changes. Perhaps we still want separate CSV columns for each part, but IIRC there is a way to flatten a struct Engine { path: ..., flags: ... } out into separate columns. If so, using a single struct everywhere (with auto-derived PartialEq, Eq, ...) could make your life easier in the future and simplify this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And then just generating the label is a matter of impl Display for Engine... but now I'm over-selling this suggestion.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abrown I like this idea; it occurred to me that something like it could be a good approach as I continued to have to update more callsites with a_engine, b_engine, etc. I will take a pass at implementing the idea as it should simplify a good chunk of code (probably exchanging a bit of complexity around ser/de).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented in 31bd2de; things did get gross with ser/de as rust-csv doesn't support flattened structs. After trying a few approaches, I ended up just have private "Wire" versions of each of the data structs to handle dealing with that.

An alternative could have been to format the struct as a single field/string but I didn't love that idea, especially if we may want to process the data with external programs (that would then need to parse/split that column to get the individual pieces of data).

This will break ingest of existing csv that doesn't have headers -- I don't think that really exists in the wild.

Comment on lines 85 to 93
format!(
"{}{}",
engine,
if let Some(ef) = engine_flags {
format!(" ({ef})")
} else {
"".into()
}
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another style suggestion:

Suggested change
format!(
"{}{}",
engine,
if let Some(ef) = engine_flags {
format!(" ({ef})")
} else {
"".into()
}
)
if let Some(ef) = engine_flags {
format!("{engine} ({ef})")
} else {
format!("{engine}")
}

This simplifies consuming and building measurement and other
related structures but at the cost of somewhat more complex
serialization.  This is largely due to our use of CSV as
an input/output format and rust-csv's longstanding inability
to handle flattened structures.

See BurntSushi/rust-csv#98
@abrown abrown merged commit 372c47a into bytecodealliance:main Dec 8, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants