Record and Use Engine Flags in Summarize/Effect-Size Tools#293
Record and Use Engine Flags in Summarize/Effect-Size Tools#293abrown merged 5 commits intobytecodealliance:mainfrom
Conversation
This can be useful when the engine filename is not useful or when wanting a concise way to differentiate benchmark runs that may use the same engine but with different flags.
Previously, only the engine name/path was stored for each measurement. The flags used represents valuable information for both understanding how a historical run was performed as well as determining how different configurations impact performance for the same engine.
Previously, recordings were updated to capture engine flag information. This change updates the code used for summaries and effect-size analysis to allow for comparing engine/flag combinations against each other which is useful for a class of comparisons.
8f2b993 to
d29ef92
Compare
d29ef92 to
3c81ea5
Compare
abrown
left a comment
There was a problem hiding this comment.
I think this is the right change to make. Thanks for this!
crates/analysis/src/effect_size.rs
Outdated
| let engines: BTreeSet<_> = key_measurements.iter().map(|m| &m.engine).collect(); | ||
| let engines: BTreeSet<_> = key_measurements | ||
| .iter() | ||
| .map(|m| (&m.engine, &m.engine_flags)) |
There was a problem hiding this comment.
I debated whether to even bring this up, please take it or leave it: if we were to coalesce the engine path and engine flags into a single field, it seems like we might be able to avoid a lot of the "oh, let's also add flags in here too" kinds of changes. Perhaps we still want separate CSV columns for each part, but IIRC there is a way to flatten a struct Engine { path: ..., flags: ... } out into separate columns. If so, using a single struct everywhere (with auto-derived PartialEq, Eq, ...) could make your life easier in the future and simplify this PR.
There was a problem hiding this comment.
And then just generating the label is a matter of impl Display for Engine... but now I'm over-selling this suggestion.
There was a problem hiding this comment.
@abrown I like this idea; it occurred to me that something like it could be a good approach as I continued to have to update more callsites with a_engine, b_engine, etc. I will take a pass at implementing the idea as it should simplify a good chunk of code (probably exchanging a bit of complexity around ser/de).
There was a problem hiding this comment.
Implemented in 31bd2de; things did get gross with ser/de as rust-csv doesn't support flattened structs. After trying a few approaches, I ended up just have private "Wire" versions of each of the data structs to handle dealing with that.
An alternative could have been to format the struct as a single field/string but I didn't love that idea, especially if we may want to process the data with external programs (that would then need to parse/split that column to get the individual pieces of data).
This will break ingest of existing csv that doesn't have headers -- I don't think that really exists in the wild.
crates/analysis/src/effect_size.rs
Outdated
| format!( | ||
| "{}{}", | ||
| engine, | ||
| if let Some(ef) = engine_flags { | ||
| format!(" ({ef})") | ||
| } else { | ||
| "".into() | ||
| } | ||
| ) |
There was a problem hiding this comment.
Another style suggestion:
| format!( | |
| "{}{}", | |
| engine, | |
| if let Some(ef) = engine_flags { | |
| format!(" ({ef})") | |
| } else { | |
| "".into() | |
| } | |
| ) | |
| if let Some(ef) = engine_flags { | |
| format!("{engine} ({ef})") | |
| } else { | |
| format!("{engine}") | |
| } |
This simplifies consuming and building measurement and other related structures but at the cost of somewhat more complex serialization. This is largely due to our use of CSV as an input/output format and rust-csv's longstanding inability to handle flattened structures. See BurntSushi/rust-csv#98
Recently, for a number of performance comparisons I've had the need to compare the same engine with different engine flags. While this has been possible to do with workarounds that involve having copies of the engine that are identical with different names, it is somewhat cumbersome.
Given that I think this sort of comparison is likely to continue to be useful, this PR introduces the following changes:
I also have prepped changes for HTML report generation to follow these changes which will also make use of this information (e.g. as used with bytecodealliance/wasmtime#1749 (comment) to compare 3 sets of flags for the same engine
.so). That will follow this change once the dust settles.