Skip to content

Conversation

@Kobzol
Copy link
Member

@Kobzol Kobzol commented Nov 17, 2024

This PR adds a new comparison mode to the compile result compare page. This mode allows users to compare the results between the LLVM and Cranelift backends if you are comparing the same commit against each other.

llvm-vs-cranelift

This was IMO the easiest way how to introduce this comparison mode. It would require non-trivial changes to get this working on the backend, and while this is arguably a bit of a hack, I think that it should work fine for the use-case that we need it for. In the future, this could be easily generalized to multiple backends, but for now it was not worth the hassle.

How to test:

  1. Get results
$ rustup component add rustc-codegen-cranelift-preview --toolchain nightly
$ cargo run --bin collector -- bench_local `rustup +nightly which rustc` --include serde --profiles Debug --backends Llvm,Cranelift
  1. Start the website (if you only have the data from the above run in the local DB, it should open the single commit compared against itself)
  2. Click on "Self-compare backend"

@Kobzol Kobzol requested a review from lqd November 17, 2024 09:26
Copy link
Member

@lqd lqd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exciting!

Here are a few comments/questions

});
}
const record = benchmarkMap.get(key);
if (comparison.backend === "llvm") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we normalize the case here somewhere I guess?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't, the name comes from here.

@Kobzol
Copy link
Member Author

Kobzol commented Nov 18, 2024

Thanks for the comments, went through them and resolved them.

@lqd
Copy link
Member

lqd commented Nov 18, 2024

There are a few cases where we have before/after assumptions on the compare page, and I don't know if we need to do anything here now:

  • the links to the detailed query results; the 3 or so links we have there are about commits (it could be interesting to have "before" link be for the llvm backend, and "after" for cranelift) but I don't remember if the page supports selecting the backend, whether we store and load measureme files per backend in the first place, and so on.
  • the frontend/backend/linker sections graph

These are all useful in understanding where the gains/losses are but it's likely a bit of work, and maybe it's fine not to do anything yet.

Probably not a big deal but when using this we'll also need to remember that we need to use the "after" commit in the compare UI when doing actual single-commit comparisons on rustc-perf. Otherwise there won't be results for both backends on the baseline commit, until we profile cranelift on each merge.

There's also an interesting case that may be more common for the foreseeable future: are broken benchmarks only shown when the 2 commits are different? I say "common" because cg_clif currently doesn't support LTO, so your recent try build shows broken benchmarks here but not on the single-commit comparison. (At some point some of the exa benchmarks were showing as -100% for me, but I can't reproduce this anymore: after some reloads, the failed benchmark rows seem to be gone, and it likely was some transient caching issue.)

@Kobzol
Copy link
Member Author

Kobzol commented Nov 18, 2024

are broken benchmarks only shown when the 2 commits are different?

Yes, it seems like so, because it only shows "newly broken" benchmarks (

newly_failed_benchmarks: errors_in_b.into_iter().collect(),
).

Tbh, I consider this functionality to be a hack useful only for comparing backends for a specific use-case of comparing LLVM and Cranelift, rather than making it work universally for all situations. So things like detailed results and charts are out of scope, IMO.

@lqd
Copy link
Member

lqd commented Nov 18, 2024

At some point some of the exa benchmarks were showing as -100%

These shouldn't have data since they're broken benchmarks in this run IIUC (and incidentally, toggling "Display raw data" filters these rows out), and I'm not sure exactly when it happens, but for historical purposes this is how it looks when it does happen:

image

Copy link
Member

@lqd lqd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@Kobzol Kobzol merged commit 0695b84 into master Nov 18, 2024
11 checks passed
@Kobzol Kobzol deleted the compare-backends branch November 18, 2024 15:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants