-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Optimize ISLE compilation #12303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Optimize ISLE compilation #12303
Conversation
|
@bongjunj thanks for this experimentation! Before moving further, would you mind benchmarking compilation time as well? One of the reasons for putting all of the codegen for an ISLE term in one function is so that the compiler can optimize the code together; splitting that code between functions and (especially) introducing ABI boundaries may produce slowdowns when Cranelift runs. IMHO, it's worth it to spend a few extra seconds when compiling Cranelift if it makes Cranelift itself run faster. If no effect, of course, then no problem and I'll be happy to review this. Thanks! |
Subscribe to Label ActionDetailsThis issue or pull request has been labeled: "cranelift", "isle"Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
|
An 11x reduction in compile time is quite massive, so even if this is a ~NN% regression in compile-time-of-wasm-code I think this would be a great thing to land at least for debug builds and maybe optionally for release builds with some sort of tunable too. |
|
Note, that's an 11x reduction with Bongjun's huge new ruleset; ~10% on |
|
If it does result in Wasm compilation time regressions, it may still make sense to have this as an option of the ISLE compiler that we can enable during development or something, if we can make the maintenance burden minimal. |
| // Compile-time optimization: huge function bodies (often from very large match arms | ||
| // of constructor bodies)cause rustc to spend a lot of time in analysis passes. | ||
| // Wrap such bodies in a local closure to move the bulk of the work into a separate body | ||
| // without needing to know the types of captured locals. | ||
| const MATCH_ARM_BODY_CLOSURE_THRESHOLD: usize = 256; | ||
| if ret_kind == ReturnKind::Iterator | ||
| && Codegen::block_weight(&arm.body) > MATCH_ARM_BODY_CLOSURE_THRESHOLD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding my previous comment: it could potentially make sense to make MATCH_ARM_BODY_CLOSURE_THRESHOLD a dynamic ISLE compilation option, rather than a constant, and then tweak that value in Cranelift's invocation of the ISLE compiler depending on if a cargo feature is enabled or whether this is a release vs debug build of Cranelift or something.
Overview
The current implementation of ISLE code generation emits a single Rust function for each ISLE term, and
rustccompiles the Rust code to integrate ISLE with other modules of Cranelift.This can cause a compilation bottleneck when a ISLE term contains numerous rules ending up with producing a very very large Rust function, since
rustccannot efficiently compile such functions. Notably, the termsimplifywhich implements the mid-end peephole optimizations has about 500 rules, and the symptom is already visible. (And the ruleset is growing!) You can check, compiling Cranelift,cranelift-codegentakes most of the time in the following report: cargo-timing.htmlPlus: a timing report for compiling wasmtime:
With this PR, the ISLE codegen helps
rustcby wrapping a large match statement generated byislecin a closure. By introducing closures, a portion of a huge Rust function (for a ISLE term) can be split into several smaller compilation units forrustc, reducing the compilation time.Evaluation
On the
mainbranch, the build times forcranelift-codegenbefore/after this optimization are measured. This PR saves ~2 seconds on my machine (x86-64, 64 core, 512GB memory)Extra
I am experimenting with over 6,000 ISLE
simplifyrules.The compilation time reduced from 841.37 seconds to 69.34 seconds with this PR. (+11x speedup)