-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Open
Description
This issue holds various status information and work items for improving performance on platforms making use of interpreter with CoreCLR. This contains both interpreter performance related work items as well as any other changes we need to do in order to have a good experience on these platforms, iOS and wasm.
iOS
- automated tracking of size and startup for iOS (to be reported at https://dataexplorer.azure.com/dashboards/e772ff35-1630-4e28-aced-70f2b3f1c80a, some pending issues currently).
- Initial numbers, for maui sample, suggest startup is decent while size is pretty bad, both with room for improvement. (https://gist.github.com/kotlarmilos/702a8a979d7c23cf2c3f26a416efad14)
- symbol stripping and
libclrjit.dylibexclusion initially done by Milos included in R2R PR [dotnet] Add support for ReadyToRun. macios#24437 - investigate why r2r composite compilation results in a significant increase in size for each assembly involved Reduce the number of sections in R2R PE files #122511 and Skip empty sections when calculating a R2R PE file layout #122820
- around 5% of samples in the startup profile for ios-maui seem related to interpreter. Aim to further reduce the number of interpreted methods.
- Investigate why ios interprets a lot of IL stubs, this behavior is also present on desktop if using an ios-like configuration Composite-r2r with interpreter fallback leads to significant IL_STUB interpretation if FEATURE_JIT is disabled #123120
- Investigate why using a PGO profile would lead to more compiled methods, and whether we could have this behavior as the default. Example: https://gist.github.com/BrzVlad/5d84a8eefe962bb84fde3e4a63fddeca (David N.)
- Investigation of startup profile of ios-maui for any potential CLR/R2R improvements. Seems unlikely we will get any significant improvements from here but worth checking.
- Test huge app. Check how much we are interpreting, app size. Obtain additional sources of improvement.
- iOS Debug build should use R2R for framework, interpreter for user assemblies. Using full interpreter, as current mono does, is likely too slow.
- iOS Debug build time comparison between CoreCLR and Mono, figure out improvements so we avoid excessive R2R.
General Interpreter Performance
- Initial investigation of interpreter performance was done on desktop on a subset of benchmarks (https://github.com/BrzVlad/benchmarks) with full clr-interpreter being around 8x slower than mono interpreter. Composite R2R with interpreter fallback shows small regression, being on average a faster configuration compared to mono. Full performance numbers: https://gist.github.com/BrzVlad/a2b27d4c633f92f85fae594d6bf8ed14
- Full interpreter is slower both due to code quality (from missing compile time optimizations) and due to higher cost of executing a single instruction. More detailed numbers and investigation https://gist.github.com/BrzVlad/59a120f25f168a6e298b40b251455fd0
-
USE_COMPUTED_GOTOon mono increases perf by over 20%. We can't benefit from this improvement because of apple clang issue. Waiting from more info from apple dev.
-
- Microbenchmarking for sources of performance improvement Microbenchmarking for clrinterpreter for iOS performance #123143
- Execution time optimizations backlog. These would be of higher priority since they would be useful for debugging performance.
- optimization of virtual calls. I believe we need a per call site cache here.
- avoid indirection through callstubs for interp to interp calli
- Compile time optimizations
- investigate how heavily we plan to commit into these optimizations, main driver would be blazor-wasm, where we should rely less on R2R. Basic per bblock optimizations + inlining, which are relatively easy to implement, might give use around 80% of the perf potential.
Blazor Wasm
- Testing on the blazorwasm sample, on mono configuration, the compressed bits of the
_frameworkamount to 2.8MB on the interpreter vs 4.8MB when using aot, with the gap expected to increase for bigger apps. IL amounts approximatively to about 100K (compressed) and we already have some form of IL stripping when using AOT, meaning we don't have much room for improvement on this front. We expect a similar increase in size if we will attempt to use full r2r with CoreCLR. Given full interpreter is the default configuration on mono-wasm, this might not be an acceptable regression. - Mono on wasm, in interpreter mode, uses jitted thunks on top of the interpreter code, in order to further improve performance (a rough estimate is 2x improvement). Even if we fully optimize the CoreCLR interpreter, we would still miss this piece, so we will need to rely a bit more heavily on R2R, or prioritize JIT capabilities.