Skip to content

Interpreter Performance #122464

@BrzVlad

Description

@BrzVlad

This issue holds various status information and work items for improving performance on platforms making use of interpreter with CoreCLR. This contains both interpreter performance related work items as well as any other changes we need to do in order to have a good experience on these platforms, iOS and wasm.

iOS

General Interpreter Performance

  • Initial investigation of interpreter performance was done on desktop on a subset of benchmarks (https://github.com/BrzVlad/benchmarks) with full clr-interpreter being around 8x slower than mono interpreter. Composite R2R with interpreter fallback shows small regression, being on average a faster configuration compared to mono. Full performance numbers: https://gist.github.com/BrzVlad/a2b27d4c633f92f85fae594d6bf8ed14
  • Full interpreter is slower both due to code quality (from missing compile time optimizations) and due to higher cost of executing a single instruction. More detailed numbers and investigation https://gist.github.com/BrzVlad/59a120f25f168a6e298b40b251455fd0
    • USE_COMPUTED_GOTO on mono increases perf by over 20%. We can't benefit from this improvement because of apple clang issue. Waiting from more info from apple dev.
  • Microbenchmarking for sources of performance improvement Microbenchmarking for clrinterpreter for iOS performance #123143
  • Execution time optimizations backlog. These would be of higher priority since they would be useful for debugging performance.
    • optimization of virtual calls. I believe we need a per call site cache here.
    • avoid indirection through callstubs for interp to interp calli
  • Compile time optimizations
    • investigate how heavily we plan to commit into these optimizations, main driver would be blazor-wasm, where we should rely less on R2R. Basic per bblock optimizations + inlining, which are relatively easy to implement, might give use around 80% of the perf potential.

Blazor Wasm

  • Testing on the blazorwasm sample, on mono configuration, the compressed bits of the _framework amount to 2.8MB on the interpreter vs 4.8MB when using aot, with the gap expected to increase for bigger apps. IL amounts approximatively to about 100K (compressed) and we already have some form of IL stripping when using AOT, meaning we don't have much room for improvement on this front. We expect a similar increase in size if we will attempt to use full r2r with CoreCLR. Given full interpreter is the default configuration on mono-wasm, this might not be an acceptable regression.
  • Mono on wasm, in interpreter mode, uses jitted thunks on top of the interpreter code, in order to further improve performance (a rough estimate is 2x improvement). Even if we fully optimize the CoreCLR interpreter, we would still miss this piece, so we will need to rely a bit more heavily on R2R, or prioritize JIT capabilities.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions