-
Notifications
You must be signed in to change notification settings - Fork 58
Experiment with storing target method for static and opt-virtual callsites in reloc info #106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: premain
Are you sure you want to change the base?
Experiment with storing target method for static and opt-virtual callsites in reloc info #106
Conversation
… sites in reloc info Signed-off-by: Ashutosh Mehra <[email protected]>
|
👋 Welcome back asmehra! A progress list of the required criteria for merging this PR into |
|
❗ This change is not yet ready to be integrated. |
|
To make it convenient to measure perf impact the change in |
iwanowww
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
| PerfTickCounters* SharedRuntime::_perf_handle_wrong_method_total_time = nullptr; | ||
| PerfTickCounters* SharedRuntime::_perf_ic_miss_total_time = nullptr; | ||
|
|
||
| uint SharedRuntime::_perf_resolve_static_cache_hit_ctr = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PerfCounters are usually more convenient to use than raw counters. For example, they can be sampled on-the-fly from a live process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. I used these counters just to do a quick check how much static call resolution can be optimized this way. If we go with this approach I will try to replace them with PerfCounters or even get rid of these counters if they are not needed.
| AtomicAccess::inc(addr); | ||
|
|
||
| if (UseNewCode2) { | ||
| bool is_mhi; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe disabling inlining through MH linkers when generating archived code should simplify things. Then, there should be no attached methods for MH linkers in archived code and vise-versa.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting disable inlining through MH linkers for both aot and jit code, or only for the aot code? If we do only for the aot code, it wouldn't help unless we decided to do this optimization only for the aot code. As it stands, it benefits bot jit and aot code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd assume it is less important for JITed code. The problem is so acute for AOTed code because it's so cheap to retrieve and install it, so we have plenty of AOT code published in a short period during application startup.
|
@ashu-mehra This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply issue a |
This work aims to reduce the time taken to perform call resolution by caching the result of direct calls (static and opt-virtual) in the reloc info during compilation of a method.
Relocations for static and opt-virtual calls already have a field
method_indexwhich is used to store the "real" method to be invoked by the method handle. It is currently only used during c2 compilations.This patch re-uses the
method_indexfield for static and opt-virtual calls to store the target method. The runtime call (SharedRuntime::resolve_helper) used by the compiled code to perform the call site resolution can then optimize the resolution process by getting the target method from the reloc info and patches the callsite through CompiledDirectCall.No special handling is needed for AOT code.
On a 4-cpu system there is around 3% improvement in
spring-boot-getting-started. Numbers for JavacBench range between 0-3% improvement.spring-boot-getting-started:-Xlog:initshows the numbers for time spent in call resolution from the compiled code.For
spring-boot-getting-startedbefore this patch:For
spring-boot-getting-startedafter this patch:For JavacBench before this patch:
For JavacBench after this patch:
Progress
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/leyden.git pull/106/head:pull/106$ git checkout pull/106Update a local copy of the PR:
$ git checkout pull/106$ git pull https://git.openjdk.org/leyden.git pull/106/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 106View PR using the GUI difftool:
$ git pr show -t 106Using diff file
Download this PR as a diff file:
https://git.openjdk.org/leyden/pull/106.diff
Using Webrev
Link to Webrev Comment