Skip to content

Conversation

@shipilev
Copy link
Member

@shipilev shipilev commented Nov 21, 2025

We know from JDK-8372284 that G1 C2 stubs can take ~10% of total instructions. So minor optimizations in hand-written assembly pay off for code density. This PR does a little x86-specific polishing: testptr where possible, short forward branches where possible. I rewired some code to make it abundantly clear the branches in question are short. It also makes clear that lots of the affected methods are essentially fall-through.

The patch is deliberately on simpler side, so we can backport it to 25u, if need arises.

Additional testing:

  • Linux x86_64 server fastdebug, tier1
  • Linux x86_64 server fastdebug, all

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8372285: G1: Micro-optimize x86 barrier code (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/28446/head:pull/28446
$ git checkout pull/28446

Update a local copy of the PR:
$ git checkout pull/28446
$ git pull https://git.openjdk.org/jdk.git pull/28446/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 28446

View PR using the GUI difftool:
$ git pr show -t 28446

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/28446.diff

Using Webrev

Link to Webrev Comment

@shipilev
Copy link
Member Author

Sample experiments show this saves ~1.6% of code:

$ for I in `seq 1 3`; do build/linux-x86_64-server-release/images/jdk/bin/java -Xcomp -XX:+CITime 2>&1 | grep "nmethod code"; done 

# Before
  nmethod code size         :  5764304 bytes
  nmethod code size         :  5764336 bytes
  nmethod code size         :  5764480 bytes

# After (-1.6%)
  nmethod code size         :  5670136 bytes
  nmethod code size         :  5670136 bytes
  nmethod code size         :  5670168 bytes
$ for I in `seq 1 3`; do build/linux-x86_64-server-release/images/jdk/bin/java -Xcomp -XX:+CITime Hello.java 2>&1 | grep "nmethod code"; done

# Before
  nmethod code size         : 25394184 bytes
  nmethod code size         : 25394552 bytes
  nmethod code size         : 25393968 bytes

# After (-1.6%)
  nmethod code size         : 24988544 bytes
  nmethod code size         : 24991696 bytes
  nmethod code size         : 24991040 bytes

@bridgekeeper
Copy link

bridgekeeper bot commented Nov 21, 2025

👋 Welcome back shade! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Nov 21, 2025

@shipilev This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8372285: G1: Micro-optimize x86 barrier code

Reviewed-by: ayang, tschatzl

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 2 new commits pushed to the master branch:

  • 5b5d85b: 8372360: Exclude jdk.jsobject from micros-javac input source packages
  • e4b583a: 8372294: Fix Malformed problem list entry in ProblemList-jvmti-stress-agent.txt

Please see this link for an up-to-date comparison between the source branch of this pull request and the master branch.
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk
Copy link

openjdk bot commented Nov 21, 2025

@shipilev The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the rfr Pull request is ready for review label Nov 21, 2025
@mlbridge
Copy link

mlbridge bot commented Nov 21, 2025

Webrevs

Copy link
Contributor

@tschatzl tschatzl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks.

I assume that the jmh writebarrier micros were run just in case. Fwiw, also the GHA failures earlier looked like infra issues.

@shipilev
Copy link
Member Author

GHA failure is due to #28445.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Nov 21, 2025
@shipilev
Copy link
Member Author

I assume that the jmh writebarrier micros were run just in case.

As expected, I see no real impact on EPYC machine, as we realistically only touch gc-active and/or slow-paths:

Benchmark                                                                         Mode  Cnt     Score    Error  Units

# ----- Baseline

WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullLarge          avgt   12  2074.042 ± 33.941  ns/op
WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullSmall          avgt   12    31.908 ±  0.020  ns/op
WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullYoungLarge     avgt   12  2052.188 ±  2.993  ns/op
WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullYoungSmall     avgt   12    31.923 ±  0.127  ns/op
WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathOldToYoungLarge    avgt   12  2648.758 ± 12.689  ns/op
WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathOldToYoungSmall    avgt   12    41.843 ±  6.851  ns/op
WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathRealLarge          avgt   12  1860.052 ± 41.707  ns/op
WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathRealSmall          avgt   12    29.635 ±  0.026  ns/op
WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToOldLarge    avgt   12  2647.011 ±  3.035  ns/op
WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToOldSmall    avgt   12    40.217 ±  0.053  ns/op
WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToYoungLarge  avgt   12  1838.099 ± 11.536  ns/op
WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToYoungSmall  avgt   12    29.637 ±  0.031  ns/op
WriteBarrier.WithDefaultUnrolling.testFieldWriteBarrierFastPath                   avgt   12     1.694 ±  0.001  ns/op
WriteBarrier.WithDefaultUnrolling.testFieldWriteBarrierFastPathYoungRef           avgt   12     2.709 ±  0.001  ns/op

WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullLarge              avgt   12  2245.868 ±  1.523  ns/op
WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullSmall              avgt   12    36.056 ±  0.008  ns/op
WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullYoungLarge         avgt   12  2247.127 ±  7.293  ns/op
WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullYoungSmall         avgt   12    36.046 ±  0.012  ns/op
WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathOldToYoungLarge        avgt   12  2812.237 ± 32.421  ns/op
WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathOldToYoungSmall        avgt   12    44.899 ±  0.258  ns/op
WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathRealLarge              avgt   12  2251.210 ± 18.101  ns/op
WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathRealSmall              avgt   12    36.018 ±  0.011  ns/op
WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathYoungToOldLarge        avgt   12  2821.869 ± 32.633  ns/op
WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathYoungToOldSmall        avgt   12    44.800 ±  0.018  ns/op
WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathYoungToYoungLarge      avgt   12  2247.837 ± 14.136  ns/op
WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathYoungToYoungSmall      avgt   12    36.021 ±  0.015  ns/op
WriteBarrier.WithoutUnrolling.testFieldWriteBarrierFastPath                       avgt   12     1.694 ±  0.001  ns/op
WriteBarrier.WithoutUnrolling.testFieldWriteBarrierFastPathYoungRef               avgt   12     2.710 ±  0.001  ns/op


# ----- Patched

WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullLarge          avgt   12  2058.748 ± 11.193  ns/op
WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullSmall          avgt   12    31.943 ±  0.031  ns/op
WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullYoungLarge     avgt   12  2052.097 ±  1.134  ns/op
WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullYoungSmall     avgt   12    31.927 ±  0.021  ns/op
WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathOldToYoungLarge    avgt   12  2661.495 ± 36.916  ns/op
WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathOldToYoungSmall    avgt   12    40.327 ±  0.463  ns/op
WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathRealLarge          avgt   12  1841.228 ±  7.491  ns/op
WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathRealSmall          avgt   12    29.644 ±  0.021  ns/op
WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToOldLarge    avgt   12  2671.222 ± 45.797  ns/op
WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToOldSmall    avgt   12    40.214 ±  0.073  ns/op
WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToYoungLarge  avgt   12  1833.984 ±  9.946  ns/op
WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToYoungSmall  avgt   12    29.635 ±  0.070  ns/op
WriteBarrier.WithDefaultUnrolling.testFieldWriteBarrierFastPath                   avgt   12     1.694 ±  0.001  ns/op
WriteBarrier.WithDefaultUnrolling.testFieldWriteBarrierFastPathYoungRef           avgt   12     2.710 ±  0.001  ns/op

WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullLarge              avgt   12  2244.271 ± 37.550  ns/op
WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullSmall              avgt   12    36.044 ±  0.006  ns/op
WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullYoungLarge         avgt   12  2245.466 ± 18.204  ns/op
WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullYoungSmall         avgt   12    36.036 ±  0.009  ns/op
WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathOldToYoungLarge        avgt   12  2811.951 ± 26.061  ns/op
WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathOldToYoungSmall        avgt   12    44.692 ±  0.041  ns/op
WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathRealLarge              avgt   12  2241.369 ±  0.614  ns/op
WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathRealSmall              avgt   12    36.019 ±  0.014  ns/op
WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathYoungToOldLarge        avgt   12  2827.016 ± 43.966  ns/op
WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathYoungToOldSmall        avgt   12    44.700 ±  0.060  ns/op
WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathYoungToYoungLarge      avgt   12  2242.395 ±  5.700  ns/op
WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathYoungToYoungSmall      avgt   12    36.018 ±  0.010  ns/op
WriteBarrier.WithoutUnrolling.testFieldWriteBarrierFastPath                       avgt   12     1.693 ±  0.001  ns/op
WriteBarrier.WithoutUnrolling.testFieldWriteBarrierFastPathYoungRef               avgt   12     2.710 ±  0.001  ns/op

@albertnetymk
Copy link
Member

/cc hotspot-gc

@openjdk
Copy link

openjdk bot commented Nov 21, 2025

@albertnetymk
The hotspot-gc label was successfully added.

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Nov 21, 2025
@openjdk openjdk bot added the ready Pull request is ready to be integrated label Nov 21, 2025
Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments.

thread, pre_val, tmp);
__ jmp(done);
__ testptr(pre_val, pre_val);
__ jccb(Assembler::equal, L_null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that this short jump will be fused to one instruction with testptr on modern x86. But you will have jump-to-jump sequence. So you may win size wise but "throughput" could be worser. Especially if it is "fast" path.

Can you check performance of these changes vs using jcc(Assembler::equal, L_done); here.


void G1BarrierSetAssembler::gen_write_ref_array_post_barrier(MacroAssembler* masm, DecoratorSet decorators,
Register addr, Register count, Register tmp) {
Label done;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are touching this code can you add L_ to labels in this code?
This is our usual practice for labels to clear see them.


Register thread = r15_thread;

Label done;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use L_done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot [email protected] hotspot-gc [email protected] ready Pull request is ready to be integrated rfr Pull request is ready for review

Development

Successfully merging this pull request may close these issues.

4 participants