-
Notifications
You must be signed in to change notification settings - Fork 6.3k
8365205: C2: Optimize popcount value computation using knownbits #27075
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
/label add hotspot-compiler-dev |
👋 Welcome back jbhateja! A progress list of the required criteria for merging this PR into |
❗ This change is not yet ready to be integrated. |
@jatin-bhateja |
Webrevs
|
The change looks good, but I wonder:
|
Hi @TobiHartmann , @SirYwell , @eme64 , can you kindly verify the changes in the latest patch? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice improvement @jatin-bhateja , thanks for working on it :)
test/hotspot/jtreg/compiler/intrinsics/TestPopCountValueTransforms.java
Outdated
Show resolved
Hide resolved
@Benchmark | ||
public int StockKernelInt() { | ||
int res = 0; | ||
for (int i = lower_bound; i < upper_bound; i++) { | ||
int constrained_i = i & 0xFFFF; | ||
res += constrained_i; | ||
} | ||
return res; | ||
} | ||
|
||
@Benchmark | ||
public int LogicFoldingKerenlInt() { | ||
int res = 0; | ||
for (int i = lower_bound; i < upper_bound; i++) { | ||
int constrained_i = i & 0xFFFF; | ||
if (Integer.bitCount(constrained_i) > 16) { | ||
throw new AssertionError("Uncommon trap"); | ||
} | ||
res += constrained_i; | ||
} | ||
return res; | ||
} | ||
|
||
@Benchmark | ||
public long StockKernelLong() { | ||
long res = 0; | ||
for (int i = lower_bound; i < upper_bound; i++) { | ||
long constrained_i = i & 0xFFFFFFL; | ||
res += constrained_i; | ||
} | ||
return res; | ||
} | ||
|
||
@Benchmark | ||
public long LogicFoldingKerenLong() { | ||
long res = 0; | ||
for (int i = lower_bound; i < upper_bound; i++) { | ||
long constrained_i = i & 0xFFFFFFL; | ||
if (Long.bitCount(constrained_i) > 24) { | ||
throw new AssertionError("Uncommon trap"); | ||
} | ||
res += constrained_i; | ||
} | ||
return res; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the stock
kernels are there to show performance if there is no op, the folding
kernels you hope have the same performance. It would be nice to have one where the bitCount
does not fold away, just to keep that comparison :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point, on a second thought, since any benchmarks compare the performance of kernels with and without optimization it's better to do away with the stock variants and only retain folding kernels.
test/hotspot/jtreg/compiler/intrinsics/TestPopCountValueTransforms.java
Outdated
Show resolved
Hide resolved
test/hotspot/jtreg/compiler/intrinsics/TestPopCountValueTransforms.java
Outdated
Show resolved
Hide resolved
test/hotspot/jtreg/compiler/intrinsics/TestPopCountValueTransforms.java
Outdated
Show resolved
Hide resolved
@jatin-bhateja I'm going to be out of the office for about 3 weeks, so feel free to ask others for reviews! |
@jatin-bhateja The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
Hi @eme64 , @chhagedorn , @SirYwell , let me know if its good to land now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I'm about to board my plane for a 3-week vacation, I'm leaving a last little note for the reviewers.
I think this is a really nice addition, so thanks for doing it @jatin-bhateja 😊 . Though it will only reach its full potential once we implement more "basic" KnownBits optimizations such as JDK-8367341.
Please make sure you test it, and make sure the random values generated with the Generators in test/hotspot/jtreg/compiler/intrinsics/TestPopCountValueTransforms.java
make sense. Currently, there is for example a 32 bit range for a 64 bit long value, which is not correct, I think.
By default, my recommendation is to not constrain the Generators ranges, unless there is a really good reason. Generators are already built to produce values close to zero at an over-proportional rate. But by not restricting we may at some point also hit cases that we did not anticipate, and catch bugs that way.
test/hotspot/jtreg/compiler/intrinsics/TestPopCountValueTransforms.java
Outdated
Show resolved
Hide resolved
Correct, currently KnownBits information is constrained as they are generated for limited value ranges, as discussed in |
// We use the KnownBits information from the integer types to derive how many one bits | ||
// we have at least and at most. | ||
// From the definition of KnownBits, we know: | ||
// zeros: Indicates which bits must be 0: ones[i] =1 -> t[i]=0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused by this, is ones[i] mixed up with zeros[i]? I.e., t[i]=0 if zeros[i]=1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SirYwell , comment updated.
Links to formal z3 proofs for this:-
Hi @SirYwell , @chhagedorn , @eme64 , I have addressed your comments. Let me know if this is good to land in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me now, although I'm not exactly sure about the semantics of widen and whether Type::WidenMax
is the right choice here. Someone else has to look at that.
Thanks for the work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A suggestion, we might do this as a later RFE.
Hi @TobiHartmann @chhagedorn , can you kindly run this through your testing and share if its good to land in. |
I submitted testing for this and will report back once it finished. |
return Type::TOP; | ||
} | ||
KnownBits<juint> bits = t->isa_int()->_bits; | ||
return TypeInt::make(population_count(bits._ones), population_count(~bits._zeros), Type::WidenMax); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The widen
of the output should be the same as the widen
of the input, not WidenMax
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @merykitty, widen is mainly used for optimistic data flow analysis pass like CCP where type analysis begins with TOP and progressively grows the value range till convergence / fixed point.
it's good to preserve the widen of input to delay eager convergence.
Testing all passed. I'll pass the review to someone else. |
Is the |
That label was added automatically, closely after https://mail.openjdk.org/pipermail/jdk-dev/2025-September/010486.html. Not sure why, but the change is definitely hotspot specific. |
Right, makes sense. I'll go ahead and remove that label. /label remove core-libs |
@eirbjo |
Hi @merykitty , Can you kindly be the second reviewer? |
FYI this patch was marked core-libs because of the benchmark addition in java/lang. I wonder if it belongs to vm/compiler instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM otherwise
if (t == Type::TOP) { | ||
return Type::TOP; | ||
} | ||
const TypeInt* tint = t->isa_int(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be is_int
. isa_int
is fine, but in cases of unexpected types, it will SIGSEGV instead of throwing an assertion, which is more difficult to debug.
This patch optimizes PopCount value transforms using KnownBits information.
Following are the results of the micro-benchmark included with the patch
Kindly review and share your feedback.
Best Regards,
Jatin
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/27075/head:pull/27075
$ git checkout pull/27075
Update a local copy of the PR:
$ git checkout pull/27075
$ git pull https://git.openjdk.org/jdk.git pull/27075/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 27075
View PR using the GUI difftool:
$ git pr show -t 27075
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/27075.diff
Using Webrev
Link to Webrev Comment