Skip to content

Conversation

liach
Copy link
Member

@liach liach commented Sep 16, 2025

There are some remnants from previous iterations of value objects in Unsafe. We should aim to remove them for cleaner code in the future.


Progress

  • Change must not contain extraneous whitespace

Issue

  • JDK-8367792: [lworld] Remove the Unsafe remnants of old valhalla prototypes (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/valhalla.git pull/1593/head:pull/1593
$ git checkout pull/1593

Update a local copy of the PR:
$ git checkout pull/1593
$ git pull https://git.openjdk.org/valhalla.git pull/1593/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 1593

View PR using the GUI difftool:
$ git pr show -t 1593

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/valhalla/pull/1593.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Sep 16, 2025

👋 Welcome back liach! A progress list of the required criteria for merging this PR into lworld will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Sep 16, 2025

@liach This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8367792: [lworld] Remove the Unsafe remnants of old valhalla prototypes

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 1 new commit pushed to the lworld branch:

  • 54c8105: 8349037: [lworld] runtime/valhalla/inlinetypes/CircularityTest.java crashes with SIGSEGV

Please see this link for an up-to-date comparison between the source branch of this pull request and the lworld branch.
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the lworld branch, type /integrate in a new comment.

@liach liach changed the title Remove old unsafe value stuff 8367792: [lworld] Remove the Unsafe remnants of old valhalla prototypes Sep 16, 2025
@liach liach marked this pull request as ready for review September 16, 2025 22:37
@liach
Copy link
Member Author

liach commented Sep 16, 2025

Testing: tier 1-3 on linux-x64 finished clear.

@openjdk openjdk bot added ready Pull request is ready to be integrated rfr Pull request is ready for review labels Sep 16, 2025
@mlbridge
Copy link

mlbridge bot commented Sep 16, 2025

Webrevs

@RogerRiggs
Copy link
Collaborator

MakePrivateBuffer is still being used in some prototypes of vector support and should not be removed.

@liach
Copy link
Member Author

liach commented Sep 17, 2025

The vector support prototype was last updated 5 months ago. makePrivateBuffer introduces significant complexity into core library code and JVM. Besides this complexity, the vector prototype also introduces "multifield", which is also uncertain.

I will reach out to Paul Sandoz and Jatin Bhateja to check on the status of this prototype - it hasn't been updated since the new lworld model, maybe our mainline hacks can continue to work after the vector classes are marked value.

@jatin-bhateja
Copy link
Member

jatin-bhateja commented Sep 25, 2025

Hi @liach, For now, vectorAPI fallback does make heavy use of unsafe APIs for explicit larval transitions.
There is a whole lot of discussion that went into the usage of the @multifield annotation for contiguous backing storage allocation. Sharing some links for your reference.

https://cr.openjdk.org/~jrose/values/multi-field.html
https://cr.openjdk.org/~jbhateja/Valhalla/Valhalla-VectorAPI-OracleF2F-3-Apr_2024.pptx

@jatin-bhateja
Copy link
Member

jatin-bhateja commented Sep 26, 2025

Hi @liach ,

Currently, Unsafe.put* APIs expect to operate on a mutable value, without Unsafe.makePrivateBuffer, there is no way to transition a value object to larval state.

image

Here is a typical update kernel for the nary operation fallback implementation.

image

Here are some relevant FAQs on the need for multifield annotation.

Q. Why do we need @multifield annotated field in VectorPayloads and not just retain the array-backed backing storage?
A. Even currently, Vector instances are immutable, with each modification or application of an operation, a new vector is generated.
Each new vector has a distinct backing storage in the form of an array; thus, no two vector ever share their backing storage, which makes vectors an immutable quantity.

 Vector<Float>  newVector  =  Vec1.lanewise(VectorOperators.ADD, Vec2);

Since arrays are always allocated over the heap, they carry an identity, which is the distinctive heap address for each new backing storage array.

This contradicts the philosophy of value type instances, which are identity-free; the compiler treats two values with the same contents as equivalent entities and is free to substitute one with another.

By replacing existing array-backed storage with a @multifield annotated storage, we ensure that payload adheres to true value semantics, a @multifiled is seen as a bundle of fields, encapsulating payload is a value class, unlike an array, a multifield is never allocated an explicit heap storage.

Here is an example code

image

Even though Payload is a value class, its two instances with the same backing storage are not equal, because arrays have identity.
By treating vectors as value objects, we expect that two vectors with the same contents should be equal.

Q. Is there any alternative to @multifield?
A. All we need to ensure is that the backing storage has no identity. Thus, we could have multiple primitive type fields in the payload, one for each lane of the vector.

image

Consider the above modified payload class ‘TrueValuePayload’, here we create a separate primitive type field for each lane of the vector, thus two vector values with the same contents are treated as equivalent entities.

With @multifield we intend to create identity-free backing storage.

Q. What are the complications with explicit fields for each lane in the container payload?
A. First, at load time, build_layout randomizes the fields' layout, but that is not a major concern since fields can always be updated using their effective offsets, for a multifield update , existing reflective name-based APIs don't suffice, we need an UNSAFE api which accepts the offsets, since a multifield has hidden / synthetic fields which can only be accessed by adding offsets w.r.t to base field

A bigger concern is that the C2 compiler will see scalar fields as different inputs to InlineTypeNode. The compiler creates an InlineTypeNode for each instance of a value class and scalarizes the fields, with different scalar field we may observe multiple inputs to IR, one for each lane. Each scalar field will be of primitive Type, while the type of InlineTypeNode will be PayloadType, since an inline type node corresponds to a value object. We expect to deal in vector-type field, i.e., the one that carry an ideal type TypeVect.

Vector IR is forwarded to its user every time we perform a vector_unbox operation over VectorPayload, this way, vector IR inputs are directly forwarded to the vector operation nodes.

Keeping multiple scalar fields, one for each lane type in the VectorPayload class, while creating a vector IR for a bundle of lanes, will result in adding kludges in the code. Consider the following case

 var  v1 = FloatVector.fromArray(SP, arr1, index)
 var  v2 = FloatVector.fromArray(SP, arr2, index)
 res =  v1.lanewise(VectorOperators.ADD, v2)

v1 and v2 are concrete vector class instances, where the vector class is a valuetype, with stock Valhalla, IR corresponding to fromArray API will look like the following
image

While the expected IR pallet should look like the following

image

@liach
Copy link
Member Author

liach commented Sep 26, 2025

Hi @jatin-bhateja, as I know, the larval bit is completely unused in current Valhalla - I believe what we should do with that assert is to simply remove it. I am thinking of some internal API that accepts a Consumer that handles the "early larval" API via unsafe before returning it.

I thought the merge of lworld into vector would be trivial, but it turned out troublesome - can you push the merge as soon as it is ready? I am more than happy to help you migrate off makePrivateBuffer and this to-be-removed larval bit.

@jatin-bhateja
Copy link
Member

jatin-bhateja commented Sep 30, 2025

Hi @liach ,

Here is a quick sync-up on the merge status.

I started the merge yesterday, but it may take a few more days as there have been lots of changes in the code over the last 4 months. I am studying the code changes and documenting the new code model, in crystalline form. Following is our new model for larval objects.

Core concept (New model):-

  • A value object can be updated when it's in the larval state, either explicitly through makePrivateBuffer or implicitly during new object creation.
  • A value in a larval state must be in a non-scalarized form, i.e., it should exist as an oop.
  • Updates to a larval value field are done by emitting an explicit store operation/IR, This mandates an OOP form of larval object; this is a major change from the earlier model, where puffield could directly update the incoming edge of scalarized InlineTypeNode IR. Given that a value object is an immutable quantity, any update is only possible by way of creating a new value, thereby going through a temporary larval state.
  • Value object is transitioned to a non-larval state after the constructor call or after finishPrivateBuffer.
  • For vector API, backing storage is in the form of a @Multifield payload, since only the base field can be exposed in Java code, and the rest of the synthetic fields are internally generated based on the annotation processing during class file parsing; hence, we definitely need an Unsafe API to update all the synthetic fields.
  • New flexible constructor does allow updates to fields before the super call, but we still cannot pass uninitializedThis to an external initialization routine (like an unsafe API) from a constructor. As a result of this, currently the VectorPayalod*MF constructors only initialize the base multifield through a putfield bytecode. JDK-8368810 tracks this issue.

In the current context, we have the following options to perform vector updates in the fallback implementation:-

Premise: Value objects are immutable, and two vector values with the same lane contents must be treated as equals.

a/ Current update loop:-

                          res = Unsafe.makePrivateBuffer(arg1);
                          for (int i = 0; i < laneCount; i++) {
                                 larg1 =  Unsafe.getFloat(arg1, offset[i]);
                                 larg2 =  Unsafe.getFloat(arg2, offset[i]);
	                         Unsafe.putFloat(res, larg1 + larg2);                // res is in oop form since its in a mutable state.         
                          }
                          res = Unsafe.finishPrivateBuffer(res);                   // re-create scalarized representation from larval state. 

The lifetime of a larval is very restricted today, and by Java application compilation, we don't emit any bytecode b/w the new and its subsequent initializing constructor. However, using handcrafted bytecode, we can have a gap b/w a new ininitialized allocation and its initializing constructor, but the JVM runtime rules for bytecode verification catch any undefined behavior. Unsafe operations are outside the purview of JVM bytecode verification.

Problem arises when we make the lifetime of the larval object being acted upon by an unsafe API span across a block or even loop of bytecode; in such cases, we may need to handle larval and non-larval merges OR emit an undefined behavior error.

Q. How performant is the unsafe get / put operation?
A. There are intrinsic and native implementations of get/put APIs, given that it's an unsafe operation compiler does not care about emitting any out-of-bound checks, as it's the responsibility of the user. In such a case, an unsafe get simply computes an effective address (base + offset) and emits load / store instructions. Does the loop involving unsafe operation auto-vectorized ? Yes, since the auto-vectorizer looks for contiguity in the address across iterations.

Consider the impact of long-lived larval res in the above update loop on intrinsicfication

               <statistics type='intrinsic'>
                      Compiler intrinsic usage:
                         2 (50.0%) _getFloat (worked)
                         1 (25.0%) _putFloat (worked)
                         1 (25.0%) _makePrivateBuffer (worked)
                         0 ( 0.0%) _finishPrivateBuffer (failed)
                         4 (100.0%) total (worked,failed)
                    </statistics> 

res, was made a larval quantity using makePrivateBuffer, but scalarization triggered at gen_checkcast, which observed a value type oop, which by the way is in larval state. As a result, finishPrivateBuffer receives an InlineTypeNode while it always expects to receive a larval value oop, and hence intrinsification fails, it has a side effect on auto-vectorization, which does not observe an IR expression for store and its associated address computation since we take the native implementation route. Hence, the performance of fall fallback implementation degrades.

Q. What are the possible solutions here?
A. Compiler should never try to scalarize the larval oop at checkcast bytecode; this handling exists because type profiling sharpens the oop's type to a value type, thus facilitating scalarization. Another solution is to make finishPrivateBuffer intrinsic more robust; it should simply return the InlineTypeNode by setting the Allocation->_larval to false. This is attempted by an unmerged patch[1]
All in all, there are loopholes if we increase the life span of the larval object; in this case, it spans across the loop.

b/ Other larval lifetime limiting options:-

In the fallback implementation, we intend to operate at the lane level; it may turn out that the generated IR is auto-vectorizable, but it's purely a performance optimization by the Compiler, and was not the intent of the user in the first place. So we read the input lanes using Unsafe API, we need an Unsafe get here, since synthetic multifields are not part of user visible Java code and have no symbol or type information in the constant pool. Thus, with multi-field, we ought to use unsafe APIs. Unsafe. get does not mandate larval transitions of the input value vector. What needs to change, well, to limit the larval life time, we need to allocate a temporary buffer before the loop for storing the result of the computation loop.

We need a new Unsafe API that accepts the temporary buffer and the result value object. This API will internally buffer the scalarized value, update its field from the temporary result storage, and then re-materialize the scalarized representation from the updated buffer.

Q. Will this API again have both native and intrinsic implementation?
A. Yes, it will definitely have native implementation, and the intrinsic implementation should make use of vector IR if the target supports the required vector size, OR it will need to emit scalar load / store IR, We can be intelligent in the intrinsic implementation to optimize this copy operation using a narrower vector.

Q. Will this be performant over the existing solution?
A. Additional allocation of a temporary buffer is costly; we can bank on escape analysis and loop unrolling to eliminate this allocation, but given that the temporary storage is passed to a native routine, it escapes the scope. With a Vector IR-based intrinsic implementation, we do need a contiguous memory view of temporary storage; otherwise, we may need to emit multiple scalar inserts into a vector, which is very costly. Overall, we don't expect the new implementation to be performant, mainly due to the additional allocation penalty. BTW, even today, vectors are immutable and vector factory routines are used in the fallback implementation to allocate memory for backing storage, thus we not only allocate a new concrete vector but also its backing storage, with current Valhalla and flat payloads we only need one allocation for vector and its backing storage, with new approach we may have to give away that benefit for reduced larval lifetime. It may turn out that the new approach is inferior in terms of performance with respect to the existing VectorAPI-Valhalla solution, but eventually may be at par with mainline support.

So it's tempting to try out the new solution to limit the complexity due to long-lifespan larval objects.

c/ Alternative approach : Create a new immutable value with each iteration of the update loop this is near to new code model .

                          res = __default_value__
                          for (int i = 0; i < laneCount; i++) {
                                 larg1 =  Unsafe.getFloat(arg1, offset[i]);
                                 larg2 =  Unsafe.getFloat(arg2, offset[i]);
	                         res =     Unsafe.updateLane(res, larg1 + larg2);     // res is in oop form since its in a mutable state.         
                          }

This also limits the lifetime of the larval value object, but on the hind side performs a buffering and scalarization with each iteration. When we buffer, we not only allocate a new temporary storage, but also emit field-level stores to dump the contents into temporary storage; this is very expensive if done for each lane update, given that we are only updating one lane of vector but are still paying the penalty to save and restore all unmodified lanes per iteration.

While this does limit the larval life span, it has a huge performance penalty in comparison to both the existing approaches of larval life time across loop and single update with temporary storage allocation.

In the vector API, the fallback implementation is wrapped within the intrinsic entry point. As long as intrinsification is successful, we are not hit by a fallback; a poor fallback may delay the warmup stage, but for long-running SIMD kernels, warmups are never an issue as long as intrinsifications are successful.

I have created a prototype application [2] mimicking the vector API use case for you. Please give it a try with the existing lworld+vector branch after integrating the patch[1]

While I am merging and understanding the implications of the new code mode, it will help if you can share your plan/test sample for the latest API to limit the scope of larval transition, keeping in view the above context.

Some notes on the implications of patch[1]
Currently, the compiler does not handle vector calling convention emitting out of scalarization; this is a concern with the generic handling of multifield, where the return value was a container of a multifield and the multifield value was held by a vector register, in such a case, the caller will expect to receive the value in a vector register, with the vector API we deal in abstract vectors across method boundaries, which are never scalarized, anyway, extending the calling convention is on our list of items for Valhalla.

To circumvent this issue seen with the test case, we can disable scalarization of return values using -XX:-InlineTypeReturnedAsFields

[1] https://github.com/jatin-bhateja/external_staging/blob/main/Backup/lworld_vector_performance_uplift_patch_30_9_2025.diff

[2] https://github.com/jatin-bhateja/external_staging/blob/main/Code/java/valhalla/valhalla_401/valhalla_vector_api_mf_proto.java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready Pull request is ready to be integrated rfr Pull request is ready for review
Development

Successfully merging this pull request may close these issues.

3 participants