Skip to content

Conversation

@wangpc-pp
Copy link
Contributor

@wangpc-pp wangpc-pp commented Jun 6, 2024

We need this when optimizing some locks' implementation, but using inline assembly will stop some optimizations. For example, if we use inline assembly, the compiler doesn't know lr.w will sign-extend the result.

ARM provides similar intrinsics like __builtin_arm_ldaex.

Currently, we only add intrinsics for Zawrs.

@wangpc-pp
Copy link
Contributor Author

I don't know if we should add intrinsics for AMO instructions as we have some "standard" C builtins for AMO.

@wangpc-pp
Copy link
Contributor Author

Ping. :-)

@cmuellner
Copy link
Collaborator

@wangpc-pp is there a GCC or LLVM patch that adds support for this?
I see the referenced LLVM PR (which has been closed), which also included LR and SC.

@wangpc-pp
Copy link
Contributor Author

@wangpc-pp is there a GCC or LLVM patch that adds support for this? I see the referenced LLVM PR (which has been closed), which also included LR and SC.

I almost forgot this PR :-)
If we have the consensus on accepting this, I can create another PR with only Zawrs. WDYT? Should I preceed on?

@cmuellner
Copy link
Collaborator

I almost forgot this PR :-) If we have the consensus on accepting this, I can create another PR with only Zawrs. WDYT? Should I preceed on?

From a comment in the referenced LLVM PR (llvm/llvm-project#94578 (comment)), I can see that this was discussed, and there is acceptance for Zawrs intrinsics, but not for a particular API.

I'm fine with this PR, but we need the OK from the GCC and LLVM community.

I suggest you create an LLVM PR that adds this API (nothing else). Given that this would be a subset of your previous PR, it should be done relatively quickly.

@wangpc-pp
Copy link
Contributor Author

I almost forgot this PR :-) If we have the consensus on accepting this, I can create another PR with only Zawrs. WDYT? Should I preceed on?

From a comment in the referenced LLVM PR (llvm/llvm-project#94578 (comment)), I can see that this was discussed, and there is acceptance for Zawrs intrinsics, but not for a particular API.

I'm fine with this PR, but we need the OK from the GCC and LLVM community.

I suggest you create an LLVM PR that adds this API (nothing else). Given that this would be a subset of your previous PR, it should be done relatively quickly.

Aha! I have done it (llvm/llvm-project#96283) before, but I forgot it. :-)

We need this when optimizing some locks' implementation, but using
inline assembly will stop some optimizations. For example, if we use
inline assembly, the compiler doesn't know `lr.w` will sign-extend the
result.

ARM provides similar intrinsics like `__builtin_arm_ldaex`.

Currently, we only add intrinsics for `Zawrs`.
@jrtc27
Copy link

jrtc27 commented Jan 16, 2025

How are you supposed to use these? WRS refers to the current reservation, but the only way to have control over that is with LR, which means it has to be in assembly already, surely?

@wangpc-pp
Copy link
Contributor Author

How are you supposed to use these? WRS refers to the current reservation, but the only way to have control over that is with LR, which means it has to be in assembly already, surely?

I have the question as well so I haven't merged the implementation. Maybe we can provide a higher abstract layer that generates LR/SC+WRS directly?

@aswaterman
Copy link
Contributor

aswaterman commented Jan 16, 2025

We probably don't want to add intrinsics that expose LR and SC individually because of the lack of a forward-progress guarantee for unconstrained LR/SC loops. Exposing higher-level abstractions that use LR/SC under the hood avoids this pitfall.

This is slightly less of a concern for LR/WRS since there's no functional bug if the reservation yields before the WRS is executed, but it is still not ideal. Some cores will eagerly yield the reservation as soon as they hit another memory access (whether or not it's an SC), so if the compiler slips a register spill in between the LR and the WRS, such cores won't ever enter a lower-power state. So I think we want higher-level abstractions in this case, too.

We could define e.g. wrs_sto_until(addr, value) to emit something like 1: lr.d t0, (addr); bne t0, value, 1f; wrs.sto; j 1b; 1:, and wrs_sto_while that inverts the branch condition, but unfortunately there might be a proliferation of these for more complex loop-exit conditions. I guess at some point you just need to write assembly code.

@jrtc27
Copy link

jrtc27 commented Jan 16, 2025

This is slightly less of a concern for LR/WRS since there's no functional bug if the reservation yields before the WRS is executed, but it is still not ideal. Some cores will eagerly yield the reservation as soon as they hit another memory access (whether or not it's an SC), so if the compiler slips a register spill in between the LR and the WRS, such cores won't ever enter a lower-power state. So I think we want higher-level abstractions in this case, too.

But if there's an LR that comes in between for whatever reason then you may have a valid reservation for a different location. Waiting with a timeout will then risk waiting a bit too long, but waiting without a timeout will risk waiting indefinitely even though the intended location has been modified. Probably there will be no LR added by the compiler, but who's to say for sure...

@aswaterman
Copy link
Contributor

Indeed. In any case, the upshot is that we shouldn't go down the route of adding intrinsics for the primitives. Adding intrinsics for entire WRS loops, maybe, if we think we can capture the important use cases without a proliferation of new intrinsics.

@cmuellner
Copy link
Collaborator

@wangpc-pp, what is the status of this?
Did this land in LLVM or GCC?

@wangpc-pp
Copy link
Contributor Author

@wangpc-pp, what is the status of this? Did this land in LLVM or GCC?

No, it is not landed and I am not working on this currently. I can close it if it bothers others.

@cmuellner
Copy link
Collaborator

Thanks for the update!
Let's close this.

@cmuellner cmuellner closed this Apr 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants