Skip to content

Conversation

@cjacek
Copy link
Contributor

@cjacek cjacek commented May 6, 2025

Treat ARM64EC as an AArch64 target for the most part. Since the vararg calling convention differs, add a dedicated implementation based on the aarch64 variant. The differences include:

  • Only the first four parameters are passed via registers
  • x4 contains a pointer to the first stack argument
  • x5 contains the size of arguments passed via the stack
  • The call is preceded by a call to __os_arm64x_check_icall, which optionally sets up an indirect call via an exit thunk if the target is x86_64 code

@cjacek cjacek requested a review from mstorsjo May 6, 2025 21:49
@llvmbot llvmbot added the openmp:libomp OpenMP host runtime label May 6, 2025
@github-actions
Copy link

github-actions bot commented May 6, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Treat ARM64EC as an AArch64 target for the most part. Since the vararg calling convention
differs, add a dedicated implementation based on the AArch64 variant. The differences include:

- Only the first four parameters are passed via registers
- x4 contains a pointer to the first stack argument
- x5 contains the size of arguments passed via the stack
- The call is preceded by a call to __os_arm64x_check_icall, which optionally sets up an indirect
  call via an exit thunk if the target is x86_64 code
@cjacek cjacek force-pushed the arm64ec-openmp branch from 83e12ad to af3bac5 Compare May 6, 2025 22:06
#if defined(__KNC__)
#error ARCHITECTURE=mic
#elif defined(__aarch64__) || defined(_M_ARM64) || defined(__arm64ec__) || defined(_M_ARM64EC)
#error ARCHITECTURE=aarch64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should leave a comment here to say that arm64ec also has x86_64 defined, and therefore has to be ordered here before that (so nobody tries to reorder it back later)?

Then again, if we add a comment here we should probably do the same in kmp_platform.h below too. So perhaps it's not necessary; once things are working well enough we'll probably build this regularly enough anyway, so we'd catch any breakage that way.

.globl KMP_PREFIX_UNDERSCORE(\proc)
KMP_PREFIX_UNDERSCORE(\proc):
.endm

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Stray change

b KMP_LABEL(kmp_0)
KMP_LABEL(kmp_1):
adrp x10, $iexit_thunk$cdecl$v$varargs
add x10, x10, :lo12:$iexit_thunk$cdecl$v$varargs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which object file provides this symbol?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A number of .cpp files from openmp generate this exit thunk, for example for __kmp_fatal guest exit thunk.


sub w9, w9, #1
cbz w9, KMP_LABEL(kmp_1)
ldr x3, [x10, #8]!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The calling convention details are a bit unclear to me here; we're accepting arguments to this function up to x4 in registers to this function, but for the called function we only pass up to x3 in registers and the rest on the stack? Is this something that the thunks and __os_arm64x_check_icall sort out if the called function is arm64ec, reloading the later arguments into registers? Or is this that the called function has a different calling convention as a variadic function?

I'm a little unsure about this aspect, since I'm not sure if the called function really is a variadic function; I have a faint memory that this function is used as adapter for calling non-variadic functions with a varying number of arguments. Did you manage to test this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__os_arm64x_check_icall can’t resolve the target correctly if it’s ARM64EC. In that case, it leaves the target address unchanged and we call the function directly. As you noted, this means that if the target isn’t actually a varargs function, it will misinterpret the arguments.

If we want to support such cases, ARM64EC alone likely won’t suffice. We may need to build x86_64 assembly instead. If we do that and the target is ARM64EC, control will return from the emulator to the entry chunk, which should handle it correctly. The downside is that it's a bit tricky for the build system to handle. I’ll investigate this further.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another aspect overall here, is that openmp involves a lot of compiler generated calls to the helper functions; I'm curious about whether all of that really works out correctly without any further touches anywhere; both for the simple case of everything compiled as arm64ec (which would be fine if as such as a first step, if that works). Mixing in object files built as x86_64 would be the advanced case...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

openmp:libomp OpenMP host runtime

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants