You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/build/arm64-windows-abi-conventions.md
+14-14Lines changed: 14 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,11 +5,11 @@ ms.date: 03/25/2025
5
5
---
6
6
# Overview of ARM64 ABI conventions
7
7
8
-
The basic application binary interface (ABI) for Windows when compiled and run on ARM processors in 64-bit mode (ARMv8 or later architectures), for the most part, follows ARM's standard AArch64 EABI. This article highlights some of the key assumptions and changes from what is documented in the EABI. For information about the 32-bit ABI, see [Overview of ARM ABI conventions](overview-of-arm-abi-conventions.md). For more information about the standard ARM EABI, see [Application Binary Interface (ABI) for the ARM Architecture](https://github.com/ARM-software/abi-aa) (external link).
8
+
The basic application binary interface (ABI) for Windows when compiled and run on ARM processors in 64-bit mode (ARMv8 or later architectures), usually follows ARM's standard AArch64 EABI. This article highlights some of the key assumptions and changes from what is documented in the EABI. For information about the 32-bit ABI, see [Overview of ARM ABI conventions](overview-of-arm-abi-conventions.md). For more information about the standard ARM EABI, see [Application Binary Interface (ABI) for the ARM Architecture](https://github.com/ARM-software/abi-aa) (external link).
9
9
10
10
## Definitions
11
11
12
-
With the introduction of 64-bit support, ARM has defined several terms:
12
+
With the introduction of 64-bit support, ARM defined several terms:
13
13
14
14
-**AArch32** – the legacy 32-bit instruction set architecture (ISA) defined by ARM, including Thumb mode execution.
15
15
-**AArch64** – the new 64-bit instruction set architecture (ISA) defined by ARM.
@@ -19,7 +19,7 @@ With the introduction of 64-bit support, ARM has defined several terms:
19
19
Windows also uses these terms:
20
20
21
21
-**ARM** – refers to the 32-bit ARM architecture (AArch32), sometimes referred to as WoA (Windows on ARM).
22
-
-**ARM32** – same as ARM, above; used in this document for clarity.
22
+
-**ARM32** – same as **ARM**; used in this document for clarity.
23
23
-**ARM64** – refers to the 64-bit ARM architecture (AArch64). There's no such thing as WoA64.
24
24
25
25
Finally, when referring to data types, the following definitions from ARM are referenced:
@@ -30,7 +30,7 @@ Finally, when referring to data types, the following definitions from ARM are re
30
30
31
31
## Base requirements
32
32
33
-
The ARM64 version of Windows presupposes that it's running on an ARMv8 or later architecture at all times. Both floating-point and NEON support are presumed to be present in hardware.
33
+
The ARM64 version of Windows presupposes that it's running on an ARMv8 or later architecture always. Both floating-point and NEON support are presumed to be present in hardware.
34
34
35
35
The ARMv8 specification describes new optional crypto and CRC helper opcodes for both AArch32 and AArch64. Support for them is currently optional, but recommended. To take advantage of these opcodes, apps should first make runtime checks for their existence.
| x18 | N/A | Reserved platform register: in kernel mode, points to KPCR for the current processor; In user mode, points to TEB |
75
75
| x19-x28 | Non-volatile | Scratch registers |
76
76
| x29/fp | Non-volatile | Frame pointer |
77
-
| x30/lr | Both | Link Register: Callee function must preserve it for its own return, but caller's value will be lost. |
77
+
| x30/lr | Both | Link Register: Callee function must preserve it for its own return, but caller's value is lost. |
78
78
79
79
Each register may be accessed as a full 64-bit value (via x0-x30) or as a 32-bit value (via w0-w30). 32-bit operations zero-extend their results up to 64 bits.
80
80
@@ -86,7 +86,7 @@ The frame pointer (x29) is required for compatibility with fast stack walking us
86
86
87
87
## Floating-point/SIMD registers
88
88
89
-
The AArch64 architecture also supports 32 floating-point/SIMD registers, summarized below:
89
+
The AArch64 architecture also supports these 32 floating-point/SIMD registers:
90
90
91
91
| Register | Volatility | Role |
92
92
| - | - | - |
@@ -118,7 +118,7 @@ Like AArch32, the AArch64 specification provides three system-controlled "thread
118
118
119
119
## Floating-point exceptions
120
120
121
-
Most ARM hardware doesn't support IEEE floating-point exceptions. You can determine if an ARM CPU supports them by writing a value that enables exceptions to the FPCR register and then reading it back. If the CPU supports floating-point exceptions, the bits corresponding to supported exceptions will remain set, while the bits corresponding to unsupported exceptions will be reset by the CPU.
121
+
Most ARM hardware doesn't support IEEE floating-point exceptions. You can determine if an ARM CPU supports them by writing a value that enables exceptions to the FPCR register and then reading it back. If the CPU supports floating-point exceptions, the bits corresponding to supported exceptions remain set, while the bits corresponding to unsupported exceptions are reset by the CPU.
122
122
123
123
For ARM CPUs that do support IEEE floating-point exceptions, the behavior on Windows is as follows:
124
124
@@ -166,7 +166,7 @@ For each argument in the list, the following rules are applied in turn until the
166
166
167
167
1. If the argument is an HFA, an HVA, a Quad-precision Floating-point or Short Vector Type, then the NSAA is rounded up to the larger of 8 or the Natural Alignment of the argument's type.
168
168
169
-
1. If the argument is a Half- or Single-precision Floating Point type, then the size of the argument is set to 8 bytes. The effect is as if the argument had been copied to the least significant bits of a 64-bit register, and the remaining bits filled with unspecified values.
169
+
1. If the argument is a Half- or Single-precision Floating Point type, then the size of the argument is set to 8 bytes. The effect is as if the argument were copied to the least significant bits of a 64-bit register, and the remaining bits filled with unspecified values.
170
170
171
171
1. If the argument is an HFA, an HVA, a Half-, Single-, Double-, or Quad-precision Floating-point or Short Vector Type, then the argument is copied to memory at the adjusted NSAA. The NSAA is incremented by the size of the argument. The argument has now been allocated.
172
172
@@ -207,7 +207,7 @@ Floating-point values are returned in s0, d0, or v0, as appropriate.
207
207
A type is considered to be an HFA or HVA if all of the following hold:
208
208
209
209
- It's non-empty,
210
-
- It doesn't have any non-trivial default or copy constructors, destructors, or assignment operators,
210
+
- It doesn't have any nontrivial default or copy constructors, destructors, or assignment operators,
211
211
- All of its members have the same HFA or HVA type, or are float, double, or neon types that match the other members' HFA or HVA types.
212
212
213
213
HVA values with four or fewer elements are returned in s0-s3, d0-d3, or v0-v3, as appropriate.
@@ -218,20 +218,20 @@ Types returned by value are handled differently depending on whether they have c
218
218
- they have a trivial copy-assignment operator, and
219
219
- they have a trivial destructor,
220
220
221
-
and are returned by non-member functions or static member functions, use the following return style:
221
+
and are returned by nonmember functions or static member functions, use the following return style:
222
222
223
223
- Types that are HFAs with four or fewer elements are returned in s0-s3, d0-d3, or v0-v3, as appropriate.
224
224
- Types less than or equal to 8 bytes are returned in x0.
225
225
- Types less than or equal to 16 bytes are returned in x0 and x1, with x0 containing the lower-order 8 bytes.
226
-
- For other aggregate types, the caller shall reserve a block of memory of sufficient size and alignment to hold the result. The address of the memory block shall be passed as an additional argument to the function in x8. The callee may modify the result memory block at any point during the execution of the subroutine. The callee isn't required to preserve the value stored in x8.
226
+
- For other aggregate types, the caller shall reserve a block of memory of sufficient size and alignment to hold the result. The address of the memory block shall be passed as another argument to the function in x8. The callee may modify the result memory block at any point during the execution of the subroutine. The callee isn't required to preserve the value stored in x8.
227
227
228
228
All other types use this convention:
229
229
230
230
- The caller shall reserve a block of memory of sufficient size and alignment to hold the result. The address of the memory block shall be passed as an additional argument to the function in x0, or x1 if $this is passed in x0. The callee may modify the result memory block at any point during the execution of the subroutine. The callee returns the address of the memory block in x0.
231
231
232
232
## Stack
233
233
234
-
Following the ABI put forth by ARM, the stack must remain 16-byte aligned at all times. AArch64 contains a hardware feature that generates stack alignment faults whenever the SP isn't 16-byte aligned and an SP-relative load or store is done. Windows runs with this feature enabled at all times.
234
+
Following the ABI put forth by ARM, the stack must always remain 16-byte aligned. AArch64 contains a hardware feature that generates stack alignment faults whenever the SP isn't 16-byte aligned and an SP-relative load or store is done. Windows always runs with this feature enabled.
235
235
236
236
Functions that allocate 4k or more worth of stack must ensure that each page prior to the final page is touched in order. This action ensures no code can "leap over" the guard pages that Windows uses to expand the stack. Typically the touching is done by the `__chkstk` helper, which has a custom calling convention that passes the total stack allocation divided by 16 in x15.
237
237
@@ -249,7 +249,7 @@ Code within Windows is compiled with frame pointers enabled ([/Oy-](reference/oy
249
249
250
250
## Exception unwinding
251
251
252
-
Unwinding during exception handling is assisted through the use of unwind codes. The unwind codes are a sequence of bytes stored in the .xdata section of the executable. They describe the operation of the prologue and epilogue in an abstract manner, such that the effects of a function's prologue can be undone in preparation for backing up to the caller's stack frame. For more information on the unwind codes, see [ARM64 exception handling](arm64-exception-handling.md).
252
+
Unwinding during exception handling is assisted by using unwind codes. The unwind codes are a sequence of bytes stored in the .xdata section of the executable. They describe the operation of the prologue and epilogue in an abstract manner, such that the effects of a function's prologue can be undone in preparation for backing up to the caller's stack frame. For more information on the unwind codes, see [ARM64 exception handling](arm64-exception-handling.md).
253
253
254
254
The ARM EABI also specifies an exception unwinding model that uses unwind codes. However, the specification as presented is insufficient for unwinding in Windows, which must handle cases where the PC is in the middle of a function prologue or epilogue.
255
255
@@ -259,7 +259,7 @@ Code that is dynamically generated should be described with dynamic function tab
259
259
260
260
All ARMv8 CPUs are required to support a cycle counter register, a 64-bit register that Windows configures to be readable at any exception level, including user mode. It can be accessed via the special PMCCNTR_EL0 register, using the MSR opcode in assembly code, or the `_ReadStatusReg` intrinsic in C/C++ code.
261
261
262
-
The cycle counter here is a true cycle counter, not a wall clock. The counting frequency will vary with the processor frequency. If you feel you must know the frequency of the cycle counter, you shouldn't be using the cycle counter. Instead, you want to measure wall clock time, for which you should use `QueryPerformanceCounter`.
262
+
The cycle counter here is a true cycle counter, not a wall clock. The counting frequency varies with the processor frequency. If you feel you must know the frequency of the cycle counter, you shouldn't be using the cycle counter. Instead, you want to measure wall clock time, for which you should use `QueryPerformanceCounter`.
0 commit comments