Skip to content

Commit 9b074bb

Browse files
committed
Merge branches 'for-next/sysreg', 'for-next/compat-hwcap' and 'for-next/sme2' into for-next/sysreg-hwcaps
Patches on this branch depend on the branches merged above.
3 parents 1abf363 + 4f2c9bf + b2ab432 commit 9b074bb

40 files changed

+1589
-80
lines changed

Documentation/arm64/booting.rst

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -369,6 +369,16 @@ Before jumping into the kernel, the following conditions must be met:
369369

370370
- HCR_EL2.ATA (bit 56) must be initialised to 0b1.
371371

372+
For CPUs with the Scalable Matrix Extension version 2 (FEAT_SME2):
373+
374+
- If EL3 is present:
375+
376+
- SMCR_EL3.EZT0 (bit 30) must be initialised to 0b1.
377+
378+
- If the kernel is entered at EL1 and EL2 is present:
379+
380+
- SMCR_EL2.EZT0 (bit 30) must be initialised to 0b1.
381+
372382
The requirements described above for CPU mode, caches, MMUs, architected
373383
timers, coherency and system registers apply to all CPUs. All CPUs must
374384
enter the kernel in the same exception level. Where the values documented

Documentation/arm64/elf_hwcaps.rst

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -284,6 +284,24 @@ HWCAP2_RPRFM
284284
HWCAP2_SVE2P1
285285
Functionality implied by ID_AA64ZFR0_EL1.SVEver == 0b0010.
286286

287+
HWCAP2_SME2
288+
Functionality implied by ID_AA64SMFR0_EL1.SMEver == 0b0001.
289+
290+
HWCAP2_SME2P1
291+
Functionality implied by ID_AA64SMFR0_EL1.SMEver == 0b0010.
292+
293+
HWCAP2_SMEI16I32
294+
Functionality implied by ID_AA64SMFR0_EL1.I16I32 == 0b0101
295+
296+
HWCAP2_SMEBI32I32
297+
Functionality implied by ID_AA64SMFR0_EL1.BI32I32 == 0b1
298+
299+
HWCAP2_SMEB16B16
300+
Functionality implied by ID_AA64SMFR0_EL1.B16B16 == 0b1
301+
302+
HWCAP2_SMEF16F16
303+
Functionality implied by ID_AA64SMFR0_EL1.F16F16 == 0b1
304+
287305
4. Unused AT_HWCAP bits
288306
-----------------------
289307

Documentation/arm64/sme.rst

Lines changed: 43 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -18,14 +18,19 @@ model features for SME is included in Appendix A.
1818
1. General
1919
-----------
2020

21-
* PSTATE.SM, PSTATE.ZA, the streaming mode vector length, the ZA
22-
register state and TPIDR2_EL0 are tracked per thread.
21+
* PSTATE.SM, PSTATE.ZA, the streaming mode vector length, the ZA and (when
22+
present) ZTn register state and TPIDR2_EL0 are tracked per thread.
2323

2424
* The presence of SME is reported to userspace via HWCAP2_SME in the aux vector
2525
AT_HWCAP2 entry. Presence of this flag implies the presence of the SME
2626
instructions and registers, and the Linux-specific system interfaces
2727
described in this document. SME is reported in /proc/cpuinfo as "sme".
2828

29+
* The presence of SME2 is reported to userspace via HWCAP2_SME2 in the
30+
aux vector AT_HWCAP2 entry. Presence of this flag implies the presence of
31+
the SME2 instructions and ZT0, and the Linux-specific system interfaces
32+
described in this document. SME2 is reported in /proc/cpuinfo as "sme2".
33+
2934
* Support for the execution of SME instructions in userspace can also be
3035
detected by reading the CPU ID register ID_AA64PFR1_EL1 using an MRS
3136
instruction, and checking that the value of the SME field is nonzero. [3]
@@ -44,6 +49,7 @@ model features for SME is included in Appendix A.
4449
HWCAP2_SME_B16F32
4550
HWCAP2_SME_F32F32
4651
HWCAP2_SME_FA64
52+
HWCAP2_SME2
4753

4854
This list may be extended over time as the SME architecture evolves.
4955

@@ -52,8 +58,8 @@ model features for SME is included in Appendix A.
5258
cpu-feature-registers.txt for details.
5359

5460
* Debuggers should restrict themselves to interacting with the target via the
55-
NT_ARM_SVE, NT_ARM_SSVE and NT_ARM_ZA regsets. The recommended way
56-
of detecting support for these regsets is to connect to a target process
61+
NT_ARM_SVE, NT_ARM_SSVE, NT_ARM_ZA and NT_ARM_ZT regsets. The recommended
62+
way of detecting support for these regsets is to connect to a target process
5763
first and then attempt a
5864

5965
ptrace(PTRACE_GETREGSET, pid, NT_ARM_<regset>, &iov).
@@ -89,13 +95,13 @@ be zeroed.
8995
-------------------------
9096

9197
* On syscall PSTATE.ZA is preserved, if PSTATE.ZA==1 then the contents of the
92-
ZA matrix are preserved.
98+
ZA matrix and ZTn (if present) are preserved.
9399

94100
* On syscall PSTATE.SM will be cleared and the SVE registers will be handled
95101
as per the standard SVE ABI.
96102

97-
* Neither the SVE registers nor ZA are used to pass arguments to or receive
98-
results from any syscall.
103+
* None of the SVE registers, ZA or ZTn are used to pass arguments to
104+
or receive results from any syscall.
99105

100106
* On process creation (eg, clone()) the newly created process will have
101107
PSTATE.SM cleared.
@@ -134,6 +140,14 @@ be zeroed.
134140
__reserved[] referencing this space. za_context is then written in the
135141
extra space. Refer to [1] for further details about this mechanism.
136142

143+
* If ZTn is supported and PSTATE.ZA==1 then a signal frame record for ZTn will
144+
be generated.
145+
146+
* The signal record for ZTn has magic ZT_MAGIC (0x5a544e01) and consists of a
147+
standard signal frame header followed by a struct zt_context specifying
148+
the number of ZTn registers supported by the system, then zt_context.nregs
149+
blocks of 64 bytes of data per register.
150+
137151

138152
5. Signal return
139153
-----------------
@@ -151,6 +165,9 @@ When returning from a signal handler:
151165
the signal frame does not match the current vector length, the signal return
152166
attempt is treated as illegal, resulting in a forced SIGSEGV.
153167

168+
* If ZTn is not supported or PSTATE.ZA==0 then it is illegal to have a
169+
signal frame record for ZTn, resulting in a forced SIGSEGV.
170+
154171

155172
6. prctl extensions
156173
--------------------
@@ -214,8 +231,8 @@ prctl(PR_SME_SET_VL, unsigned long arg)
214231
vector length that will be applied at the next execve() by the calling
215232
thread.
216233

217-
* Changing the vector length causes all of ZA, P0..P15, FFR and all bits of
218-
Z0..Z31 except for Z0 bits [127:0] .. Z31 bits [127:0] to become
234+
* Changing the vector length causes all of ZA, ZTn, P0..P15, FFR and all
235+
bits of Z0..Z31 except for Z0 bits [127:0] .. Z31 bits [127:0] to become
219236
unspecified, including both streaming and non-streaming SVE state.
220237
Calling PR_SME_SET_VL with vl equal to the thread's current vector
221238
length, or calling PR_SME_SET_VL with the PR_SVE_SET_VL_ONEXEC flag,
@@ -317,6 +334,15 @@ The regset data starts with struct user_za_header, containing:
317334

318335
* The effect of writing a partial, incomplete payload is unspecified.
319336

337+
* A new regset NT_ARM_ZT is defined for access to ZTn state via
338+
PTRACE_GETREGSET and PTRACE_SETREGSET.
339+
340+
* The NT_ARM_ZT regset consists of a single 512 bit register.
341+
342+
* When PSTATE.ZA==0 reads of NT_ARM_ZT will report all bits of ZTn as 0.
343+
344+
* Writes to NT_ARM_ZT will set PSTATE.ZA to 1.
345+
320346

321347
8. ELF coredump extensions
322348
---------------------------
@@ -331,6 +357,11 @@ The regset data starts with struct user_za_header, containing:
331357
been read if a PTRACE_GETREGSET of NT_ARM_ZA were executed for each thread
332358
when the coredump was generated.
333359

360+
* A NT_ARM_ZT note will be added to each coredump for each thread of the
361+
dumped process. The contents will be equivalent to the data that would have
362+
been read if a PTRACE_GETREGSET of NT_ARM_ZT were executed for each thread
363+
when the coredump was generated.
364+
334365
* The NT_ARM_TLS note will be extended to two registers, the second register
335366
will contain TPIDR2_EL0 on systems that support SME and will be read as
336367
zero with writes ignored otherwise.
@@ -406,6 +437,9 @@ In A64 state, SME adds the following:
406437
For best system performance it is strongly encouraged for software to enable
407438
ZA only when it is actively being used.
408439

440+
* A new ZT0 register is introduced when SME2 is present. This is a 512 bit
441+
register which is accessible when PSTATE.ZA is set, as ZA itself is.
442+
409443
* Two new 1 bit fields in PSTATE which may be controlled via the SMSTART and
410444
SMSTOP instructions or by access to the SVCR system register:
411445

arch/arm64/include/asm/cpufeature.h

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -769,6 +769,12 @@ static __always_inline bool system_supports_sme(void)
769769
cpus_have_const_cap(ARM64_SME);
770770
}
771771

772+
static __always_inline bool system_supports_sme2(void)
773+
{
774+
return IS_ENABLED(CONFIG_ARM64_SME) &&
775+
cpus_have_const_cap(ARM64_SME2);
776+
}
777+
772778
static __always_inline bool system_supports_fa64(void)
773779
{
774780
return IS_ENABLED(CONFIG_ARM64_SME) &&

arch/arm64/include/asm/esr.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -341,6 +341,7 @@
341341
#define ESR_ELx_SME_ISS_ILL 1
342342
#define ESR_ELx_SME_ISS_SM_DISABLED 2
343343
#define ESR_ELx_SME_ISS_ZA_DISABLED 3
344+
#define ESR_ELx_SME_ISS_ZT_DISABLED 4
344345

345346
#ifndef __ASSEMBLY__
346347
#include <asm/types.h>

arch/arm64/include/asm/fpsimd.h

Lines changed: 22 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ extern void fpsimd_kvm_prepare(void);
6161
struct cpu_fp_state {
6262
struct user_fpsimd_state *st;
6363
void *sve_state;
64-
void *za_state;
64+
void *sme_state;
6565
u64 *svcr;
6666
unsigned int sve_vl;
6767
unsigned int sme_vl;
@@ -105,19 +105,27 @@ static inline void *sve_pffr(struct thread_struct *thread)
105105
return (char *)thread->sve_state + sve_ffr_offset(vl);
106106
}
107107

108+
static inline void *thread_zt_state(struct thread_struct *thread)
109+
{
110+
/* The ZT register state is stored immediately after the ZA state */
111+
unsigned int sme_vq = sve_vq_from_vl(thread_get_sme_vl(thread));
112+
return thread->sme_state + ZA_SIG_REGS_SIZE(sme_vq);
113+
}
114+
108115
extern void sve_save_state(void *state, u32 *pfpsr, int save_ffr);
109116
extern void sve_load_state(void const *state, u32 const *pfpsr,
110117
int restore_ffr);
111118
extern void sve_flush_live(bool flush_ffr, unsigned long vq_minus_1);
112119
extern unsigned int sve_get_vl(void);
113120
extern void sve_set_vq(unsigned long vq_minus_1);
114121
extern void sme_set_vq(unsigned long vq_minus_1);
115-
extern void za_save_state(void *state);
116-
extern void za_load_state(void const *state);
122+
extern void sme_save_state(void *state, int zt);
123+
extern void sme_load_state(void const *state, int zt);
117124

118125
struct arm64_cpu_capabilities;
119126
extern void sve_kernel_enable(const struct arm64_cpu_capabilities *__unused);
120127
extern void sme_kernel_enable(const struct arm64_cpu_capabilities *__unused);
128+
extern void sme2_kernel_enable(const struct arm64_cpu_capabilities *__unused);
121129
extern void fa64_kernel_enable(const struct arm64_cpu_capabilities *__unused);
122130

123131
extern u64 read_zcr_features(void);
@@ -355,14 +363,20 @@ extern int sme_get_current_vl(void);
355363

356364
/*
357365
* Return how many bytes of memory are required to store the full SME
358-
* specific state (currently just ZA) for task, given task's currently
359-
* configured vector length.
366+
* specific state for task, given task's currently configured vector
367+
* length.
360368
*/
361-
static inline size_t za_state_size(struct task_struct const *task)
369+
static inline size_t sme_state_size(struct task_struct const *task)
362370
{
363371
unsigned int vl = task_get_sme_vl(task);
372+
size_t size;
373+
374+
size = ZA_SIG_REGS_SIZE(sve_vq_from_vl(vl));
375+
376+
if (system_supports_sme2())
377+
size += ZT_SIG_REG_SIZE;
364378

365-
return ZA_SIG_REGS_SIZE(sve_vq_from_vl(vl));
379+
return size;
366380
}
367381

368382
#else
@@ -382,7 +396,7 @@ static inline int sme_max_virtualisable_vl(void) { return 0; }
382396
static inline int sme_set_current_vl(unsigned long arg) { return -EINVAL; }
383397
static inline int sme_get_current_vl(void) { return -EINVAL; }
384398

385-
static inline size_t za_state_size(struct task_struct const *task)
399+
static inline size_t sme_state_size(struct task_struct const *task)
386400
{
387401
return 0;
388402
}

arch/arm64/include/asm/fpsimdmacros.h

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -220,6 +220,28 @@
220220
| ((\offset) & 7)
221221
.endm
222222

223+
/*
224+
* LDR (ZT0)
225+
*
226+
* LDR ZT0, nx
227+
*/
228+
.macro _ldr_zt nx
229+
_check_general_reg \nx
230+
.inst 0xe11f8000 \
231+
| (\nx << 5)
232+
.endm
233+
234+
/*
235+
* STR (ZT0)
236+
*
237+
* STR ZT0, nx
238+
*/
239+
.macro _str_zt nx
240+
_check_general_reg \nx
241+
.inst 0xe13f8000 \
242+
| (\nx << 5)
243+
.endm
244+
223245
/*
224246
* Zero the entire ZA array
225247
* ZERO ZA

arch/arm64/include/asm/hwcap.h

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,12 +31,20 @@
3131
#define COMPAT_HWCAP_VFPD32 (1 << 19)
3232
#define COMPAT_HWCAP_LPAE (1 << 20)
3333
#define COMPAT_HWCAP_EVTSTRM (1 << 21)
34+
#define COMPAT_HWCAP_FPHP (1 << 22)
35+
#define COMPAT_HWCAP_ASIMDHP (1 << 23)
36+
#define COMPAT_HWCAP_ASIMDDP (1 << 24)
37+
#define COMPAT_HWCAP_ASIMDFHM (1 << 25)
38+
#define COMPAT_HWCAP_ASIMDBF16 (1 << 26)
39+
#define COMPAT_HWCAP_I8MM (1 << 27)
3440

3541
#define COMPAT_HWCAP2_AES (1 << 0)
3642
#define COMPAT_HWCAP2_PMULL (1 << 1)
3743
#define COMPAT_HWCAP2_SHA1 (1 << 2)
3844
#define COMPAT_HWCAP2_SHA2 (1 << 3)
3945
#define COMPAT_HWCAP2_CRC32 (1 << 4)
46+
#define COMPAT_HWCAP2_SB (1 << 5)
47+
#define COMPAT_HWCAP2_SSBS (1 << 6)
4048

4149
#ifndef __ASSEMBLY__
4250
#include <linux/log2.h>
@@ -123,6 +131,12 @@
123131
#define KERNEL_HWCAP_CSSC __khwcap2_feature(CSSC)
124132
#define KERNEL_HWCAP_RPRFM __khwcap2_feature(RPRFM)
125133
#define KERNEL_HWCAP_SVE2P1 __khwcap2_feature(SVE2P1)
134+
#define KERNEL_HWCAP_SME2 __khwcap2_feature(SME2)
135+
#define KERNEL_HWCAP_SME2P1 __khwcap2_feature(SME2P1)
136+
#define KERNEL_HWCAP_SME_I16I32 __khwcap2_feature(SME_I16I32)
137+
#define KERNEL_HWCAP_SME_BI32I32 __khwcap2_feature(SME_BI32I32)
138+
#define KERNEL_HWCAP_SME_B16B16 __khwcap2_feature(SME_B16B16)
139+
#define KERNEL_HWCAP_SME_F16F16 __khwcap2_feature(SME_F16F16)
126140

127141
/*
128142
* This yields a mask that user programs can use to figure out what

arch/arm64/include/asm/processor.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -161,7 +161,7 @@ struct thread_struct {
161161
enum fp_type fp_type; /* registers FPSIMD or SVE? */
162162
unsigned int fpsimd_cpu;
163163
void *sve_state; /* SVE registers, if any */
164-
void *za_state; /* ZA register, if any */
164+
void *sme_state; /* ZA and ZT state, if any */
165165
unsigned int vl[ARM64_VEC_MAX]; /* vector length */
166166
unsigned int vl_onexec[ARM64_VEC_MAX]; /* vl after next exec */
167167
unsigned long fault_address; /* fault info */

arch/arm64/include/uapi/asm/hwcap.h

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -96,5 +96,11 @@
9696
#define HWCAP2_CSSC (1UL << 34)
9797
#define HWCAP2_RPRFM (1UL << 35)
9898
#define HWCAP2_SVE2P1 (1UL << 36)
99+
#define HWCAP2_SME2 (1UL << 37)
100+
#define HWCAP2_SME2P1 (1UL << 38)
101+
#define HWCAP2_SME_I16I32 (1UL << 39)
102+
#define HWCAP2_SME_BI32I32 (1UL << 40)
103+
#define HWCAP2_SME_B16B16 (1UL << 41)
104+
#define HWCAP2_SME_F16F16 (1UL << 42)
99105

100106
#endif /* _UAPI__ASM_HWCAP_H */

0 commit comments

Comments
 (0)