Skip to content

Commit 6d75c6f

Browse files
committed
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas: "The major features are support for LPA2 (52-bit VA/PA with 4K and 16K pages), the dpISA extension and Rust enabled on arm64. The changes are mostly contained within the usual arch/arm64/, drivers/perf, the arm64 Documentation and kselftests. The exception is the Rust support which touches some generic build files. Summary: - Reorganise the arm64 kernel VA space and add support for LPA2 (at stage 1, KVM stage 2 was merged earlier) - 52-bit VA/PA address range with 4KB and 16KB pages - Enable Rust on arm64 - Support for the 2023 dpISA extensions (data processing ISA), host only - arm64 perf updates: - StarFive's StarLink (integrates one or more CPU cores with a shared L3 memory system) PMU support - Enable HiSilicon Erratum 162700402 quirk for HIP09 - Several updates for the HiSilicon PCIe PMU driver - Arm CoreSight PMU support - Convert all drivers under drivers/perf/ to use .remove_new() - Miscellaneous: - Don't enable workarounds for "rare" errata by default - Clean up the DAIF flags handling for EL0 returns (in preparation for NMI support) - Kselftest update for ptrace() - Update some of the sysreg field definitions - Slight improvement in the code generation for inline asm I/O accessors to permit offset addressing - kretprobes: acquire regs via a BRK exception (previously done via a trampoline handler) - SVE/SME cleanups, comment updates - Allow CALL_OPS+CC_OPTIMIZE_FOR_SIZE with clang (previously disabled due to gcc silently ignoring -falign-functions=N)" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (134 commits) Revert "mm: add arch hook to validate mmap() prot flags" Revert "arm64: mm: add support for WXN memory translation attribute" Revert "ARM64: Dynamically allocate cpumasks and increase supported CPUs to 512" ARM64: Dynamically allocate cpumasks and increase supported CPUs to 512 kselftest/arm64: Add 2023 DPISA hwcap test coverage kselftest/arm64: Add basic FPMR test kselftest/arm64: Handle FPMR context in generic signal frame parser arm64/hwcap: Define hwcaps for 2023 DPISA features arm64/ptrace: Expose FPMR via ptrace arm64/signal: Add FPMR signal handling arm64/fpsimd: Support FEAT_FPMR arm64/fpsimd: Enable host kernel access to FPMR arm64/cpufeature: Hook new identification registers up to cpufeature docs: perf: Fix build warning of hisi-pcie-pmu.rst perf: starfive: Only allow COMPILE_TEST for 64-bit architectures MAINTAINERS: Add entry for StarFive StarLink PMU docs: perf: Add description for StarFive's StarLink PMU dt-bindings: perf: starfive: Add JH8100 StarLink PMU perf: starfive: Add StarLink PMU support docs: perf: Update usage for target filter of hisi-pcie-pmu ...
2 parents fe46a7d + 1ef21fc commit 6d75c6f

File tree

137 files changed

+5508
-1578
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

137 files changed

+5508
-1578
lines changed

Documentation/admin-guide/perf/hisi-pcie-pmu.rst

Lines changed: 24 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -37,9 +37,21 @@ Example usage of perf::
3737
hisi_pcie0_core0/rx_mwr_cnt/ [kernel PMU event]
3838
------------------------------------------
3939

40-
$# perf stat -e hisi_pcie0_core0/rx_mwr_latency/
41-
$# perf stat -e hisi_pcie0_core0/rx_mwr_cnt/
42-
$# perf stat -g -e hisi_pcie0_core0/rx_mwr_latency/ -e hisi_pcie0_core0/rx_mwr_cnt/
40+
$# perf stat -e hisi_pcie0_core0/rx_mwr_latency,port=0xffff/
41+
$# perf stat -e hisi_pcie0_core0/rx_mwr_cnt,port=0xffff/
42+
43+
The related events usually used to calculate the bandwidth, latency or others.
44+
They need to start and end counting at the same time, therefore related events
45+
are best used in the same event group to get the expected value. There are two
46+
ways to know if they are related events:
47+
48+
a) By event name, such as the latency events "xxx_latency, xxx_cnt" or
49+
bandwidth events "xxx_flux, xxx_time".
50+
b) By event type, such as "event=0xXXXX, event=0x1XXXX".
51+
52+
Example usage of perf group::
53+
54+
$# perf stat -e "{hisi_pcie0_core0/rx_mwr_latency,port=0xffff/,hisi_pcie0_core0/rx_mwr_cnt,port=0xffff/}"
4355

4456
The current driver does not support sampling. So "perf record" is unsupported.
4557
Also attach to a task is unsupported for PCIe PMU.
@@ -51,8 +63,12 @@ Filter options
5163

5264
PMU could only monitor the performance of traffic downstream target Root
5365
Ports or downstream target Endpoint. PCIe PMU driver support "port" and
54-
"bdf" interfaces for users, and these two interfaces aren't supported at the
55-
same time.
66+
"bdf" interfaces for users.
67+
Please notice that, one of these two interfaces must be set, and these two
68+
interfaces aren't supported at the same time. If they are both set, only
69+
"port" filter is valid.
70+
If "port" filter not being set or is set explicitly to zero (default), the
71+
"bdf" filter will be in effect, because "bdf=0" meaning 0000:000:00.0.
5672

5773
- port
5874

@@ -95,7 +111,7 @@ Filter options
95111

96112
Example usage of perf::
97113

98-
$# perf stat -e hisi_pcie0_core0/rx_mrd_flux,trig_len=0x4,trig_mode=1/ sleep 5
114+
$# perf stat -e hisi_pcie0_core0/rx_mrd_flux,port=0xffff,trig_len=0x4,trig_mode=1/ sleep 5
99115

100116
3. Threshold filter
101117

@@ -109,7 +125,7 @@ Filter options
109125

110126
Example usage of perf::
111127

112-
$# perf stat -e hisi_pcie0_core0/rx_mrd_flux,thr_len=0x4,thr_mode=1/ sleep 5
128+
$# perf stat -e hisi_pcie0_core0/rx_mrd_flux,port=0xffff,thr_len=0x4,thr_mode=1/ sleep 5
113129

114130
4. TLP Length filter
115131

@@ -127,4 +143,4 @@ Filter options
127143

128144
Example usage of perf::
129145

130-
$# perf stat -e hisi_pcie0_core0/rx_mrd_flux,len_mode=0x1/ sleep 5
146+
$# perf stat -e hisi_pcie0_core0/rx_mrd_flux,port=0xffff,len_mode=0x1/ sleep 5

Documentation/admin-guide/perf/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ Performance monitor support
1313
imx-ddr
1414
qcom_l2_pmu
1515
qcom_l3_pmu
16+
starfive_starlink_pmu
1617
arm-ccn
1718
arm-cmn
1819
xgene-pmu
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
================================================
2+
StarFive StarLink Performance Monitor Unit (PMU)
3+
================================================
4+
5+
StarFive StarLink Performance Monitor Unit (PMU) exists within the
6+
StarLink Coherent Network on Chip (CNoC) that connects multiple CPU
7+
clusters with an L3 memory system.
8+
9+
The uncore PMU supports overflow interrupt, up to 16 programmable 64bit
10+
event counters, and an independent 64bit cycle counter.
11+
The PMU can only be accessed via Memory Mapped I/O and are common to the
12+
cores connected to the same PMU.
13+
14+
Driver exposes supported PMU events in sysfs "events" directory under::
15+
16+
/sys/bus/event_source/devices/starfive_starlink_pmu/events/
17+
18+
Driver exposes cpu used to handle PMU events in sysfs "cpumask" directory
19+
under::
20+
21+
/sys/bus/event_source/devices/starfive_starlink_pmu/cpumask/
22+
23+
Driver describes the format of config (event ID) in sysfs "format" directory
24+
under::
25+
26+
/sys/bus/event_source/devices/starfive_starlink_pmu/format/
27+
28+
Example of perf usage::
29+
30+
$ perf list
31+
32+
starfive_starlink_pmu/cycles/ [Kernel PMU event]
33+
starfive_starlink_pmu/read_hit/ [Kernel PMU event]
34+
starfive_starlink_pmu/read_miss/ [Kernel PMU event]
35+
starfive_starlink_pmu/read_request/ [Kernel PMU event]
36+
starfive_starlink_pmu/release_request/ [Kernel PMU event]
37+
starfive_starlink_pmu/write_hit/ [Kernel PMU event]
38+
starfive_starlink_pmu/write_miss/ [Kernel PMU event]
39+
starfive_starlink_pmu/write_request/ [Kernel PMU event]
40+
starfive_starlink_pmu/writeback/ [Kernel PMU event]
41+
42+
43+
$ perf stat -a -e /starfive_starlink_pmu/cycles/ sleep 1
44+
45+
Sampling is not supported. As a result, "perf record" is not supported.
46+
Attaching to a task is not supported, only system-wide counting is supported.

Documentation/arch/arm64/elf_hwcaps.rst

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -317,6 +317,55 @@ HWCAP2_LRCPC3
317317
HWCAP2_LSE128
318318
Functionality implied by ID_AA64ISAR0_EL1.Atomic == 0b0011.
319319

320+
HWCAP2_FPMR
321+
Functionality implied by ID_AA64PFR2_EL1.FMR == 0b0001.
322+
323+
HWCAP2_LUT
324+
Functionality implied by ID_AA64ISAR2_EL1.LUT == 0b0001.
325+
326+
HWCAP2_FAMINMAX
327+
Functionality implied by ID_AA64ISAR3_EL1.FAMINMAX == 0b0001.
328+
329+
HWCAP2_F8CVT
330+
Functionality implied by ID_AA64FPFR0_EL1.F8CVT == 0b1.
331+
332+
HWCAP2_F8FMA
333+
Functionality implied by ID_AA64FPFR0_EL1.F8FMA == 0b1.
334+
335+
HWCAP2_F8DP4
336+
Functionality implied by ID_AA64FPFR0_EL1.F8DP4 == 0b1.
337+
338+
HWCAP2_F8DP2
339+
Functionality implied by ID_AA64FPFR0_EL1.F8DP2 == 0b1.
340+
341+
HWCAP2_F8E4M3
342+
Functionality implied by ID_AA64FPFR0_EL1.F8E4M3 == 0b1.
343+
344+
HWCAP2_F8E5M2
345+
Functionality implied by ID_AA64FPFR0_EL1.F8E5M2 == 0b1.
346+
347+
HWCAP2_SME_LUTV2
348+
Functionality implied by ID_AA64SMFR0_EL1.LUTv2 == 0b1.
349+
350+
HWCAP2_SME_F8F16
351+
Functionality implied by ID_AA64SMFR0_EL1.F8F16 == 0b1.
352+
353+
HWCAP2_SME_F8F32
354+
Functionality implied by ID_AA64SMFR0_EL1.F8F32 == 0b1.
355+
356+
HWCAP2_SME_SF8FMA
357+
Functionality implied by ID_AA64SMFR0_EL1.SF8FMA == 0b1.
358+
359+
HWCAP2_SME_SF8DP4
360+
Functionality implied by ID_AA64SMFR0_EL1.SF8DP4 == 0b1.
361+
362+
HWCAP2_SME_SF8DP2
363+
Functionality implied by ID_AA64SMFR0_EL1.SF8DP2 == 0b1.
364+
365+
HWCAP2_SME_SF8DP4
366+
Functionality implied by ID_AA64SMFR0_EL1.SF8DP4 == 0b1.
367+
368+
320369
4. Unused AT_HWCAP bits
321370
-----------------------
322371

Documentation/arch/arm64/silicon-errata.rst

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -35,8 +35,9 @@ can be triggered by Linux).
3535
For software workarounds that may adversely impact systems unaffected by
3636
the erratum in question, a Kconfig entry is added under "Kernel
3737
Features" -> "ARM errata workarounds via the alternatives framework".
38-
These are enabled by default and patched in at runtime when an affected
39-
CPU is detected. For less-intrusive workarounds, a Kconfig option is not
38+
With the exception of workarounds for errata deemed "rare" by Arm, these
39+
are enabled by default and patched in at runtime when an affected CPU is
40+
detected. For less-intrusive workarounds, a Kconfig option is not
4041
available and the code is structured (preferably with a comment) in such
4142
a way that the erratum will not be hit.
4243

Documentation/arch/arm64/sme.rst

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ model features for SME is included in Appendix A.
7575
2. Vector lengths
7676
------------------
7777

78-
SME defines a second vector length similar to the SVE vector length which is
78+
SME defines a second vector length similar to the SVE vector length which
7979
controls the size of the streaming mode SVE vectors and the ZA matrix array.
8080
The ZA matrix is square with each side having as many bytes as a streaming
8181
mode SVE vector.
@@ -238,12 +238,12 @@ prctl(PR_SME_SET_VL, unsigned long arg)
238238
bits of Z0..Z31 except for Z0 bits [127:0] .. Z31 bits [127:0] to become
239239
unspecified, including both streaming and non-streaming SVE state.
240240
Calling PR_SME_SET_VL with vl equal to the thread's current vector
241-
length, or calling PR_SME_SET_VL with the PR_SVE_SET_VL_ONEXEC flag,
241+
length, or calling PR_SME_SET_VL with the PR_SME_SET_VL_ONEXEC flag,
242242
does not constitute a change to the vector length for this purpose.
243243

244244
* Changing the vector length causes PSTATE.ZA and PSTATE.SM to be cleared.
245245
Calling PR_SME_SET_VL with vl equal to the thread's current vector
246-
length, or calling PR_SME_SET_VL with the PR_SVE_SET_VL_ONEXEC flag,
246+
length, or calling PR_SME_SET_VL with the PR_SME_SET_VL_ONEXEC flag,
247247
does not constitute a change to the vector length for this purpose.
248248

249249

@@ -379,9 +379,8 @@ The regset data starts with struct user_za_header, containing:
379379
/proc/sys/abi/sme_default_vector_length
380380

381381
Writing the text representation of an integer to this file sets the system
382-
default vector length to the specified value, unless the value is greater
383-
than the maximum vector length supported by the system in which case the
384-
default vector length is set to that maximum.
382+
default vector length to the specified value rounded to a supported value
383+
using the same rules as for setting vector length via PR_SME_SET_VL.
385384

386385
The result can be determined by reopening the file and reading its
387386
contents.

Documentation/arch/arm64/sve.rst

Lines changed: 2 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -117,11 +117,6 @@ the SVE instruction set architecture.
117117
* The SVE registers are not used to pass arguments to or receive results from
118118
any syscall.
119119

120-
* In practice the affected registers/bits will be preserved or will be replaced
121-
with zeros on return from a syscall, but userspace should not make
122-
assumptions about this. The kernel behaviour may vary on a case-by-case
123-
basis.
124-
125120
* All other SVE state of a thread, including the currently configured vector
126121
length, the state of the PR_SVE_VL_INHERIT flag, and the deferred vector
127122
length (if any), is preserved across all syscalls, subject to the specific
@@ -428,9 +423,8 @@ The regset data starts with struct user_sve_header, containing:
428423
/proc/sys/abi/sve_default_vector_length
429424

430425
Writing the text representation of an integer to this file sets the system
431-
default vector length to the specified value, unless the value is greater
432-
than the maximum vector length supported by the system in which case the
433-
default vector length is set to that maximum.
426+
default vector length to the specified value rounded to a supported value
427+
using the same rules as for setting vector length via PR_SVE_SET_VL.
434428

435429
The result can be determined by reopening the file and reading its
436430
contents.
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
2+
%YAML 1.2
3+
---
4+
$id: http://devicetree.org/schemas/perf/arm,coresight-pmu.yaml#
5+
$schema: http://devicetree.org/meta-schemas/core.yaml#
6+
7+
title: Arm Coresight Performance Monitoring Unit Architecture
8+
9+
maintainers:
10+
- Robin Murphy <[email protected]>
11+
12+
properties:
13+
compatible:
14+
const: arm,coresight-pmu
15+
16+
reg:
17+
items:
18+
- description: Register page 0
19+
- description: Register page 1, if the PMU implements the dual-page extension
20+
minItems: 1
21+
22+
interrupts:
23+
items:
24+
- description: Overflow interrupt
25+
26+
cpus:
27+
description: If the PMU is associated with a particular CPU or subset of CPUs,
28+
array of phandles to the appropriate CPU node(s)
29+
30+
reg-io-width:
31+
description: Granularity at which PMU register accesses are single-copy atomic
32+
default: 4
33+
enum: [4, 8]
34+
35+
required:
36+
- compatible
37+
- reg
38+
39+
additionalProperties: false
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
2+
%YAML 1.2
3+
---
4+
$id: http://devicetree.org/schemas/perf/starfive,jh8100-starlink-pmu.yaml#
5+
$schema: http://devicetree.org/meta-schemas/core.yaml#
6+
7+
title: StarFive JH8100 StarLink PMU
8+
9+
maintainers:
10+
- Ji Sheng Teoh <[email protected]>
11+
12+
description:
13+
StarFive's JH8100 StarLink PMU integrates one or more CPU cores with a
14+
shared L3 memory system. The PMU support overflow interrupt, up to
15+
16 programmable 64bit event counters, and an independent 64bit cycle
16+
counter. StarFive's JH8100 StarLink PMU is accessed via MMIO.
17+
18+
properties:
19+
compatible:
20+
const: starfive,jh8100-starlink-pmu
21+
22+
reg:
23+
maxItems: 1
24+
25+
interrupts:
26+
maxItems: 1
27+
28+
required:
29+
- compatible
30+
- reg
31+
- interrupts
32+
33+
additionalProperties: false
34+
35+
examples:
36+
- |
37+
soc {
38+
#address-cells = <2>;
39+
#size-cells = <2>;
40+
41+
pmu@12900000 {
42+
compatible = "starfive,jh8100-starlink-pmu";
43+
reg = <0x0 0x12900000 0x0 0x10000>;
44+
interrupts = <34>;
45+
};
46+
};

Documentation/rust/arch-support.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ support corresponds to ``S`` values in the ``MAINTAINERS`` file.
1515
============= ================ ==============================================
1616
Architecture Level of support Constraints
1717
============= ================ ==============================================
18+
``arm64`` Maintained Little Endian only.
1819
``loongarch`` Maintained -
1920
``um`` Maintained ``x86_64`` only.
2021
``x86`` Maintained ``x86_64`` only.

0 commit comments

Comments
 (0)