Skip to content

Commit bfed6ef

Browse files
committed
Merge tag 'x86_sgx_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 SGX updates from Borislav Petkov: - Add support for handling hw errors in SGX pages: poisoning, recovering from poison memory and error injection into SGX pages - A bunch of changes to the SGX selftests to simplify and allow of SGX features testing without the need of a whole SGX software stack - Add a sysfs attribute which is supposed to show the amount of SGX memory in a NUMA node, similar to what /proc/meminfo is to normal memory - The usual bunch of fixes and cleanups too * tag 'x86_sgx_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) x86/sgx: Fix NULL pointer dereference on non-SGX systems selftests/sgx: Fix corrupted cpuid macro invocation x86/sgx: Add an attribute for the amount of SGX memory in a NUMA node x86/sgx: Fix minor documentation issues selftests/sgx: Add test for multiple TCS entry selftests/sgx: Enable multiple thread support selftests/sgx: Add page permission and exception test selftests/sgx: Rename test properties in preparation for more enclave tests selftests/sgx: Provide per-op parameter structs for the test enclave selftests/sgx: Add a new kselftest: Unclobbered_vdso_oversubscribed selftests/sgx: Move setup_test_encl() to each TEST_F() selftests/sgx: Encpsulate the test enclave creation selftests/sgx: Dump segments and /proc/self/maps only on failure selftests/sgx: Create a heap for the test enclave selftests/sgx: Make data measurement for an enclave segment optional selftests/sgx: Assign source for each segment selftests/sgx: Fix a benign linker warning x86/sgx: Add check for SGX pages to ghes_do_memory_failure() x86/sgx: Add hook to error injection address validation x86/sgx: Hook arch_memory_failure() into mainline code ...
2 parents d3c20bf + 2056e29 commit bfed6ef

File tree

23 files changed

+698
-103
lines changed

23 files changed

+698
-103
lines changed

Documentation/ABI/stable/sysfs-devices-node

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -176,3 +176,9 @@ Contact: Keith Busch <[email protected]>
176176
Description:
177177
The cache write policy: 0 for write-back, 1 for write-through,
178178
other or unknown.
179+
180+
What: /sys/devices/system/node/nodeX/x86/sgx_total_bytes
181+
Date: November 2021
182+
Contact: Jarkko Sakkinen <[email protected]>
183+
Description:
184+
The total amount of SGX physical memory in bytes.

Documentation/firmware-guide/acpi/apei/einj.rst

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -181,5 +181,24 @@ You should see something like this in dmesg::
181181
[22715.834759] EDAC sbridge MC3: PROCESSOR 0:306e7 TIME 1422553404 SOCKET 0 APIC 0
182182
[22716.616173] EDAC MC3: 1 CE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 slot:0 page:0x12345 offset:0x0 grain:32 syndrome:0x0 - area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0)
183183

184+
Special notes for injection into SGX enclaves:
185+
186+
There may be a separate BIOS setup option to enable SGX injection.
187+
188+
The injection process consists of setting some special memory controller
189+
trigger that will inject the error on the next write to the target
190+
address. But the h/w prevents any software outside of an SGX enclave
191+
from accessing enclave pages (even BIOS SMM mode).
192+
193+
The following sequence can be used:
194+
1) Determine physical address of enclave page
195+
2) Use "notrigger=1" mode to inject (this will setup
196+
the injection address, but will not actually inject)
197+
3) Enter the enclave
198+
4) Store data to the virtual address matching physical address from step 1
199+
5) Execute CLFLUSH for that virtual address
200+
6) Spin delay for 250ms
201+
7) Read from the virtual address. This will trigger the error
202+
184203
For more information about EINJ, please refer to ACPI specification
185204
version 4.0, section 17.5 and ACPI 5.0, section 18.6.

Documentation/x86/sgx.rst

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ Overview
1010
Software Guard eXtensions (SGX) hardware enables for user space applications
1111
to set aside private memory regions of code and data:
1212

13-
* Privileged (ring-0) ENCLS functions orchestrate the construction of the.
13+
* Privileged (ring-0) ENCLS functions orchestrate the construction of the
1414
regions.
1515
* Unprivileged (ring-3) ENCLU functions allow an application to enter and
1616
execute inside the regions.
@@ -91,7 +91,7 @@ In addition to the traditional compiler and linker build process, SGX has a
9191
separate enclave “build” process. Enclaves must be built before they can be
9292
executed (entered). The first step in building an enclave is opening the
9393
**/dev/sgx_enclave** device. Since enclave memory is protected from direct
94-
access, special privileged instructions are Then used to copy data into enclave
94+
access, special privileged instructions are then used to copy data into enclave
9595
pages and establish enclave page permissions.
9696

9797
.. kernel-doc:: arch/x86/kernel/cpu/sgx/ioctl.c
@@ -126,13 +126,13 @@ the need to juggle signal handlers.
126126
ksgxd
127127
=====
128128

129-
SGX support includes a kernel thread called *ksgxwapd*.
129+
SGX support includes a kernel thread called *ksgxd*.
130130

131131
EPC sanitization
132132
----------------
133133

134134
ksgxd is started when SGX initializes. Enclave memory is typically ready
135-
For use when the processor powers on or resets. However, if SGX has been in
135+
for use when the processor powers on or resets. However, if SGX has been in
136136
use since the reset, enclave pages may be in an inconsistent state. This might
137137
occur after a crash and kexec() cycle, for instance. At boot, ksgxd
138138
reinitializes all enclave pages so that they can be allocated and re-used.
@@ -147,7 +147,7 @@ Page reclaimer
147147

148148
Similar to the core kswapd, ksgxd, is responsible for managing the
149149
overcommitment of enclave memory. If the system runs out of enclave memory,
150-
*ksgxwapd* “swaps” enclave memory to normal memory.
150+
*ksgxd* “swaps” enclave memory to normal memory.
151151

152152
Launch Control
153153
==============
@@ -156,7 +156,7 @@ SGX provides a launch control mechanism. After all enclave pages have been
156156
copied, kernel executes EINIT function, which initializes the enclave. Only after
157157
this the CPU can execute inside the enclave.
158158

159-
ENIT function takes an RSA-3072 signature of the enclave measurement. The function
159+
EINIT function takes an RSA-3072 signature of the enclave measurement. The function
160160
checks that the measurement is correct and signature is signed with the key
161161
hashed to the four **IA32_SGXLEPUBKEYHASH{0, 1, 2, 3}** MSRs representing the
162162
SHA256 of a public key.
@@ -184,7 +184,7 @@ CPUs starting from Icelake use Total Memory Encryption (TME) in the place of
184184
MEE. TME-based SGX implementations do not have an integrity Merkle tree, which
185185
means integrity and replay-attacks are not mitigated. B, it includes
186186
additional changes to prevent cipher text from being returned and SW memory
187-
aliases from being Created.
187+
aliases from being created.
188188

189189
DMA to enclave memory is blocked by range registers on both MEE and TME systems
190190
(SDM section 41.10).

arch/Kconfig

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1312,6 +1312,10 @@ config ARCH_HAS_PARANOID_L1D_FLUSH
13121312
config DYNAMIC_SIGFRAME
13131313
bool
13141314

1315+
# Select, if arch has a named attribute group bound to NUMA device nodes.
1316+
config HAVE_ARCH_NODE_DEV_GROUP
1317+
bool
1318+
13151319
source "kernel/gcov/Kconfig"
13161320

13171321
source "scripts/gcc-plugins/Kconfig"

arch/x86/Kconfig

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -269,6 +269,7 @@ config X86
269269
select HAVE_ARCH_KCSAN if X86_64
270270
select X86_FEATURE_NAMES if PROC_FS
271271
select PROC_PID_ARCH_STATUS if PROC_FS
272+
select HAVE_ARCH_NODE_DEV_GROUP if X86_SGX
272273
imply IMA_SECURE_AND_OR_TRUSTED_BOOT if EFI
273274

274275
config INSTRUCTION_DECODER
@@ -1921,6 +1922,7 @@ config X86_SGX
19211922
select SRCU
19221923
select MMU_NOTIFIER
19231924
select NUMA_KEEP_MEMINFO if NUMA
1925+
select XARRAY_MULTI
19241926
help
19251927
Intel(R) Software Guard eXtensions (SGX) is a set of CPU instructions
19261928
that can be used by applications to set aside private regions of code

arch/x86/include/asm/processor.h

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -855,4 +855,12 @@ enum mds_mitigations {
855855
MDS_MITIGATION_VMWERV,
856856
};
857857

858+
#ifdef CONFIG_X86_SGX
859+
int arch_memory_failure(unsigned long pfn, int flags);
860+
#define arch_memory_failure arch_memory_failure
861+
862+
bool arch_is_platform_page(u64 paddr);
863+
#define arch_is_platform_page arch_is_platform_page
864+
#endif
865+
858866
#endif /* _ASM_X86_PROCESSOR_H */

arch/x86/include/asm/set_memory.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
#ifndef _ASM_X86_SET_MEMORY_H
33
#define _ASM_X86_SET_MEMORY_H
44

5+
#include <linux/mm.h>
56
#include <asm/page.h>
67
#include <asm-generic/set_memory.h>
78

@@ -99,6 +100,9 @@ static inline int set_mce_nospec(unsigned long pfn, bool unmap)
99100
unsigned long decoy_addr;
100101
int rc;
101102

103+
/* SGX pages are not in the 1:1 map */
104+
if (arch_is_platform_page(pfn << PAGE_SHIFT))
105+
return 0;
102106
/*
103107
* We would like to just call:
104108
* set_memory_XX((unsigned long)pfn_to_kaddr(pfn), 1);

arch/x86/kernel/cpu/sgx/main.c

Lines changed: 161 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,13 @@
66
#include <linux/highmem.h>
77
#include <linux/kthread.h>
88
#include <linux/miscdevice.h>
9+
#include <linux/node.h>
910
#include <linux/pagemap.h>
1011
#include <linux/ratelimit.h>
1112
#include <linux/sched/mm.h>
1213
#include <linux/sched/signal.h>
1314
#include <linux/slab.h>
15+
#include <linux/sysfs.h>
1416
#include <asm/sgx.h>
1517
#include "driver.h"
1618
#include "encl.h"
@@ -20,6 +22,7 @@ struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS];
2022
static int sgx_nr_epc_sections;
2123
static struct task_struct *ksgxd_tsk;
2224
static DECLARE_WAIT_QUEUE_HEAD(ksgxd_waitq);
25+
static DEFINE_XARRAY(sgx_epc_address_space);
2326

2427
/*
2528
* These variables are part of the state of the reclaimer, and must be accessed
@@ -60,6 +63,24 @@ static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
6063

6164
page = list_first_entry(dirty_page_list, struct sgx_epc_page, list);
6265

66+
/*
67+
* Checking page->poison without holding the node->lock
68+
* is racy, but losing the race (i.e. poison is set just
69+
* after the check) just means __eremove() will be uselessly
70+
* called for a page that sgx_free_epc_page() will put onto
71+
* the node->sgx_poison_page_list later.
72+
*/
73+
if (page->poison) {
74+
struct sgx_epc_section *section = &sgx_epc_sections[page->section];
75+
struct sgx_numa_node *node = section->node;
76+
77+
spin_lock(&node->lock);
78+
list_move(&page->list, &node->sgx_poison_page_list);
79+
spin_unlock(&node->lock);
80+
81+
continue;
82+
}
83+
6384
ret = __eremove(sgx_get_epc_virt_addr(page));
6485
if (!ret) {
6586
/*
@@ -471,6 +492,7 @@ static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(int nid)
471492

472493
page = list_first_entry(&node->free_page_list, struct sgx_epc_page, list);
473494
list_del_init(&page->list);
495+
page->flags = 0;
474496

475497
spin_unlock(&node->lock);
476498
atomic_long_dec(&sgx_nr_free_pages);
@@ -624,7 +646,12 @@ void sgx_free_epc_page(struct sgx_epc_page *page)
624646

625647
spin_lock(&node->lock);
626648

627-
list_add_tail(&page->list, &node->free_page_list);
649+
page->owner = NULL;
650+
if (page->poison)
651+
list_add(&page->list, &node->sgx_poison_page_list);
652+
else
653+
list_add_tail(&page->list, &node->free_page_list);
654+
page->flags = SGX_EPC_PAGE_IS_FREE;
628655

629656
spin_unlock(&node->lock);
630657
atomic_long_inc(&sgx_nr_free_pages);
@@ -648,17 +675,102 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
648675
}
649676

650677
section->phys_addr = phys_addr;
678+
xa_store_range(&sgx_epc_address_space, section->phys_addr,
679+
phys_addr + size - 1, section, GFP_KERNEL);
651680

652681
for (i = 0; i < nr_pages; i++) {
653682
section->pages[i].section = index;
654683
section->pages[i].flags = 0;
655684
section->pages[i].owner = NULL;
685+
section->pages[i].poison = 0;
656686
list_add_tail(&section->pages[i].list, &sgx_dirty_page_list);
657687
}
658688

659689
return true;
660690
}
661691

692+
bool arch_is_platform_page(u64 paddr)
693+
{
694+
return !!xa_load(&sgx_epc_address_space, paddr);
695+
}
696+
EXPORT_SYMBOL_GPL(arch_is_platform_page);
697+
698+
static struct sgx_epc_page *sgx_paddr_to_page(u64 paddr)
699+
{
700+
struct sgx_epc_section *section;
701+
702+
section = xa_load(&sgx_epc_address_space, paddr);
703+
if (!section)
704+
return NULL;
705+
706+
return &section->pages[PFN_DOWN(paddr - section->phys_addr)];
707+
}
708+
709+
/*
710+
* Called in process context to handle a hardware reported
711+
* error in an SGX EPC page.
712+
* If the MF_ACTION_REQUIRED bit is set in flags, then the
713+
* context is the task that consumed the poison data. Otherwise
714+
* this is called from a kernel thread unrelated to the page.
715+
*/
716+
int arch_memory_failure(unsigned long pfn, int flags)
717+
{
718+
struct sgx_epc_page *page = sgx_paddr_to_page(pfn << PAGE_SHIFT);
719+
struct sgx_epc_section *section;
720+
struct sgx_numa_node *node;
721+
722+
/*
723+
* mm/memory-failure.c calls this routine for all errors
724+
* where there isn't a "struct page" for the address. But that
725+
* includes other address ranges besides SGX.
726+
*/
727+
if (!page)
728+
return -ENXIO;
729+
730+
/*
731+
* If poison was consumed synchronously. Send a SIGBUS to
732+
* the task. Hardware has already exited the SGX enclave and
733+
* will not allow re-entry to an enclave that has a memory
734+
* error. The signal may help the task understand why the
735+
* enclave is broken.
736+
*/
737+
if (flags & MF_ACTION_REQUIRED)
738+
force_sig(SIGBUS);
739+
740+
section = &sgx_epc_sections[page->section];
741+
node = section->node;
742+
743+
spin_lock(&node->lock);
744+
745+
/* Already poisoned? Nothing more to do */
746+
if (page->poison)
747+
goto out;
748+
749+
page->poison = 1;
750+
751+
/*
752+
* If the page is on a free list, move it to the per-node
753+
* poison page list.
754+
*/
755+
if (page->flags & SGX_EPC_PAGE_IS_FREE) {
756+
list_move(&page->list, &node->sgx_poison_page_list);
757+
goto out;
758+
}
759+
760+
/*
761+
* TBD: Add additional plumbing to enable pre-emptive
762+
* action for asynchronous poison notification. Until
763+
* then just hope that the poison:
764+
* a) is not accessed - sgx_free_epc_page() will deal with it
765+
* when the user gives it back
766+
* b) results in a recoverable machine check rather than
767+
* a fatal one
768+
*/
769+
out:
770+
spin_unlock(&node->lock);
771+
return 0;
772+
}
773+
662774
/**
663775
* A section metric is concatenated in a way that @low bits 12-31 define the
664776
* bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the
@@ -670,6 +782,48 @@ static inline u64 __init sgx_calc_section_metric(u64 low, u64 high)
670782
((high & GENMASK_ULL(19, 0)) << 32);
671783
}
672784

785+
#ifdef CONFIG_NUMA
786+
static ssize_t sgx_total_bytes_show(struct device *dev, struct device_attribute *attr, char *buf)
787+
{
788+
return sysfs_emit(buf, "%lu\n", sgx_numa_nodes[dev->id].size);
789+
}
790+
static DEVICE_ATTR_RO(sgx_total_bytes);
791+
792+
static umode_t arch_node_attr_is_visible(struct kobject *kobj,
793+
struct attribute *attr, int idx)
794+
{
795+
/* Make all x86/ attributes invisible when SGX is not initialized: */
796+
if (nodes_empty(sgx_numa_mask))
797+
return 0;
798+
799+
return attr->mode;
800+
}
801+
802+
static struct attribute *arch_node_dev_attrs[] = {
803+
&dev_attr_sgx_total_bytes.attr,
804+
NULL,
805+
};
806+
807+
const struct attribute_group arch_node_dev_group = {
808+
.name = "x86",
809+
.attrs = arch_node_dev_attrs,
810+
.is_visible = arch_node_attr_is_visible,
811+
};
812+
813+
static void __init arch_update_sysfs_visibility(int nid)
814+
{
815+
struct node *node = node_devices[nid];
816+
int ret;
817+
818+
ret = sysfs_update_group(&node->dev.kobj, &arch_node_dev_group);
819+
820+
if (ret)
821+
pr_err("sysfs update failed (%d), files may be invisible", ret);
822+
}
823+
#else /* !CONFIG_NUMA */
824+
static void __init arch_update_sysfs_visibility(int nid) {}
825+
#endif
826+
673827
static bool __init sgx_page_cache_init(void)
674828
{
675829
u32 eax, ebx, ecx, edx, type;
@@ -713,10 +867,16 @@ static bool __init sgx_page_cache_init(void)
713867
if (!node_isset(nid, sgx_numa_mask)) {
714868
spin_lock_init(&sgx_numa_nodes[nid].lock);
715869
INIT_LIST_HEAD(&sgx_numa_nodes[nid].free_page_list);
870+
INIT_LIST_HEAD(&sgx_numa_nodes[nid].sgx_poison_page_list);
716871
node_set(nid, sgx_numa_mask);
872+
sgx_numa_nodes[nid].size = 0;
873+
874+
/* Make SGX-specific node sysfs files visible: */
875+
arch_update_sysfs_visibility(nid);
717876
}
718877

719878
sgx_epc_sections[i].node = &sgx_numa_nodes[nid];
879+
sgx_numa_nodes[nid].size += size;
720880

721881
sgx_nr_epc_sections++;
722882
}

0 commit comments

Comments
 (0)