Skip to content

Commit 47af12b

Browse files
Aili Yaotorvalds
authored andcommitted
mm,hwpoison: return -EHWPOISON to denote that the page has already been poisoned
When memory_failure() is called with MF_ACTION_REQUIRED on the page that has already been hwpoisoned, memory_failure() could fail to send SIGBUS to the affected process, which results in infinite loop of MCEs. Currently memory_failure() returns 0 if it's called for already hwpoisoned page, then the caller, kill_me_maybe(), could return without sending SIGBUS to current process. An action required MCE is raised when the current process accesses to the broken memory, so no SIGBUS means that the current process continues to run and access to the error page again soon, so running into MCE loop. This issue can arise for example in the following scenarios: - Two or more threads access to the poisoned page concurrently. If local MCE is enabled, MCE handler independently handles the MCE events. So there's a race among MCE events, and the second or latter threads fall into the situation in question. - If there was a precedent memory error event and memory_failure() for the event failed to unmap the error page for some reason, the subsequent memory access to the error page triggers the MCE loop situation. To fix the issue, make memory_failure() return an error code when the error page has already been hwpoisoned. This allows memory error handler to control how it sends signals to userspace. And make sure that any process touching a hwpoisoned page should get a SIGBUS even in "already hwpoisoned" path of memory_failure() as is done in page fault path. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Aili Yao <[email protected]> Signed-off-by: Naoya Horiguchi <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Jue Wang <[email protected]> Cc: Tony Luck <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
1 parent 171936d commit 47af12b

File tree

1 file changed

+2
-1
lines changed

1 file changed

+2
-1
lines changed

mm/memory-failure.c

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1253,7 +1253,7 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
12531253
if (TestSetPageHWPoison(head)) {
12541254
pr_err("Memory failure: %#lx: already hardware poisoned\n",
12551255
pfn);
1256-
return 0;
1256+
return -EHWPOISON;
12571257
}
12581258

12591259
num_poisoned_pages_inc();
@@ -1461,6 +1461,7 @@ int memory_failure(unsigned long pfn, int flags)
14611461
if (TestSetPageHWPoison(p)) {
14621462
pr_err("Memory failure: %#lx: already hardware poisoned\n",
14631463
pfn);
1464+
res = -EHWPOISON;
14641465
goto unlock_mutex;
14651466
}
14661467

0 commit comments

Comments
 (0)