-
Notifications
You must be signed in to change notification settings - Fork 10
[SIG CLOUD 8] rebase custom changes to 4.18.0-553.40.1.el8_10 #163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SIG CLOUD 8] rebase custom changes to 4.18.0-553.40.1.el8_10 #163
Conversation
jira LE-1482 commit 415d832 upstream-diff - Conflicts around use of arch_atomic vs atomic - there are no arch_atomic* ops in Rocky 8.x These operations are documented as always ordered in include/asm-generic/bitops/instrumented-atomic.h, and producer-consumer type use cases where one side needs to ensure a flag is left pending after some shared data was updated rely on this ordering, even in the failure case. This is the case with the workqueue code, which currently suffers from a reproducible ordering violation on Apple M1 platforms (which are notoriously out-of-order) that ends up causing the TTY layer to fail to deliver data to userspace properly under the right conditions. This change fixes that bug. Change the documentation to restrict the "no order on failure" story to the _lock() variant (for which it makes sense), and remove the early-exit from the generic implementation, which is what causes the missing barrier semantics in that case. Without this, the remaining atomic op is fully ordered (including on ARM64 LSE, as of recent versions of the architecture spec). Suggested-by: Linus Torvalds <[email protected]> Cc: [email protected] Fixes: e986a0d ("locking/atomics, asm-generic/bitops/atomic.h: Rewrite using atomic_*() APIs") Fixes: 61e0239 ("locking/atomic/bitops: Document and clarify ordering semantics for failed test_and_{}_bit()") Signed-off-by: Hector Martin <[email protected]> Acked-by: Will Deacon <[email protected]> Reviewed-by: Arnd Bergmann <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> (cherry picked from commit 415d832) Signed-off-by: Greg Rose <[email protected]> Signed-off-by: Jonathan Maple <[email protected]>
…tead of a two-phase approach jira roc-2673 commit fbf6449 Instead of setting x86_virt_bits to a possibly-correct value and then correcting it later, do all the necessary checks before setting it. At this point, the #VC handler references boot_cpu_data.x86_virt_bits, and in the previous version, it would be triggered by the CPUIDs between the point at which it is set to 48 and when it is set to the correct value. Suggested-by: Dave Hansen <[email protected]> Signed-off-by: Adam Dunlap <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Tested-by: Jacob Xu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Ronnie Sahlberg <[email protected]> Signed-off-by: Jonathan Maple <[email protected]>
jira roc-2673 commit 3e32552 c->x86_cache_alignment is initialized from c->x86_clflush_size. However, commit fbf6449 moved c->x86_clflush_size initialization to later in boot without moving the c->x86_cache_alignment assignment: fbf6449 ("x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach") This presumably left c->x86_cache_alignment set to zero for longer than it should be. The result was an oops on 32-bit kernels while accessing a pointer at 0x20. The 0x20 came from accessing a structure member at offset 0x10 (buffer->cpumask) from a ZERO_SIZE_PTR=0x10. kmalloc() can evidently return ZERO_SIZE_PTR when it's given 0 as its alignment requirement. Move the c->x86_cache_alignment initialization to be after c->x86_clflush_size has an actual value. Fixes: fbf6449 ("x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach") Signed-off-by: Dave Hansen <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Tested-by: Nathan Chancellor <[email protected]> Link: https://lore.kernel.org/r/[email protected] (cherry picked from commit 3e32552) Signed-off-by: Ronnie Sahlberg <[email protected]> Signed-off-by: Jonathan Maple <[email protected]>
In sig-cloud-8/4.18.0-553.36.1.el8_10 we added the three commits below because they were fixes to fbf6449:
Any reason not to have them here? |
Because I based it on 22.1 since for my tree So I need to go fix that since, I'll be back and restarting reviews. |
Actually I'm about to push a new update to rocky9.5 we'll just go with that. |
During our test, it is found that a warning can be trigger in try_grab_folio as follow: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 1678 at mm/gup.c:147 try_grab_folio+0x106/0x130 Modules linked in: CPU: 0 UID: 0 PID: 1678 Comm: syz.3.31 Not tainted 6.15.0-rc5 #163 PREEMPT(undef) RIP: 0010:try_grab_folio+0x106/0x130 Call Trace: <TASK> follow_huge_pmd+0x240/0x8e0 follow_pmd_mask.constprop.0.isra.0+0x40b/0x5c0 follow_pud_mask.constprop.0.isra.0+0x14a/0x170 follow_page_mask+0x1c2/0x1f0 __get_user_pages+0x176/0x950 __gup_longterm_locked+0x15b/0x1060 ? gup_fast+0x120/0x1f0 gup_fast_fallback+0x17e/0x230 get_user_pages_fast+0x5f/0x80 vmci_host_unlocked_ioctl+0x21c/0xf80 RIP: 0033:0x54d2cd ---[ end trace 0000000000000000 ]--- Digging into the source, context->notify_page may init by get_user_pages_fast and can be seen in vmci_ctx_unset_notify which will try to put_page. However get_user_pages_fast is not finished here and lead to following try_grab_folio warning. The race condition is shown as follow: cpu0 cpu1 vmci_host_do_set_notify vmci_host_setup_notify get_user_pages_fast(uva, 1, FOLL_WRITE, &context->notify_page); lockless_pages_from_mm gup_pgd_range gup_huge_pmd // update &context->notify_page vmci_host_do_set_notify vmci_ctx_unset_notify notify_page = context->notify_page; if (notify_page) put_page(notify_page); // page is freed __gup_longterm_locked __get_user_pages follow_trans_huge_pmd try_grab_folio // warn here To slove this, use local variable page to make notify_page can be seen after finish get_user_pages_fast. Fixes: a1d8843 ("VMCI: Fix two UVA mapping bugs") Cc: stable <[email protected]> Closes: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Wupeng Ma <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 1bd6406 upstream. During our test, it is found that a warning can be trigger in try_grab_folio as follow: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 1678 at mm/gup.c:147 try_grab_folio+0x106/0x130 Modules linked in: CPU: 0 UID: 0 PID: 1678 Comm: syz.3.31 Not tainted 6.15.0-rc5 #163 PREEMPT(undef) RIP: 0010:try_grab_folio+0x106/0x130 Call Trace: <TASK> follow_huge_pmd+0x240/0x8e0 follow_pmd_mask.constprop.0.isra.0+0x40b/0x5c0 follow_pud_mask.constprop.0.isra.0+0x14a/0x170 follow_page_mask+0x1c2/0x1f0 __get_user_pages+0x176/0x950 __gup_longterm_locked+0x15b/0x1060 ? gup_fast+0x120/0x1f0 gup_fast_fallback+0x17e/0x230 get_user_pages_fast+0x5f/0x80 vmci_host_unlocked_ioctl+0x21c/0xf80 RIP: 0033:0x54d2cd ---[ end trace 0000000000000000 ]--- Digging into the source, context->notify_page may init by get_user_pages_fast and can be seen in vmci_ctx_unset_notify which will try to put_page. However get_user_pages_fast is not finished here and lead to following try_grab_folio warning. The race condition is shown as follow: cpu0 cpu1 vmci_host_do_set_notify vmci_host_setup_notify get_user_pages_fast(uva, 1, FOLL_WRITE, &context->notify_page); lockless_pages_from_mm gup_pgd_range gup_huge_pmd // update &context->notify_page vmci_host_do_set_notify vmci_ctx_unset_notify notify_page = context->notify_page; if (notify_page) put_page(notify_page); // page is freed __gup_longterm_locked __get_user_pages follow_trans_huge_pmd try_grab_folio // warn here To slove this, use local variable page to make notify_page can be seen after finish get_user_pages_fast. Fixes: a1d8843 ("VMCI: Fix two UVA mapping bugs") Cc: stable <[email protected]> Closes: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Wupeng Ma <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
Update process (This kernel CentOS base for
4.18.0-553
)src.rpm
s hosted by RESFsig-cloud-8/4.18.0-553.40.1.el8_10
branchel
release.Removed Patches
Build
Login
Testing
Just ran a basic two pass
