CloudHv: bump PcdCpuMaxLogicalProcessorNumber to 256#6
Closed
tpressure wants to merge 1 commit intocloud-hypervisor:chfrom
Closed
CloudHv: bump PcdCpuMaxLogicalProcessorNumber to 256#6tpressure wants to merge 1 commit intocloud-hypervisor:chfrom
tpressure wants to merge 1 commit intocloud-hypervisor:chfrom
Conversation
Signed-off-by: Thomas Prescher <thomas.prescher@cyberus-technology.de> On-behalf-of: SAP thomas.prescher@sap.com
Member
Author
|
@rbradford is this the right place for this change, or should I rather propose this in upstream EDK2? |
Member
|
@tpressure Yes, you should send it upstream - this repository only exists to produce binary assets for the main CH CI. |
mergify bot
pushed a commit
to tpressure/edk2
that referenced
this pull request
Jul 24, 2025
…eEntry This patch introduces a synchronization point between the BSP and APs to ensure all APs have entered their SMM wait-loop (while (TRUE) in APHandler ()) before the BSP calls into the SMI handler logic via gSmmCpuPrivate ->SmmCoreEntry(). Previously, the BSP would invoke ReleaseAllAPs() and immediately proceed to SmmCoreEntry() without confirming whether APs had reached the stable waiting state. If SmmStartupThisAp() was called inside the SMI handler shortly after ReleaseAllAPs(), it might lead to a race condition: APs are issued two consecutive wait signals (SmmCpuSyncWaitForBsp()). BSP sends two consecutive releases (ReleaseAllAPs() + SmmStartupThisAp()) If an AP has not yet responded to the first release, the second release may overwrite the semaphore state, and the AP might miss the notification, causing it to hang or behave unpredictably. To address this: A SmmCpuSyncWaitForAPs() is added in BSP after mmCpuPlatformHookBeforeMmiHandler() and before entering SmmCoreEntry(). A matching SmmCpuSyncReleaseBsp() is added in AP immediately after its own SmmCpuPlatformHookBeforeMmiHandler() This ensures that BSP does not enter SMI handler logic or dispatch any AP-related requests before all APs are confirmed to be idle and ready. Debug sync point markers (e.g., /// cloud-hypervisor#6, tianocore#7) are updated accordingly. This change eliminates a subtle but critical race condition in multi-processor/multi-socket systems during SMM entry and improves overall synchronization safety. Signed-off-by: Wei6 Xu <wei6.xu@intel.com>
mergify bot
pushed a commit
to tpressure/edk2
that referenced
this pull request
Jul 24, 2025
…eEntry This patch introduces a synchronization point between the BSP and APs to ensure all APs have entered their SMM wait-loop (while (TRUE) in APHandler ()) before the BSP calls into the SMI handler logic via gSmmCpuPrivate ->SmmCoreEntry(). Previously, the BSP would invoke ReleaseAllAPs() and immediately proceed to SmmCoreEntry() without confirming whether APs had reached the stable waiting state. If SmmStartupThisAp() was called inside the SMI handler shortly after ReleaseAllAPs(), it might lead to a race condition: APs are issued two consecutive wait signals (SmmCpuSyncWaitForBsp()). BSP sends two consecutive releases (ReleaseAllAPs() + SmmStartupThisAp()) If an AP has not yet responded to the first release, the second release may overwrite the semaphore state, and the AP might miss the notification, causing it to hang or behave unpredictably. To address this: A SmmCpuSyncWaitForAPs() is added in BSP after mmCpuPlatformHookBeforeMmiHandler() and before entering SmmCoreEntry(). A matching SmmCpuSyncReleaseBsp() is added in AP immediately after its own SmmCpuPlatformHookBeforeMmiHandler() This ensures that BSP does not enter SMI handler logic or dispatch any AP-related requests before all APs are confirmed to be idle and ready. Debug sync point markers (e.g., /// cloud-hypervisor#6, tianocore#7) are updated accordingly. This change eliminates a subtle but critical race condition in multi-processor/multi-socket systems during SMM entry and improves overall synchronization safety. Signed-off-by: Wei6 Xu <wei6.xu@intel.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When using CLOUDHV.fd on cloud-hypervisor, only a total of 64 vCPUs can be used. This is because cloud-hypervisor does not (yet) implement a fw_cfg device that tells us the maximum number of logical processors and the default cloud-hypervisor config sets this value to 64. Thus, the firmware is only allocating stacks for 64 cpus. When more than 64 vcpus are in use, random vcpus crash due to stack collisions.
Proposed solution
We should bump the default here to at least what cloud-hypervisor currently supports, that is, 254 vcpus.
I'm also open to bump this to 1024 or 8192 (the maximum number of CPUs Linux supports) because I'm working on >254 vcpu support in CHV anyway.