Skip to content

Commit 1aaf506

Browse files
committed
Hub freshness
1 parent 7f59972 commit 1aaf506

File tree

2 files changed

+64
-63
lines changed

2 files changed

+64
-63
lines changed

azure-stack/operator/azure-stack-capacity-planning-compute.md

Lines changed: 33 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -18,15 +18,15 @@ ms.lastreviewed: 03/02/2021
1818
The [virtual machine (VM) sizes](../user/azure-stack-vm-sizes.md) supported on Azure Stack Hub are a subset of those supported on Azure. Azure imposes resource limits along many vectors to avoid overconsumption of resources (server local and service-level). Without imposing some limits on tenant consumption, the tenant experiences suffer when other tenants overconsume resources. For networking egress from the VM, there are bandwidth caps in place on Azure Stack Hub that match Azure limitations. For storage resources on Azure Stack Hub, storage IOPS limits avoid basic over consumption of resources by tenants for storage access.
1919

2020
> [!IMPORTANT]
21-
> The [Azure Stack Hub Capacity Planner](https://aka.ms/azstackcapacityplanner) does not consider or guarantee IOPS performance. The administrator portal shows a warning alert when the total system memory consumption has reached 85%. This alert can be remediated by [adding additional capacity](azure-stack-add-scale-node.md), or by removing virtual machines that are no longer required.
21+
> The [Azure Stack Hub Capacity Planner](https://aka.ms/azstackcapacityplanner) does not consider or guarantee IOPS performance. The administrator portal shows a warning alert when the total system memory consumption reaches 85%. This alert can be remediated by [adding more capacity](azure-stack-add-scale-node.md), or by removing virtual machines that are no longer required.
2222
2323
## VM placement
2424

2525
The Azure Stack Hub placement engine places tenant VMs across the available hosts.
2626

2727
Azure Stack Hub uses two considerations when placing VMs. One, is there enough memory on the host for that VM type? And two, are the VMs a part of an [availability set](/azure/virtual-machines/windows/manage-availability) or are they [virtual machine scale sets](/azure/virtual-machine-scale-sets/overview)?
2828

29-
To achieve high availability of a multi-VM production workload in Azure Stack Hub, virtual machines (VMs) are placed in an availability set that spreads them across multiple fault domains. A fault domain in an availability set is defined as a single node in the scale unit. Azure Stack Hub supports having an availability set with a maximum of three fault domains to be consistent with Azure. VMs placed in an availability set will be physically isolated from each other by spreading them as evenly as possible over multiple fault domains (Azure Stack Hub nodes). If there's a hardware failure, VMs from the failed fault domain are restarted in other fault domains. If possible, they're kept in separate fault domains from the other VMs in the same availability set. When the host comes back online, VMs are rebalanced to maintain high availability.
29+
To achieve high availability of a multi-VM production workload in Azure Stack Hub, virtual machines (VMs) are placed in an availability set that spreads them across multiple fault domains. A fault domain in an availability set is defined as a single node in the scale unit. Azure Stack Hub supports having an availability set with a maximum of three fault domains to be consistent with Azure. VMs placed in an availability set are physically isolated from each other by spreading them as evenly as possible over multiple fault domains (Azure Stack Hub nodes). If there's a hardware failure, VMs from the failed fault domain are restarted in other fault domains. If possible, they're kept in separate fault domains from the other VMs in the same availability set. When the host comes back online, VMs are rebalanced to maintain high availability.
3030

3131
Virtual machine scale sets use availability sets on the back end and make sure each virtual machine scale set instance is placed in a different fault domain. This means they use separate Azure Stack Hub infrastructure nodes. For example, in a four-node Azure Stack Hub system, there might be a situation where a virtual machine scale set of three instances fails at creation due to the lack of the four-node capacity to place three virtual machine scale set instances on three separate Azure Stack Hub nodes. In addition, Azure Stack Hub nodes can be filled up at varying levels before trying placement.
3232

@@ -36,7 +36,7 @@ Since placement algorithms don't look at the existing virtual to physical core o
3636

3737
## Consideration for total number of VMs
3838

39-
There is a limit on the total number of VMs that can be created. The maximum number of VMs on Azure Stack Hub is 700 and 60 per scale unit node. For example, an eight-server Azure Stack Hub VM limit would be 480 (8 * 60). For a 12 to 16 server Azure Stack Hub solution, the limit would be 700. This limit has been created keeping all the compute capacity considerations in mind, such as the resiliency reserve and the CPU virtual-to-physical ratio that an operator would like to maintain on the stamp.
39+
There is a limit on the total number of VMs that can be created. The maximum number of VMs on Azure Stack Hub is 700 and 60 per scale unit node. For example, an eight-server Azure Stack Hub VM limit would be 480 (8 * 60). For a 12 to 16 server Azure Stack Hub solution, the limit would be 700. This limit waws created with all the compute capacity considerations in mind, such as the resiliency reserve and the CPU virtual-to-physical ratio that an operator would like to maintain on the stamp.
4040

4141
If the VM scale limit is reached, the following error codes are returned as a result: `VMsPerScaleUnitLimitExceeded`, `VMsPerScaleUnitNodeLimitExceeded`.
4242

@@ -45,15 +45,15 @@ If the VM scale limit is reached, the following error codes are returned as a re
4545
4646
## Consideration for batch deployment of VMs
4747

48-
In releases prior to and including 2002, 2-5 VMs per batch with 5 mins gap in between batches provided reliable VM deployments to reach a scale of 700 VMs. With the 2005 version of Azure Stack Hub onwards, we are able to reliably provision VMs at batch sizes of 40 with 5 mins gap in between batch deployments. Start, Stop-deallocate, and update operations should be done at a batch size of 30, leaving 5 mins in between each batch.
48+
In releases before and including 2002, 2-5 VMs per batch with 5 mins gap in between batches provided reliable VM deployments to reach a scale of 700 VMs. With the 2005 version of Azure Stack Hub onwards, we are able to reliably provision VMs at batch sizes of 40 with 5 mins gap in between batch deployments. Start, Stop-deallocate, and update operations should be done at a batch size of 30, leaving 5 mins in between each batch.
4949

5050
## Consideration for GPU VMs
5151

5252
Azure Stack hub reserves memory for the infrastructure and tenant VMs to failover. Unlike other VMs, GPU VMs run in a non-HA (high availability) mode and therefore do not failover. As a result, reserved memory for a GPU VM-only stamp is what is required by the infrastructure to failover, as opposed to accounting for HA tenant VM memory too.
5353

5454
## Azure Stack Hub memory
5555

56-
Azure Stack Hub is designed to keep VMs running that have been successfully provisioned. For example, if a host is offline because of a hardware failure, Azure Stack Hub will attempt to restart that VM on another host. A second example during patching and updating of the Azure Stack Hub software. If there's a need to reboot a physical host, an attempt is made to move the VMs executing on that host to another available host in the solution.
56+
Azure Stack Hub is designed to keep VMs running that were successfully provisioned. For example, if a host is offline because of a hardware failure, Azure Stack Hub attempts to restart that VM on another host. A second example during patching and updating of the Azure Stack Hub software. If there's a need to reboot a physical host, an attempt is made to move the VMs executing on that host to another available host in the solution.
5757

5858
This VM management or movement can only be achieved if there's reserved memory capacity to allow for the restart or migration to occur. A portion of the total host memory is reserved and unavailable for tenant VM placement.
5959

@@ -66,7 +66,7 @@ Used memory is made up of several components. The following components consume t
6666
- **Host OS usage or reserve:** The memory used by the operating system (OS) on the host, virtual memory page tables, processes that are running on the host OS, and the Spaces Direct memory cache. Since this value is dependent on the memory used by the different Hyper-V processes running on the host, it can fluctuate.
6767
- **Infrastructure services:** The infrastructure VMs that make up Azure Stack Hub. As discussed previously, these VMs are part of the 700 VM maximum. The memory utilization of the infrastructure services component may change as we work on making our infrastructure services more scalable and resilient. For more information see the [Azure Stack Hub capacity planner](https://aka.ms/azstackcapacityplanner)
6868
- **Resiliency reserve:** Azure Stack Hub reserves a portion of the memory to allow for tenant availability during a single host failure as well as during patch and update to allow for successful live migration of VMs.
69-
- **Tenant VMs:** The tenant VMs created by Azure Stack Hub users. In addition to running VMs, memory is consumed by any VMs that have landed on the fabric. This means that VMs in "Creating" or "Failed" state, or VMs shut down from within the guest, will consume memory. However, VMs that have been deallocated using the stop deallocated option from portal/powershell/cli won't consume memory from Azure Stack Hub.
69+
- **Tenant VMs:** The tenant VMs created by Azure Stack Hub users. In addition to running VMs, memory is consumed by any VMs that have landed on the fabric. This means that VMs in "Creating" or "Failed" state, or VMs shut down from within the guest, consume memory. However, VMs that were deallocated using the stop deallocated option from portal/powershell/cli won't consume memory from Azure Stack Hub.
7070
- **Value-add resource providers (RPs):** VMs deployed for the value-add RPs like SQL, MySQL, App Service, and so on.
7171

7272
The best way to understand memory consumption on the portal is to use the [Azure Stack Hub Capacity Planner](https://aka.ms/azstackcapacityplanner) to see the impact of various workloads. The following calculation is the same one used by the planner.
@@ -75,31 +75,32 @@ This calculation results in the total available memory that can be used for tena
7575

7676
Available memory for VM placement = total host memory - resiliency reserve - memory used by running tenant VMs - Azure Stack Hub Infrastructure Overhead <sup>1</sup>
7777

78-
* Total host memory = Sum of memory from all nodes
79-
* Resiliency reserve = H + R * ((N-1) * H) + V * (N-2)
80-
* Memory used by tenant VMs = Actual memory consumed by tenant workload, does not depend on HA configuration
81-
* Azure Stack Hub Infrastructure Overhead = 268 GB + (4GB x N)
78+
- Total host memory = Sum of memory from all nodes
79+
- Resiliency reserve = H + R * ((N-1) * H) + V * (N-2)
80+
- Memory used by tenant VMs = Actual memory consumed by tenant workload, does not depend on HA configuration
81+
- Azure Stack Hub Infrastructure Overhead = 268 GB + (4GB x N)
8282

83-
> Where:
84-
> - H = Size of single server memory
85-
> - N = Size of Scale Unit (number of servers)
86-
> - R = The operating system reserve for OS overhead, which is .15 in this formula<sup>2</sup>
87-
> - V = Largest HA VM in the scale unit
83+
Where:
84+
85+
- H = Size of single server memory
86+
- N = Size of Scale Unit (number of servers)
87+
- R = The operating system reserve for OS overhead, which is .15 in this formula<sup>2</sup>
88+
- V = Largest HA VM in the scale unit
8889

8990
<sup>1</sup> Azure Stack Hub Infrastructure overhead = 268 GB + (4 GB x # of nodes). Approximately 31 VMs are used to host Azure Stack Hub's infrastructure and, in total, consume about 268 GB + (4 GB x # of nodes) of memory and 146 virtual cores. The rationale for this number of VMs is to satisfy the needed service separation to meet security, scalability, servicing, and patching requirements. This internal service structure allows for the future introduction of new infrastructure services as they're developed.
9091

91-
<sup>2</sup> Operating system reserve for overhead = 15% (.15) of node memory. The operating system reserve value is an estimate and will vary based on the physical memory capacity of the server and general operating system overhead.
92+
<sup>2</sup> Operating system reserve for overhead = 15% (.15) of node memory. The operating system reserve value is an estimate and varies based on the physical memory capacity of the server and general operating system overhead.
9293

93-
The value V, largest HA VM in the scale unit, is dynamically based on the largest tenant VM memory size. For example, the largest HA VM value is a minimum of 12 GB (accounting for the infrastructure VM) or 112 GB or any other supported VM memory size in the Azure Stack Hub solution. Changing the largest HA VM on the Azure Stack Hub fabric will result in an increase in the resiliency reserve and also to the increase in the memory of the VM itself. Remember that GPU VMs run in non-HA mode.
94+
The value V, largest HA VM in the scale unit, is dynamically based on the largest tenant VM memory size. For example, the largest HA VM value is a minimum of 12 GB (accounting for the infrastructure VM) or 112 GB or any other supported VM memory size in the Azure Stack Hub solution. Changing the largest HA VM on the Azure Stack Hub fabric results in an increase in the resiliency reserve and also to the increase in the memory of the VM itself. Remember that GPU VMs run in non-HA mode.
9495

9596
### Sample calculation
9697

97-
We have a small four-node Azure Stack Hub deployment with 768 GB RAM on each node. We plan to place a virtual machine for SQL server with 128GB of RAM (Standard_E16_v3). What will be the available memory for VM placement?
98+
We have a small four-node Azure Stack Hub deployment with 768 GB RAM on each node. We plan to place a virtual machine for SQL server with 128GB of RAM (Standard_E16_v3). What is the available memory for VM placement?
9899

99-
* Total host memory = Sum of memory from all nodes = 4 * 768 GB = 3072 GB
100-
* Resiliency reserve = H + R * ((N-1) * H) + V * (N-2) = 768 + 0.15 * ((4 - 1) * 768) + 128 * (4 - 2) = 1370 GB
101-
* Memory used by tenant VMs = Actual memory consumed by tenant workload, does not depend on HA configuration = 0 GB
102-
* Azure Stack Hub Infrastructure Overhead = 268 GB + (4GB x N) = 268 + (4 * 4) = 284 GB
100+
- Total host memory = Sum of memory from all nodes = 4 * 768 GB = 3072 GB
101+
- Resiliency reserve = H + R * ((N-1) * H) + V * (N-2) = 768 + 0.15 * ((4 - 1) * 768) + 128 * (4 - 2) = 1370 GB
102+
- Memory used by tenant VMs = Actual memory consumed by tenant workload, does not depend on HA configuration = 0 GB
103+
- Azure Stack Hub Infrastructure Overhead = 268 GB + (4GB x N) = 268 + (4 * 4) = 284 GB
103104

104105
Available memory for VM placement = total host memory - resiliency reserve - memory used by running tenant VMs - Azure Stack Hub Infrastructure Overhead
105106

@@ -126,43 +127,43 @@ There are three ways to deallocate memory for VM placement using the formula **R
126127

127128
### Reduce the size of the largest VM
128129

129-
Reducing the size of the largest VM to the next smallest VM in stamp (24 GB) will reduce the size of the resiliency reserve.
130+
Reducing the size of the largest VM to the next smallest VM in stamp (24 GB) reduces the size of the resiliency reserve.
130131

131132
![Reduce the VM size](media/azure-stack-capacity-planning/decrease-vm-size.png)
132-
133+
133134
Resiliency reserve = 384 + 172.8 + 48 = 604.8 GB
134-
135+
135136
| Total memory | Infra GB | Tenant GB | Resiliency reserve | Total memory reserved | Total GB available for placement |
136137
|--------------|--------------------|---------------------|--------------------|--------------------------------|----------------------------------|
137138
| 1536 GB | 258 GB | 329.25 GB | 604.8 GB | 258 + 329.25 + 604.8 = 1168 GB | **~344 GB** |
138-
139+
139140
### Add a node
140141

141-
[Adding an Azure Stack Hub node](./azure-stack-add-scale-node.md) will deallocate memory by equally distributing the memory between the two nodes.
142+
[Adding an Azure Stack Hub node](./azure-stack-add-scale-node.md) deallocates memory by equally distributing the memory between the two nodes.
142143

143144
![Add a node](media/azure-stack-capacity-planning/add-a-node.png)
144145

145146
Resiliency reserve = 384 + (0.15) ((5)*384) + 112 * (3) = 1008 GB
146-
147+
147148
| Total Memory | Infra GB | Tenant GB | Resiliency reserve | Total memory reserved | Total GB available for placement |
148149
|--------------|--------------------|---------------------|--------------------|--------------------------------|----------------------------------|
149150
| 1536 GB | 258 GB | 329.25 GB | 604.8 GB | 258 + 329.25 + 604.8 = 1168 GB | **~ 344 GB** |
150151

151152
### Increase memory on each node to 512 GB
152153

153-
[Increasing the memory of each node](./azure-stack-manage-storage-physical-memory-capacity.md) will increase the total available memory.
154+
[Increasing the memory of each node](./azure-stack-manage-storage-physical-memory-capacity.md) increases the total available memory.
154155

155156
![Increase the size of the node](media/azure-stack-capacity-planning/increase-node-size.png)
156157

157158
Resiliency reserve = 512 + 230.4 + 224 = 966.4 GB
158-
159+
159160
| Total Memory | Infra GB | Tenant GB | Resiliency reserve | Total memory reserved | Total GB available for placement |
160161
|-----------------|----------|-----------|--------------------|-----------------------|----------------------------------|
161162
| 2048 (4*512) GB | 258 GB | 505.75 GB | 966.4 GB | 1730.15 GB | **~ 318 GB** |
162163

163164
## Frequently Asked Questions
164165

165-
**Q**: My tenant deployed a new VM, how long will it take for the capability chart on the administrator portal to show remaining capacity?
166+
**Q**: My tenant deployed a new VM, how long does it take for the capability chart on the administrator portal to show remaining capacity?
166167

167168
**A**: The capacity blade refreshes every 15 minutes, so take that into consideration.
168169

@@ -192,7 +193,7 @@ Resiliency reserve = 512 + 230.4 + 224 = 966.4 GB
192193

193194
**Q**: What state do tenant VMs have to be in to consume memory?
194195

195-
**A**: In addition to running VMs, memory is consumed by any VMs that have landed on the fabric. This means that VMs that are in a "Creating" or "Failed" state will consume memory. VMs shut down from within the guest as opposed to stop deallocated from portal/powershell/cli will also consume memory.
196+
**A**: In addition to running VMs, memory is consumed by any VMs that have landed on the fabric. This means that VMs that are in a "Creating" or "Failed" state consume memory. VMs shut down from within the guest as opposed to stop deallocated from portal/powershell/cli also consume memory.
196197

197198
**Q**: I have a four-host Azure Stack Hub. My tenant has 3 VMs that consume 56 GB of RAM (D5_v2) each. One of the VMs is resized to 112 GB RAM (D14_v2), and available memory reporting on dashboard resulted in a spike of 168 GB usage on the capacity blade. Subsequent resizing of the other two D5_v2 VMs to D14_v2 resulted in only 56 GB of RAM increase each. Why is this so?
198199

0 commit comments

Comments
 (0)