Skip to content
Merged
Changes from 1 commit
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
bbed453
Revise blog on AKS capacity management strategies
wdarko1 Dec 3, 2025
868afa1
Refine capacity management details in blog post
wdarko1 Dec 3, 2025
874a978
Revise content for Node Auto Provisioning and VM Node Pools
wdarko1 Dec 3, 2025
946109b
Fix NAP documentation link in blog post
wdarko1 Dec 3, 2025
adf8530
Enhance documentation for NAP and VM node pools
wdarko1 Dec 4, 2025
2c8e5a1
Apply suggestions from code review
wdarko1 Dec 12, 2025
3a5386a
Revise node auto-provisioning and VM node pools docs
wdarko1 Dec 13, 2025
59efdf0
Revise NAP requirements section title
wdarko1 Dec 13, 2025
8664648
Refine language on VM SKU availability and scaling
wdarko1 Dec 17, 2025
af4654a
Apply suggestions from code review
wdarko1 Dec 17, 2025
7de73c4
Update website/blog/2025-12-06-node-auto-provisioning-capacity-manage…
wdarko1 Dec 17, 2025
092f67d
Update website/blog/2025-12-06-node-auto-provisioning-capacity-manage…
wdarko1 Dec 17, 2025
bc0d444
Revise NAP and VM node pool documentation
wdarko1 Dec 19, 2025
1948871
Update description of VM node pools auto-scaling
wdarko1 Dec 19, 2025
0458cd5
update index.md - clean up typo
wdarko1 Dec 19, 2025
92eea8f
Merge branch 'Azure:master' into nap-capacity
wdarko1 Dec 19, 2025
e3137fe
Add tags for Node Auto Provisioning and VM Node Pools
wdarko1 Dec 19, 2025
f34f669
Add Wilson Darko to authors list
wdarko1 Dec 19, 2025
aae8200
Update publication date for blog post on AKS
wdarko1 Dec 19, 2025
ca91a31
Revise blog on node auto provisioning and capacity management
wdarko1 Dec 22, 2025
1a28f37
Fix formatting and improve clarity in blog post
wdarko1 Dec 22, 2025
743721d
Correct author name in blog post metadata
wdarko1 Jan 27, 2026
4c3fca4
Update publication date and author format
wdarko1 Jan 27, 2026
5484743
Update node pool management commands in blog post
wdarko1 Jan 27, 2026
5d6db52
Fix formatting and improve clarity in blog post
wdarko1 Jan 27, 2026
95b6caa
Revise bullet points in NAP section for consistency
wdarko1 Jan 27, 2026
bee9e35
Update blog post on Node Auto Provisioning features
wdarko1 Jan 27, 2026
8773d11
Update website/blog/tags.yml
wdarko1 Jan 27, 2026
aa1bbca
Apply suggestions from code review
wdarko1 Jan 27, 2026
4332043
Update documentation for Node Auto Provisioning and VM pools
wdarko1 Jan 28, 2026
0ef452c
Fix formatting and clarify NAP vs cluster autoscaler section
wdarko1 Jan 28, 2026
f83cbaa
Update Capacity Error list
wdarko1 Jan 29, 2026
ba92bdb
Enhance blog post on AKS node provisioning
wdarko1 Jan 29, 2026
e58365b
Add files via upload
wdarko1 Jan 29, 2026
022f127
Add visual demo and documentation links for node provisioning
wdarko1 Jan 29, 2026
deddc4b
Fix formatting and improve clarity in blog post
wdarko1 Jan 29, 2026
d2fcebf
Fix formatting issue in guidance section
wdarko1 Jan 29, 2026
33352be
Update index.md
wdarko1 Jan 29, 2026
7b1fee7
Update index.md
wdarko1 Jan 29, 2026
afd825d
Fix formatting issues in NAP vs Cluster Autoscaler section
wdarko1 Jan 29, 2026
698e436
Update recommendations for NAP and VM node pools
wdarko1 Jan 29, 2026
4a0de81
Fix formatting issue in guidance section
wdarko1 Jan 29, 2026
adc4251
Update headings for consistency and clarity
wdarko1 Jan 29, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
---
title: "Navigating Capacity Challenges on AKS with Node Auto Provisioning or Virtual Machine Node Pools"
description: "Learn how Node auto provisioning and Virtual Machine node pools can address capacity constraints when scaling an AKS cluster"
date: 2025-11-26
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The date field is set to 2025-11-26, which is in the past relative to the blog post directory name (2025-12-06). According to the AKS Blog Post Content Guidelines, the date can be future-dated for future publishing, but there should be consistency between the directory name date and the front matter date. Consider updating the date to match the directory name (2025-12-06) or vice versa to maintain consistency.

Suggested change
date: 2025-11-26
date: 2025-12-06

Copilot uses AI. Check for mistakes.
authors: ["wilson darko"]
tags:
- node-auto-provisioning
- vm-node-pools
---

<!-- truncate -->
Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The blog post is missing a hero image. According to the blog post guidelines, a hero image should be included after the <!-- truncate --> marker using the pattern ![Hero Image](./hero-image.png). Please add a hero image with descriptive alt text.

Copilot generated this review using guidance from repository custom instructions.

:::info

Learn more in the official documentation: [Node Auto Provisioning](https://learn.microsoft.com/azure/aks/node-auto-provisioning) or [Virtual Machine Node Pool](https://learn.microsoft.com/azure/aks/virtual-machines-node-pools)

:::

---

## When Growth Meets a Wall
Imagine this: your application is thriving, traffic spikes, and Kubernetes promises elasticity. You hit “scale,” expecting magic—only to be greeted by cryptic errors like:

- **Insufficient regional capacity**: Azure can’t allocate the VM size you requested.
- **Quota exceeded**: Your subscription has hit its compute limits.
- **Overconstrained allocation**: The VM SKU you chose isn’t available in the zone.

For customers, these aren’t just error messages - they’re roadblocks. Pods remain pending, deployments stall, and SLAs tremble. Scaling isn’t just about adding nodes; it’s about finding capacity in a dynamic, multi-tenant cloud where demand often outpaces supply.

Comment on lines 27 to 35
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to blog post guidelines, the blog post should include a hero image after the truncate marker. Consider adding a descriptive diagram or image illustrating the capacity management concepts (e.g., NAP vs traditional node pools, multi-SKU flexibility). Use the format: ![Descriptive alt text](./hero-image.png)

Copilot generated this review using guidance from repository custom instructions.
---

## The Hidden Complexity Behind Capacity
Why does this happen? Because scaling in Kubernetes isn’t just horizontal—it’s logistical. Every node pool is tied to a VM SKU, region, and zone. When workloads diversify—GPU jobs, memory-heavy analytics, latency-sensitive microservices—the rigid structure of fixed node pools becomes a bottleneck. You’re left juggling trade-offs: Do you overprovision expensive SKUs “just in case”? Or risk underprovisioning and throttling growth? AKS offers to solutions that aim to address these capacity scaling challenges.

---

## Breaking the Mold: Features That Change the Game

### Node Auto Provisioning (NAP): Smarter Scaling
Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the Microsoft Style Guide, use sentence-style capitalization for headings. "Node Auto Provisioning" should be "Node auto provisioning" (capitalize only the first word and proper nouns). Update to: "### Node auto provisioning: Smarter Scaling"

Copilot generated this review using guidance from repository custom instructions.
NAP flips the script. Instead of you guessing the right VM size, NAP uses **pending pod resource requests** to dynamically provision nodes that fit your workloads. Built on the open-source **Karpenter** project, NAP:

- **Automates VM selection**: Chooses optimal SKUs based on CPU, memory, and constraints.
- **Consolidates intelligently**: Removes underutilized nodes, reducing cost.
- **Adapts in real time**: Responds to pod pressure without manual intervention.

Think of NAP as Kubernetes with foresight—provisioning what you need, when you need it, without the spreadsheet gymnastics. Without NAP, a single unavailable VM SKU can block scaling entirely. With NAP, AKS dynamically adapts to capacity fluctuations, ensuring workloads keep running on available VM sizes - even during regional/zonal shortages.

#### How NAP handles capacity errors

When a requested VM SKU isn’t available due to regional or zonal capacity constraints, NAP doesn’t fail outright. Instead, NAP will automatically:

* Evaluate pending pod resource requirements (CPU, memory, GPU, etc.).
* Check if pending pods can fit on existing nodes
* Search across multiple VM SKUs within the allowed families defined in your NAP configuration files (custom resource definitions referred to as the NodePool and AKSNodeClass CRDs).
* Provision an alternative SKU that meets the workload requirements and policy constraints.
* In the event that no VM sizes that match your requirements are available, NAP will only then send an error detailing that "No available SKU that meets your configuration definition is available". **Mitigation**: Make sure you reference a broad range of size options in the NAP configuration files (e.g. D-series, multiple SKU families)

This flexibility is key to avoiding hard failures during scale-out.

### Virtual Machine Node Pools: Flexibility at Scale
Traditional node pools are rigid: one SKU per pool. Virtual Machine node pools break that limitation. With multi-SKU support, you can:

* Mix VM sizes within a single pool for diverse workloads.
* Fine-tune capacity without creating dozens of pools.
* Reduce operational overhead while improving resilience.

This isn’t just flexibility - it’s versatility in capacity-constrained regions.

#### How Virtual Machine node pools handle capacity errors

You can manually add or update alternative VM SKUs into your new or existing node pools. When a requested VM SKU isn't available due to a regional or zonal capacity constraint, you will receive a capacity error, and can resolve this error by simply adding and updating the VM SKUs in your node pools.

## Quick Guidance: When to Use What
Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the Microsoft Style Guide, use sentence-style capitalization for headings. "Quick Guidance" should be "Quick guidance". Update to: "## Quick guidance: When to Use What"

Copilot generated this review using guidance from repository custom instructions.
Generally, using NAP or Virtual Machine node pools are mutually exclusive. You can use NAP to create standalone VMs which NAP manages instead of traditional node pools, which allows for **mixed SKU autoscaling**. Virtual Machine node pools uses traditional node pools, but allows for **mixed SKU manual scaling**.

* (Recommended) Choose NAP for dynamic environments where manual SKU planning is impractical.
* Choose Virtual Machine node pools when you need control—specific SKUs for compliance, predictable performance, or cost modeling.

Avoid NAP if you require strict SKU governance or have regulatory constraints. Avoid VM node pools if you want full automation without manual profiles.

## Best Practice for Resilience
Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the Microsoft Style Guide, use sentence-style capitalization for headings. "Best Practice" should be "Best practice" (and grammatically should be plural "Best practices"). Update to: "## Best practices for Resilience"

Copilot generated this review using guidance from repository custom instructions.

To maximize NAP's ability to handle capacity errors:
* Define broad SKU families (e.g., D, E) in your NodePool requirements.
* Avoid overly restrictive affinity rules.
* Enable multiple NodePools with different priorities for fallback.

## What’s Next on the AKS Roadmap

NAP: Expect deeper integration with cost optimization tools and advanced disruption policies for even smarter consolidation.
Virtual Machine node pools: Auto-scaling profiles are on the horizon, reducing manual configuration and enabling adaptive scaling across mixed SKUs.
Loading