refs

brown9804 · web-flow · commit 5526405cd5e7 · 2025-08-20T09:20:09.000-06:00
diff --git a/README.md b/README.md
@@ -43,11 +43,34 @@ Last updated: 2025-07-30
     - [Cloud Computing Era 1990s-2010](#cloud-computing-era-1990s-2010)
     - [Cloud-Native & Beyond 2013-Present](#cloud-native--beyond-2013-present)
     - [Energy & Sustainability Milestones 1992-present](#energy--sustainability-milestones-1992-present)
+- [Recommended Approaches and Best Practices](#recommended-approaches-and-best-practices)
+    - [Monitor and Request Quota Increases](#monitor-and-request-quota-increases)
+    - [Multi-Region Planning](#multi-region-planning)
+    - [Leverage Newer/Alternative Regions](#leverage-neweralternative-regions)
+    - [Right-Size and Optimize Consumption](#right-size-and-optimize-consumption)
+    - [Work with Azure Support](#work-with-azure-support)
+    - [Adopt Carbon- and Cost-Aware Scheduling](#adopt-carbon--and-cost-aware-scheduling)
 
 </details>
 
 ## History → Cloud → Azure 
 
+> `Why There’s Not Enough Quota/Capacity for Some Cloud Services in Some Regions?`
+
+> 1. **Physical Hardware Limits**:
+>     - Each cloud region is made up of physical data centers (“Availability Zones”) with finite amounts of servers, storage, and network hardware.
+>     - Newer hardware (like GPUs or specialized AI accelerators) or high-demand VM types may be deployed in limited quantities, especially in smaller or newer regions.
+> 2. **High Regional Demand**:
+>     - Popular regions (e.g., “East US”, “West Europe”) can experience surges in demand, especially for trendy services (e.g., AI, GPUs, large VM sizes).
+>     - Capacity is often allocated on a first-come, first-served basis; sudden spikes (product launches, AI workloads, seasonal trends) can exhaust available resources.
+> 3. **Quota as a Control Mechanism**:
+>    - Cloud providers set default quotas (“service limits”) per subscription to prevent accidental overspending and to protect the underlying infrastructure from noisy-neighbor effects.
+>    - Quotas help manage risk, avoid abuse, and ensure fair access for all tenants.
+> 4. **Supply Chain and Energy Constraints**:
+>    - Hardware supply can be delayed due to global supply chain issues, and energy constraints (grid limits, sustainability targets) can cap regional expansion.
+>    - Highly sustainable regions may have stricter power/capacity planning to meet carbon goals.
+> 5. **Regulatory and Compliance Restrictions**: Some services are only available in specific regions due to regulatory, data residency, or export control reasons.
+
 <img width="1143" height="921" alt="cloud-evolution-timeline-Computing Timeline drawio" src="https://github.com/user-attachments/assets/4096eb52-7a0e-4a98-8b01-0b8e7884cdd8" />
 
 [Diagram](./docs/cloud-evolution-timeline.drawio)
@@ -1096,6 +1119,165 @@ From [K8s cluster components](https://kubernetes.io/docs/concepts/architecture/)
 
 </details>
 
+## Recommended Approaches and Best Practices
+
+> In Azure, service/capacity limitations in certain regions are common, especially for high-demand resources (like GPUs, large VM sizes, or new services).
+
+> By monitoring quotas, planning for multi-region deployment, optimizing usage, and collaborating with Microsoft, you can minimize disruption from regional Azure capacity constraints, while meeting energy and sustainability goals.
+
+### Monitor and Request Quota Increases
+
+- Regularly monitor quotas: Use the Azure Portal `Usage + quotas` (per Subscription/Region/Provider) and service blades (e.g., vCPU—virtual CPU families).
+- Proactively request increases: Submit quota increases ahead of scale events; some are self‑serve, others require support.
+- Automate monitoring: Script checks and alerts for thresholds.
+- What to monitor (examples):
+  - Compute vCPU families (Dv5, Fsv2, ND/NC GPU families), Public/Static IPs, disk SKUs, vNet Gateways (per region).
+  - AKS (Azure Kubernetes Service) underlying VM quotas (node pools map to VM families/regions).
+- CLI/PowerShell quick checks (Windows PowerShell):
+  ```powershell
+  # vCPU usage/limits for a region
+  az vm list-usage -l eastus -o table
+
+  # Unattached Public IPs (proxy for capacity housekeeping)
+  az network public-ip list -g MyRg --query "[?ipConfiguration==null]" -o table
+
+  # Regions list
+  az account list-locations --query "[].{name:name, displayName:displayName}" -o table
+  ```
+- Quota change request:
+  - Portal: Subscription > Usage + quotas > Microsoft.Compute (or service) > Region > Request increase.
+  - Support ticket (CLI)
+    ```powershell
+    az extension add --name support
+    az support ticket create `
+      --problem-classification "Service and subscription limits (quotas)" `
+      --service-id "Microsoft.Compute" `
+      --title "Increase Dv5 vCPU quota in East US" `
+      --severity minimal `
+      --description "Need +500 vCPU for AKS scale event in 3 weeks"
+    ```
+
+- Alerting pattern: Schedule a task (Windows Task Scheduler/Azure Automation) that pages if usage/limit > 80%.
+
+### Multi-Region Planning
+
+- Architect for portability: Parameterize region/zone in ARM (Azure Resource Manager), Bicep, or Terraform; avoid hard‑coding.
+- Infrastructure as Code (IaC): One template, multiple parameter files for primary/secondary; validate with what‑if.
+- Data replication:
+  - Storage: LRS → ZRS → GRS/RAGRS/GZRS (pick per RPO/RTO).
+  - Azure SQL: Auto‑Failover Groups.
+  - Cosmos DB: Multi‑region writes; pick consistency (eventual → strong).
+  - Key Vault: Zone‑redundant vaults, soft‑delete, purge protection.
+- Routing and failover:
+  - Azure Front Door (global L7) or Traffic Manager (DNS) for active/active or active/passive; health probes drive failover.
+  - DNS (Domain Name System) TTL tuning for faster cutover.
+- Region‑parameterized Bicep:
+  ```bicep
+  // Deploy a VM Scale Set; pass -p region=eastus|westus2 on deploy
+  @allowed([
+    'eastus'
+    'westus2'
+    'westeurope'
+  ])
+  param region string = resourceGroup().location
+
+  resource vmss 'Microsoft.Compute/virtualMachineScaleSets@2023-09-01' = {
+    name: 'fleet'
+    location: region
+    sku: {
+      name: 'Standard_D4s_v5'
+      capacity: 3
+      tier: 'Standard'
+    }
+    properties: {
+      upgradePolicy: { mode: 'Rolling' }
+      // ...other properties...
+    }
+  }
+  ```
+- What‑if to verify drift/impact:
+  ```powershell
+  az deployment group what-if -g MyRg -f main.bicep -p region=westus2
+  ```
+
+### Leverage Newer/Alternative Regions
+
+- Check capacity in alternatives: Newer/less‑used regions can have more GPU/large VM stock.
+- Evaluate constraints: Latency, data residency, compliance, AZ (Availability Zone) count, price/SLA.
+- Practical steps:
+  - Maintain a ranked “preferred regions” list per workload (latency‑critical vs batch).
+  - Measure latency to users/backends:
+    ```powershell
+    az network watcher test-connectivity `
+      --source-resource <vmIdOrName> `
+      --dest-address myapp.contoso.com --dest-port 443
+    ```
+  - Keep equivalency maps of VM SKUs by region (families differ regionally).
+
+- Cost/feature checks: Compare regional pricing and SLAs; validate managed service feature parity before moving.
+
+### Right-Size and Optimize Consumption
+
+- Use autoscaling and Spot VMs (Virtual Machine Scale Sets—VMSS):
+  ```powershell
+  az vmss create -g MyRg -n batch-spot `
+    --image UbuntuLTS --orchestration-mode Uniform `
+    --priority Spot --max-price -1 --instance-count 0
+  ```
+- Deallocate unused resources:
+  ```powershell
+  # Stop/deallocate
+  az vm deallocate -g MyRg -n DevVm01
+
+  # Find and clean up unattached disks
+  az disk list --query "[?managedBy==null].{name:name, rg:resourceGroup}" -o table
+  ```
+- Schedule non‑critical work: Run batch at off‑peak times/regions (AKS CronJobs, Functions timers, Logic Apps).
+- Instance/disk optimization: Size VMs to actual CPU/mem/IO; use Ephemeral OS disks for stateless nodes; choose Premium SSD v2/Ultra only where needed.
+- Purchasing options: Reservations/Savings Plans for steady workloads; track with Cost Management + Advisor.
+
+
+### Work with Azure Support
+
+- Engage Microsoft early: For large GPU (AI/ML) clusters or bursts, loop in your account team; consider capacity reservations.
+- Capacity reservations (pin capacity):
+  ```powershell
+  az capacity reservation group create -g MyRg -n capGroup -l eastus
+  az capacity reservation create -g MyRg --reservation-group capGroup `
+    -n capDv5 --sku "Standard_D8s_v5" --capacity 50 --zone 1
+  ```
+
+- Monitor advisories: Azure Service Health alerts for regional constraints/maintenance.
+- Open a support ticket (CLI):
+  ```powershell
+  az support ticket create `
+    --problem-classification "Service and subscription limits (quotas)" `
+    --service-id "Microsoft.Compute" `
+    --title "GPU quota & capacity planning - East US & West Europe" `
+    --severity moderate `
+    --description "Requesting NDv5 vGPU quota + capacity reservation for Q4 training."
+  ```
+### Adopt Carbon- and Cost-Aware Scheduling
+
+- Use carbon/cost‑aware strategies: Prefer regions/times with lower carbon intensity or price for deferrable jobs.
+- Automate with orchestrators: AKS scheduler plugins, external metrics; Functions/Logic Apps for timers.
+- Signals and data: Combine grid carbon indicators (gCO₂/kWh) with price/capacity; track kgCO₂e and $/hr in dashboards.
+- AKS autoscale and placement:
+  ```powershell
+  az aks nodepool update -g MyRg -n np1 --cluster-name MyAks `
+    --enable-cluster-autoscaler --min-count 1 --max-count 20
+  ```
+  - Label/taint `green` pools; HPA (Horizontal Pod Autoscaler) with external carbon metrics to pause/resume background work.
+- Batch/ETL (Extract–Transform–Load) patterns:
+  - Time windows aligned to low‑carbon forecasts
+  - Bounded retries with DLQs (Dead‑Letter Queues) for safety
+- Governance:
+  - Budgets/alerts for cost and emissions
+  - Documented RTO/RPO when shifting regions/windows
+
+> [!NOTE]
+> Quotas, capacity, cost, and carbon are coupled constraints. Make deployments portable (IaC), keep a ranked list of viable regions/SKUs, and wire alerts so you can react before users feel it.
+
 <!-- START BADGE -->
 <div align="center">
   <img src="https://img.shields.io/badge/Total%20views-1341-limegreen" alt="Total views">