diff --git a/.github/skills/network/SKILL.md b/.github/skills/network/SKILL.md
index ada055e..a90f347 100644
--- a/.github/skills/network/SKILL.md
+++ b/.github/skills/network/SKILL.md
@@ -133,9 +133,9 @@ The network module automatically derives a PE pool from all subnets whose name s
- **Explicit `pe_subnet_key` in tenant config** (`var.tenants[key].pe_subnet_key`) — **ALWAYS set**, validated at plan time
- Resolution is strict: invalid/missing key in the shared PE pool fails at plan time (no silent fallback)
-Each tenant creates up to 5 PEs (Key Vault, AI Search, Cosmos DB, Document Intelligence, Speech Services). All PEs for a tenant land on the **same** subnet ("tenant affinity"). Storage Account has no PE (public access in Landing Zone).
+Each tenant creates up to 5 PEs but 6 IPs total (Cosmos DB PE = 2 IPs: sql global + canadacentral regional endpoint). All PEs for a tenant land on the **same** subnet ("tenant affinity"). Storage Account has no PE (public access in Landing Zone).
-Shared stack PEs (AI Foundry Hub, Language Service, Hub Key Vault) always use the primary `privateendpoints-subnet` (~4-5 PEs).
+Shared stack PEs always use the primary `privateendpoints-subnet` — consuming exactly **5 IPs**: AI Foundry Hub PE (3 IPs: cognitiveservices, openai, services.ai sub-resources), Language Service PE (1 IP), Hub Key Vault PE (1 IP).
### PE Subnet Assignment Strategy
@@ -143,9 +143,9 @@ Shared stack PEs (AI Foundry Hub, Language Service, Hub Key Vault) always use th
**Capacity math:**
- Each `/24` PE subnet holds ~251 usable IPs (Azure reserves 5)
-- Each tenant consumes up to 5 PE IPs → ~50 tenants per `/24` subnet
-- Shared stack consumes ~5 PEs on primary subnet (reducing tenant capacity to ~49 on primary)
-- Prod has 3 PE subnets → theoretical max ~148 tenants
+- Each tenant consumes up to 6 PE IPs (Cosmos DB = 2) → ~41 tenants per `/24` subnet
+- Shared stack consumes exactly 5 IPs on primary subnet: Foundry Hub 3 IPs (AIServices kind exposes cognitiveservices + openai + services.ai) + Language Service 1 IP + Hub KV 1 IP → reduces tenant capacity to ~41 on primary (246 ÷ 6)
+- Prod has 3 PE subnets → theoretical max ~123 tenants
**Assignment rules for new tenants:**
1. Check current PE count per subnet (Azure Portal → subnet → Connected devices, or `az network vnet subnet show`)
diff --git a/docs/_pages/diagrams.html b/docs/_pages/diagrams.html
index 4ed9e27..df6135a 100644
--- a/docs/_pages/diagrams.html
+++ b/docs/_pages/diagrams.html
@@ -402,7 +402,7 @@
What's Included / Not
IP Budget Breakdown
-
Detailed IP allocation: base infrastructure, per-tenant consumption, 50 IP calculation
+
Detailed IP allocation: base infrastructure, per-tenant consumption (~6 IPs each), capacity math (~41 tenants per /24 PE subnet)
@@ -424,7 +424,7 @@
Networking Architecture (Detailed)
Network Environments
-
All 4 VNets (prod, test, dev, tools) with subnet allocations and NSG rules
+
3 VNets (prod, test, dev) with subnet allocations and NSG rules
@@ -646,7 +646,7 @@
Network Architecture
Network Environments
Complete environment layout:
-
All 4 VNets (da4cf6-prod/test/dev/tools)
+
3 VNets (da4cf6-prod/test/dev) — tools VNet is a separate peered spoke (CI/CD only, not in this allocation)
Subnet allocations per environment
Canada Central (prod) vs Canada East (non-prod)
Client connectivity via App Gateway + APIM
diff --git a/docs/assets/ip-budget-breakdown.svg b/docs/assets/ip-budget-breakdown.svg
index 770aa4b..14c10cc 100644
--- a/docs/assets/ip-budget-breakdown.svg
+++ b/docs/assets/ip-budget-breakdown.svg
@@ -16,7 +16,7 @@
BC Gov AI Hub - IP Budget Breakdown
- How we calculated 50 IPs per tenant resource group
+ Actual PE consumption: ~6 IPs per tenant (Cosmos DB uses 2 IPs: sql global + canadacentral regional)
@@ -25,12 +25,12 @@
PROD VNet
- /22 = 1,024 IPs
+ 4×/24 = 1,024 IPsTEST VNet
- /23 = 512 IPs
+ 2×/24 = 512 IPs
@@ -38,19 +38,19 @@
/24 = 256 IPs
-
- TOOLS VNet
- /24 = 256 IPs
-
- TOTAL
- 2,048 IPs
+ 1,792 IPs
-
+ USABLE*
- ~1,800 IPs
+ ~1,550 IPs
+
+
+
+ TOOLS VNet
+ External peered spoke — CI/CD only
@@ -72,43 +72,43 @@
- AppGatewaySubnet
+ appgw-subnet/266459
- WAF instances (~15 used)
+ App Gateway WAF v2 · No delegation
- APIMSubnet
- /26
- 64
- 59
- Internal APIM (~12 used)
+ apim-subnet
+ /25
+ 128
+ 123
+ APIM internal mode · Web/serverFarms
-
+
- AIFoundryCompute
- /25
- 128
- 123
- Compute instances, training
+ aca-subnet
+ /27
+ 32
+ 27
+ Container Apps Env · App/environments
-
+
- PrivateLinkSubnet
- /24
- 256
- 251
- TENANT PRIVATE ENDPOINTS
+ privateendpoints-subnet (×3 /24s)
+ 3×/24
+ 768
+ 753
+ TENANT PRIVATE ENDPOINTS (~149 tenants)PROD TOTAL
- /22
+ 4×/241,024
- ~999
- *Azure reserves 5 per subnet
+ ~962
+ *Azure reserves 5 per subnet (6 subnets)
@@ -118,98 +118,83 @@
App Gateway (WAF instances, frontend IPs):
- ~15 IPs
+ ~5 IPs
- APIM Internal Mode (units, management):
- ~12 IPs
+ APIM internal mode (VNet injection):
+ ~5 IPs
- Shared Platform Private Endpoints:
- ~10 IPs
- (TF state, shared KV, ACR)
+ Shared stack Private Endpoints (on primary PE subnet):
+ ~5 IPs
+ (Foundry Hub 3 IPs + Lang Svc 1 + Hub KV 1)
- AI Foundry Hub (shared compute pool):
- ~20 IPs
+ Container Apps Env (key rotation job):
+ ~5 IPs
- Azure Reserved (5 per subnet × 5 subnets):
- ~25 IPs
+ Azure Reserved (5 per subnet × 7 subnets):
+ ~35 IPs
- TOTAL SHARED INFRASTRUCTURE: ~82 IPs consumed
+ TOTAL SHARED INFRASTRUCTURE: ~55 IPs consumed
-
-
-
- TOOLS VNET - PLATFORM TEAM ONLY
+
+
+
+ PE SUBNET CAPACITY (WHERE TENANT PEs LIVE)
- Subnet
+ PE SubnetCIDRIPs
- Used
- Purpose
+ Usable
+ Tenant Capacity (~6 IPs each)
- AzureBastionSubnet
- /26
- 64
- ~5
- Admin access (platform team)
+ privateendpoints-subnet (prod)
+ /24
+ 256
+ 251
+ ~41 tenants (246 IPs ÷ 6 per tenant; 5 shared: Foundry Hub 3 + Lang 1 + KV 1)
- JumpboxSubnet
- /28
- 16
- ~2
- 1-2 VMs for emergency access
+ privateendpoints-subnet-1 (prod)
+ /24
+ 256
+ 251
+ ~41 tenants
- RunnersSubnet
- /26
- 64
- ~10
- GitHub Actions (via Chisel proxy)
-
-
- ToolsPrivateLink
- /26
- 64
- ~8
- TF state, shared secrets
-
-
- TOOLS TOTAL
- /24
- 256
- ~25
- NOT for tenant use
+ privateendpoints-subnet-2 (prod)
+ /24
+ 256
+ 251
+ ~41 tenants
+
+
+ PROD PE TOTAL
+ 3×/24
+ 753
+ ~123
+ max tenants in prod (6 IPs each)
- AVAILABLE FOR TENANT PRIVATE ENDPOINTS
+ CAPACITY: ~6 IPs PER TENANT
- PrivateLink Subnet Capacity (where tenant PEs live):
+ Each tenant's private endpoints (all land on one PE subnet — tenant affinity):
- PROD PrivateLink Subnet (/24):
- 256 IPs - 5 reserved - 10 shared =
- ~241 IPs
-
- TEST PrivateLink Subnet (/25):
- 128 IPs - 5 reserved - 5 shared =
- ~118 IPs
-
- DEV PrivateLink Subnet (/26):
- 64 IPs - 5 reserved - 5 shared =
- ~54 IPs
+ Key Vault PE: 1 IP | AI Search PE: 1 IP | Cosmos DB PE: 2 IPs
+ Doc Intelligence PE: 1 IP | Speech Services PE: 1 IP | (Cosmos: sql global + canadacentral regional)
+ Storage Account: NO private endpoint (public access via Landing Zone)
- TOTAL AVAILABLE FOR TENANT PRIVATE ENDPOINTS
- ~413 IPs across all environments
+ UP TO 6 IPs PER TENANT → ~41 TENANTS PER /24 PE SUBNET
+ PROD max ~123 tenants (3 PE subnets × ~41)
@@ -228,119 +213,99 @@
- ALWAYS INCLUDED:
+ ALWAYS INCLUDED (up to 5 PEs = 6 IPs total per tenant):
- Storage Account
- blob, file, queue, table
- 4 IPs
+ Key Vault
+ vault endpoint
+ 1 IP
- Key Vault
- vault
+ AI Search (dedicated)
+ searchService1 IP
- AI Foundry Project
- workspace
+ Cosmos DB
+ sql global + canadacentral regional2 IPs
- Azure OpenAI
+ Document Intelligencecognitiveservices1 IP
-
- BASE MINIMUM
- 8 IPs
-
-
-
- CONDITIONAL (if dedicated, not shared):
-
-
- AI Search (dedicated)
- searchService
- 3-5 IPs
-
-
- Document Intelligence (dedicated)
- cognitiveservices
- 1-2 IPs
-
-
- Cosmos DB
- sql, mongodb
- 3-5 IPs
-
-
- Container Registry (dedicated)
- registry
- 2-3 IPs
+
+ Speech Services
+ cognitiveservices
+ 1 IP
+
+
+ Storage Account
+ PUBLIC ACCESS — no PE
+ 0 IPs
+
+
+ MAXIMUM PER TENANT
+ 6 IPs
- Example Tenant Scenarios:
+ Example Tenant Scenarios (PE IPs consumed):
-
- MINIMAL: Basic RAG Chatbot
- Storage + KV + Foundry + OpenAI + Shared Search
- Buffer for growth
- = 8 IPs
- = 7 IPs
-
- ~15 IPs
+
+ MINIMAL: KV only
+ Key Vault PE only
+ = 1 IP
+
+ 1 IP
-
- STANDARD: Full AI Solution
- Base minimum
- + Dedicated AI Search + Cosmos DB
- + Doc Intelligence + Buffer
- = 8 IPs
- = 8 IPs
- = 12 IPs
-
- ~28 IPs
+
+ STANDARD: KV + Search + Cosmos
+ Key Vault + AI Search + Cosmos DB (2 IPs)
+ = 4 IPs
+
+ 4 IPs
+ + Doc Intelligence
+ = 5 IPs
+
+ 5 IPs
-
- FULL: Extended Enterprise
- Standard allocation
- + Container Apps Env
- + ACR + Additional buffer
- = 28 IPs
- = 10 IPs
- = 12 IPs
-
- ~50 IPs
+
+ FULL: All 5 PEs (6 IPs) deployed
+ KV + AI Search + Cosmos (×2) + Doc Intel + Speech
+ = 6 IPs
+
+ 6 IPs MAX
-
+
- WHY 50 IPs PER TENANT?
+ WHY ~41 TENANTS PER /24?Calculation:
- PROD PrivateLink available: ~241 IPs
- Target: 5 ministries in PROD
- 241 ÷ 5 = ~48 IPs per tenant
- Rounded to 50 for clean allocation
+ Each /24 PE subnet: 256 - 5 reserved = 251 usable
+ Each tenant: up to 6 PE IPs (KV, Search, Cosmos×2, DocIntel, Speech)
+ Primary subnet: 251 - 5 (Foundry Hub 3 + Lang 1 + KV 1) = 246 ÷ 6 → ~41 tenants
+ PE overflow subnets: ~41 tenants each (251 ÷ 6; no shared stack PEs)
- Why 50 works:
- ✓ Covers minimal scenario (15 IPs) with room
- ✓ Covers standard scenario (28 IPs) comfortably
- ⚠ Extended scenario needs increase request
+ Prod capacity (3 PE subnets):
+ ✓ ~41 (primary) + ~41 + ~41 = ~123 max tenants
+ ✓ Storage Account has NO PE (saves IPs per tenant)
+ ✓ pe_subnet_key is sticky — assign once, never change
- Policy:
- Default: 50 IPs per tenant resource group
- Increase: Submit capacity plan with justification
+ To expand capacity:
+ Add privateendpoints-subnet-3 /24 (subnet name pattern)
+ Add to params/prod/shared.tfvars subnet_allocation map
@@ -353,68 +318,61 @@
-
- Shared Infra
- ~82 IPs
-
-
-
- Tools
- ~25
+
+ Shared
+ ~55
-
- WLRS
- 50 IPs
+
+ WLRS
-
- SDPR
- 50 IPs
+
+ SDPR
-
- NR-DAP
- 50 IPs
+
+ NR
-
- Future
- 50 IPs
+
+ T4
-
- Future
- 50 IPs
+
+ T5
+
+
+ GROWTH CAPACITY — ~118 more tenants (each ~6 IPs)
+ Each tick above = 6 IPs. 3 PE subnets × ~251 IPs → ~123 max tenants in PROD.
-
- RESERVE / GROWTH CAPACITY
- ~143 IPs remaining for additional tenants or expansions
+
+ GROWTH CAPACITY
+ ~118 more tenants remaining (3 PE subnets × ~41 minus current 5)
- PROD PRIVATELINK MATH
- Total: 256 - Reserved: 5 - Shared: 10 = 241 usable
- 5 tenants × 50 IPs = 250 IPs allocated
- NOTE: Slightly over! May need /23 for PrivateLink
+ PROD PE CAPACITY MATH
+ 3 PE subnets × (251 usable / 6 IPs per tenant) = ~123 tenants
+ Currently 3 tenants deployed → ~120 tenant slots remain
+ Storage Account excluded — public access, no PE needed.EXPANSION OPTIONS
- 1. Increase PrivateLink subnet to /23 (512 IPs)
- 2. Use shared services more aggressively
- 3. Request IP quota increase with justification
+ 1. Add privateendpoints-subnet-3 /24 (+~41 tenants)
+ 2. Add more address spaces to subnet_allocation in tfvars
+ 3. Pool naming: privateendpoints-subnet-N (N starting at 1)KEY TAKEAWAY
- 50 IPs/tenant is based on real capacity planning.
- Share services by default. Dedicated = justify the IPs.
- Extended deployments may need increase request.
+ Storage Account has NO PE (public access in Landing Zone).
+ Capacity: ~41 tenants per /24 PE subnet.
@@ -438,14 +396,14 @@
2^(32-22) = 2^101,024~999
- PROD VNet (large)
+ PROD VNet total (4×/24 address spaces)/232^(32-23) = 2^9512~507
- TEST VNet / large subnets
+ TEST VNet (2×/24 = 512 IPs total)/24
@@ -459,28 +417,28 @@
2^(32-25) = 2^7128~123
- AI Foundry Compute
+ apim-subnet (PROD)/262^(32-26) = 2^664~59
- App Gateway, APIM, Bastion
+ appgw-subnet (PROD)/272^(32-27) = 2^532~27
- Small service subnet
+ aca-subnet; TEST/DEV workload; DEV PE/282^(32-28) = 2^416~11
- Jumpbox subnet (minimal)
+ Minimal — not used in this landing zone
@@ -489,41 +447,41 @@
HOW MANY TENANTS CAN WE SUPPORT?
- Calculation per environment (50 IPs/tenant baseline):
+ Calculation per environment (~5 IPs/tenant):
- PROD (/22 = 1,024 IPs)
- PrivateLink subnet /24 = 256 IPs → 256 - 5 reserved - 10 shared = 241 usable
- 241 ÷ 50 IPs/tenant =
- 4-5 tenants
- (or 8 minimal tenants, 3 extended tenants)
+ PROD (4×/24 = 1,024 IPs)
+ 3 PE subnets: 251 usable each → (246 + 251 + 251) / 5 IPs per tenant
+ Total capacity:
+ ~149 tenants
+ (3 currently deployed → 146 slots remain)
- TEST (/23 = 512 IPs)
- PrivateLink subnet /25 = 128 IPs → 128 - 5 reserved - 5 shared = 118 usable
- 118 ÷ 50 IPs/tenant =
- 2-3 tenants
- (most tenants share TEST env anyway)
+ TEST (2×/24 = 512 IPs)
+ PE subnet /24 (dedicated) = 251 usable → ~5 IPs shared stack → 246 for tenants
+ 246 ÷ 5 IPs/tenant =
+ ~49 tenants
+ (typically fewer tenants share test)
- DEV (/24 = 256 IPs)
- PrivateLink subnet /26 = 64 IPs → 64 - 5 reserved - 5 shared = 54 usable
- 54 ÷ 50 IPs/tenant =
- 1 tenant
- (DEV is for platform team / single dev tenant)
+ DEV (1×/24 = 256 IPs)
+ PE subnet /27 = 27 usable → ~5 IPs shared stack → 22 for tenants
+ 22 ÷ 5 IPs/tenant =
+ ~4 tenants
+ (DEV is for platform team — small PE subnet /27)Notes:* Azure reserves 5 IPs per subnet (network address, gateway, broadcast, 2× Azure DNS). These are NOT usable for resources.
- * Private endpoints consume 1 IP each. Multi-endpoint services (Storage with blob+file+queue+table) consume multiple IPs.
- * Container Apps Environments may use 10-20 IPs. Plan capacity before provisioning conditional services.
- * Shared services (Doc Intelligence, AI Search) can serve multiple tenants via RBAC - this SAVES IPs. Dedicated instances = consume tenant's IP budget.
- * To support more tenants: (1) Expand PrivateLink subnet to /23, (2) Encourage shared services over dedicated, (3) Request IP increase with justification.
+ * Private endpoints consume 1 IP each. Each AI tenant endpoint = 1 IP. Storage Account has NO private endpoint (public access in Landing Zone).
+ * Container Apps (aca-subnet /27) hosts the key rotation job; it does NOT host per-tenant AI workloads.
+ * Each tenant uses up to 5 PE IPs: Key Vault, AI Search, Cosmos DB, Document Intelligence, Speech Services.
+ * To add more tenant capacity: add privateendpoints-subnet-N address spaces to subnet_allocation in shared.tfvars params.* CIDR Quick Reference: Each step up (e.g., /24 → /23) DOUBLES the IPs. Each step down (e.g., /24 → /25) HALVES the IPs.
diff --git a/docs/assets/network-architecture.svg b/docs/assets/network-architecture.svg
index 27ab5c8..4c9c4fc 100644
--- a/docs/assets/network-architecture.svg
+++ b/docs/assets/network-architecture.svg
@@ -204,10 +204,11 @@
Shared Stack Private Endpoints
-
- AI Foundry Hub
- OpenAI models (Canada East)
- cognitiveservices
+
+ AI Foundry Hub
+ OpenAI models (Canada East)
+ cognitiveservices · openai
+ services.ai · 3 IPsAPIM PE
@@ -277,8 +278,8 @@
100 IN Allow 10.x.136.0/24 → 10.x.136.0/24 (self)101 IN Allow 10.x.82.0/24 → 10.x.136.0/24 (infra subnets)
- 300 IN Allow 10.x.115.0/24 → PE subnet (tools VNet)
- 200-301 OUT Allow PE ↔ VNet + PE ↔ Tools VNet
+ 300 IN Allow peered VNets → PE subnet (external projects)
+ 200-301 OUT Allow PE ↔ VNet address spaces
diff --git a/docs/assets/network-environments.svg b/docs/assets/network-environments.svg
index e0adc26..438cfbf 100644
--- a/docs/assets/network-environments.svg
+++ b/docs/assets/network-environments.svg
@@ -1,4 +1,4 @@
-
diff --git a/docs/diagrams.html b/docs/diagrams.html
index d9f3b7c..66c2175 100644
--- a/docs/diagrams.html
+++ b/docs/diagrams.html
@@ -1480,7 +1480,7 @@
What's Included / Not
IP Budget Breakdown
-
Detailed IP allocation: base infrastructure, per-tenant consumption, 50 IP calculation
+
Detailed IP allocation: base infrastructure, per-tenant consumption (~6 IPs each), capacity math (~41 tenants per /24 PE subnet)
@@ -1502,7 +1502,7 @@
Networking Architecture (Detailed)
Network Environments
-
All 4 VNets (prod, test, dev, tools) with subnet allocations and NSG rules
+
3 VNets (prod, test, dev) with subnet allocations and NSG rules
@@ -1724,7 +1724,7 @@
Network Architecture
Network Environments
Complete environment layout:
-
All 4 VNets (da4cf6-prod/test/dev/tools)
+
3 VNets (da4cf6-prod/test/dev) — tools VNet is a separate peered spoke (CI/CD only, not in this allocation)
Subnet allocations per environment
Canada Central (prod) vs Canada East (non-prod)
Client connectivity via App Gateway + APIM
diff --git a/infra-ai-hub/README.md b/infra-ai-hub/README.md
index a6a4afb..97bbab2 100644
--- a/infra-ai-hub/README.md
+++ b/infra-ai-hub/README.md
@@ -898,7 +898,7 @@ Before creating the tenant config, determine which PE subnet to assign:
2. **Prod**: Check current PE count per subnet to find the one with most remaining capacity:
- Azure Portal → VNet → Subnets → each `privateendpoints-subnet*` → "Connected devices" count
- Or CLI: `az network vnet subnet show --resource-group --vnet-name --name privateendpoints-subnet --query 'ipConfigurations | length(@)'`
-3. Each `/24` PE subnet holds ~251 usable IPs; each tenant uses up to 5 PEs
+3. Each `/24` PE subnet holds ~251 usable IPs; each tenant uses up to 6 PE IPs (Cosmos DB = 2 IPs: sql global + canadacentral regional)
4. Record the chosen key (e.g., `privateendpoints-subnet-1`) — it is **immutable** after first deploy
#### 1. Create Tenant Configuration File