@@ -50,33 +50,11 @@ Examples: sv2-long-run-12345, sv2-long-run-67890
5050- ** Lifecycle** : Can be cleaned up after testing completes
5151- ** Example** : PR validation run with Build ID 12345 → ` sv2-long-run-12345 `
5252
53- ** 3. Parallel/Custom Environments** :
54- ```
55- Pattern: sv2-long-run-<region>-<suffix>
56- Examples: sv2-long-run-centraluseuap-dev, sv2-long-run-eastus-staging
57- ```
58- - ** When to use** : Parallel environments, feature testing, version upgrades
59- - ** Purpose** : Isolated environment alongside production
60- - ** Lifecycle** : Persistent or temporary based on use case
61- - ** Example** : Development environment in Central US EUAP → ` sv2-long-run-centraluseuap-dev `
62-
6353** Important Notes** :
64- - ⚠️ Always follow the naming pattern for scheduled runs on master: ` sv2-long-run-<region> `
65- - ⚠️ Do not use build IDs for production scheduled infrastructure (it breaks continuity)
66- - ⚠️ Region name should match the ` location ` parameter for consistency
67- - ✅ All resource names within the setup use the resource group name as BUILD_ID prefix
68-
69- ### Mode 1: Scheduled Test Runs (Default)
70- ** Trigger** : Automated cron schedule every 1 hour
71- ** Purpose** : Continuous validation of long-running infrastructure
72- ** Setup Stages** : Disabled
73- ** Test Duration** : ~ 30-40 minutes per run
74- ** Resource Group** : Static (default: ` sv2-long-run-<region> ` , e.g., ` sv2-long-run-centraluseuap ` )
54+ - Always follow the naming pattern for scheduled runs on master: ` sv2-long-run-<region> `
55+ - Do not use build IDs for production scheduled infrastructure (it breaks continuity)
56+ - All resource names within the setup use the resource group name as BUILD_ID prefix
7557
76- ``` yaml
77- # Runs automatically every 1 hour
78- # No manual/external triggers allowed
79- ```
8058
8159### Mode 2: Initial Setup or Rebuild
8260** Trigger** : Manual run with parameter change
@@ -120,15 +98,6 @@ Parameters are organized by usage:
12098| -----------| ---------| -------------|
12199| ` resourceGroupName ` | ` "" ` (empty) | ** Leave empty** to auto-generate based on usage pattern. See Resource Group Naming Conventions below. |
122100
123- ** Resource Group Naming Conventions** :
124- - ** For scheduled runs on master/main branch** : Use ` sv2-long-run-<region> ` (e.g., ` sv2-long-run-centraluseuap ` )
125- - This ensures consistent naming for production scheduled tests
126- - Example: Creating infrastructure in ` centraluseuap ` for scheduled runs → ` sv2-long-run-centraluseuap `
127- - ** For test/dev runs or PR validation** : Use ` sv2-long-run-$(Build.BuildId) `
128- - Auto-cleanup after testing
129- - Example: ` sv2-long-run-12345 ` (where 12345 is the build ID)
130- - ** For parallel environments** : Use descriptive suffix (e.g., ` sv2-long-run-centraluseuap-dev ` , ` sv2-long-run-eastus-staging ` )
131-
132101** Note** : VM SKUs are hardcoded as constants in the pipeline template:
133102- Default nodepool: ` Standard_D4s_v3 ` (low-nic capacity, 1 NIC)
134103- NPLinux nodepool: ` Standard_D16s_v3 ` (high-nic capacity, 7 NICs)
@@ -161,49 +130,42 @@ The pipeline is organized into stages based on workload type, allowing sequentia
161130### Future Stages (Planned Architecture)
162131Additional stages can be added to test different workload types sequentially:
163132
164- ** Example: Stage 3 - BYONodeDataPathTests **
133+ ** Example: Stage 3 - LinuxBYONodeDataPathTests **
165134``` yaml
166- - stage : BYONodeDataPathTests
135+ - stage : LinuxBYONodeDataPathTests
167136 displayName : " SwiftV2 Data Path Tests - BYO Node ID"
168137 dependsOn : ManagedNodeDataPathTests
169138 variables :
170- WORKLOAD_TYPE : " swiftv2-byonodeid "
139+ WORKLOAD_TYPE : " swiftv2-linuxbyon "
171140 # Same job structure as ManagedNodeDataPathTests
172141 # Tests run on nodes labeled: workload-type=swiftv2-byonodeid
173142```
174143
175- ** Example: Stage 4 - WindowsNodeDataPathTests **
144+ ** Example: Stage 4 - L1vhAccelnetNodeDataPathTests **
176145``` yaml
177- - stage : WindowsNodeDataPathTests
178- displayName : " SwiftV2 Data Path Tests - Windows Nodes"
146+ - stage : L1vhAccelnetNodeDataPathTests
147+ displayName : " SwiftV2 Data Path Tests - Windows Nodes Accelnet "
179148 dependsOn : BYONodeDataPathTests
180149 variables :
181150 WORKLOAD_TYPE : " swiftv2-windows"
182151 # Same job structure
183152 # Tests run on nodes labeled: workload-type=swiftv2-windows
184153```
185154
186- ** Benefits of Stage-Based Architecture** :
187- - ✅ Sequential execution: Each workload type tested independently
188- - ✅ Isolated node pools: No resource contention between workload types
189- - ✅ Same infrastructure: All stages use the same VNets, storage, NSGs
190- - ✅ Same test suite: Connectivity and private endpoint tests run for each workload type
191- - ✅ Easy extensibility: Add new stages without modifying existing ones
192- - ✅ Clear results: Separate test results per workload type
193-
194155** Node Labeling for Multiple Workload Types** :
195156Each node pool gets labeled with its designated workload type during setup:
196157``` bash
197158# During cluster creation or node pool addition:
198- kubectl label nodes -l agentpool=nodepool1 workload-type=swiftv2-linux
199- kubectl label nodes -l agentpool=byonodepool workload-type=swiftv2-byonodeid
200- kubectl label nodes -l agentpool=winnodepool workload-type=swiftv2-windows
159+ kubectl label nodes -l workload-type=swiftv2-linux
160+ kubectl label nodes -l workload-type=swiftv2-linuxbyon
161+ kubectl label nodes -l workload-type=swiftv2-l1vhaccelnet
162+ kubectl label nodes -l workload-type=swiftv2-l1vhib
201163```
202164
203165## How It Works
204166
205167### Scheduled Test Flow
206- Every 1 hour, the pipeline:
168+ Every 3 hour, the pipeline:
2071691 . Skips setup stages (infrastructure already exists)
2081702 . ** Job 1 - Create Resources** : Creates 8 test scenarios (PodNetwork, PNI, Pods with HTTP servers on port 8080)
2091713 . ** Job 2 - Connectivity Tests** : Tests HTTP connectivity between pods (9 test cases), then waits 20 minutes
@@ -361,142 +323,6 @@ pod-c1-aks1-a1s1-low
361323
362324** All infrastructure resources are tagged with ` SkipAutoDeleteTill=2032-12-31 ` ** to prevent automatic cleanup by Azure subscription policies.
363325
364- ## Resource Naming
365-
366- All test resources use the pattern: ` <type>-static-setup-<vnet>-<subnet> `
367-
368- ** Examples** :
369- - PodNetwork: ` pn-static-setup-a1-s1 `
370- - PodNetworkInstance: ` pni-static-setup-a1-s1 `
371- - Pod: ` pod-c1-aks1-a1s1-low `
372- - Namespace: ` pn-static-setup-a1-s1 `
373-
374- VNet names are simplified:
375- - ` cx_vnet_a1 ` → ` a1 `
376- - ` cx_vnet_b1 ` → ` b1 `
377-
378- ## Switching to a New Setup
379-
380- ** Scenario** : You created a new setup in RG ` sv2-long-run-eastus ` and want scheduled runs to use it.
381-
382- ** Steps** :
383- 1 . Go to Pipeline → Edit
384- 2 . Update location parameter default value:
385- ``` yaml
386- - name : location
387- default : " centraluseuap" # Change this
388- ` ` `
389- 3. Save and commit
390- 4. RG name will automatically become ` sv2-long-run-centraluseuap`
391-
392- Alternatively, manually trigger with the new location or override `resourceGroupName` directly.
393-
394- # # Creating Multiple Test Setups
395-
396- **Use Case**: You want to create a new test environment without affecting the existing one (e.g., for testing different configurations, regions, or versions).
397-
398- **Steps**:
399- 1. Go to Pipeline → Run pipeline
400- 2. Set `runSetupStages` = `true`
401- 3. **Set `resourceGroupName`** based on usage :
402- - **For scheduled runs on master/main branch**: `sv2-long-run-<region>` (e.g., `sv2-long-run-centraluseuap`, `sv2-long-run-eastus`)
403- - Use this naming pattern for production scheduled tests
404- - **For test/dev runs**: `sv2-long-run-$(Build.BuildId)` or custom (e.g., `sv2-long-run-12345`)
405- - For temporary testing or PR validation
406- - **For parallel environments**: Custom with descriptive suffix (e.g., `sv2-long-run-centraluseuap-dev`, `sv2-long-run-centraluseuap-v2`)
407- 4. Optionally adjust `location`
408- 5. Run pipeline
409-
410- **After setup completes**:
411- - The new infrastructure will be tagged with `SkipAutoDeleteTill=2032-12-31`
412- - Resources are isolated by the unique resource group name
413- - To run tests against the new setup, the scheduled pipeline would need to be updated with the new RG name
414-
415- **Example Scenarios**:
416- | Scenario | Resource Group Name | Purpose | Naming Pattern |
417- |----------|-------------------|---------|----------------|
418- | Production scheduled (Central US EUAP) | `sv2-long-run-centraluseuap` | Daily scheduled tests on master | `sv2-long-run-<region>` |
419- | Production scheduled (East US) | `sv2-long-run-eastus` | Regional scheduled testing on master | `sv2-long-run-<region>` |
420- | Temporary test run | `sv2-long-run-12345` | One-time testing (Build ID : 12345) | `sv2-long-run-$(Build.BuildId)` |
421- | Development environment | `sv2-long-run-centraluseuap-dev` | Development/testing | Custom with suffix |
422- | Version upgrade testing | `sv2-long-run-centraluseuap-v2` | Parallel environment for upgrades | Custom with suffix |
423-
424- # # Resource Naming
425- instead of ping use
426- The pipeline uses the **resource group name as the BUILD_ID** to ensure unique resource names per test setup. This allows multiple parallel test environments without naming collisions.
427-
428- **Generated Resource Names**:
429- ```
430- BUILD_ID = <resourceGroupName >
431-
432- PodNetwork: pn-<BUILD_ID>-<vnet >-<subnet >
433- PodNetworkInstance: pni-<BUILD_ID>-<vnet >-<subnet >
434- Namespace: pn-<BUILD_ID>-<vnet >-<subnet >
435- Pod: pod-<scenario-suffix >
436- ```
437-
438- **Example for `resourceGroupName=sv2-long-run-centraluseuap`**:
439- ```
440- pn-sv2-long-run-centraluseuap-b1-s1 (PodNetwork for cx_vnet_b1, subnet s1)
441- pni-sv2-long-run-centraluseuap-b1-s1 (PodNetworkInstance)
442- pn-sv2-long-run-centraluseuap-a1-s1 (PodNetwork for cx_vnet_a1, subnet s1)
443- pni-sv2-long-run-centraluseuap-a1-s2 (PodNetworkInstance for cx_vnet_a1, subnet s2)
444- ```
445-
446- **Example for different setup `resourceGroupName=sv2-long-run-eastus`**:
447- ```
448- pn-sv2-long-run-eastus-b1-s1 (Different from centraluseuap setup)
449- pni-sv2-long-run-eastus-b1-s1
450- pn-sv2-long-run-eastus-a1-s1
451- ```
452-
453- This ensures **no collision** between different test setups running in parallel.
454-
455- ## Deletion Strategy
456- ### Phase 1: Delete All Pods
457- Deletes all pods across all scenarios first. This ensures IP reservations are released.
458-
459- ```
460- Deleting pod pod-c2-aks2-b1s1-low...
461- Deleting pod pod-c2-aks2-b1s1-high...
462- ...
463- ```
464-
465- ### Phase 2: Delete Shared Resources
466- Groups resources by vnet/subnet/cluster and deletes PNI/PN/Namespace once per group.
467-
468- ```
469- Deleting PodNetworkInstance pni-static-setup-b1-s1...
470- Deleting PodNetwork pn-static-setup-b1-s1...
471- Deleting namespace pn-static-setup-b1-s1...
472- ```
473-
474- **Why**: Multiple pods can share the same PNI. Deleting PNI while pods exist causes "ReservationInUse" errors.
475-
476- ## Troubleshooting
477-
478- ### Tests are running on wrong cluster
479- - Check `resourceGroupName` parameter points to correct RG
480- - Verify RG contains aks-1 and aks-2 clusters
481- - Check kubeconfig retrieval in logs
482-
483- ### Setup stages not running
484- - Verify `runSetupStages` parameter is set to `true`
485- - Check condition: `condition: eq(${{ parameters.runSetupStages }}, true)`
486-
487- ### Schedule not triggering
488- - Verify cron expression: `"0 */1 * * *"` (every 1 hour)
489- - Check branch in schedule matches your working branch
490- - Ensure `always: true` is set (runs even without code changes)
491-
492- ### PNI stuck with "ReservationInUse"
493- - Check if pods were deleted first (Phase 1 logs)
494- - Manual fix: Delete pod → Wait 10s → Patch PNI to remove finalizers
495-
496- ### Pipeline timeout after 6 hours
497- - This is expected behavior (timeoutInMinutes: 360)
498- - Tests should complete in ~30-40 minutes
499- - If tests hang, check deletion logs for stuck resources
500326
501327## Manual Testing
502328
@@ -544,21 +370,6 @@ kubectl label nodes -l agentpool=nodepool1 nic-capacity=low-nic --overwrite
544370kubectl label nodes -l agentpool=nplinux nic-capacity=high-nic --overwrite
545371```
546372
547- ** Example Node Labels** :
548- ``` yaml
549- # Low-NIC node (nodepool1)
550- labels :
551- agentpool : nodepool1
552- workload-type : swiftv2-linux
553- nic-capacity : low-nic
554-
555- # High-NIC node (nplinux)
556- labels :
557- agentpool : nplinux
558- workload-type : swiftv2-linux
559- nic-capacity : high-nic
560- ` ` `
561-
562373### Node Selection in Tests
563374
564375Tests use these labels to select appropriate nodes dynamically:
@@ -588,20 +399,6 @@ Tests use these labels to select appropriate nodes dynamically:
588399
589400** Note** : VM SKUs are hardcoded as constants in the pipeline template and cannot be changed by users.
590401
591- # # Schedule Modification
592-
593- To change test frequency, edit the cron schedule :
594-
595- ` ` ` yaml
596- schedules:
597- - cron: "0 */1 * * *" # Every 1 hour (current)
598- # Examples:
599- # - cron: "0 */2 * * *" # Every 2 hours
600- # - cron: "0 */6 * * *" # Every 6 hours
601- # - cron: "0 0,8,16 * * *" # At 12am, 8am, 4pm
602- # - cron: "0 0 * * *" # Daily at midnight
603- ` ` `
604-
605402## File Structure
606403
607404```
@@ -639,23 +436,3 @@ test/integration/swiftv2/longRunningCluster/
639436 - Storage accounts
6404375 . ** Avoid resource group collisions** : Always use unique ` resourceGroupName ` when creating new setups
6414386 . ** Document changes** : Update this README when modifying test scenarios or infrastructure
642-
643- ## Resource Tags
644-
645- All infrastructure resources are automatically tagged during creation:
646-
647- ```bash
648- SkipAutoDeleteTill=2032-12-31
649- ```
650-
651- This prevents automatic cleanup by Azure subscription policies that delete resources after a certain period. The tag is applied to:
652- - Resource group (via create_resource_group job)
653- - AKS clusters (aks-1, aks-2)
654- - AKS cluster VNets
655- - Customer VNets (cx_vnet_a1, cx_vnet_a2, cx_vnet_a3, cx_vnet_b1)
656- - Storage accounts (sa1xxxx, sa2xxxx)
657-
658- To manually update the tag date:
659- ``` bash
660- az resource update --ids < resource-id> --set tags.SkipAutoDeleteTill=2033-12-31
661- ```
0 commit comments