Add SwiftV2 long-running pipeline with scheduled tests #4142

sivakami-projects · 2025-11-24T16:53:34Z

Pipeline to run repeated tests on long running Swiftv2 AKS clusters.

Testing Approach
Test Lifecycle (per stage):
Create 8 pod scenarios with PodNetwork, PodNetworkInstance, Pods
Run 9 connectivity tests (HTTP-based)
Run private endpoint tests (storage access)
Delete all resources (Phase 1: Pods, Phase 2: PNI/PN/Namespaces)

Node Selection:
Tests filter by workload-type=$WORKLOAD_TYPE AND nic-capacity labels
Ensures isolation between different workload type stages
Currently: WORKLOAD_TYPE=swiftv2-linux

Files Changed
Pipeline Configuration
pipeline.yaml: Main pipeline with schedule trigger
long-running-pipeline-template.yaml: Stage definitions with VM SKU constants

Setup Scripts
create_aks.sh: AKS cluster creation with node labeling
create_vnets.sh: Customer VNet creation
create_peerings.sh: VNet peering mesh
create_storage.sh: Storage accounts with public access disabled (SA1 only)
create_nsg.sh: NSG rule application with retry logic
create_pe.sh: Private endpoint and DNS zone setup

Test Code
datapath.go: Enhanced with node label filtering, private endpoint testing
datapath_create_test.go: Resource creation scenarios
datapath_connectivity_test.go: HTTP connectivity validation
datapath_private_endpoint_test.go: Private endpoint access/isolation tests
datapath_delete_test.go: Resource cleanup

Documentation
README.md:

Reason for Change:

Issue Fixed:

Requirements:

uses conventional commit messages
includes documentation
adds unit tests
relevant PR labels added

Notes:

- Implemented scheduled pipeline running every 1 hour with persistent infrastructure - Split test execution into 2 jobs: Create (with 20min wait) and Delete - Added 8 test scenarios across 2 AKS clusters, 4 VNets, different subnets - Implemented two-phase deletion strategy to prevent PNI ReservationInUse errors - Added context timeouts on kubectl commands with force delete fallbacks - Resource naming uses RG name as BUILD_ID for uniqueness across parallel setups - Added SkipAutoDeleteTill tags to prevent automatic resource cleanup - Conditional setup stages controlled by runSetupStages parameter - Auto-generate RG name from location or allow custom names for parallel setups - Added comprehensive README with setup instructions and troubleshooting - Node selection by agentpool labels with usage tracking to prevent conflicts - Kubernetes naming compliance (RFC 1123) for all resources fix ginkgo flag. Add datapath tests. Delete old test file. Add testcases for provate endpoint. Ginkgo run specs only on specified files. update pipeline params. Add ginkgo tags Add datapath tests. Add ginkgo build tags. remove wait time. set namespace. update pod image. Add more nsg rules to block subnets s1 and s2 test change. Change delegated subnet address range. Use delegated interface for network connectivity tests. Datapath test between clusters. test. test private endpoints. fix private endpoint tests. Set storage account names in putput var. set storage account name. fix pn names. update pe update pe test. update sas token generation. Add node labels for sw2 scenario, cleanup pods on any test failure. enable nsg tests. update storage. Add rules to nsg. disable private endpoint negative test. disable public network access on storage account with private endpoint. wait for default nsg to be created. disable negative test on private endpoint. private endpoint depends on aks cluster vnets, change pipeline job dependencies. Add node labels for each workload type and nic capacity. make sku constant. Update readme, set schedule for long running cluster on test branch.

Copilot

Pull request overview

This PR adds a comprehensive long-running test pipeline for SwiftV2 pod networking on Azure AKS. The pipeline creates persistent infrastructure (2 AKS clusters, 4 VNets, storage accounts with private endpoints, NSGs) and runs scheduled tests every 3 hours to validate pod-to-pod connectivity, network security group isolation, and private endpoint access across multi-tenant scenarios.

Key Changes:

Adds scheduled pipeline with conditional infrastructure setup (runSetupStages parameter)
Implements 8 pod test scenarios across 2 clusters and 4 VNets with different NIC capacities
Includes 9 connectivity tests and 5 private endpoint tests with tenant isolation validation

Reviewed changes

Copilot reviewed 19 out of 20 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
.pipelines/swiftv2-long-running/pipeline.yaml	Main pipeline with 3-hour scheduled trigger and runSetupStages parameter
.pipelines/swiftv2-long-running/template/long-running-pipeline-template.yaml	Two-stage template: setup (conditional) and datapath tests with 4 jobs
.pipelines/swiftv2-long-running/scripts/*.sh	Infrastructure setup scripts for AKS, VNets, storage, NSGs, and private endpoints
test/integration/swiftv2/longRunningCluster/datapath*.go	Test implementation split into create, connectivity, private endpoint, and delete tests
test/integration/swiftv2/helpers/az_helpers.go	Azure CLI and kubectl helper functions for resource management
test/integration/manifests/swiftv2/long-running-cluster/*.yaml	Kubernetes resource templates for PodNetwork, PNI, and Pods
go.mod, go.sum	Updates to support Ginkgo v2 testing framework
hack/aks/Makefile	Updates for SwiftV2 cluster creation with multi-tenancy tags
.pipelines/swiftv2-long-running/README.md	Comprehensive documentation of pipeline architecture and test scenarios

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-24T17:14:47Z

go.mod

+go 1.24.0
+
+toolchain go1.24.10


The Go version specification is inconsistent. Line 3 declares go 1.24.0 but line 5 specifies a toolchain of go1.24.10. Note that Go 1.24 doesn't exist yet (the current stable version as of January 2025 is Go 1.22). This appears to be an invalid version that will likely cause build failures. Consider using a valid Go version like 1.22.5 (which is used in the pipeline) or 1.23.x.

Suggested change

go 1.24.0

toolchain go1.24.10

go 1.22.5

toolchain go1.22.5

Copilot · 2025-11-24T17:14:47Z

.pipelines/swiftv2-long-running/scripts/create_vnets.sh

+    cmd_delegator_curl="'curl -X PUT http://localhost:8080/DelegatedSubnet/$modified_custsubnet'"
+    cmd_containerapp_exec="az containerapp exec -n subnetdelegator-westus-u3h4j -g subnetdelegator-westus --subscription 9b8218f9-902a-4d20-a65c-e98acec5362f --command $cmd_delegator_curl"


Hardcoded credentials and subscription IDs in the script. The script contains a hardcoded subscription ID 9b8218f9-902a-4d20-a65c-e98acec5362f and references a specific container app subnetdelegator-westus-u3h4j in resource group subnetdelegator-westus. These hardcoded values make the script non-portable and could expose sensitive information. Consider parameterizing these values or using environment variables.

Copilot · 2025-11-24T17:14:48Z

.pipelines/swiftv2-long-running/scripts/create_aks.sh

+    responseFile="response.txt"
+    modified_vnet="${vnet_id//\//%2F}"
+    cmd_stamp_curl="'curl -v -X PUT http://localhost:8080/VirtualNetwork/$modified_vnet/stampcreatorservicename'"
+    cmd_containerapp_exec="az containerapp exec -n subnetdelegator-westus-u3h4j -g subnetdelegator-westus --subscription 9b8218f9-902a-4d20-a65c-e98acec5362f --command $cmd_stamp_curl"


Same hardcoded credentials issue. The script contains hardcoded subscription ID 9b8218f9-902a-4d20-a65c-e98acec5362f and references to subnetdelegator-westus-u3h4j container app. Consider parameterizing these values.

Copilot · 2025-11-24T17:14:48Z

.pipelines/swiftv2-long-running/scripts/create_storage.sh

+# # Disable public network access ONLY on SA1 (Tenant A storage with private endpoint)
+# echo "==> Disabling public network access on $SA1"
+# az storage account update \
+#   --name "$SA1" \
+#   --resource-group "$RG" \
+#   --public-network-access Disabled \
+#   --output none \
+#   && echo "[OK] Public network access disabled on $SA1"


Commented-out code should be removed. Lines 74-81 contain commented-out code for disabling public network access on storage accounts. If this functionality is not needed, remove it. If it's planned for future use, add a TODO comment explaining why it's disabled and when it should be re-enabled.

Copilot · 2025-11-24T17:14:48Z

test/integration/swiftv2/longRunningCluster/datapath_create_test.go

+	rg := os.Getenv("RG")
+	buildId := os.Getenv("BUILD_ID")
+
+	if rg == "" || buildId == "" {
+		ginkgo.Fail(fmt.Sprintf("Missing required environment variables: RG='%s', BUILD_ID='%s'", rg, buildId))
+	}
+
+	ginkgo.It("creates PodNetwork, PodNetworkInstance, and Pods", ginkgo.NodeTimeout(0), func() {


Missing error handling for the ginkgo.Fail() calls at the module level. The environment variable validation (lines 25-27) executes at test suite initialization time, outside of the ginkgo.It() block. If these checks fail, the test suite won't properly handle the error. Move this validation inside the ginkgo.It() block (like in datapath_private_endpoint_test.go lines 30-36) for proper error handling.

Suggested change

rg := os.Getenv("RG")

buildId := os.Getenv("BUILD_ID")

if rg == "" || buildId == "" {

ginkgo.Fail(fmt.Sprintf("Missing required environment variables: RG='%s', BUILD_ID='%s'", rg, buildId))

}

ginkgo.It("creates PodNetwork, PodNetworkInstance, and Pods", ginkgo.NodeTimeout(0), func() {

ginkgo.It("creates PodNetwork, PodNetworkInstance, and Pods", ginkgo.NodeTimeout(0), func() {

rg := os.Getenv("RG")

buildId := os.Getenv("BUILD_ID")

if rg == "" || buildId == "" {

ginkgo.Fail(fmt.Sprintf("Missing required environment variables: RG='%s', BUILD_ID='%s'", rg, buildId))

}

Copilot · 2025-11-24T17:14:49Z

test/integration/swiftv2/longRunningCluster/datapath_delete_test.go

+	rg := os.Getenv("RG")
+	buildId := os.Getenv("BUILD_ID")
+
+	if rg == "" || buildId == "" {
+		ginkgo.Fail(fmt.Sprintf("Missing required environment variables: RG='%s', BUILD_ID='%s'", rg, buildId))
+	}
+
+	ginkgo.It("deletes PodNetwork, PodNetworkInstance, and Pods", ginkgo.NodeTimeout(0), func() {


Same issue as in datapath_create_test.go. Environment variable validation at lines 25-27 happens outside the test block and won't be properly handled. Move this validation inside the ginkgo.It() block for consistent error handling.

Suggested change

rg := os.Getenv("RG")

buildId := os.Getenv("BUILD_ID")

if rg == "" || buildId == "" {

ginkgo.Fail(fmt.Sprintf("Missing required environment variables: RG='%s', BUILD_ID='%s'", rg, buildId))

}

ginkgo.It("deletes PodNetwork, PodNetworkInstance, and Pods", ginkgo.NodeTimeout(0), func() {

ginkgo.It("deletes PodNetwork, PodNetworkInstance, and Pods", ginkgo.NodeTimeout(0), func() {

rg := os.Getenv("RG")

buildId := os.Getenv("BUILD_ID")

if rg == "" || buildId == "" {

ginkgo.Fail(fmt.Sprintf("Missing required environment variables: RG='%s', BUILD_ID='%s'", rg, buildId))

}

Copilot · 2025-11-24T17:14:49Z

test/integration/swiftv2/longRunningCluster/datapath_connectivity_test.go

+	rg := os.Getenv("RG")
+	buildId := os.Getenv("BUILD_ID")
+
+	if rg == "" || buildId == "" {
+		ginkgo.Fail(fmt.Sprintf("Missing required environment variables: RG='%s', BUILD_ID='%s'", rg, buildId))
+	}
+
+	ginkgo.It("tests HTTP connectivity between pods", ginkgo.NodeTimeout(0), func() {


Same environment variable validation issue. Lines 25-29 perform validation outside the test block. Move inside the ginkgo.It() block for proper error handling.

Suggested change

rg := os.Getenv("RG")

buildId := os.Getenv("BUILD_ID")

if rg == "" || buildId == "" {

ginkgo.Fail(fmt.Sprintf("Missing required environment variables: RG='%s', BUILD_ID='%s'", rg, buildId))

}

ginkgo.It("tests HTTP connectivity between pods", ginkgo.NodeTimeout(0), func() {

ginkgo.It("tests HTTP connectivity between pods", ginkgo.NodeTimeout(0), func() {

rg := os.Getenv("RG")

buildId := os.Getenv("BUILD_ID")

if rg == "" || buildId == "" {

ginkgo.Fail(fmt.Sprintf("Missing required environment variables: RG='%s', BUILD_ID='%s'", rg, buildId))

}

Copilot · 2025-11-24T17:14:49Z

test/integration/swiftv2/longRunningCluster/datapath_delete_test.go

@@ -0,0 +1,117 @@
+// +build delete_test


Inconsistent build tag syntax. Line 1 uses the old // +build format while line 2 uses the new //go:build format. According to Go 1.17+ conventions, you should only use the //go:build directive. The old // +build format can be removed.

Suggested change

// +build delete_test

//go:build delete_test

sivakami added 2 commits November 24, 2025 08:38

Update readme file.

873c05e

sivakami-projects marked this pull request as ready for review November 24, 2025 17:07

sivakami-projects requested a review from a team as a code owner November 24, 2025 17:07

sivakami-projects requested review from Copilot and ravisraju November 24, 2025 17:07

Copilot started reviewing on behalf of sivakami-projects November 24, 2025 17:07 View session

Copilot finished reviewing on behalf of sivakami-projects November 24, 2025 17:09

Copilot AI reviewed Nov 24, 2025

View reviewed changes

fix syntax for pe test.

3395415

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add SwiftV2 long-running pipeline with scheduled tests #4142

Add SwiftV2 long-running pipeline with scheduled tests #4142

Uh oh!

sivakami-projects commented Nov 24, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 24, 2025

Uh oh!

Copilot AI Nov 24, 2025

Uh oh!

Copilot AI Nov 24, 2025

Uh oh!

Copilot AI Nov 24, 2025

Uh oh!

Copilot AI Nov 24, 2025

Uh oh!

Copilot AI Nov 24, 2025

Uh oh!

Copilot AI Nov 24, 2025

Uh oh!

Copilot AI Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		cmd_delegator_curl="'curl -X PUT http://localhost:8080/DelegatedSubnet/$modified_custsubnet'"
		cmd_containerapp_exec="az containerapp exec -n subnetdelegator-westus-u3h4j -g subnetdelegator-westus --subscription 9b8218f9-902a-4d20-a65c-e98acec5362f --command $cmd_delegator_curl"

Add SwiftV2 long-running pipeline with scheduled tests #4142

Are you sure you want to change the base?

Add SwiftV2 long-running pipeline with scheduled tests #4142

Uh oh!

Conversation

sivakami-projects commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sivakami-projects commented Nov 24, 2025 •

edited

Loading