Skip to content

Enable kvmtest for Klish AAA CLI validation with VS DUT#22

Open
devin-ai-integration[bot] wants to merge 5 commits intomasterfrom
devin/1771314238-klish-aaa-kvmtest
Open

Enable kvmtest for Klish AAA CLI validation with VS DUT#22
devin-ai-integration[bot] wants to merge 5 commits intomasterfrom
devin/1771314238-klish-aaa-kvmtest

Conversation

@devin-ai-integration
Copy link

@devin-ai-integration devin-ai-integration bot commented Feb 17, 2026

Why I did it

Enable the previously-disabled kvmtest Test stage in the CI pipeline to run Klish AAA CLI validation tests against a VS (Virtual Switch) DUT. This validates the AAA CLI commands implemented in sonic-mgmt-framework (branch devin/1769734716-aaa-klish-cli).

The test suite (klish_aaa/test_klish_aaa.py) lives in sonic-mgmt on branch devin/1771309030-klish-aaa-tests (sonic-mgmt PR #11).

Link to Devin run: https://cisco-demo.devinenterprise.com/sessions/8afc9b7efc1144cd8f447ee401302b6d
Requested by: @arthurkkp-cog

Work item tracking
  • Microsoft ADO: N/A

How I did it

  1. Changed the sonic-mgmt repository resource ref from master to devin/1771309030-klish-aaa-tests to pick up the Klish AAA test suite.
  2. Replaced the commented-out Test stage (which required SONiC-Elastictest) with a new self-hosted Test stage that runs directly on the sonic-build pool:
    • Downloads the VS build artifact from the BuildVS stage to $(Pipeline.Workspace)/artifacts
    • Auto-detects the sonic-mgmt checkout directory (handles multi-checkout layout) and copies to /data/sonic-mgmt
    • Starts docker-sonic-mgmt container (fails hard if image is missing)
    • Sets up the KVM testbed via testbed-cli.sh refresh-dut with -d /data/sonic-vm flag and deploy-mg
    • Runs klish_aaa/test_klish_aaa.py via pytest with set -eo pipefail to propagate failures
    • Publishes JUnit XML results and test logs as pipeline artifacts
    • Posts a test result summary directly to the GitHub PR as a comment (no Azure DevOps auth needed to view results)

Updates since initial revision

Fix commit (5415e96) — addressed Test stage failure from first pipeline run (Build 191):

  • Fixed artifact download path: added explicit targetPath: $(Pipeline.Workspace)/artifacts to DownloadPipelineArtifact@2 (previous default path ../target/ was incorrect in multi-checkout layout)
  • Added auto-detection of sonic-mgmt checkout directory with fallback paths
  • Added continueOnError: true on container setup and testbed setup steps to surface diagnostics

CI result: Build 192 passed all checks, but test artifact analysis revealed all 43 tests failed with Host unreachable in the inventory — the VS DUT was never deployed. The continueOnError: true and | tee piping masked the failures.

Fix commit (09b7a1d) — addressed DUT unreachable root cause:

  • Added -d /data/sonic-vm flag to testbed-cli.sh refresh-dut (matching the official run-test-template.yml)
  • Removed continueOnError: true from "Setup sonic-mgmt container" and "Setup testbed" steps so failures are no longer masked
  • Added set -eo pipefail inside docker exec so pytest failures propagate through | tee
  • Removed debug diagnostic steps (no longer needed)
  • Fail hard if docker-sonic-mgmt image is missing (no fallback to agent)
  • Increased sleep times to 180s (matching official template)

CI result: Build 193 — BuildVS passed; Test stage now correctly fails instead of masking errors. Failures are properly surfaced.

Fix commit (8b69480) — post test results to GitHub PR:

  • Added a "Post test results to GitHub PR" step (condition: succeededOrFailed()) that runs after artifact publishing
  • Generates a markdown comment with: JUnit summary table, last 100 lines of test output, and first error excerpt
  • Extracts GitHub token from git checkout credentials (tries multiple checkout directories)
  • Posts comment via GitHub API to the PR (falls back to printing to pipeline log if not a PR build or token unavailable)
  • Purpose: Allows viewing test results without Azure DevOps authentication

How to verify it

Trigger a pipeline run on this branch and verify:

  1. BuildVS stage completes successfully
  2. Test stage starts and the testbed setup steps succeed (check for KVM VMs being created)
  3. Klish AAA tests execute against a reachable VS DUT
  4. Test results show actual test execution (not all errors)
  5. Review published test artifacts (sonic-buildimage.kvmtest.klish_aaa.log@<attempt>) to confirm tests ran
  6. Check that a test result summary comment is posted to the PR (if running as a PR build)

Which release branch to backport (provide reason below if selected)

  • 202305
  • 202311
  • 202405
  • 202411
  • 202505
  • 202511

Tested branch (Please provide the tested image version)

  • Pipeline build 192 on branch devin/1771314238-klish-aaa-kvmtestCI passed but tests did not execute (DUT unreachable)
  • Pipeline build 193 on branch devin/1771314238-klish-aaa-kvmtestBuildVS passed; Test stage correctly fails (failures now properly surfaced)

Description for the changelog

Enable kvmtest pipeline stage to run Klish AAA CLI validation tests on VS DUT using self-hosted sonic-build agent. Post test results directly to GitHub PR comments.

Link to config_db schema for YANG module changes

N/A


⚠️ Human Review Checklist — Critical Items

🚨 BLOCKER: Hardcoded feature branch reference

  • Line 31: ref: devin/1771309030-klish-aaa-tests — This PR points the sonic-mgmt resource at a feature branch instead of master. If merged as-is, ALL future pipeline runs (not just this PR) would use this feature branch. This is almost certainly not intended.
    • Action required: This PR should NOT be merged to master. It is intended only for one-off test validation. If the tests need to be part of the regular pipeline, the sonic-mgmt tests must be merged to master first and this ref changed back to master.

❓ Test stage still failing (Build 193)

  • The latest fixes (09b7a1d) successfully removed error masking, so failures are now properly surfaced. However, the Test stage is still failing, indicating real infrastructure issues:
    • Possible causes: KVM/libvirt not configured on sonic-build agent, missing docker-sonic-mgmt image, network/resource constraints
    • Recommendation: Investigate the Build 193 Test stage logs (or wait for the GitHub PR comment with test results) to identify the actual failure. The testbed setup may require additional agent configuration.

🔐 GitHub token extraction is fragile

  • The "Post test results to GitHub PR" step extracts the GitHub token from git checkout credentials (http.https://github.com/.extraheader)
  • Risk: This relies on Azure Pipelines' service connection format, which may vary (OAuth vs PAT). If the format changes or the token is unavailable, the step gracefully falls back to printing to the pipeline log, but the feature won't work.
  • Security note: The token is used in a script with set -x, which could expose it in pipeline logs. Consider using set +x around sensitive operations.
  • Recommendation: Test that the GitHub comment posting actually works in a PR build. If it doesn't, the fallback is acceptable (results still available in Azure artifacts).

🧹 Missing cleanup for persistent agent

  • The self-hosted sonic-build agent is persistent (not ephemeral). The pipeline doesn't clean up:
    • KVM VMs created by testbed-cli.sh refresh-dut (left running after tests)
    • /data/sonic-mgmt and /data/sonic-vm directories (accumulate across runs)
    • docker-sonic-mgmt container (left running with sleep infinity)
  • Risk: State accumulation across pipeline runs could cause flaky tests, resource exhaustion, or port conflicts.
  • Recommendation: Add cleanup steps (either in a finally block or as a separate cleanup job) before merging any similar pipeline changes to master. Consider using the official cleanup.yml template or adding explicit testbed-cli.sh remove-topo and docker stop/rm steps.

✅ Resolved: continueOnError masking failures

  • Previous concern about continueOnError: true has been addressed in commit 09b7a1d. Testbed setup failures will now properly fail the pipeline.

Minor items:

  • No validation that the VS image artifact actually contains the Klish AAA implementation (the tests assume mgmt-framework container is present).
  • The tests themselves were written from code review only and have never been validated against a real VS DUT with Klish AAA functionality.
  • The $(System.PullRequest.PullRequestNumber) variable may not be set if Azure DevOps doesn't recognize this as a PR build, in which case the GitHub comment posting is skipped (graceful fallback).

Add Test stage to azure-pipelines.yml that runs klish_aaa tests on
VS DUT after BuildVS completes. Uses sonic-mgmt branch
devin/1771309030-klish-aaa-tests from arthur-cog-sonic/sonic-mgmt.

The test stage:
- Downloads VS build artifact
- Sets up testbed with KVM VS DUT
- Runs klish_aaa/test_klish_aaa.py pytest suite
- Publishes test results and logs as artifacts

Co-Authored-By: Arthur Poon <arthur.poon@windsurf.com>
@devin-ai-integration
Copy link
Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

devin-ai-integration bot and others added 4 commits February 17, 2026 09:03
- Use targetPath in DownloadPipelineArtifact to control download location
- Add diagnostic step to log agent environment (docker, KVM, paths)
- Fix VS image path to use Pipeline.Workspace/artifacts/target/
- Auto-detect sonic-mgmt checkout directory (handles multi-checkout)
- Add continueOnError for container and testbed setup steps
- Reduce hardcoded sleep times
- Consolidate test run command to avoid duplication

Co-Authored-By: Arthur Poon <arthur.poon@windsurf.com>
…pipefail

Key fixes:
- Add -d /data/sonic-vm to refresh-dut command (matches official template)
- Remove continueOnError: true from testbed setup steps
- Add set -eo pipefail inside docker exec so pytest failures propagate
- Remove debug diagnostic steps (no longer needed)
- Fail hard if docker-sonic-mgmt image is missing
- Increase sleep times to 180s (matches official template)

Co-Authored-By: Arthur Poon <arthur.poon@windsurf.com>
Co-Authored-By: Arthur Poon <arthur.poon@windsurf.com>
Co-Authored-By: Arthur Poon <arthur.poon@windsurf.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants