Skip to content

TestTranslSubscribe is flaky due to hardcoded port 8081 conflict #613

@hdwhdw

Description

@hdwhdw

Summary

TestTranslSubscribe fails approximately 22% of the time in CI (Azure Pipelines), causing nearly all PRs to fail regardless of code correctness.

Symptom

The test panics or errors immediately with:

bind: address already in use

This occurs before any subtest executes, inside createServer(t, 8081) at the parent TestTranslSubscribe level.

Root Cause

The root cause is a hardcoded port 8081 passed to createServer(t, 8081) at line 41 of gnmi_server/transl_sub_test.go. When a prior test run leaves the port in TIME_WAIT or the socket has not yet been released by the OS, the new server fails to bind.

Contributing factors:

  1. Long-running SAMPLE subtests — several subtests use SAMPLE mode with 25-second intervals, meaning the gRPC server lingers for tens of seconds after the subtest ends. The defer s.s.Stop() at the parent level only fires when all subtests complete, so the port stays in use until the parent exits.

  2. Incomplete Redis flushprepareDbTranslib(t) flushes only one Redis DB. If a prior test leaves state in a different DB number that translib touches during ACL writes, subsequent test runs may behave unexpectedly and increase the chance of server teardown racing.

  3. All 15 active subtests share a single createServer call — a single port conflict fails the entire test function immediately. There is no per-subtest server creation, so there is no partial fallback.

Affected Tests

All subtests under TestTranslSubscribe:

  • origin=openconfig, origin=sonic-db, and ~13 more active subtests covering ONCE, POLL, STREAM, ON_CHANGE, SAMPLE, and TARGET_DEFINED subscription modes.

Proposed Fix

Temporary (this PR): Skip the test with a t.Skip() linking back to this issue, allowing PRs to pass CI while the root cause is tracked.

Permanent: Allocate a dynamic port using net.Listen("tcp", ":0") in createServer instead of hardcoding 8081, eliminating the port-reuse race entirely. This requires updating all callers.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions