Skip to content

fix(monitoring): avoid port conflicts on concurrent restores#14239

Draft
dimakr wants to merge 1 commit intoscylladb:masterfrom
dimakr:docker_provision_test_fail
Draft

fix(monitoring): avoid port conflicts on concurrent restores#14239
dimakr wants to merge 1 commit intoscylladb:masterfrom
dimakr:docker_provision_test_fail

Conversation

@dimakr
Copy link
Copy Markdown
Contributor

@dimakr dimakr commented Mar 29, 2026

When Docker provision tests restore monitoring stack concurrently, they race on container names/ports. On retry, new ports are picked each time, leaving
orphan containers that hold the Prometheus DB lock and make all subsequent attempts fail.
The change fixes this by selecting ports once before the retry loop, and force-removing containers from prior attempts to ensure clean state on each retry.

Fixes: SCT-179

Testing

PR pre-checks (self review)

  • I added the relevant backport labels
  • I didn't leave commented-out/debugging code

Reminders

  • Add New configuration option and document them (in sdcm/sct_config.py)
  • Add unit tests to cover my changes (under unit-test/ folder)
  • Update the Readme/doc folder relevant to this change (if needed)

@dimakr dimakr self-assigned this Mar 29, 2026
@dimakr dimakr added backport/none Backport is not required test-provision-docker Run provision test on Docker (Scylla) test-provision-vs-docker Run provision test for Vector Store on Docker labels Mar 29, 2026
@github-actions github-actions bot added the P2 High Priority label Mar 29, 2026
@scylladb-promoter
Copy link
Copy Markdown
Collaborator

scylladb-promoter commented Mar 29, 2026

✅ Test Summary: PASSED

✅ Precommit: PASSED

Total Passed Failed Skipped
24 14 0 10

✅ Tests: PASSED

Total Passed Failed Errors Skipped
1705 1690 0 0 15

Full build log

@dimakr dimakr force-pushed the docker_provision_test_fail branch 3 times, most recently from 3bccf2a to a4dd19c Compare March 30, 2026 09:01
When Docker provision tests restore monitoring stack concurrently, they race
on container names/ports. On retry, new ports are picked each time, leaving
orphan containers that hold the Prometheus DB lock and make all subsequent
attempts fail.
The change fixes this by selecting ports once before the retry loop, and
force-removing containers from prior attempts to ensure clean state on
each retry.

Fixes: SCT-179
@dimakr dimakr force-pushed the docker_provision_test_fail branch from a4dd19c to 4a4b861 Compare March 30, 2026 12:58
@dimakr dimakr requested review from a team and fruch March 30, 2026 13:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport/none Backport is not required P2 High Priority test-provision-docker Run provision test on Docker (Scylla) test-provision-vs-docker Run provision test for Vector Store on Docker

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants