Skip to content

refactor(nemesis): Remove Class:N count syntax from nemesis_class_name, enforce explicit list format#14256

Open
cezarmoise wants to merge 1 commit intoscylladb:masterfrom
cezarmoise:parallel-nemesis
Open

refactor(nemesis): Remove Class:N count syntax from nemesis_class_name, enforce explicit list format#14256
cezarmoise wants to merge 1 commit intoscylladb:masterfrom
cezarmoise:parallel-nemesis

Conversation

@cezarmoise
Copy link
Copy Markdown
Contributor

@cezarmoise cezarmoise commented Mar 30, 2026

Parallel nemesis runs using the space-separated nemesis_class_name format (e.g. 'SisyphusMonkey:1 SisyphusMonkey:1') were silently broken after the pydantic config migration (aa9cd75).

See https://argus.scylladb.com/tests/scylla-cluster-tests/1b6ed4e9-2c22-4153-94c4-3081d46c28d1/nemesis
Only 1 nemesis thread ran.

Summary

Replaces the implicit count/space-separated syntax for running parallel nemesis threads with an
explicit YAML list. Adds validation so misconfigurations are caught at config-load time rather
than producing silent surprises at runtime.

Problem

The old nemesis_class_name syntax accepted three formats simultaneously:

# Single string with count — one entry expanded to N threads
nemesis_class_name: "SisyphusMonkey:2"

# Space-separated string — split into multiple threads
nemesis_class_name: "SisyphusMonkey SisyphusMonkey"

# Explicit list (already worked, never formally documented)
nemesis_class_name: ["SisyphusMonkey", "SisyphusMonkey"]

This made nemesis_selector and nemesis_seed assignment opaque: if you had two threads and
one selector, was the selector applied to thread 0 only, or broadcast? The code used
selectors[i % len(class_names)], which silently cycled through selectors in ways that were
hard to reason about.

Changes

sdcm/tester.pyget_nemesis_class()

  • The Class:N count expansion loop is removed.
  • Explicit ValueError is raised if any entry in nemesis_class_name contains : (count
    syntax) or a space (space-separated format), with a clear migration message.
  • Selector/seed assignment semantics are simplified and made explicit:
    • empty / not set → empty string / None per thread
    • single-element list → broadcast to all threads
    • list of length N (matching nemesis_class_name) → 1:1 mapping, enforced by config validator

sdcm/sct_config.py

  • nemesis_class_name field description updated to document the new list-only API and
    explicitly deprecate the old syntax.
  • New _validate_nemesis_parallel_config() method validates that multi-element
    nemesis_selector and nemesis_seed lists match the number of entries in
    nemesis_class_name. Mismatches raise ValueError with a helpful error message at
    config-load time. Called unconditionally from verify_configuration().

sdcm/nemesis/__init__.pyNemesisRunner.nemesis_selector

  • __init__ parameter typed as Optional[str] = None.
  • Property uses None as the sentinel for "not explicitly set" (replaces the old truthiness
    check that treated "" identically to None).
    • None → falls back to cluster.params.get("nemesis_selector"), enabling direct
      instantiation paths such as sct.py nemesis-list to pick up the config value.
    • "" → explicitly means "run all nemeses, no filtering".
    • Any non-empty string → used as-is.
  • Config fallback handles both list (takes [0]) and str/None return types from params.

sdcm/utils/operations_thread.py

  • Removed the isinstance(nemesis_seed, str) branch (the IntOrList type already resolves
    seeds to int or list[int]; string seeds no longer reach this code).
  • Fixed the Random() seed guard from truthiness to is not None so that seed value 0 is
    not silently discarded.

YAML test-case files converted

All occurrences of Class:N and space-separated nemesis_class_name are replaced with
explicit YAML lists. Two files also had nemesis_seed values in the old space-separated
string format, converted to proper YAML lists/integers.

Migration guide

Before:

nemesis_class_name: "SisyphusMonkey:2"
# or
nemesis_class_name: "SisyphusMonkey SisyphusMonkey"

After:

nemesis_class_name: ["SisyphusMonkey", "SisyphusMonkey"]
# yaml list syntax also works
nemesis_class_name:
  - SisyphusMonkey
  - SisyphusMonkey

If you use nemesis_selector or nemesis_seed with multiple threads, the valid forms are:

# Broadcast the same selector to all threads
nemesis_selector: "not disruptive"

# Or map one selector per thread (must match nemesis_class_name length exactly)
nemesis_selector: ["not disruptive", "topology_changes"]

Any mismatch in list lengths will raise a ValueError at config-load time with a
descriptive message showing both lists.


Testing

PR pre-checks (self review)

  • I added the relevant backport labels
  • I didn't leave commented-out/debugging code

Reminders

  • Add New configuration option and document them (in sdcm/sct_config.py)
  • Add unit tests to cover my changes (under unit-test/ folder)
  • Update the Readme/doc folder relevant to this change (if needed)

@cezarmoise
Copy link
Copy Markdown
Contributor Author

@claude review this change please

@claude

This comment was marked as outdated.

@scylladb-promoter
Copy link
Copy Markdown
Collaborator

scylladb-promoter commented Mar 30, 2026

✅ Test Summary: PASSED

✅ Precommit: PASSED

Total Passed Failed Skipped
25 15 0 10

✅ Tests: PASSED

Total Passed Failed Errors Skipped
1713 1698 0 0 15

Full build log

@cezarmoise cezarmoise marked this pull request as draft March 31, 2026 11:53
@cezarmoise
Copy link
Copy Markdown
Contributor Author

Temporarily marked as draft as I want to improve further.

@cezarmoise cezarmoise changed the title fix(nemesis): fix space-separated nemesis_class_name parsing broken b… refactor(nemesis): Remove Class:N count syntax from nemesis_class_name, enforce explicit list format Mar 31, 2026
… format

- nemesis_class_name now accepts only plain class names or explicit YAML
  lists; the former 'Class:N' count syntax and space-separated strings
  both raise ValueError with a migration hint
- nemesis_selector / nemesis_seed: single value broadcasts to all threads;
  N-element list maps 1:1 to class names; mismatched lengths raise
  ValueError at config-load time (_validate_nemesis_parallel_config)
- NemesisRunner.nemesis_selector: None (not set) falls back to config for
  direct-instantiation paths (sct.py nemesis-list); explicit "" means
  "run all nemeses"
- Convert 12 YAML test-case files from count/space syntax to explicit lists
- Update docs and unit tests for new API (14 tests, all passing)

Co-authored-by: Claude <claude@anthropic.com>
gce_n_local_ssd_disk_db: 8

nemesis_class_name: 'SisyphusMonkey'
nemesis_selector: ["topology_changes", "schema_changes and not disruptive"]
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was wrong, confirmed by @timtimb0t

@cezarmoise cezarmoise marked this pull request as ready for review March 31, 2026 16:13
@cezarmoise cezarmoise requested a review from fruch as a code owner March 31, 2026 16:13
@cezarmoise cezarmoise requested a review from a team March 31, 2026 16:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants