Skip to content

Conversation

apalan60
Copy link
Contributor

Purpose

Allow running multiple Kafka System Test clusters on the same host without conflicts.


Background

  • ducker-ak up creates cluster.json and binds debug port 5678:5678 on the first node.
  • Only one cluster can run per host because a second attempt fails due to fixed names and a static debug port.

Key Changes

Prefix isolation

  • Usage: --prefix <string> CLI option or DUCKER_PREFIX environment variable to set a unique cluster prefix.
  • Effects when prefix is provided:
    • Network: <prefix>-ducknet
    • Containers: <prefix>-ducker01, <prefix>-ducker02, ...
    • Cluster file: cluster.<prefix>.json; tests read the matching file.
    • Debug port: with a prefix, the first node of the cluster is assigned a host port automatically when running ducker-ak up or ducker-ak test. In this auto-assigned mode, the chosen host port is written to debug-port.<prefix>.txt. If a specific port is required, it can be set using the --debug-port CLI option or the DUCKER_DEBUGPY_PORT environment variable.
  • Without prefix: behavior remains (5678:5678, cluster.json, ducknet, ducker...).

Compatibility

Case Host:Container Cluster file Network Names Debug Port file
No prefix 5678:5678 cluster.json ducknet ducker01, ducker02 x
Prefix random:5678 cluster.<prefix>.json <prefix>-ducknet <prefix>-ducker01, <prefix>-ducker02 debug-port.<prefix>.txt

@github-actions github-actions bot added the triage PRs from the community label Sep 29, 2025
@apalan60
Copy link
Contributor Author

For easier step-by-step review, I’ve summarized some usage examples along with the test results below.


1. Run Multiple Clusters

Bring up two clusters (cluster1, cluster2)

bash tests/docker/ducker-ak up --prefix cluster1
bash tests/docker/ducker-ak up --prefix cluster2

Expected outcome:
Both clusters run independently, each with its own network, containers, and configuration files.
You will see:

  • cluster.cluster1.json, debug-port.cluster1.txt, node_hosts.cluster1,
  • cluster.cluster2.json, debug-port.cluster2.txt, node_hosts.cluster2.

Verification

Click to expand
echo "=== docker ps ==="
docker ps --format "table {{.ID}}\t{{.Ports}}\t{{.Names}}"

echo
echo "=== build directory ==="
ls tests/docker/build/

Output:

=== docker ps ===
CONTAINER ID   PORTS                                           NAMES
6681d068d87e                                                   cluster2-ducker14
a9416f9c1beb                                                   cluster2-ducker13
c5b5fb11d321                                                   cluster2-ducker12
ca7e7077fcd7                                                   cluster2-ducker11
65326c9654bb                                                   cluster2-ducker10
28ec3873812c                                                   cluster2-ducker09
8f0ca237aeda                                                   cluster2-ducker08
6584ce88a80b                                                   cluster2-ducker07
ee4ca868c125                                                   cluster2-ducker06
ab68dc1dfa7e                                                   cluster2-ducker05
b656463543ff                                                   cluster2-ducker04
037221c7430d                                                   cluster2-ducker03
412acfb157ea                                                   cluster2-ducker02
2ebd75e99cef   0.0.0.0:55001->5678/tcp, [::]:55001->5678/tcp   cluster2-ducker01
9c85b2bf32a2                                                   cluster1-ducker14
e28f7ae1e5d9                                                   cluster1-ducker13
d83e20ef08f0                                                   cluster1-ducker12
e6c5581cd3da                                                   cluster1-ducker11
48d32edc8ce4                                                   cluster1-ducker10
0ec0bae72056                                                   cluster1-ducker09
f4170adca696                                                   cluster1-ducker08
67c39990397d                                                   cluster1-ducker07
dc2481e4a00b                                                   cluster1-ducker06
8196fad921c2                                                   cluster1-ducker05
fc102b1e97be                                                   cluster1-ducker04
8ec48e2b7ce1                                                   cluster1-ducker03
56f899483461                                                   cluster1-ducker02
db112a0f296c   0.0.0.0:55000->5678/tcp, [::]:55000->5678/tcp   cluster1-ducker01

=== build directory ===
cluster.cluster1.json   cluster.cluster2.json   debug-port.cluster1.txt debug-port.cluster2.txt node_hosts.cluster1     node_hosts.cluster2

2. Run Tests Against Each Cluster

Open two terminals, and run tests against each cluster:

# Terminal 1 (against cluster1)
bash tests/docker/ducker-ak test tests/kafkatest/tests/client/pluggable_test.py::PluggableConsumerTest.test_start_stop --prefix cluster1
# Terminal 2 (against cluster2)
bash tests/docker/ducker-ak test tests/kafkatest/tests/client/pluggable_test.py::PluggableConsumerTest.test_start_stop --prefix cluster2

Expected outcome:
Tests run in parallel on their respective clusters, confirming isolation.

Verification

Click to expand

Output:

Cluster1

> Configure project :
...
docker exec cluster1-ducker01 bash -c "cd /opt/kafka-dev && ducktape --cluster-file /opt/kafka-dev/tests/docker/build/cluster.cluster1.json  ./tests/kafkatest/tests/client/pluggable_test.py::PluggableConsumerTest.test_start_stop "
/usr/local/lib/python3.9/dist-packages/paramiko/pkey.py:82: CryptographyDeprecationWarning: TripleDES has been moved to cryptography.hazmat.decrepit.ciphers.algorithms.TripleDES and will be removed from cryptography.hazmat.primitives.ciphers.algorithms in 48.0.0.
  "cipher": algorithms.TripleDES,
/usr/local/lib/python3.9/dist-packages/paramiko/transport.py:260: CryptographyDeprecationWarning: TripleDES has been moved to cryptography.hazmat.decrepit.ciphers.algorithms.TripleDES and will be removed from cryptography.hazmat.primitives.ciphers.algorithms in 48.0.0.
  "class": algorithms.TripleDES,
[INFO:2025-09-29 02:43:09,587]: starting test run with session id 2025-09-29--004...
[INFO:2025-09-29 02:43:09,587]: running 1 tests...
[INFO:2025-09-29 02:43:09,587]: Triggering test 1 of 1...
[INFO:2025-09-29 02:43:09,593]: RunnerClient: Loading test {'directory': '/opt/kafka-dev/tests/kafkatest/tests/client', 'file_name': 'pluggable_test.py', 'cls_name': 'PluggableConsumerTest', 'method_name': 'test_start_stop', 'injected_args': {'metadata_quorum': 'ISOLATED_KRAFT'}}
[INFO:2025-09-29 02:43:09,594]: RunnerClient: kafkatest.tests.client.pluggable_test.PluggableConsumerTest.test_start_stop.metadata_quorum=ISOLATED_KRAFT: on run 1/1
[INFO:2025-09-29 02:43:09,595]: RunnerClient: kafkatest.tests.client.pluggable_test.PluggableConsumerTest.test_start_stop.metadata_quorum=ISOLATED_KRAFT: Setting up...
[INFO:2025-09-29 02:43:17,782]: RunnerClient: kafkatest.tests.client.pluggable_test.PluggableConsumerTest.test_start_stop.metadata_quorum=ISOLATED_KRAFT: Running...
[INFO:2025-09-29 02:43:19,572]: RunnerClient: kafkatest.tests.client.pluggable_test.PluggableConsumerTest.test_start_stop.metadata_quorum=ISOLATED_KRAFT: Tearing down...
[INFO:2025-09-29 02:43:24,079]: RunnerClient: kafkatest.tests.client.pluggable_test.PluggableConsumerTest.test_start_stop.metadata_quorum=ISOLATED_KRAFT: PASS
[WARNING - 2025-09-29 02:43:24,079 - runner_client - log - lineno:459]: RunnerClient: kafkatest.tests.client.pluggable_test.PluggableConsumerTest.test_start_stop.metadata_quorum=ISOLATED_KRAFT: Test requested 4 nodes, used only 3
[WARNING:2025-09-29 02:43:24,079]: RunnerClient: kafkatest.tests.client.pluggable_test.PluggableConsumerTest.test_start_stop.metadata_quorum=ISOLATED_KRAFT: Test requested 4 nodes, used only 3
[INFO:2025-09-29 02:43:24,080]: RunnerClient: kafkatest.tests.client.pluggable_test.PluggableConsumerTest.test_start_stop.metadata_quorum=ISOLATED_KRAFT: Data: None
================================================================================
SESSION REPORT (ALL TESTS)
ducktape version: 0.12.0
session_id:       2025-09-29--004
run time:         14.604 seconds
tests run:        1
passed:           1
flaky:            0
failed:           0
ignored:          0
================================================================================
test_id:    kafkatest.tests.client.pluggable_test.PluggableConsumerTest.test_start_stop.metadata_quorum=ISOLATED_KRAFT
status:     PASS
run time:   14.484 seconds
--------------------------------------------------------------------------------

Cluster2

> Configure project :
...
docker exec cluster2-ducker01 bash -c "cd /opt/kafka-dev && ducktape --cluster-file /opt/kafka-dev/tests/docker/build/cluster.cluster2.json  ./tests/kafkatest/tests/client/pluggable_test.py::PluggableConsumerTest.test_start_stop "
/usr/local/lib/python3.9/dist-packages/paramiko/pkey.py:82: CryptographyDeprecationWarning: TripleDES has been moved to cryptography.hazmat.decrepit.ciphers.algorithms.TripleDES and will be removed from cryptography.hazmat.primitives.ciphers.algorithms in 48.0.0.
  "cipher": algorithms.TripleDES,
/usr/local/lib/python3.9/dist-packages/paramiko/transport.py:260: CryptographyDeprecationWarning: TripleDES has been moved to cryptography.hazmat.decrepit.ciphers.algorithms.TripleDES and will be removed from cryptography.hazmat.primitives.ciphers.algorithms in 48.0.0.
  "class": algorithms.TripleDES,
[INFO:2025-09-29 02:43:11,949]: starting test run with session id 2025-09-29--005...
[INFO:2025-09-29 02:43:11,949]: running 1 tests...
[INFO:2025-09-29 02:43:11,949]: Triggering test 1 of 1...
[INFO:2025-09-29 02:43:11,954]: RunnerClient: Loading test {'directory': '/opt/kafka-dev/tests/kafkatest/tests/client', 'file_name': 'pluggable_test.py', 'cls_name': 'PluggableConsumerTest', 'method_name': 'test_start_stop', 'injected_args': {'metadata_quorum': 'ISOLATED_KRAFT'}}
[INFO:2025-09-29 02:43:11,956]: RunnerClient: kafkatest.tests.client.pluggable_test.PluggableConsumerTest.test_start_stop.metadata_quorum=ISOLATED_KRAFT: on run 1/1
[INFO:2025-09-29 02:43:11,957]: RunnerClient: kafkatest.tests.client.pluggable_test.PluggableConsumerTest.test_start_stop.metadata_quorum=ISOLATED_KRAFT: Setting up...
[INFO:2025-09-29 02:43:19,917]: RunnerClient: kafkatest.tests.client.pluggable_test.PluggableConsumerTest.test_start_stop.metadata_quorum=ISOLATED_KRAFT: Running...
[INFO:2025-09-29 02:43:21,566]: RunnerClient: kafkatest.tests.client.pluggable_test.PluggableConsumerTest.test_start_stop.metadata_quorum=ISOLATED_KRAFT: Tearing down...
[INFO:2025-09-29 02:43:26,210]: RunnerClient: kafkatest.tests.client.pluggable_test.PluggableConsumerTest.test_start_stop.metadata_quorum=ISOLATED_KRAFT: PASS
[WARNING - 2025-09-29 02:43:26,210 - runner_client - log - lineno:459]: RunnerClient: kafkatest.tests.client.pluggable_test.PluggableConsumerTest.test_start_stop.metadata_quorum=ISOLATED_KRAFT: Test requested 4 nodes, used only 3
[WARNING:2025-09-29 02:43:26,210]: RunnerClient: kafkatest.tests.client.pluggable_test.PluggableConsumerTest.test_start_stop.metadata_quorum=ISOLATED_KRAFT: Test requested 4 nodes, used only 3
[INFO:2025-09-29 02:43:26,211]: RunnerClient: kafkatest.tests.client.pluggable_test.PluggableConsumerTest.test_start_stop.metadata_quorum=ISOLATED_KRAFT: Data: None
================================================================================
SESSION REPORT (ALL TESTS)
ducktape version: 0.12.0
session_id:       2025-09-29--005
run time:         14.371 seconds
tests run:        1
passed:           1
flaky:            0
failed:           0
ignored:          0
================================================================================
test_id:    kafkatest.tests.client.pluggable_test.PluggableConsumerTest.test_start_stop.metadata_quorum=ISOLATED_KRAFT
status:     PASS
run time:   14.254 seconds
--------------------------------------------------------------------------------

3. Tear Down a Specific Cluster

Stop one cluster while leaving the other running:

bash tests/docker/ducker-ak down --prefix cluster1

Expected outcome:
Only the cluster1 cluster is removed. cluster2 continues running.

Verification

Click to expand
echo "=== docker ps ==="
docker ps --format "table {{.ID}}\t{{.Ports}}\t{{.Names}}"

echo
echo "=== build directory ==="
ls tests/docker/build/

Output:

=== docker ps ===
CONTAINER ID   PORTS                                           NAMES
6681d068d87e                                                   cluster2-ducker14
a9416f9c1beb                                                   cluster2-ducker13
c5b5fb11d321                                                   cluster2-ducker12
ca7e7077fcd7                                                   cluster2-ducker11
65326c9654bb                                                   cluster2-ducker10
28ec3873812c                                                   cluster2-ducker09
8f0ca237aeda                                                   cluster2-ducker08
6584ce88a80b                                                   cluster2-ducker07
ee4ca868c125                                                   cluster2-ducker06
ab68dc1dfa7e                                                   cluster2-ducker05
b656463543ff                                                   cluster2-ducker04
037221c7430d                                                   cluster2-ducker03
412acfb157ea                                                   cluster2-ducker02
2ebd75e99cef   0.0.0.0:55001->5678/tcp, [::]:55001->5678/tcp   cluster2-ducker01

=== build directory ===
cluster.cluster2.json   debug-port.cluster2.txt node_hosts.cluster2

4. Start with Custom Debug Port

By default, the debug port is randomly assigned. You can fix it explicitly:

bash tests/docker/ducker-ak up --prefix cluster1 --debug-port 55006

Expected outcome:
Cluster starts as above, but the first node is mapped to host port 55006 instead of a random port. A file debug-port.cluster1.txt records this port.

Verification

Click to expand
cat tests/docker/build/debug-port.cluster1.txt

Output:

55006

5. Run Tests in Debug Mode

Enable debug mode; the host port will be random unless overridden:

bash tests/docker/ducker-ak test tests/kafkatest/tests/client/pluggable_test.py::PluggableConsumerTest.test_start_stop --prefix cluster1 --debug

Expected outcome:
A random host port is mapped for debugpy on the first node, recorded in debug-port.cluster1.txt, which can be used to attach an IDE debugger.

Verification

Click to expand

.vscode/launch.json

{
  "version": "0.2.0",
  "configurations": [
    {
      "name": "Python: Attach to Ducker",
      "type": "python",
      "request": "attach",
      "connect": { "host": "localhost", "port": 55006 },
      "justMyCode": false,
      "pathMappings": [
        {
          "localRoot": "${workspaceFolder}",
          "remoteRoot": "."
        }
      ]
    }
  ]
}
image

6. Directly Run Tests with CLI Option

TC_PATHS="tests/kafkatest/tests/client/pluggable_test.py::PluggableConsumerTest.test_start_stop" bash tests/docker/run_tests.sh --prefix cluster1

Expected outcome:
Same as above, ducker-ak up is invoked internally, cluster cluster1 is started, then the specified test runs.


7. Directly Run Tests with Environment Variables

With prefix defined by env variable

DUCKER_PREFIX=cluster1 TC_PATHS="tests/kafkatest/tests/client/pluggable_test.py::PluggableConsumerTest.test_start_stop" bash tests/docker/run_tests.sh

Expected outcome:
Same as above, equivalent to passing --prefix cluster1 on the CLI. A cluster cluster1 is created and the test runs.

With debugpy port specified by env variable

DUCKER_DEBUGPY_PORT=55006 DUCKER_PREFIX=cluster1 TC_PATHS="tests/kafkatest/tests/client/pluggable_test.py::PluggableConsumerTest.test_start_stop" bash tests/docker/run_tests.sh

Expected outcome:
Same as above, equivalent to passing --prefix cluster1 --debug-port 55006 on the CLI. A cluster cluster1 is created with the first node mapped to host port 55006, and the test runs.

@github-actions github-actions bot added the tests Test fixes (including flaky tests) label Sep 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-approved tests Test fixes (including flaky tests) triage PRs from the community
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants