Skip to content

Commit b45bd2d

Browse files
authored
Changed node selection (#1881)
This PR adds a reusable withHealthyNode(...) wrapper and a set of health-checks that verify a node before running any heavy work. If a node fails the checks (e.g., Docker daemon down, missing GPU devices), the node is blacklisted and the pipeline automatically retries on the next candidate until a healthy executor is found (or retry budget is exhausted).
1 parent 1358852 commit b45bd2d

File tree

1 file changed

+730
-915
lines changed

1 file changed

+730
-915
lines changed

0 commit comments

Comments
 (0)