Skip to content

Commit d4e7e55

Browse files
authored
Merge pull request #2081 from nebius/SCHED-840/turn-off-dcgmi-diag-active-checks
turn off dcgmi diag active checks
2 parents 454f486 + d090aa1 commit d4e7e55

File tree

2 files changed

+4
-4
lines changed

2 files changed

+4
-4
lines changed

helm/soperator-activechecks/scripts/extensive-check.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -190,7 +190,7 @@ health_checker_runs=(
190190
all_reduce_with_ib
191191
all_reduce_without_ib
192192
cuda_samples
193-
dcgmi_diag_r2
193+
# dcgmi_diag_r2
194194
gpu_fryer
195195
# ib_gpu_perf
196196
mem_perf

helm/soperator-activechecks/values.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -55,12 +55,12 @@ checks:
5555
runAfterCreation: true
5656
drainReasonPrefix: "[node_problem]"
5757
dcgmiDiagR2:
58-
suspend: false
59-
runAfterCreation: true
58+
suspend: true
59+
runAfterCreation: false
6060
drainReasonPrefix: "[node_problem]"
6161
dcgmiDiagR3:
6262
suspend: true
63-
runAfterCreation: true
63+
runAfterCreation: false
6464
drainReasonPrefix: "[node_problem]"
6565
enrootCleanup:
6666
suspend: false

0 commit comments

Comments
 (0)