Commit e793631
authored
[core][autoscaler] Fix RAY_NODE_TYPE_NAME handling when autoscaler is in read-only mode (#58460)
This ensures node type names are correctly reported even when the
autoscaler is disabled (read-only mode).
## Description
Autoscaler v2 fails to report prometheus metrics when operating in
read-only mode on KubeRay with the following KeyError error:
```
2025-11-08 12:06:57,402 ERROR autoscaler.py:215 -- 'small-group'
Traceback (most recent call last):
File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/autoscaler/v2/autoscaler.py", line 200, in update_autoscaling_state
return Reconciler.reconcile(
File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/autoscaler/v2/instance_manager/reconciler.py", line 120, in reconcile
Reconciler._step_next(
File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/autoscaler/v2/instance_manager/reconciler.py", line 275, in _step_next
Reconciler._scale_cluster(
File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/autoscaler/v2/instance_manager/reconciler.py", line 1125, in _scale_cluster
reply = scheduler.schedule(sched_request)
File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/autoscaler/v2/scheduler.py", line 933, in schedule
ResourceDemandScheduler._enforce_max_workers_per_type(ctx)
File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/autoscaler/v2/scheduler.py", line 1006, in _enforce_max_workers_per_type
node_config = ctx.get_node_type_configs()[node_type]
KeyError: 'small-group'
```
This happens because the `ReadOnlyProviderConfigReader` populates
`ctx.get_node_type_configs()` using node IDs as node types, which is
correct for local Ray (where local ray does not have
`RAY_NODE_TYPE_NAME` set), but incorrect for KubeRay where
`ray_node_type_name` is present and expected with `RAY_NODE_TYPE_NAME`
set.
As a result, in read-only mode the scheduler sees a node type name (ex.
small-group) that never exists in the populated configs.
This PR fixes the issue by using `ray_node_type_name` when it exists,
and only falling back to node ID when it does not.
## Related issues
Fixes #58227
Signed-off-by: Rueian <[email protected]>1 parent 654feda commit e793631
File tree
3 files changed
+72
-9
lines changed- python/ray/autoscaler
- _private/readonly
- v2
- instance_manager
- tests
3 files changed
+72
-9
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
16 | | - | |
17 | 16 | | |
18 | 17 | | |
19 | 18 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
522 | 522 | | |
523 | 523 | | |
524 | 524 | | |
525 | | - | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
526 | 529 | | |
527 | 530 | | |
528 | 531 | | |
529 | | - | |
530 | | - | |
531 | | - | |
532 | | - | |
533 | | - | |
534 | | - | |
| 532 | + | |
| 533 | + | |
| 534 | + | |
| 535 | + | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
535 | 542 | | |
536 | 543 | | |
537 | 544 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
| 8 | + | |
8 | 9 | | |
9 | 10 | | |
10 | | - | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
11 | 18 | | |
12 | 19 | | |
13 | 20 | | |
| |||
179 | 186 | | |
180 | 187 | | |
181 | 188 | | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
182 | 239 | | |
183 | 240 | | |
184 | 241 | | |
| |||
0 commit comments