Skip to content

Commit 7d971ce

Browse files
author
Sunil Thaha
committed
docs: update redfish proposal to match implementation
Signed-off-by: Sunil Thaha <[email protected]>
1 parent f194409 commit 7d971ce

File tree

1 file changed

+15
-17
lines changed

1 file changed

+15
-17
lines changed

docs/developer/proposal/EP_001-redfish-support.md

Lines changed: 15 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -69,17 +69,17 @@ C4Container
6969

7070
Nodes are automatically identified using the following priority:
7171

72-
1. **CLI flag**: `--experimental.platform.redfish.node-id=worker-1`
73-
2. **Configuration**: `experimental.platform.redfish.nodeID` in config.yaml
72+
1. **CLI flag**: `--experimental.platform.redfish.node-name=worker-1`
73+
2. **Configuration**: `experimental.platform.redfish.nodeName` in config.yaml
7474
3. **Kubernetes node name**: Automatically detected when Kubernetes is enabled
7575
4. **Hostname fallback**: System hostname used if no explicit identifier provided
7676

7777
```bash
78-
# Explicit node ID
79-
kepler --experimental.platform.redfish.node-id=worker-1
78+
# Explicit node name
79+
kepler --experimental.platform.redfish.node-name=worker-1
8080

8181
# Or automatic resolution from Kubernetes node name
82-
kepler --kube.enable --kube.node-name=worker-1
82+
kepler --kube.enabled --kube.node-name=worker-1
8383
```
8484

8585
**Configuration Example:**
@@ -89,7 +89,7 @@ kepler --kube.enable --kube.node-name=worker-1
8989
experimental:
9090
platform:
9191
redfish:
92-
nodeID: worker-1 # Optional - will auto-resolve if not provided
92+
nodeName: worker-1 # Optional - will auto-resolve to kube.node or hostname if not provided
9393
```
9494
9595
```mermaid
@@ -147,9 +147,8 @@ type Platform struct {
147147

148148
type Redfish struct {
149149
Enabled *bool `yaml:"enabled"`
150-
NodeID string `yaml:"nodeID"`
150+
NodeName string `yaml:"nodeName"`
151151
ConfigFile string `yaml:"configFile"`
152-
Staleness time.Duration `yaml:"staleness"` // Max age before forcing new collection (simplified caching)
153152
HTTPTimeout time.Duration `yaml:"httpTimeout"` // HTTP client timeout for BMC requests
154153
}
155154
```
@@ -158,9 +157,9 @@ type Redfish struct {
158157

159158
```bash
160159
--experimental.platform.redfish.enabled=true
161-
--experimental.platform.redfish.node-id=worker-1
160+
--experimental.platform.redfish.node-name=worker-1
162161
--experimental.platform.redfish.config=/etc/kepler/redfish.yaml
163-
# Note: staleness and httpTimeout are configuration-only (not exposed as CLI flags)
162+
# Note: httpTimeout is configuration-only (not exposed as CLI flag)
164163
```
165164

166165
**Main Configuration (`hack/config.yaml`):**
@@ -172,9 +171,8 @@ experimental:
172171
platform:
173172
redfish:
174173
enabled: true
175-
nodeID: "worker-1" # Node identifier for BMC mapping
174+
nodeName: "worker-1" # Node identifier for BMC mapping
176175
configFile: "/etc/kepler/redfish.yaml"
177-
staleness: 30s # Cache readings for 30 seconds
178176
httpTimeout: 5s # HTTP client timeout for BMC requests
179177
```
180178
@@ -222,7 +220,7 @@ The Redfish service implements a **on-demand collection with caching**:
222220

223221
- No background collection or periodic polling
224222
- Direct BMC API calls during Prometheus scrape via `Power()`
225-
- Implements simple caching with configurable staleness (default 30 seconds) to
223+
- Implements simple caching with staleness-based expiration to
226224
support multiple Prometheus scrapes in a short period (High Availability)
227225
- Returns cached data if available and fresh, otherwise collects fresh data
228226
- Returns all chassis with detailed PowerControl readings in a single call
@@ -338,7 +336,7 @@ kepler_node_cpu_watts{zone="package",node_name="worker-1"} 125.2
338336
- On-demand collection with caching reduces BMC load
339337
- Simplified architecture minimizes overhead
340338
- Multiple chassis data collected in single BMC interaction
341-
- Configurable staleness for different performance requirements
339+
- Built-in staleness management to optimize performance
342340

343341
## Migration
344342

@@ -376,13 +374,13 @@ kepler_node_cpu_watts{zone="package",node_name="worker-1"} 125.2
376374

377375
- Power-only metrics (no energy counters due to intermittent BMC polling)
378376
- Basic staleness-based caching (more advanced cache management could be added)
379-
- BMC calls during Prometheus scrape when cache is stale (mitigated by configurable staleness)
377+
- BMC calls during Prometheus scrape when cache is stale (mitigated by built-in caching)
380378
- Tested with mock servers (Dell, HPE, Lenovo, Generic scenarios)
381379

382380
## Future Enhancements
383381

384382
- Background collection with better caching for improved performance
385-
- Enhanced staleness management and retry logic
383+
- Enhanced cache management and retry logic
386384
- Circuit breaker patterns for BMC failure handling
387385
- External secret integration (Kubernetes, Vault)
388386
- Chassis sub-component power zones (PSU, fans, storage)
@@ -392,7 +390,7 @@ kepler_node_cpu_watts{zone="package",node_name="worker-1"} 125.2
392390
## Open Questions
393391

394392
1. ~~Multi-chassis server handling for complex hardware?~~ **Addressed**: `ChassisPower()` returns all chassis with power readings
395-
2. ~~Need for caching layer in future versions?~~ **Partially Addressed**: Simple staleness-based caching implemented
393+
2. ~~Need for caching layer in future versions?~~ **Addressed**: Simple caching layer implemented
396394
3. Sub-component power exposure (PSU, fans) priority?
397395
4. Integration with other platform monitoring tools?
398396
5. Performance impact of BMC calls during Prometheus scrape (mitigated by caching)?

0 commit comments

Comments
 (0)