@@ -69,17 +69,17 @@ C4Container
6969
7070Nodes are automatically identified using the following priority:
7171
72- 1 . ** CLI flag** : ` --experimental.platform.redfish.node-id =worker-1 `
73- 2 . ** Configuration** : ` experimental.platform.redfish.nodeID ` in config.yaml
72+ 1 . ** CLI flag** : ` --experimental.platform.redfish.node-name =worker-1 `
73+ 2 . ** Configuration** : ` experimental.platform.redfish.nodeName ` in config.yaml
74743 . ** Kubernetes node name** : Automatically detected when Kubernetes is enabled
75754 . ** Hostname fallback** : System hostname used if no explicit identifier provided
7676
7777``` bash
78- # Explicit node ID
79- kepler --experimental.platform.redfish.node-id =worker-1
78+ # Explicit node name
79+ kepler --experimental.platform.redfish.node-name =worker-1
8080
8181# Or automatic resolution from Kubernetes node name
82- kepler --kube.enable --kube.node-name=worker-1
82+ kepler --kube.enabled --kube.node-name=worker-1
8383```
8484
8585** Configuration Example:**
@@ -89,7 +89,7 @@ kepler --kube.enable --kube.node-name=worker-1
8989experimental :
9090 platform :
9191 redfish :
92- nodeID : worker-1 # Optional - will auto-resolve if not provided
92+ nodeName : worker-1 # Optional - will auto-resolve to kube.node or hostname if not provided
9393` ` `
9494
9595` ` ` mermaid
@@ -147,9 +147,8 @@ type Platform struct {
147147
148148type Redfish struct {
149149 Enabled *bool ` yaml:"enabled"`
150- NodeID string ` yaml:"nodeID "`
150+ NodeName string ` yaml:"nodeName "`
151151 ConfigFile string ` yaml:"configFile"`
152- Staleness time.Duration ` yaml:"staleness"` // Max age before forcing new collection (simplified caching)
153152 HTTPTimeout time.Duration ` yaml:"httpTimeout"` // HTTP client timeout for BMC requests
154153}
155154```
@@ -158,9 +157,9 @@ type Redfish struct {
158157
159158``` bash
160159--experimental.platform.redfish.enabled=true
161- --experimental.platform.redfish.node-id =worker-1
160+ --experimental.platform.redfish.node-name =worker-1
162161--experimental.platform.redfish.config=/etc/kepler/redfish.yaml
163- # Note: staleness and httpTimeout are configuration-only (not exposed as CLI flags )
162+ # Note: httpTimeout is configuration-only (not exposed as CLI flag )
164163```
165164
166165** Main Configuration (` hack/config.yaml ` ):**
@@ -172,9 +171,8 @@ experimental:
172171 platform :
173172 redfish :
174173 enabled : true
175- nodeID : " worker-1" # Node identifier for BMC mapping
174+ nodeName : " worker-1" # Node identifier for BMC mapping
176175 configFile : " /etc/kepler/redfish.yaml"
177- staleness : 30s # Cache readings for 30 seconds
178176 httpTimeout : 5s # HTTP client timeout for BMC requests
179177` ` `
180178
@@ -222,7 +220,7 @@ The Redfish service implements a **on-demand collection with caching**:
222220
223221- No background collection or periodic polling
224222- Direct BMC API calls during Prometheus scrape via `Power()`
225- - Implements simple caching with configurable staleness (default 30 seconds) to
223+ - Implements simple caching with staleness-based expiration to
226224 support multiple Prometheus scrapes in a short period (High Availability)
227225- Returns cached data if available and fresh, otherwise collects fresh data
228226- Returns all chassis with detailed PowerControl readings in a single call
@@ -338,7 +336,7 @@ kepler_node_cpu_watts{zone="package",node_name="worker-1"} 125.2
338336- On-demand collection with caching reduces BMC load
339337- Simplified architecture minimizes overhead
340338- Multiple chassis data collected in single BMC interaction
341- - Configurable staleness for different performance requirements
339+ - Built-in staleness management to optimize performance
342340
343341# # Migration
344342
@@ -376,13 +374,13 @@ kepler_node_cpu_watts{zone="package",node_name="worker-1"} 125.2
376374
377375- Power-only metrics (no energy counters due to intermittent BMC polling)
378376- Basic staleness-based caching (more advanced cache management could be added)
379- - BMC calls during Prometheus scrape when cache is stale (mitigated by configurable staleness )
377+ - BMC calls during Prometheus scrape when cache is stale (mitigated by built-in caching )
380378- Tested with mock servers (Dell, HPE, Lenovo, Generic scenarios)
381379
382380# # Future Enhancements
383381
384382- Background collection with better caching for improved performance
385- - Enhanced staleness management and retry logic
383+ - Enhanced cache management and retry logic
386384- Circuit breaker patterns for BMC failure handling
387385- External secret integration (Kubernetes, Vault)
388386- Chassis sub-component power zones (PSU, fans, storage)
@@ -392,7 +390,7 @@ kepler_node_cpu_watts{zone="package",node_name="worker-1"} 125.2
392390# # Open Questions
393391
3943921. ~~Multi-chassis server handling for complex hardware?~~ **Addressed** : ` ChassisPower()` returns all chassis with power readings
395- 2. ~~Need for caching layer in future versions?~~ **Partially Addressed** : Simple staleness-based caching implemented
393+ 2. ~~Need for caching layer in future versions?~~ **Addressed** : Simple caching layer implemented
3963943. Sub-component power exposure (PSU, fans) priority?
3973954. Integration with other platform monitoring tools?
3983965. Performance impact of BMC calls during Prometheus scrape (mitigated by caching)?
0 commit comments