Skip to content

Commit a1be458

Browse files
feat: issues/1597: Temperature Monitoring - Thanks @MitchellThompkins
# Temperature Monitoring System ## Description This PR implements a comprehensive temperature monitoring system integrated into the existing `MetricsResolver`. It provides real-time access to temperature data from various system sensors via GraphQL queries and subscriptions. I tested these changes locally using the provided Apollo Server on Unraid `7.2.0`. Addresses [#1597](#1597) ## Key Features * **Multi-Source Support:** * **CPU & Motherboard:** Via `lm-sensors` integration. * **Disks:** Modifies `smartctl` integration * Modifies `smartctl` to leverage json parsing instead of raw strings. * **IPMI:** Support for `ipmitool` sensors (see notes below!). * **GraphQL API:** * New `temperature` field on the `Metrics` node. * `systemMetricsTemperature` subscription for real-time updates. * Exposes `current`, `min`, `max`, and `history` for each sensor. * **Configuration:** * Fully configurable via `api.json` (enabled status, polling intervals, thresholds). * Support for `default_unit` preference (Celsius/Fahrenheit) with automatic conversion. * **History & Aggregation:** * In-memory history tracking with configurable retention. * Summary statistics (average temp, hottest/coolest sensor counts). ## Implementation Details * **Binary Management:** Relies on system-installed tools (`sensors`, `smartctl`, `ipmitool`) rather than bundling binaries, aligning with the base OS integration strategy. **There is no attempt here to package sensor tooling as part of this feature.** This is per a [conversation on the feature scope with the Unraid team](#1597 (comment)). * **Architecture:** Implemented a modular `TemperatureSensorProvider` interface allowing for easy addition of new sensor types. * **Robustness:** `DisksService` was updated to parse `smartctl` JSON output directly, resolving issues with raw string parsing on certain drive models. There was an issue with some Seagate drives reporting raw values with extra data in parentheses: `24 (0 14 0 0 0)`. Parsing the last value in the string returned incorrect data. Parsing with json prevents this. * **Testing:** Added unit tests for all additional modules. ## Documentation * Added developer documentation at `docs/developer/temperature.md` detailing configuration options and API usage. ## Scope Notes * **IPMI Integration:** The `IpmiSensorsService` has been implemented to parse standard `ipmitool` output, but **it has not been tested on live hardware** as I do not have access to a system with IPMI support. It relies on standard `ipmitool` output formats (at least the documentation I saw and what AI told me). * **GPU:** GPU temperature monitoring is currently out of scope and has not been implemented. I do not have access to a machine with a GPU and could not reliably test it. * **Alerts:** The API calculates and exposes `WARNING` and `CRITICAL` statuses based on thresholds but does not currently trigger active system notifications. This type of alert is passive only. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## New Features * Added comprehensive temperature monitoring with real-time metrics for CPU, disk, and system sensors * Configurable temperature thresholds and alerts (Warning/Critical states) * Support for multiple temperature units (Celsius, Fahrenheit, Kelvin, Rankine) * Temperature history tracking and trend analysis capability * Real-time temperature metric subscriptions <!-- end of auto-generated comment: release notes by coderabbit.ai -->
1 parent db88eb8 commit a1be458

31 files changed

+4290
-50
lines changed

api/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,7 @@ For detailed information about specific features:
8484
- [Feature Flags](docs/developer/feature-flags.md) - Conditionally enabling functionality
8585
- [Repository Organization](docs/developer/repo-organization.md) - Codebase structure
8686
- [Development Workflows](docs/developer/workflows.md) - Development processes
87+
- [Temperature Monitoring](docs/developer/temperature.md) - Configuration and API details for temperature sensors
8788

8889
## License
8990

api/docs/developer/temperature.md

Lines changed: 144 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,144 @@
1+
# Temperature Monitoring
2+
3+
The Temperature Monitoring feature allows the Unraid API to collect and expose temperature metrics from various sensors (CPU, Disks, Motherboard, etc.).
4+
5+
## Configuration
6+
7+
You can configure the temperature monitoring behavior in your `api.json`.
8+
Nominally the `api.json` file is found at
9+
`/boot/config/plugins/dynamix.my.servers/configs/`.
10+
11+
### `api.temperature` Object
12+
13+
| Key | Type | Default | Description |
14+
| :--- | :--- | :--- | :--- |
15+
| `enabled` | `boolean` | `true` | Globally enable or disable temperature monitoring. |
16+
| `default_unit` | `string` | `"celsius"` | The unit to return values in. Options: `"celsius"`, `"fahrenheit"`, `"kelvin"`, `"rankine"`. |
17+
| `polling_interval` | `number` | `5000` | Polling interval in milliseconds for the subscription. |
18+
| `history.max_readings` | `number` | `1000` | (Internal) Number of historical data points to keep in memory per sensor. |
19+
| `history.retention_ms` | `number` | `86400000` | (Internal) Retention period for historical data in milliseconds. |
20+
21+
### `api.temperature.sensors` Object
22+
23+
Enable or disable specific sensor providers.
24+
25+
| Key | Type | Default | Description |
26+
| :--- | :--- | :--- | :--- |
27+
| `lm_sensors.enabled` | `boolean` | `true` | Enable `lm-sensors` provider (requires `sensors` binary). |
28+
| `lm_sensors.config_path` | `string` | `""` | Optional path to a specific sensors config file (passed as `-c` to `sensors`). |
29+
| `smartctl.enabled` | `boolean` | `true` | Enable disk temperature monitoring via `smartctl` (via DiskService). |
30+
| `ipmi.enabled` | `boolean` | `true` | Enable IPMI sensor provider (requires `ipmitool`). |
31+
| `ipmi.args` | `string[]` | `[]` | Optional array of arguments to pass to the `ipmitool` command. |
32+
33+
### `api.temperature.thresholds` Object
34+
35+
Customize warning and critical thresholds.
36+
37+
| Key | Type | Default | Description |
38+
| :--- | :--- | :--- | :--- |
39+
| `warning` | `number` | `80` | Global warning threshold for other sensors. |
40+
| `critical` | `number` | `90` | Global critical threshold for other sensors. |
41+
| `cpu_warning` | `number` | `70` | Warning threshold for CPU. |
42+
| `cpu_critical` | `number` | `85` | Critical threshold for CPU. |
43+
| `disk_warning` | `number` | `50` | Warning threshold for Disks. |
44+
| `disk_critical` | `number` | `60` | Critical threshold for Disks. |
45+
46+
### Sample Configuration
47+
48+
Example of an `api.json` configuration file:
49+
50+
```json
51+
{
52+
"version": "4.28.2+9196778e",
53+
"extraOrigins": [],
54+
"sandbox": true,
55+
"ssoSubIds": [],
56+
"plugins": [
57+
"unraid-api-plugin-connect"
58+
],
59+
"temperature": {
60+
"enabled": true,
61+
"polling_interval": 10000,
62+
"default_unit": "celsius",
63+
"history": {
64+
"max_readings": 144,
65+
"retention_ms": 86400000
66+
},
67+
"thresholds": {
68+
"cpu_warning": 75,
69+
"cpu_critical": 90,
70+
"disk_warning": 50,
71+
"disk_critical": 60
72+
},
73+
"sensors": {
74+
"lm_sensors": {
75+
"enabled": true,
76+
"config_path": "/etc/sensors3.conf"
77+
},
78+
"smartctl": {
79+
"enabled": true
80+
},
81+
"ipmi": {
82+
"enabled": false
83+
}
84+
}
85+
}
86+
}
87+
```
88+
89+
## GraphQL API
90+
91+
### Query: `metrics` -> `temperature`
92+
93+
Returns a snapshot of the current temperature metrics.
94+
95+
```graphql
96+
query {
97+
metrics {
98+
temperature {
99+
id
100+
summary {
101+
average
102+
hottest {
103+
name
104+
current { value unit }
105+
}
106+
}
107+
sensors {
108+
id
109+
name
110+
type
111+
current {
112+
value
113+
unit
114+
status
115+
}
116+
history {
117+
value
118+
timestamp
119+
}
120+
}
121+
}
122+
}
123+
}
124+
```
125+
126+
### Subscription: `systemMetricsTemperature`
127+
128+
Subscribes to temperature updates (pushed at `polling_interval`).
129+
130+
```graphql
131+
subscription {
132+
systemMetricsTemperature {
133+
summary {
134+
average
135+
}
136+
sensors {
137+
name
138+
current {
139+
value
140+
}
141+
}
142+
}
143+
}
144+
```

0 commit comments

Comments
 (0)