Skip to content

Commit 45bb2cb

Browse files
authored
fix: Add Windows CPU temperature fallback chain to handle WBEM_E_NOT_FOUND (#103)
* fix: Add Windows CPU temperature fallback chain to handle WBEM_E_NOT_FOUND Implements a cascading fallback mechanism for CPU temperature monitoring on Windows to address the issue where MSAcpi_ThermalZoneTemperature is not available on many systems (error 0x8004100C). The fallback chain tries these sources in order: 1. MSAcpi_ThermalZoneTemperature (standard ACPI thermal zones) 2. AMD Ryzen Master SDK (AMD CPUs only, via FFI) 3. Intel WMI (Intel CPUs only, root/Intel namespace) 4. LibreHardwareMonitor WMI (any CPU, if app running) 5. None (graceful fallback without error spam) Key changes: - New module src/device/windows_temp/ with temperature sources - TemperatureManager with OnceCell caching for availability status - CPU vendor detection (AMD/Intel) for source selection - Removed eprintln! calls that caused repeated error messages - Updated documentation with Windows temperature limitations Closes #102 * fix(security): Address security and robustness issues in Windows temperature module Security fixes: - Remove relative DLL path to prevent DLL hijacking attacks (CRITICAL) - Only use absolute paths for AMD SDK DLL loading Robustness improvements: - Add compile-time struct size verification for RmQuickStats FFI safety - Replace panic-on-lock-poisoning with graceful recovery pattern - Use round() instead of truncation for temperature conversion accuracy Code quality: - Add Clone, Copy, PartialEq, Eq derives to TemperatureResult - Add security and thread safety documentation to amd_ryzen.rs - Simplify try_load_library() implementation * fix: Resolve build warnings and improve WMI connection handling - Add #[allow(dead_code)] to sensor_type field in libre_hwmon.rs (required for WMI deserialization but not used in code) - Refactor Intel/LibreHardwareMonitor WMI sources to avoid caching WMIConnection (not Send + Sync), cache only namespace availability - Fix AMD DLL path dereference in amd_ryzen.rs - Use CpuRefreshKind::everything() for proper CPU vendor detection * style: Apply rustfmt to intel_wmi.rs * fix: Address cargo fmt and clippy warnings - Use matches! macro instead of match expression in mod.rs - Inline format arguments in eprintln! calls in amd_windows.rs
1 parent 240aa48 commit 45bb2cb

File tree

11 files changed

+1028
-66
lines changed

11 files changed

+1028
-66
lines changed

Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,7 @@ openssl = { version = "0.10.75", features = ["vendored"] }
6868
# Windows-specific dependencies
6969
[target.'cfg(target_os = "windows")'.dependencies]
7070
wmi = "0.18"
71+
libloading = "0.9"
7172

7273
# macOS-specific dependencies
7374
[target.'cfg(target_os = "macos")'.dependencies]

README.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -162,6 +162,18 @@ http://gpu-node3:9090
162162
- **No Sudo Required:** NVIDIA GPU monitoring works without sudo privileges
163163
- **Driver Required:** NVIDIA proprietary drivers must be installed
164164

165+
### Windows
166+
- **No Sudo Required:** GPU and CPU monitoring works without administrator privileges
167+
- **CPU Temperature Limitations:**
168+
- Standard Windows WMI thermal zones (MSAcpi_ThermalZoneTemperature) are not available on all systems
169+
- The application uses a fallback chain to try multiple temperature sources:
170+
1. ACPI Thermal Zones (standard WMI)
171+
2. AMD Ryzen Master SDK (AMD CPUs - requires AMD drivers or Ryzen Master)
172+
3. Intel WMI (Intel CPUs - if chipset drivers support it)
173+
4. LibreHardwareMonitor WMI (any CPU - if [LibreHardwareMonitor](https://github.com/LibreHardwareMonitor/LibreHardwareMonitor) is running)
174+
- If temperature is not available, it will be shown as "N/A" without error messages
175+
- For best temperature monitoring on Windows, install and run LibreHardwareMonitor in the background
176+
165177
## Features
166178

167179
### GPU Monitoring

docs/ARCHITECTURE.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -238,6 +238,18 @@ pub trait MetricsExporter: Send + Sync {
238238
- `sysctl` for runtime metrics
239239
- P/E-core detection for Apple Silicon
240240

241+
#### Windows (`src/device/cpu_windows.rs`, `src/device/windows_temp/`)
242+
- WMI for processor information (MaxClockSpeed, cache sizes, socket count)
243+
- Thread-local WMI connections for efficiency
244+
- Temperature monitoring via fallback chain:
245+
1. **MSAcpi_ThermalZoneTemperature**: Standard ACPI thermal zones (root\WMI namespace)
246+
2. **AMD Ryzen Master SDK**: FFI integration with `AMDRyzenMasterMonitoringDLL.dll`
247+
3. **Intel WMI**: Intel-specific thermal zones (root\Intel namespace)
248+
4. **LibreHardwareMonitor**: Third-party WMI (root\LibreHardwareMonitor namespace)
249+
- Graceful fallback: Returns `None` silently when all sources unavailable
250+
- Availability caching: OnceCell/RwLock pattern to avoid repeated failed queries
251+
- No error spam: WBEM_E_NOT_FOUND (0x8004100C) handled silently
252+
241253
### Conditional Compilation
242254

243255
```rust

src/device/cpu_windows.rs

Lines changed: 31 additions & 58 deletions
Original file line numberDiff line numberDiff line change
@@ -22,12 +22,8 @@ use std::sync::RwLock;
2222
use sysinfo::{CpuRefreshKind, System};
2323
use wmi::WMIConnection;
2424

25-
// WMI structures for thermal zone temperature
26-
#[derive(Deserialize, Debug)]
27-
#[serde(rename_all = "PascalCase")]
28-
struct ThermalZoneTemperature {
29-
current_temperature: Option<u32>, // Temperature in tenths of Kelvin
30-
}
25+
// Import the temperature fallback chain
26+
use super::windows_temp::TemperatureManager;
3127

3228
// WMI structures for processor information
3329
#[derive(Deserialize, Debug)]
@@ -40,23 +36,21 @@ struct Win32Processor {
4036

4137
// Thread-local WMI connections for reuse within the same thread
4238
thread_local! {
43-
static WMI_CIMV2_CONNECTION: std::cell::RefCell<Option<WMIConnection>> = std::cell::RefCell::new(None);
44-
static WMI_ROOT_WMI_CONNECTION: std::cell::RefCell<Option<WMIConnection>> = std::cell::RefCell::new(None);
39+
static WMI_CIMV2_CONNECTION: std::cell::RefCell<Option<WMIConnection>> =
40+
const { std::cell::RefCell::new(None) };
41+
static WMI_ROOT_WMI_CONNECTION: std::cell::RefCell<Option<WMIConnection>> =
42+
const { std::cell::RefCell::new(None) };
4543
}
4644

4745
/// Helper to get or create CIMV2 connection
4846
fn with_cimv2_connection<T, F: FnOnce(&WMIConnection) -> T>(f: F) -> Option<T> {
4947
WMI_CIMV2_CONNECTION.with(|cell| {
5048
let mut conn_ref = cell.borrow_mut();
5149
if conn_ref.is_none() {
52-
match WMIConnection::new() {
53-
Ok(wmi_con) => {
54-
*conn_ref = Some(wmi_con);
55-
}
56-
Err(e) => {
57-
eprintln!("Failed to create WMI CIMV2 connection: {e}");
58-
}
50+
if let Ok(wmi_con) = WMIConnection::new() {
51+
*conn_ref = Some(wmi_con);
5952
}
53+
// Silently fail if connection cannot be created
6054
}
6155
conn_ref.as_ref().map(f)
6256
})
@@ -67,14 +61,10 @@ fn with_root_wmi_connection<T, F: FnOnce(&WMIConnection) -> T>(f: F) -> Option<T
6761
WMI_ROOT_WMI_CONNECTION.with(|cell| {
6862
let mut conn_ref = cell.borrow_mut();
6963
if conn_ref.is_none() {
70-
match WMIConnection::with_namespace_path("root\\WMI") {
71-
Ok(wmi_con) => {
72-
*conn_ref = Some(wmi_con);
73-
}
74-
Err(e) => {
75-
eprintln!("Failed to create WMI root\\WMI connection: {e}");
76-
}
64+
if let Ok(wmi_con) = WMIConnection::with_namespace_path("root\\WMI") {
65+
*conn_ref = Some(wmi_con);
7766
}
67+
// Silently fail if connection cannot be created
7868
}
7969
conn_ref.as_ref().map(f)
8070
})
@@ -87,6 +77,8 @@ pub struct WindowsCpuReader {
8777
cached_max_frequency: RwLock<Option<u32>>,
8878
cached_cache_size: RwLock<Option<u32>>,
8979
cached_socket_count: RwLock<Option<u32>>,
80+
// Temperature manager with fallback chain
81+
temperature_manager: TemperatureManager,
9082
}
9183

9284
impl Default for WindowsCpuReader {
@@ -112,45 +104,26 @@ impl WindowsCpuReader {
112104
cached_max_frequency: RwLock::new(None),
113105
cached_cache_size: RwLock::new(None),
114106
cached_socket_count: RwLock::new(None),
107+
temperature_manager: TemperatureManager::new(),
115108
}
116109
}
117110

118-
/// Get CPU temperature from WMI thermal zones (using thread-local connection)
111+
/// Get CPU temperature using the fallback chain.
112+
///
113+
/// Tries multiple temperature sources in order:
114+
/// 1. MSAcpi_ThermalZoneTemperature (ACPI thermal zones)
115+
/// 2. AMD Ryzen Master SDK (AMD CPUs only)
116+
/// 3. Intel WMI (Intel CPUs only)
117+
/// 4. LibreHardwareMonitor WMI (any CPU)
118+
/// 5. None (graceful fallback)
119119
fn get_cpu_temperature(&self) -> Option<u32> {
120-
with_root_wmi_connection(|wmi_con| {
121-
let results: Result<Vec<ThermalZoneTemperature>, _> = wmi_con
122-
.raw_query("SELECT CurrentTemperature FROM MSAcpi_ThermalZoneTemperature");
123-
124-
match results {
125-
Ok(zones) => {
126-
if zones.is_empty() {
127-
eprintln!("CPU temperature: No thermal zones found in WMI");
128-
return None;
129-
}
130-
for zone in zones {
131-
if let Some(temp_tenths_kelvin) = zone.current_temperature {
132-
// Convert from tenths of Kelvin to Celsius
133-
// Formula: (K / 10) - 273.15 = C
134-
let celsius = (temp_tenths_kelvin as f64 / 10.0) - 273.15;
135-
if celsius > 0.0 && celsius < 150.0 {
136-
return Some(celsius as u32);
137-
} else {
138-
eprintln!(
139-
"CPU temperature: Out of range value {:.1}°C (raw: {} tenths K)",
140-
celsius, temp_tenths_kelvin
141-
);
142-
}
143-
}
144-
}
145-
None
146-
}
147-
Err(e) => {
148-
eprintln!("CPU temperature: WMI query failed: {e}");
149-
None
150-
}
151-
}
120+
// Get the root\WMI connection for ACPI thermal zones
121+
with_root_wmi_connection(|wmi_conn| {
122+
self.temperature_manager.get_temperature(Some(wmi_conn))
152123
})
153124
.flatten()
125+
// If root\WMI connection failed, still try other sources
126+
.or_else(|| self.temperature_manager.get_temperature(None))
154127
}
155128

156129
/// Get static CPU info from WMI (max frequency, cache size, socket count)
@@ -304,7 +277,7 @@ impl WindowsCpuReader {
304277
});
305278
}
306279

307-
// Get CPU temperature from WMI
280+
// Get CPU temperature using fallback chain (no more error spam)
308281
let temperature = self.get_cpu_temperature();
309282

310283
// Get static info from WMI (max frequency, cache size, socket count)
@@ -356,8 +329,8 @@ impl CpuReader for WindowsCpuReader {
356329
fn get_cpu_info(&self) -> Vec<CpuInfo> {
357330
match self.get_cpu_info_from_system() {
358331
Ok(cpu_info) => vec![cpu_info],
359-
Err(e) => {
360-
eprintln!("Error reading CPU info: {e}");
332+
Err(_) => {
333+
// Silently return empty - errors are expected on some systems
361334
vec![]
362335
}
363336
}

src/device/mod.rs

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,10 @@ pub mod cpu_macos;
2727
#[cfg(target_os = "windows")]
2828
pub mod cpu_windows;
2929

30+
// Windows temperature fallback chain
31+
#[cfg(target_os = "windows")]
32+
pub mod windows_temp;
33+
3034
// Container resource support
3135
#[cfg(target_os = "linux")]
3236
pub mod container_info;

src/device/readers/amd_windows.rs

Lines changed: 2 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -123,16 +123,10 @@ impl AmdWindowsGpuReader {
123123
// Warn if the reported VRAM is suspiciously close to 4GB limit or 0
124124
const FOUR_GB: u64 = 4 * 1024 * 1024 * 1024; // 4,294,967,296 bytes
125125
if total_memory == 0 {
126-
eprintln!(
127-
"AMD GPU '{}': VRAM size unavailable (reported as 0)",
128-
name
129-
);
126+
eprintln!("AMD GPU '{name}': VRAM size unavailable (reported as 0)");
130127
} else if total_memory >= FOUR_GB - (512 * 1024 * 1024) {
131128
// If reported value is >= 3.5GB, it might be capped/wrapped for >4GB GPU
132-
eprintln!(
133-
"AMD GPU '{}': VRAM reported as {} bytes, may be inaccurate for >4GB GPUs due to WMI 32-bit limitation",
134-
name, total_memory
135-
);
129+
eprintln!("AMD GPU '{name}': VRAM reported as {total_memory} bytes, may be inaccurate for >4GB GPUs due to WMI 32-bit limitation");
136130
}
137131

138132
// Build detail map
Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
// Copyright 2025 Lablup Inc. and Jeongkyu Shin
2+
//
3+
// Licensed under the Apache License, Version 2.0 (the "License");
4+
// you may not use this file except in compliance with the License.
5+
// You may obtain a copy of the License at
6+
//
7+
// http://www.apache.org/licenses/LICENSE-2.0
8+
//
9+
// Unless required by applicable law or agreed to in writing, software
10+
// distributed under the License is distributed on an "AS IS" BASIS,
11+
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
// See the License for the specific language governing permissions and
13+
// limitations under the License.
14+
15+
//! ACPI Thermal Zone temperature source.
16+
//!
17+
//! Queries MSAcpi_ThermalZoneTemperature from root\WMI namespace.
18+
//! This is the standard Windows method but is not available on all systems.
19+
20+
use super::{is_wmi_not_found_error, TemperatureResult};
21+
use serde::Deserialize;
22+
use wmi::WMIConnection;
23+
24+
/// WMI structure for thermal zone temperature.
25+
#[derive(Deserialize, Debug)]
26+
#[serde(rename_all = "PascalCase")]
27+
struct ThermalZoneTemperature {
28+
current_temperature: Option<u32>, // Temperature in tenths of Kelvin
29+
}
30+
31+
/// ACPI Thermal Zone temperature source.
32+
pub struct AcpiThermalSource {
33+
// No state needed - uses passed WMI connection
34+
}
35+
36+
impl Default for AcpiThermalSource {
37+
fn default() -> Self {
38+
Self::new()
39+
}
40+
}
41+
42+
impl AcpiThermalSource {
43+
/// Create a new ACPI thermal source.
44+
pub fn new() -> Self {
45+
Self {}
46+
}
47+
48+
/// Get temperature from ACPI thermal zones.
49+
///
50+
/// # Arguments
51+
/// * `wmi_conn` - Optional WMI connection to root\WMI namespace
52+
///
53+
/// # Returns
54+
/// * `TemperatureResult::Success(temp)` - Temperature in Celsius
55+
/// * `TemperatureResult::NotFound` - MSAcpi_ThermalZoneTemperature class not found
56+
/// * `TemperatureResult::Error` - Transient error (connection issue, etc.)
57+
/// * `TemperatureResult::NoValidReading` - Class exists but no valid temperature
58+
pub fn get_temperature(&self, wmi_conn: Option<&WMIConnection>) -> TemperatureResult {
59+
let conn = match wmi_conn {
60+
Some(c) => c,
61+
None => return TemperatureResult::Error,
62+
};
63+
64+
let results: Result<Vec<ThermalZoneTemperature>, _> =
65+
conn.raw_query("SELECT CurrentTemperature FROM MSAcpi_ThermalZoneTemperature");
66+
67+
match results {
68+
Ok(zones) => {
69+
if zones.is_empty() {
70+
// No thermal zones found - this might be a permanent condition
71+
return TemperatureResult::NoValidReading;
72+
}
73+
74+
for zone in zones {
75+
if let Some(temp_tenths_kelvin) = zone.current_temperature {
76+
// Convert from tenths of Kelvin to Celsius
77+
// Formula: (K / 10) - 273.15 = C
78+
let celsius = (temp_tenths_kelvin as f64 / 10.0) - 273.15;
79+
if celsius > 0.0 && celsius < 150.0 {
80+
// Use round() for more accurate conversion
81+
return TemperatureResult::Success(celsius.round() as u32);
82+
}
83+
// Out of range value, continue to next zone
84+
}
85+
}
86+
TemperatureResult::NoValidReading
87+
}
88+
Err(e) => {
89+
if is_wmi_not_found_error(&e) {
90+
// WBEM_E_NOT_FOUND - class doesn't exist
91+
TemperatureResult::NotFound
92+
} else {
93+
// Other WMI error - likely transient
94+
TemperatureResult::Error
95+
}
96+
}
97+
}
98+
}
99+
}

0 commit comments

Comments
 (0)