Linux mdadm collector fails if any software RAID array has a delayed check or resync

Under some situations, if you (or system cron jobs/systemd timers/etc) trigger a check or a resync of multiple software RAID arrays at the same time, the action will be delayed for all arrays but one. When this happens, the mdadm collector fails, reporting:

```
time=2025-12-07T01:28:23.387-05:00 level=ERROR source=collector.go:168 msg="collector failed" name=mdadm duration_seconds=0.125841512 err="error parsing mdraids: expected integer"
```

The underlying cause of this is prometheus/procfs/issues/770 but I'm filing this issue against the node_exporter too so you can track this and pick up the fix when it's made.

People who have such software RAID setups are likely to experience this issue regularly (which may trigger alarms if they alert on unexpected collector failures), because most Linux distributions perform periodic software RAID 'check' operations and it seems common to nominally start them at the same time on all software RAID arrays. Specifically, Ubuntu LTS releases do this weekly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Linux mdadm collector fails if any software RAID array has a delayed check or resync #3500

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Linux mdadm collector fails if any software RAID array has a delayed check or resync #3500

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions