Skip to content

Linux mdadm collector fails if any software RAID array has a delayed check or resyncΒ #3500

@siebenmann

Description

@siebenmann

Under some situations, if you (or system cron jobs/systemd timers/etc) trigger a check or a resync of multiple software RAID arrays at the same time, the action will be delayed for all arrays but one. When this happens, the mdadm collector fails, reporting:

time=2025-12-07T01:28:23.387-05:00 level=ERROR source=collector.go:168 msg="collector failed" name=mdadm duration_seconds=0.125841512 err="error parsing mdraids: expected integer"

The underlying cause of this is prometheus/procfs/issues/770 but I'm filing this issue against the node_exporter too so you can track this and pick up the fix when it's made.

People who have such software RAID setups are likely to experience this issue regularly (which may trigger alarms if they alert on unexpected collector failures), because most Linux distributions perform periodic software RAID 'check' operations and it seems common to nominally start them at the same time on all software RAID arrays. Specifically, Ubuntu LTS releases do this weekly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions