-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Description
Describe the bug
Hello,
I'm experiencing a severe stability issue on Raspberry Pi CM5 (eMMC) when running Ubuntu 24.x.
After approximately 3.5 days of continuous uptime, the system becomes completely unresponsive (freezes) while ping still works.
Environment
- Board: Raspberry Pi Compute Module 5 (eMMC model)
- OS: Ubuntu 24.x
- Kernel versions tested:
- 6.11.0-1009-raspi
- 6.14.0-1012-raspi
- (Issue occurs on both versions)
- Storage: CM5 onboard eMMC
- Uptime before issue: typically 3.5 days or more
Symptoms when the issue occurs
- SSH is unreachable
- Ping works normally
- Network communication becomes extremely slow or stops entirely
- File-based DB connections fail
- Sometimes logs are written, other times nothing is logged at all
The system requires a hard reboot to recover.
Observed logs
1. Filesystem suddenly turns read-only
In many cases, just before the freeze, I see multiple logs like:
fallocate[323]: fallocate: cannot open /swapfile: Read-only file system
Once this message appears, the freeze almost always follows shortly after.
2. USB errors sometimes appear before the freeze
These USB errors often appear before the read-only filesystem message:
usb 2-1.4.2: device descriptor read/64, error -71
usb 2-1.4.2: new high-speed USB device number 37 using xhci-hcd
usb 2-1.4-port2: attempt power cycle
I have also seen occasional disconnect or current-related USB errors.
However, the USB devices use a separate, stable power supply, so I do not believe this is a power issue.
3. Sometimes there are no logs at all
In several cases:
- no syslog entries
- no dmesg updates
- system is frozen but not remounted read-only
What I have verified
- No CPU spikes
- Memory usage is stable
- Disk usage is fine
- No network congestion
- System temperature is within normal range
- eMMC I/O load is not heavy
Nothing obvious seems to lead to the freeze.
Possible root cause: eMMC CQE deadlock?
Based on my research, I found discussions mentioning CQE (Command Queue Engine) deadlock issues on certain Raspberry Pi eMMC configurations.
My questions:
- Is there a known CQE-related freeze/deadlock issue for CM5 eMMC in these kernel versions?
- If so, has this been addressed in a newer kernel or firmware update?
- Some users suggest disabling CQE,
but is there an official or recommended workaround other than disabling CQE entirely? - Is long-uptime instability with eMMC + CQE a known issue on CM5?
This system must operate 24/7, so long-term stability is critical.
If additional logs or traces are needed, I can provide them.
Thank you very much for your help. Let me know what further information I can collect to help diagnose this issue.
Steps to reproduce the behaviour
- Install Ubuntu 24.04 (or later) on Raspberry Pi CM5 eMMC.
- Use kernel versions such as 6.11.0-1009-raspi or 6.14.0-1012-raspi
(the issue occurs on both). - Run normal workloads (logging, DB file access, USB devices attached,
light-to-moderate I/O). No heavy stress is required. - Let the system run continuously for 3.5 days or longer.
- After ~3.5 days of uptime, the system gradually becomes unstable:
- Network slows down severely or stalls.
- SSH stops responding.
- File operations start failing.
- Eventually, the system freezes completely while ping still replies.
- In some cases, "Read-only file system" messages or USB errors appear
shortly before the freeze; in other cases, no logs are produced.
Device (s)
Raspberry Pi CM5
System
- Raspberry Pi Compute Module 5 (eMMC 4G/8G)
- OS: Ubuntu 24.04/24.10 LTS (non-Raspberry Pi OS)
- Kernel: 6.11.0-1009-raspi or 6.14.0-1012-raspi (issue present in both)
- Firmware: N/A on Ubuntu images
- Uptime before failure: typically ~3.5 days
This system must operate continuously (24/7), so resolving long-term stability issues is essential.
Logs
Common logs before freeze:
fallocate: cannot open /swapfile: Read-only file system
and sometimes,
Below are the logs captured shortly before the system freeze.
These logs show multiple kernel "hung task" events, where essential
filesystem-related tasks (jbd2, systemd-journal, application modules, sync)
remain blocked for more than 122 seconds. This indicates that the eMMC or
EXT4 journaling layer is no longer responding, which aligns with the
"Read-only file system" message observed in other freeze events.
Such behaviour suggests a possible deadlock in the block layer, EXT4 journal,
or eMMC/CQE command queue path. Once this occurs, all write operations stall
and the entire system becomes unresponsive while still answering ping.
Full logs:
----------------------------------------------------------------------
[Sat Nov 29 06:54:29 2025] INFO: task jbd2/mmcblk0p2-:258 blocked for more than 122 seconds.
[Sat Nov 29 06:54:29 2025] Tainted: G C E 6.14.0-1012-raspi #12-Ubuntu
[Sat Nov 29 06:54:29 2025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Sat Nov 29 06:54:29 2025] task:jbd2/mmcblk0p2- state:D stack:0 pid:258 tgid:258 ppid:2 task_flags:0x240040 flags:0x00000008
[Sat Nov 29 06:54:29 2025] Call trace:
[Sat Nov 29 06:54:29 2025] __switch_to+0xe8/0x148 (T)
[Sat Nov 29 06:54:29 2025] __schedule+0x32c/0x990
[Sat Nov 29 06:54:29 2025] schedule+0x3c/0x118
[Sat Nov 29 06:54:29 2025] jbd2_journal_wait_updates+0x70/0xf0
[Sat Nov 29 06:54:29 2025] jbd2_journal_commit_transaction+0x19c/0x16b0
[Sat Nov 29 06:54:29 2025] kjournald2+0xc4/0x248
[Sat Nov 29 06:54:29 2025] kthread+0x110/0x1e0
[Sat Nov 29 06:54:29 2025] ret_from_fork+0x10/0x20
...
[Sat Nov 29 06:54:29 2025] Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings
----------------------------------------------------------------------
USB-related errors seen before some freezes:
usb 2-1.4.2: device descriptor read/64, error -71
usb 2-1.4-port2: attempt power cycle
Other occurrences:
- Network becomes extremely slow or stops.
- SSH becomes unavailable while ping still responds.
- Sometimes no logs appear at all.
Additional context
No response