Commit 7fa304f
Fix critical flash corruption bug in flash_cache_flush()
PROBLEM:
--------
Users reported that BLE keyboards fail to reconnect after sleep, requiring
manual re-pairing. Investigation revealed that bonding data files were
becoming corrupted during sleep/wake cycles, when BLE connections are most
unstable and prone to unexpected disconnections.
ROOT CAUSE ANALYSIS:
-------------------
The flash_cache_flush() function in flash_cache.c had a critical bug where
it ignored return values from flash erase and program operations:
```c
// BROKEN CODE (lines 89-90):
fc->erase(fc->cache_addr); // Return value ignored
fc->program(fc->cache_addr, fc->cache_buf, FLASH_CACHE_SIZE); // Return value ignored
fc->cache_addr = FLASH_CACHE_INVALID_ADDR; // Always cleared cache
```
This meant that:
1. Flash operations could fail (especially during BLE disconnections)
2. The failure was silently ignored
3. The cache was cleared anyway, losing the data forever
4. LittleFS believed the write succeeded when it actually failed
5. Subsequent operations worked on corrupted filesystem state
TECHNICAL DETAILS:
-----------------
The underlying flash operations (fal_erase/fal_program) already had proper
retry logic (20 attempts each, added in commit 3ea7855), but their failures
were being ignored by flash_cache_flush().
This issue was particularly problematic during BLE sleep/wake cycles because:
- BLE connections are unstable during wake-up
- Flash operations are more likely to be interrupted
- Multiple filesystem operations occur simultaneously
- Unexpected disconnections corrupt in-progress writes
INVESTIGATION RESOURCES:
-----------------------
This fix was developed after extensive research including:
1. Adafruit nRF52 Arduino Issue adafruit#350: Documents this exact problem
"The library completely ignores flash errors, both when reported
synchronously and when reported via NRF_EVT_FLASH_OPERATION_ERROR"
2. Meshtastic firmware Issue #5839: BLE disconnection corruption
meshtastic/firmware#5839
3. Meshtastic PR #5916: Application-layer workaround
meshtastic/firmware#5916
4. Todd Herbert's flash operation improvements (our commit 3ea7855)
Already implemented retry logic, but failures still ignored
5. Analysis of Nordic SoftDevice 6.1.1 flash operation behavior
Confirmed that sd_flash_write/sd_flash_page_erase can fail during
BLE instability
SOLUTION:
--------
Modified flash_cache_flush() to:
1. Check return values from both fc->erase() and fc->program() operations
2. Only clear cache (fc->cache_addr = FLASH_CACHE_INVALID_ADDR) if BOTH succeed
3. Preserve cache data for retry if either operation fails
4. Handle the verify-skip optimization path correctly
This ensures that failed flash operations are properly detected and the cache
remains valid for subsequent retry attempts, preventing data loss and
filesystem corruption.
TESTING IMPACT:
--------------
This fix addresses the root cause of BLE reconnection failures after sleep.
Users should experience:
- Reliable BLE reconnection after keyboard wake from sleep
- No more manual re-pairing requirements
- Elimination of "corrupted dir pair" LittleFS errors
- Overall improved filesystem integrity
Related commits:
- d8b2ea8: Disabled CCCD persistence (high-level workaround)
- 3ea7855: Added flash operation retry logic (Todd Herbert)
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <[email protected]>1 parent d8b2ea8 commit 7fa304f
1 file changed
+13
-4
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
86 | 86 | | |
87 | 87 | | |
88 | 88 | | |
89 | | - | |
90 | | - | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
91 | 97 | | |
92 | 98 | | |
93 | 99 | | |
94 | | - | |
95 | | - | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
96 | 105 | | |
97 | 106 | | |
98 | 107 | | |
| |||
0 commit comments