-
-
Notifications
You must be signed in to change notification settings - Fork 9
Add node mutex #103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add node mutex #103
Conversation
To prevent crashes when updating controls frequently
|
Note
|
| Cohort / File(s) | Summary |
|---|---|
Node core src/MoonBase/Nodes.h |
Add nodeMutex (created with xSemaphoreCreateMutex()), null-check in constructor, and destructor that deletes the mutex. |
Manager synchronization src/MoonBase/NodeManager.h |
Acquire/release per-node mutex around control update and onUpdate calls to protect control value modifications. |
Layer synchronization src/MoonLight/Layers/PhysicalLayer.cpp, src/MoonLight/Layers/VirtualLayer.cpp |
Guard node->loop(), node->onSizeChanged(), and related per-node callbacks with xSemaphoreTake/xSemaphoreGive when nodes are active or size changes. |
Module notes & assignment src/MoonLight/Modules/ModuleDrivers.h, src/MoonLight/Modules/ModuleEffects.h |
Ensure driver-layer node assignment in existing-index path; add comments marking where effects run concurrently. |
UI component interface/src/lib/components/moonbase/FieldRenderer.svelte |
Add throttled slider input handling (batch updates ~50ms), immediate local visual updates, and flush-on-destroy behavior. |
Control logic tweak src/MoonLight/Modules/ModuleLightsControl.h |
Replace read-with-callback usage with direct inline checks against _state.data["monitorOn"]. |
Configuration platformio.ini |
Bump APP_DATE build flag from 20260109 to 20260112. |
Documentation docs/gettingstarted/installer.md, docs/moonlight/effects.md, docs/develop/architecture.md |
Reword installer steps (terminology and formatting), correct effect asset filenames (capitalize StarSky files), and expand architecture task/core/timing and synchronization notes. |
Sequence Diagram(s)
sequenceDiagram
participant EffectTask as Effect/Driver Task
participant Layer as Physical/Virtual Layer
participant Manager as NodeManager
participant Node as Node (has nodeMutex)
EffectTask->>Layer: call loop()
Layer->>Manager: iterate nodes / trigger control update path
Manager->>Node: xSemaphoreTake(nodeMutex)
Manager->>Node: updateControl(...) / onUpdate(...)
Manager->>Node: xSemaphoreGive(nodeMutex)
Layer->>Node: xSemaphoreTake(nodeMutex) before node->loop()/onSizeChanged()
Node->>Node: node->loop()/onSizeChanged() (protected)
Node-->>Layer: xSemaphoreGive(nodeMutex)
Estimated code review effort
🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
- Add Home Assistant integration #87 — touches
ModuleLightsControl.hand nearby logic; may conflict with the state-read changes. - rename UI components #75 — updates
FieldRenderer/UI components; likely relevant to the throttled slider changes. - Prepare for 0.7.1 #94 — overlaps node/mutex additions and NodeManager synchronization changes.
Suggested labels
enhancement
Poem
🐰 I hopped through code with tiny paws,
I tucked each node behind locked jaws,
Sliders whisper, batched and bright,
Docs polished for the installer’s light,
Concurrency snug — a carroted applause! 🥕
🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
| Check name | Status | Explanation | Resolution |
|---|---|---|---|
| Docstring Coverage | Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. | Write docstrings for the functions missing them to satisfy the coverage threshold. |
✅ Passed checks (2 passed)
| Check name | Status | Explanation |
|---|---|---|
| Description Check | ✅ Passed | Check skipped - CodeRabbit’s high-level summary is enabled. |
| Title check | ✅ Passed | The title 'Add node mutex' accurately reflects the primary change: introducing per-node mutexes throughout the codebase for thread-safe node operations. |
✏️ Tip: You can configure your own custom pre-merge checks in the settings.
📜 Recent review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
interface/src/lib/components/moonbase/FieldRenderer.sveltelib/framework/WWWData.hplatformio.ini
🚧 Files skipped from review as they are similar to previous changes (1)
- interface/src/lib/components/moonbase/FieldRenderer.svelte
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 84
File: src/MoonLight/Modules/ModuleLightsControl.h:367-379
Timestamp: 2025-12-28T10:28:24.157Z
Learning: In MoonLight PR #84: wrapping effectTask's layerP.loop() with monitorMutex while reading channelsE for monitor emission is thread-safe and prevents partial frames, but introduces serialization between effects rendering and monitor emission. Reading from channelsD (driver/front buffer) instead—with monitorMutex held only during the brief pointer swap—is superior because it maintains full parallelism (effects ║ monitor ║ driver), shows actual LED output, and minimizes mutex hold time.
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-06T13:45:03.067Z
Learning: For MoonLight on ESP32 at extreme loads (e.g., 49K channels @ 100 fps): Task watchdog can fire when AppDrivers/AppEffects process many nodes without yielding. Even with vTaskDelay(1) at loop end, the node iteration for-loops (PhysicalLayer::loopDrivers line 108-111, PhysicalLayer::loop line 73-75) can take >5 seconds without yielding, starving IDLE task (priority 0). Solution: add taskYIELD() or vTaskDelay(1) inside the node/layer iteration loops to give IDLE task chances to run and reset watchdog. Lowering task priority won't help since IDLE is already at 0 (lowest).
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-08T11:39:52.862Z
Learning: In MoonLight WebSocket monitor: When browser tabs are hidden, browsers throttle JavaScript (including WebSocket onmessage handlers) but keep TCP connections open. Monitor data (48KB frames every 20-300ms) continues arriving at TCP level but isn't processed. This fills ESP32's httpd_ws_send_frame_async() buffers, causing ESP_FAIL. When tab becomes visible, hundreds of queued frames overwhelm the JavaScript main thread, freezing the UI. Solution: use Page Visibility API (client_visibility event) to stop emitting monitor data to hidden clients, preventing buffer overflow and backlog buildup.
📚 Learning: 2025-11-29T16:17:30.619Z
Learnt from: CR
Repo: MoonModules/MoonLight PR: 0
File: misc/parking/GEMINI.md:0-0
Timestamp: 2025-11-29T16:17:30.619Z
Learning: Applies to misc/parking/**/platformio.ini : Use PlatformIO to build and upload the ESP32 backend C++ code
Applied to files:
platformio.ini
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (1)
platformio.ini (1)
59-59: LGTM! Routine build metadata update.The APP_DATE build flag update from "20260109" to "20260112" is a standard metadata change that reflects the build date. This has no functional impact on the codebase.
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
Comment @coderabbitai help to get the list of available commands and usage tips.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In @src/MoonLight/Layers/PhysicalLayer.cpp:
- Around line 74-78: Currently nodeMutex is held for the entire duration of
layer->loop(), causing unnecessary serialization; change the code to grab
nodeMutex only for the minimal critical section (e.g., copy the layer pointer or
perform the pointer swap/read under the mutex), then release the mutex before
calling layer->loop(), or alternatively move the mutex into the implementation
of Layer::loop() so it only protects specific shared-state accesses; ensure the
pointer copy/read uses nodeMutex (nodeMutex, layer) to prevent use-after-free
while leaving long-running rendering in layer->loop() outside the lock.
In @src/MoonLight/Layers/PhysicalLayer.h:
- Line 124: The code creates nodeMutex via xSemaphoreCreateMutex() without
checking for NULL and never deletes it; validate the return of
xSemaphoreCreateMutex() (e.g., after the call to xSemaphoreCreateMutex() check
if nodeMutex == NULL and log/handle the error) and add lifecycle cleanup by
declaring a PhysicalLayer destructor (~PhysicalLayer()) and implementing it to
vSemaphoreDelete(nodeMutex) (and clear nodeMutex) so future
xSemaphoreTake/xSemaphoreGive calls won't operate on a NULL/leaked handle.
🧹 Nitpick comments (3)
src/MoonBase/NodeManager.h (1)
274-277: Consider adding a timeout and null check for the mutex.Using
portMAX_DELAYblocks indefinitely, which could cause deadlocks if the mutex is held elsewhere or not properly initialized. Based on learnings from similar mutex usage, minimizing hold time and avoiding indefinite blocking improves robustness.🔒 Suggested improvements
- Add a null check for the mutex:
+if (layerP.nodeMutex == NULL) { + EXT_LOGW(ML_TAG, "nodeMutex not initialized"); + return; +}
- Consider using a timeout instead of
portMAX_DELAY:-xSemaphoreTake(layerP.nodeMutex, portMAX_DELAY); +if (xSemaphoreTake(layerP.nodeMutex, pdMS_TO_TICKS(100)) != pdTRUE) { + EXT_LOGW(ML_TAG, "Failed to acquire nodeMutex for control update"); + return; +}src/MoonLight/Layers/PhysicalLayer.cpp (2)
113-117: Add timeout for mutex acquisition.Using
portMAX_DELAYinloopDrivers()risks blocking indefinitely if the mutex is held elsewhere or not initialized. Consider using a timeout as suggested for NodeManager.h.-xSemaphoreTake(nodeMutex, portMAX_DELAY); +if (xSemaphoreTake(nodeMutex, pdMS_TO_TICKS(100)) == pdTRUE) { node->onSizeChanged(prevSize); xSemaphoreGive(nodeMutex); +} else { + EXT_LOGW(ML_TAG, "Failed to acquire nodeMutex for onSizeChanged"); +}
131-133: Consider timeout and verify protection granularity.Similar to the other mutex usage locations:
portMAX_DELAYrisks indefinite blockingnode->onLayout()could be a lengthy operation, introducing serialization across all nodesIf only specific state modifications within
onLayout()require synchronization, consider moving the mutex protection inside the function to minimize hold time. Otherwise, apply the timeout pattern:-xSemaphoreTake(nodeMutex, portMAX_DELAY); +if (xSemaphoreTake(nodeMutex, pdMS_TO_TICKS(100)) == pdTRUE) { node->onLayout(); xSemaphoreGive(nodeMutex); +} else { + EXT_LOGW(ML_TAG, "Failed to acquire nodeMutex for onLayout"); +}
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
docs/gettingstarted/installer.mddocs/moonlight/effects.mdsrc/MoonBase/NodeManager.hsrc/MoonLight/Layers/PhysicalLayer.cppsrc/MoonLight/Layers/PhysicalLayer.h
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 84
File: src/MoonLight/Modules/ModuleLightsControl.h:367-379
Timestamp: 2025-12-28T10:28:24.157Z
Learning: In MoonLight PR #84: wrapping effectTask's layerP.loop() with monitorMutex while reading channelsE for monitor emission is thread-safe and prevents partial frames, but introduces serialization between effects rendering and monitor emission. Reading from channelsD (driver/front buffer) instead—with monitorMutex held only during the brief pointer swap—is superior because it maintains full parallelism (effects ║ monitor ║ driver), shows actual LED output, and minimizes mutex hold time.
📚 Learning: 2025-12-28T10:28:24.157Z
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 84
File: src/MoonLight/Modules/ModuleLightsControl.h:367-379
Timestamp: 2025-12-28T10:28:24.157Z
Learning: In MoonLight PR #84: wrapping effectTask's layerP.loop() with monitorMutex while reading channelsE for monitor emission is thread-safe and prevents partial frames, but introduces serialization between effects rendering and monitor emission. Reading from channelsD (driver/front buffer) instead—with monitorMutex held only during the brief pointer swap—is superior because it maintains full parallelism (effects ║ monitor ║ driver), shows actual LED output, and minimizes mutex hold time.
Applied to files:
src/MoonLight/Layers/PhysicalLayer.cpp
📚 Learning: 2026-01-06T15:24:17.118Z
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 93
File: src/MoonLight/Layers/VirtualLayer.cpp:164-171
Timestamp: 2026-01-06T15:24:17.118Z
Learning: In VirtualLayer::setLight and VirtualLayer::getLight (src/MoonLight/Layers/VirtualLayer.cpp), bounds checking in the unmapped path (when indexV >= mappingTableSize or oneToOneMapping is true) is intentionally disabled for performance when driving very large LED counts (100K+). Effect developers are responsible for ensuring their code stays within bounds. This is an explicit design decision prioritizing performance over defensive checks in the hot path.
Applied to files:
src/MoonLight/Layers/PhysicalLayer.cpp
📚 Learning: 2026-01-06T15:24:29.770Z
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 93
File: src/MoonLight/Layers/VirtualLayer.cpp:212-221
Timestamp: 2026-01-06T15:24:29.770Z
Learning: In VirtualLayer::getLight (src/MoonLight/Layers/VirtualLayer.cpp), bounds checking for unmapped lights (the else branch around lines 212-221) is intentionally disabled for performance when driving 100K+ LEDs. The design places responsibility on effect developers to stay within maxChannels. This is an explicit performance vs safety tradeoff for extreme LED counts.
Applied to files:
src/MoonLight/Layers/PhysicalLayer.cpp
📚 Learning: 2026-01-06T13:45:03.067Z
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-06T13:45:03.067Z
Learning: For MoonLight on ESP32 at extreme loads (e.g., 49K channels @ 100 fps): Task watchdog can fire when AppDrivers/AppEffects process many nodes without yielding. Even with vTaskDelay(1) at loop end, the node iteration for-loops (PhysicalLayer::loopDrivers line 108-111, PhysicalLayer::loop line 73-75) can take >5 seconds without yielding, starving IDLE task (priority 0). Solution: add taskYIELD() or vTaskDelay(1) inside the node/layer iteration loops to give IDLE task chances to run and reset watchdog. Lowering task priority won't help since IDLE is already at 0 (lowest).
Applied to files:
src/MoonLight/Layers/PhysicalLayer.cpp
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (3)
docs/gettingstarted/installer.md (2)
27-27: Terminology clarification improves user guidance.Changing "Restart" to "Reset Device" is more explicit and accurately describes the action users should perform in the installer context. This improves clarity for users following the setup instructions.
144-144: Proper name capitalization corrected.Updating "Visual Studio code" to "Visual Studio Code" corrects the capitalization of the proper product name.
docs/moonlight/effects.md (1)
60-60: LGTM!Asset path capitalization corrected for consistency.
src/MoonLight/Layers/PhysicalLayer.h
Outdated
|
|
||
| Coord3D prevSize; // to calculate size change | ||
|
|
||
| SemaphoreHandle_t nodeMutex = xSemaphoreCreateMutex(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validate mutex creation and add lifecycle management.
xSemaphoreCreateMutex() can return NULL if memory allocation fails. Without validation, subsequent xSemaphoreTake/xSemaphoreGive calls on a NULL handle will fail silently or cause undefined behavior.
Additionally, the mutex should be deleted in the destructor to prevent resource leaks.
🔧 Recommended fixes
- Add validation after creation (in the constructor or setup):
// In PhysicalLayer::PhysicalLayer() or PhysicalLayer::setup()
if (nodeMutex == NULL) {
EXT_LOGE(ML_TAG, "Failed to create nodeMutex");
// Handle error appropriately
}- Add cleanup in destructor:
// In PhysicalLayer.h, add destructor declaration:
~PhysicalLayer();
// In PhysicalLayer.cpp, implement:
PhysicalLayer::~PhysicalLayer() {
if (nodeMutex != NULL) {
vSemaphoreDelete(nodeMutex);
nodeMutex = NULL;
}
}🤖 Prompt for AI Agents
In @src/MoonLight/Layers/PhysicalLayer.h at line 124, The code creates nodeMutex
via xSemaphoreCreateMutex() without checking for NULL and never deletes it;
validate the return of xSemaphoreCreateMutex() (e.g., after the call to
xSemaphoreCreateMutex() check if nodeMutex == NULL and log/handle the error) and
add lifecycle cleanup by declaring a PhysicalLayer destructor (~PhysicalLayer())
and implementing it to vSemaphoreDelete(nodeMutex) (and clear nodeMutex) so
future xSemaphoreTake/xSemaphoreGive calls won't operate on a NULL/leaked
handle.
Docs ==== - Architecture: add update and loop in Task Interaction Flow Backend ======= - Node manager: node onUpdate within nodeMutex - Nodes: nodeMutex per node (not one global), constructor and destructor checks - Physical layer: nodeMutex on onSizeChanged and loop and onLayout - Virtual layer: nodeMutex on onSizeChanged and loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In @docs/develop/architecture.md:
- Line 13: The timing listed for ESP32SvelteKit is inconsistent: the task table
row showing "10ms" should match the detailed task description which states "Runs
every: 20ms"; update the table entry for the ESP32SvelteKit row (the table cell
containing "10ms") to "20ms" so both the main table and the detailed description
align.
In @src/MoonBase/Nodes.h:
- Around line 90-93: nodeMutex can be null after creation and callers
unconditionally call xSemaphoreTake(node->nodeMutex,...), which will crash;
change the node creation path in Nodes.h so semaphore creation failure is
propagated (fail-fast) by returning an error/false or nullptr from the node
creation/initialization routine (or throw) instead of only logging, and update
all call sites in NodeManager.h, VirtualLayer.cpp and PhysicalLayer.cpp to
handle a failed creation (check returned pointer/boolean and abort
initialization) or guard every xSemaphoreTake with a null check for
node->nodeMutex until callers are fixed.
🧹 Nitpick comments (2)
docs/develop/architecture.md (1)
72-94: Excellent documentation of task interactions and mutex synchronization.The new task narrative sections clearly document the synchronization strategy using
nodeMutexto protect node updates, size changes, and loops across HTTPP, SvelteKit, Driver, and Effect tasks. This aligns well with the PR objective of preventing crashes during frequent control updates.Please address the following minor grammar/spelling issues:
📝 Suggested grammar and spelling fixes
Line 77: Hyphenate compound adjective:
-* MoonLight Modules: runs Modules::compareRecursive and Modules::checkReOrderSwap which calls postUpdate which presents a semaphore guarded updatedItem (updateProcessedSem, updateReadySem) +* MoonLight Modules: runs Modules::compareRecursive and Modules::checkReOrderSwap which calls postUpdate which presents a semaphore-guarded updatedItem (updateProcessedSem, updateReadySem)Line 83: Correct spelling:
-* processUpdatedItem() calls Module::onUpdate(), which is a virtual function which is overriden by Modules to implement custom functionality +* processUpdatedItem() calls Module::onUpdate(), which is a virtual function which is overridden by Modules to implement custom functionalityLine 82: Fix typo:
-* Module::loop() runs in the SvelteKit task and calls getUpdate() to retrieve the updatedItam in a synchronized way, getUpdate() calls processUpdatedItem() +* Module::loop() runs in the SvelteKit task and calls getUpdate() to retrieve the updatedItem in a synchronized way, getUpdate() calls processUpdatedItem()Based on static analysis hints and code review.
src/MoonBase/NodeManager.h (1)
273-279: Mutex usage is correct, but consider a null-check for defensive coding.The synchronization pattern correctly protects
updateControlandonUpdatefrom concurrentloop()execution.requestMappings()is appropriately placed outside the critical section since it only sets atomic flags.However, if mutex creation fails (as checked in Node constructor), this code will crash on
xSemaphoreTake(nullptr, ...).🛡️ Optional: Add defensive null-check
if (nodeClass != nullptr) { - - xSemaphoreTake(nodeClass->nodeMutex, portMAX_DELAY); + if (nodeClass->nodeMutex) xSemaphoreTake(nodeClass->nodeMutex, portMAX_DELAY); nodeClass->updateControl(control); nodeClass->onUpdate(updatedItem.oldValue, control); // custom onUpdate for the node - xSemaphoreGive(nodeClass->nodeMutex); + if (nodeClass->nodeMutex) xSemaphoreGive(nodeClass->nodeMutex); nodeClass->requestMappings();
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
docs/develop/architecture.mdsrc/MoonBase/NodeManager.hsrc/MoonBase/Nodes.hsrc/MoonLight/Layers/PhysicalLayer.cppsrc/MoonLight/Layers/VirtualLayer.cppsrc/MoonLight/Modules/ModuleDrivers.hsrc/MoonLight/Modules/ModuleEffects.h
✅ Files skipped from review due to trivial changes (1)
- src/MoonLight/Modules/ModuleEffects.h
🧰 Additional context used
🧠 Learnings (12)
📓 Common learnings
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 84
File: src/MoonLight/Modules/ModuleLightsControl.h:367-379
Timestamp: 2025-12-28T10:28:24.157Z
Learning: In MoonLight PR #84: wrapping effectTask's layerP.loop() with monitorMutex while reading channelsE for monitor emission is thread-safe and prevents partial frames, but introduces serialization between effects rendering and monitor emission. Reading from channelsD (driver/front buffer) instead—with monitorMutex held only during the brief pointer swap—is superior because it maintains full parallelism (effects ║ monitor ║ driver), shows actual LED output, and minimizes mutex hold time.
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-06T13:45:03.067Z
Learning: For MoonLight on ESP32 at extreme loads (e.g., 49K channels @ 100 fps): Task watchdog can fire when AppDrivers/AppEffects process many nodes without yielding. Even with vTaskDelay(1) at loop end, the node iteration for-loops (PhysicalLayer::loopDrivers line 108-111, PhysicalLayer::loop line 73-75) can take >5 seconds without yielding, starving IDLE task (priority 0). Solution: add taskYIELD() or vTaskDelay(1) inside the node/layer iteration loops to give IDLE task chances to run and reset watchdog. Lowering task priority won't help since IDLE is already at 0 (lowest).
📚 Learning: 2025-12-28T10:28:24.157Z
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 84
File: src/MoonLight/Modules/ModuleLightsControl.h:367-379
Timestamp: 2025-12-28T10:28:24.157Z
Learning: In MoonLight PR #84: wrapping effectTask's layerP.loop() with monitorMutex while reading channelsE for monitor emission is thread-safe and prevents partial frames, but introduces serialization between effects rendering and monitor emission. Reading from channelsD (driver/front buffer) instead—with monitorMutex held only during the brief pointer swap—is superior because it maintains full parallelism (effects ║ monitor ║ driver), shows actual LED output, and minimizes mutex hold time.
Applied to files:
src/MoonLight/Layers/VirtualLayer.cppsrc/MoonLight/Layers/PhysicalLayer.cppdocs/develop/architecture.md
📚 Learning: 2026-01-06T15:24:11.096Z
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 93
File: src/MoonLight/Layers/VirtualLayer.cpp:164-171
Timestamp: 2026-01-06T15:24:11.096Z
Learning: In VirtualLayer.cpp (setLight and getLight), the unmapped path intentionally disables bounds checking when indexV >= mappingTableSize or oneToOneMapping is true to optimize performance for very large LED counts (100K+). This is a deliberate design decision; only rely on this behavior if you can guarantee calls stay within bounds. Document this contract and avoid adding defensive checks in the hot path. Use explicit bounds checks in all other code paths or when the input cannot be guaranteed to be in-range.
Applied to files:
src/MoonLight/Layers/VirtualLayer.cpp
📚 Learning: 2026-01-06T15:24:29.770Z
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 93
File: src/MoonLight/Layers/VirtualLayer.cpp:212-221
Timestamp: 2026-01-06T15:24:29.770Z
Learning: In VirtualLayer::getLight (src/MoonLight/Layers/VirtualLayer.cpp), bounds checking for unmapped lights (the else branch around lines 212-221) is intentionally disabled for performance when driving 100K+ LEDs. The design places responsibility on effect developers to stay within maxChannels. This is an explicit performance vs safety tradeoff for extreme LED counts.
Applied to files:
src/MoonLight/Layers/VirtualLayer.cppsrc/MoonLight/Layers/PhysicalLayer.cpp
📚 Learning: 2026-01-06T15:24:17.118Z
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 93
File: src/MoonLight/Layers/VirtualLayer.cpp:164-171
Timestamp: 2026-01-06T15:24:17.118Z
Learning: In VirtualLayer::setLight and VirtualLayer::getLight (src/MoonLight/Layers/VirtualLayer.cpp), bounds checking in the unmapped path (when indexV >= mappingTableSize or oneToOneMapping is true) is intentionally disabled for performance when driving very large LED counts (100K+). Effect developers are responsible for ensuring their code stays within bounds. This is an explicit design decision prioritizing performance over defensive checks in the hot path.
Applied to files:
src/MoonLight/Modules/ModuleDrivers.hsrc/MoonLight/Layers/PhysicalLayer.cpp
📚 Learning: 2026-01-06T13:45:03.067Z
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-06T13:45:03.067Z
Learning: For MoonLight on ESP32 at extreme loads (e.g., 49K channels @ 100 fps): Task watchdog can fire when AppDrivers/AppEffects process many nodes without yielding. Even with vTaskDelay(1) at loop end, the node iteration for-loops (PhysicalLayer::loopDrivers line 108-111, PhysicalLayer::loop line 73-75) can take >5 seconds without yielding, starving IDLE task (priority 0). Solution: add taskYIELD() or vTaskDelay(1) inside the node/layer iteration loops to give IDLE task chances to run and reset watchdog. Lowering task priority won't help since IDLE is already at 0 (lowest).
Applied to files:
src/MoonLight/Layers/PhysicalLayer.cppdocs/develop/architecture.md
📚 Learning: 2025-12-25T13:31:32.367Z
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2025-12-25T13:31:32.367Z
Learning: For ESP32 LED drivers using DMA, I2S, LCD, or PARLIO: always pin driver tasks to Core 1 (APP_CPU), not Core 0. WiFi/BT runs at priority ~23 on Core 0 and would constantly preempt lower-priority driver tasks, breaking LED timing requirements. ESP-IDF explicitly recommends display/DMA processing on Core 1 to avoid WiFi interference.
Applied to files:
docs/develop/architecture.md
📚 Learning: 2025-12-25T13:34:40.097Z
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2025-12-25T13:34:40.097Z
Learning: For the MoonLight LED system: Effects task can run on Core 0 (PRO_CPU) since it only performs computation (calculating colors, writing to channels array) without timing-critical hardware interaction. WiFi preemption is acceptable for pure computation tasks. Only the driver task (using DMA, I2S, LCD, PARLIO) requires Core 1 (APP_CPU) for uninterrupted timing-critical hardware operations.
Applied to files:
docs/develop/architecture.md
📚 Learning: 2026-01-04T09:31:07.533Z
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 93
File: lib/framework/ESP32SvelteKit.cpp:229-229
Timestamp: 2026-01-04T09:31:07.533Z
Learning: In lib/framework/ESP32SvelteKit.cpp, the SvelteKit loop task stack size of 6KB (6 * 1024 bytes) for non-PSRAM devices has been validated through testing, showing approximately 1500 bytes of free stack remaining under load, providing adequate ~24% headroom.
Applied to files:
docs/develop/architecture.md
📚 Learning: 2025-11-29T16:17:30.619Z
Learnt from: CR
Repo: MoonModules/MoonLight PR: 0
File: misc/parking/GEMINI.md:0-0
Timestamp: 2025-11-29T16:17:30.619Z
Learning: Applies to misc/parking/src/**/*.{h,cpp} : Modify C++ backend code in the `src/` and `lib/` directories for ESP32 development
Applied to files:
docs/develop/architecture.md
📚 Learning: 2026-01-08T10:20:09.363Z
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-08T10:20:09.363Z
Learning: In MoonLight's Idle Watchdog approach: `vTaskDelay(1)` is preferred over `taskYIELD()` because taskYIELD() only yields to equal or higher priority tasks. Since Effect Task and Driver Task run at priority 3 while the Idle task runs at priority 0, only vTaskDelay(1) actually blocks the calling task and allows the Idle task to run and reset the watchdog timer.
Applied to files:
docs/develop/architecture.md
📚 Learning: 2026-01-08T11:39:52.862Z
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-08T11:39:52.862Z
Learning: In MoonLight WebSocket monitor: When browser tabs are hidden, browsers throttle JavaScript (including WebSocket onmessage handlers) but keep TCP connections open. Monitor data (48KB frames every 20-300ms) continues arriving at TCP level but isn't processed. This fills ESP32's httpd_ws_send_frame_async() buffers, causing ESP_FAIL. When tab becomes visible, hundreds of queued frames overwhelm the JavaScript main thread, freezing the UI. Solution: use Page Visibility API (client_visibility event) to stop emitting monitor data to hidden clients, preventing buffer overflow and backlog buildup.
Applied to files:
docs/develop/architecture.md
🪛 LanguageTool
docs/develop/architecture.md
[grammar] ~77-~77: Use a hyphen to join words.
Context: ...ls postUpdate which presents a semaphore guarded updatedItem (updateProcessedSem,...
(QB_NEW_EN_HYPHEN)
[grammar] ~83-~83: Ensure spelling is correct
Context: ...), which is a virtual function which is overriden by Modules to implement custom function...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (10)
docs/develop/architecture.md (2)
35-35: Verify UI responsiveness impact of timing change.The ESP32SvelteKit task cadence changed from 10ms to 20ms (doubled period). While this may reduce CPU load and potential mutex contention, please verify that UI responsiveness remains acceptable, especially for:
- WebSocket message latency
- Real-time control updates
- Monitor data refresh rates
Based on the architecture described, UI updates will now occur at half the previous frequency. If this was an intentional optimization related to the mutex synchronization work, consider documenting the rationale in the PR description.
308-310: LGTM: Correct watchdog integration pattern.The code example properly demonstrates the watchdog pattern with
esp_task_wdt_reset()at the start of the loop iteration, followed byloop()execution andvTaskDelay(1)yield. This ensures the Idle task (priority 0) gets CPU time to reset the watchdog timer, which is critical for preventing Task Watchdog Timer crashes under heavy loads.Based on learnings: vTaskDelay(1) is essential (not taskYIELD()) because it blocks the calling task and allows lower-priority Idle task to run.
src/MoonLight/Modules/ModuleDrivers.h (1)
168-174: LGTM - Comment clarifies concurrency boundary.The added comment correctly documents that after this point, the node runs concurrently in the drivers task. The
elsebranch assignment ensures proper node replacement at existing indices.src/MoonLight/Layers/VirtualLayer.cpp (2)
62-71: Mutex synchronization looks correct for loop() and onSizeChanged.The mutex acquisition/release pattern properly protects per-node operations, and
addYield(10)is correctly placed outside the critical section.
77-81: No synchronization issue:loop20ms()is a placeholder method with no actual implementations in any derived classes. The base Node class declares it as an empty virtual method, and no subclasses override it. Sinceloop20ms()executes no code, there is no risk of race conditions or state modification. The pattern inconsistency (no mutex inloop20ms()vs. mutex inloop()) is immaterial becauseloop20ms()does nothing. Additionally, the comment referencesonUpdate()which does not exist in the codebase.Likely an incorrect or invalid review comment.
src/MoonLight/Layers/PhysicalLayer.cpp (3)
74-76: Minor formatting improvement.Adding explicit braces improves readability without changing behavior.
110-122: Mutex usage in loopDrivers is consistent with VirtualLayer.The synchronization pattern correctly protects
onSizeChangedandloopoperations.addYield(10)remains outside the critical section, which is important for watchdog feeding per your previous learnings about extreme loads (49K channels @ 100 fps).
129-137: Mutex around onLayout is correct.Layout operations are now thread-safe with respect to control updates and loop execution.
src/MoonBase/Nodes.h (2)
78-78: Mutex initialization at member declaration is correct for ESP32/FreeRTOS.Creating the mutex inline ensures it's available immediately when the Node object is allocated.
95-101: Destructor cleanup is correct.The null-check before
vSemaphoreDeleteand the subsequent null assignment prevent double-free and use-after-free issues.
| if (nodeMutex == nullptr) { | ||
| EXT_LOGE(ML_TAG, "Failed to create nodeMutex"); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error logging but continued operation may lead to crashes.
When nodeMutex creation fails, the code logs an error but allows the node to continue. However, all call sites (NodeManager.h, VirtualLayer.cpp, PhysicalLayer.cpp) unconditionally call xSemaphoreTake(node->nodeMutex, ...), which will crash if nodeMutex is null.
Consider either:
- Propagating the failure (return false or throw), or
- Adding null checks at all call sites (as suggested for NodeManager.h)
🔧 Option: Fail fast by returning early or setting a flag
virtual void constructor(VirtualLayer* layer, const JsonArray& controls) {
this->layer = layer;
this->controls = controls;
if (nodeMutex == nullptr) {
EXT_LOGE(ML_TAG, "Failed to create nodeMutex");
+ // Consider: mark node as invalid so it won't be used
+ // on = false; // prevent loop execution
}
}Alternatively, ensure all mutex call sites have null guards (more defensive but adds overhead).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (nodeMutex == nullptr) { | |
| EXT_LOGE(ML_TAG, "Failed to create nodeMutex"); | |
| } | |
| } | |
| if (nodeMutex == nullptr) { | |
| EXT_LOGE(ML_TAG, "Failed to create nodeMutex"); | |
| // Consider: mark node as invalid so it won't be used | |
| // on = false; // prevent loop execution | |
| } | |
| } |
🤖 Prompt for AI Agents
In @src/MoonBase/Nodes.h around lines 90 - 93, nodeMutex can be null after
creation and callers unconditionally call xSemaphoreTake(node->nodeMutex,...),
which will crash; change the node creation path in Nodes.h so semaphore creation
failure is propagated (fail-fast) by returning an error/false or nullptr from
the node creation/initialization routine (or throw) instead of only logging, and
update all call sites in NodeManager.h, VirtualLayer.cpp and PhysicalLayer.cpp
to handle a failed creation (check returned pointer/boolean and abort
initialization) or guard every xSemaphoreTake with a null check for
node->nodeMutex until callers are fixed.
|
@coderabbitai, while making the system more robust by adding mutexes and semaphores, updating the global brightness is very slow (triggering [ 61188][I][StatefulService.h:202] beginTransaction(): [🐼] _accessMutex lightscontrol wait 100ms) and updating node controls is quite slow (triggering [395992][D][Nodes.cpp:22] updateControl(): [💫] ESP32 SvelteKit (3720) star fill = 200 t:slider p:0x3c28eb2e). Can you make an analysis why this is and how we can speed up things? |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 62 🏁 Script executed: Length of output: 23318 🏁 Script executed: Length of output: 24379 🏁 Script executed: Length of output: 2978 🏁 Script executed: Length of output: 0 🏁 Script executed: Length of output: 6966 🏁 Script executed: Length of output: 7500 🏁 Script executed: Length of output: 0 🏁 Script executed: Length of output: 114 🏁 Script executed: Length of output: 273 @ewowi, I've analyzed the performance issue and identified the root causes. The slowness is due to mutex contention where control updates from the UI thread must wait for Effect/Driver tasks that hold Root Causes1. nodeMutex held during loop() execution
|
|
@coderabbitai, regarding
Is that true? As each node has its own mutex it only waits for one frame (loop) which is about 20ms , not more isn't it? And as we don't know what the loop is doing (lots of users will make effects) and the loop will use allocated variables in onUpdate and onSizeChanged. So long story short: it needs a lock on loop but that will not take long ... |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 1722 🏁 Script executed: Length of output: 62 🏁 Script executed: Length of output: 0 🏁 Script executed: Length of output: 0 🏁 Script executed: Length of output: 1681 🏁 Script executed: Length of output: 31 🏁 Script executed: Length of output: 1047 🏁 Script executed: Length of output: 33 🏁 Script executed: Length of output: 819 🏁 Script executed: Length of output: 6966 🏁 Script executed: Length of output: 3388 🏁 Script executed: Length of output: 536 🏁 Script executed: Length of output: 0 🏁 Script executed: Length of output: 405 🏁 Script executed: Length of output: 2223 🏁 Script executed: Length of output: 2727 🏁 Script executed: Length of output: 2746 @ewowi, you're absolutely right—I apologize for the confusion! 🙏 My Error CorrectedYou're correct: each node has its own Actual Root Cause of 100ms _accessMutex WaitI found the real culprit! The monitor emission code is holding ModuleLightsControl.h lines 526-537: static unsigned long monitorMillis = 0;
if (millis() - monitorMillis >= MAX(20, layerP.lights.header.nrOfLights / 300)) { // Every 20-40ms
monitorMillis = millis();
read( // ← This acquires _accessMutex!
[&](ModuleState& _state) {
if (_socket->getActiveClients() && _state.data["monitorOn"]) {
_socket->emitEvent("monitor", (char*)layerP.lights.channelsD, ...);
}
},
_moduleName);
}The contention scenario:
StatefulService.h shows the timeout: if (xSemaphoreTakeRecursive(_accessMutex, pdMS_TO_TICKS(100))==pdFALSE) {
ESP_LOGI("🐼", "_accessMutex %s wait 100ms", originId.c_str());Node Control UpdatesThe "quite slow" node control updates are likely just the ~20ms per-node
Solutions for _accessMutex ContentionOption 1: Cache monitorOn flagAvoid // Add member variable
std::atomic<bool> cachedMonitorOn{false};
// In setup/onUpdate, cache the flag:
void onUpdate(const UpdatedItem& updatedItem) {
if (updatedItem.name == "monitorOn") {
cachedMonitorOn = _state.data["monitorOn"];
}
// ... rest of onUpdate
}
// In monitor emission:
if (_socket->getActiveClients() && cachedMonitorOn.load()) {
// Emit directly without read() call
_socket->emitEvent("monitor", ...);
}Option 2: Use non-blocking trylock pattern// Try to read, but don't block if mutex busy
if (xSemaphoreTakeRecursive(_accessMutex, 0) == pdTRUE) {
if (_state.data["monitorOn"]) {
_socket->emitEvent("monitor", ...);
}
xSemaphoreGiveRecursive(_accessMutex);
}
// If busy, skip this frame—next frame is only 20-40ms awayOption 3: Separate monitor state mutexMove Summary
The quickest fix is Option 1 (cache 🧠 Learnings used |
|
@coderabbitai, eliminating read for emitEvent monitor was an excellent catch, that solved the wait on global brightness and the _accessmutex is also not waiting anymore! [234479][D][Nodes.cpp:22] updateControl(): [💫] ESP32 SvelteKit (3720) speed = 3 t:slider p:0x3c35131d Can we do that faster, keeping in mind I want to keep the slider responsiveness so the loop must keep updating while changing the slider. Also take into account that some control updates won't recreate memory objects (e.g. speed) but others do (e.g. nr of stars), for the latter some more delay is acceptable |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 15252 🏁 Script executed: Length of output: 0 🏁 Script executed: Length of output: 6693 🏁 Script executed: Length of output: 2807 🏁 Script executed: Length of output: 4299 🏁 Script executed: Length of output: 176 🏁 Script executed: Length of output: 0 🏁 Script executed: Length of output: 0 🏁 Script executed: Length of output: 3179 🏁 Script executed: Length of output: 2701 🏁 Script executed: Length of output: 10634 🏁 Script executed: Length of output: 3135 🏁 Script executed: Length of output: 2270 @ewowi, great catch! 🎯 I've identified the root cause. The slowness isn't from the Root Cause AnalysisLooking at the timestamps, most delays are 100-300ms (not ~20ms from a single loop), with two big gaps of ~900ms. Here's what's happening: Architecture ContextFrom docs/develop/architecture.md: ESP32SvelteKit task runs on Core 1 at Priority 2, while Effect Task and Driver Task both run at Priority 3. In FreeRTOS, higher number = higher priority, so Effect/Driver tasks can preempt the SvelteKit task. The Flow:
The delays you're seeing are not mutex waits—they're the time SvelteKit spends waiting for CPU time because Effect/Driver tasks keep preempting it! Solutions🚀 Solution 1: Client-Side Throttling (Recommended)Throttle slider updates on the client side to send at most 1 update per 50-100ms. This maintains visual responsiveness while reducing server load: In let throttleTimer: ReturnType<typeof setTimeout> | null = null;
let pendingValue: number | null = null;
function handleSliderChange(event: Event) {
const value = (event.target as HTMLInputElement).value;
// Update UI immediately for responsiveness
localValue = value;
// Throttle WebSocket sends
pendingValue = value;
if (!throttleTimer) {
throttleTimer = setTimeout(() => {
if (pendingValue !== null) {
socket.sendEvent('update', {
id: controlId,
value: pendingValue
});
pendingValue = null;
}
throttleTimer = null;
}, 50); // Send max 20 updates/sec
}
}Benefits:
🔧 Solution 2: Server-Side Update CoalescingProcess only the latest value if multiple updates arrive in quick succession: In NodeManager.h, replace the control update section: // Add member variable to track pending updates per control
struct PendingUpdate {
uint8_t nodeIndex;
uint8_t controlIndex;
JsonObject control;
unsigned long timestamp;
};
std::vector<PendingUpdate> pendingUpdates;
// In onUpdate(), instead of immediate processing:
else if (updatedItem.parent[1] == "controls" && updatedItem.name == "value") {
JsonObject control = nodeState["controls"][updatedItem.index[1]];
// Check if there's already a pending update for this control
bool found = false;
for (auto& pending : pendingUpdates) {
if (pending.nodeIndex == updatedItem.index[0] &&
pending.controlIndex == updatedItem.index[1]) {
// Update existing pending update with latest value
pending.control = control;
pending.timestamp = millis();
found = true;
break;
}
}
if (!found) {
pendingUpdates.push_back({
updatedItem.index[0],
updatedItem.index[1],
control,
millis()
});
}
}
// Process pending updates (call from Module::loop every ~50ms):
void processPendingControlUpdates() {
if (pendingUpdates.empty()) return;
for (const auto& pending : pendingUpdates) {
if (pending.nodeIndex < nodes->size()) {
Node* nodeClass = (*nodes)[pending.nodeIndex];
if (nodeClass) {
xSemaphoreTake(nodeClass->nodeMutex, portMAX_DELAY);
nodeClass->updateControl(pending.control);
nodeClass->onUpdate("", pending.control);
xSemaphoreGive(nodeClass->nodeMutex);
nodeClass->requestMappings();
}
}
}
pendingUpdates.clear();
}⏱️ Solution 3: Timeout Instead of Blocking ForeverReplace if (xSemaphoreTake(nodeClass->nodeMutex, pdMS_TO_TICKS(100)) == pdTRUE) {
nodeClass->updateControl(control);
nodeClass->onUpdate(updatedItem.oldValue, control);
xSemaphoreGive(nodeClass->nodeMutex);
nodeClass->requestMappings();
} else {
EXT_LOGD(ML_TAG, "⏱️ Skipped control update (node busy)");
// Optionally retry later or drop the update
}🎯 Solution 4: Differentiate Control TypesFor "fast" controls like speed (no memory reallocation), process immediately. For "slow" controls like "number of stars", allow delays: // Add control metadata
bool isHeavyUpdate = (control["name"] == "count" || control["name"] == "size");
if (!isHeavyUpdate) {
// Fast path: try-lock with short timeout
if (xSemaphoreTake(nodeClass->nodeMutex, pdMS_TO_TICKS(20)) == pdTRUE) {
nodeClass->updateControl(control);
nodeClass->onUpdate(updatedItem.oldValue, control);
xSemaphoreGive(nodeClass->nodeMutex);
nodeClass->requestMappings();
} else {
// Queue for later if node is busy
}
} else {
// Heavy update: use longer timeout (acceptable delay)
xSemaphoreTake(nodeClass->nodeMutex, pdMS_TO_TICKS(100));
// ... update ...
}RecommendationStart with Solution 1 (client-side throttling)—it's the cleanest fix that maintains responsiveness while reducing unnecessary traffic. If you still see issues, combine it with Solution 3 (timeouts) or Solution 4 (differentiate control types). The key insight: Your per-node mutex design is correct—the issue is too many updates arriving too fast combined with task priority preemption. Throttling on the client side solves both problems elegantly. Would you like me to prepare a patch for the client-side throttling? 🧠 Learnings used |
|
@coderabbitai, should I give the sveltekit task also priority 3 to give it a fair use of time, when controls are updated specifically but also for all other use cases ? |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 11846 🏁 Script executed: Length of output: 2887 🏁 Script executed: Length of output: 2867 🏁 Script executed: Length of output: 0 🏁 Script executed: Length of output: 0 @ewowi, No, I recommend keeping SvelteKit at Priority 2 for the following reasons: Why Priority 2 is Correct1. LED Timing is More Critical Than UI Responsiveness 🎯The current design is intentional per the architecture docs:
Driver Task performs timing-critical DMA operations that must maintain consistent ~60fps output. If SvelteKit were Priority 3, heavy WebSocket processing or JSON serialization could delay LED frames, causing visible glitches. 2. Priority 3 Won't Actually Fix the Control Update Speed IssueEven with equal priority (3), FreeRTOS uses round-robin scheduling—Driver Task would still get CPU time, interleaving with SvelteKit. The delays you're seeing aren't from priority preemption alone; they're from:
Raising priority would only reduce, not eliminate, the delays. 3. Current Priority Already Uses Inheritance ✅FreeRTOS has priority inheritance built into mutexes:
What You SHOULD Do Instead✅ Option 1: Client-Side Throttling (Best Solution)Throttle slider updates in the browser to send max 1 update per 50-100ms (as I detailed in my previous response). This gives you:
✅ Option 2: Differentiated Priority for Control Updates OnlyIf you want better control update responsiveness specifically, create a dedicated control update task at Priority 3: // In main.cpp
xTaskCreatePinnedToCore(
controlUpdateTask,
"ControlUpdates",
2 * 1024,
NULL,
3, // Same as Driver/Effect
&controlUpdateTaskHandle,
1 // Core 1
);Then move the mutex-protected control update logic from NodeManager into this task via a lightweight message queue. This way:
|
|
@coderabbitai, regarding slider control. Please provide a solution which fits the current code. Ho tp FieldRederer.svelte, there is so we should add Client-Side Throttling here isn't it? |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 81 🏁 Script executed: Length of output: 58 🏁 Script executed: Length of output: 11855 @ewowi, yes exactly! Here's the client-side throttling solution that fits perfectly into your existing Add Throttling to FieldRenderer.svelteAdd this to the // Throttling for slider control updates
let throttleTimer: ReturnType<typeof setTimeout> | null = null;
let pendingSliderEvent: Event | null = null;
function handleSliderInput(event: Event) {
// Update local value immediately for visual responsiveness
// (bind:value already handles this)
// Store the event for throttled sending
pendingSliderEvent = event;
// Only send to server once per 50ms
if (!throttleTimer) {
throttleTimer = setTimeout(() => {
if (pendingSliderEvent) {
onChange(pendingSliderEvent);
pendingSliderEvent = null;
}
throttleTimer = null;
}, 50); // Max 20 updates/second
}
}
// Clean up throttle timer on component destroy
onDestroy(() => {
clearInterval(interval);
if (throttleTimer) {
clearTimeout(throttleTimer);
// Send final pending update if any
if (pendingSliderEvent) {
onChange(pendingSliderEvent);
}
}
});Update the existing onDestroy(() => {
// console.log('FieldRenderer clearing interval');
clearInterval(interval);
// Clean up throttle timer
if (throttleTimer) {
clearTimeout(throttleTimer);
// Send final pending update if any
if (pendingSliderEvent) {
onChange(pendingSliderEvent);
}
}
});Update the slider input event handler (line 179): {:else if property.type == 'slider'}
<!-- range colors: https://daisyui.com/components/range/
on:input: throttled response to server for performance
-->
<input
type="range"
min={property.min ? property.min : 0}
max={property.max ? property.max : 255}
{step}
class={'flex-1 range ' +
(disabled == false
? property.color == 'Red'
? 'range-error'
: property.color == 'Green'
? 'range-success'
: 'range-primary'
: 'range-secondary')}
{disabled}
bind:value
on:input={handleSliderInput}
/>
{#if hasNumber}
<input
type="number"
min={property.min ? property.min : 0}
max={property.max ? property.max : 255}
{step}
class="input"
style="height: 2rem; width: 5rem"
{disabled}
bind:value
on:change={onChange}
/>
{/if}What This Does
Expected Results
This should make your slider controls feel much more responsive! 🎯 🧠 Learnings used |
Doc ==== - Installer language check frontend ======= - Field Renderer: throttle slider updates backend ======= Lights control: monitoring without read monitorOn (to avoid _accessMutex congestion)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @docs/develop/architecture.md:
- Around line 306-311: The code sample calls esp_task_wdt_reset() before loop(),
but the documentation text says the watchdog reset occurs after loop() — fix by
making code and prose consistent: either move the esp_task_wdt_reset() call to
after loop() (update the while(true) body around loop(), esp_task_wdt_reset(),
then vTaskDelay(1)) if you want the reset to happen post-loop, or update the
surrounding documentation to state that esp_task_wdt_reset() is called before
loop() and describe the implications; refer to the loop(), esp_task_wdt_reset(),
vTaskDelay(), and the while(true) loop in the example when applying the change.
🧹 Nitpick comments (2)
interface/src/lib/components/moonbase/FieldRenderer.svelte (1)
41-58: Consider leading-edge throttle for immediate first update.The current throttling implementation waits up to 50ms before sending the first event. For better perceived responsiveness, consider a "leading-edge" pattern that sends the first event immediately, then throttles subsequent events:
♻️ Optional: Leading-edge throttle for better UX
function handleSliderInput(event: Event) { - // Update local value immediately for visual responsiveness - // (bind:value already handles this) - - // Store the event for throttled sending pendingSliderEvent = event; - // Only send to server once per 50ms if (!throttleTimer) { + // Send immediately on first interaction + onChange(event); + pendingSliderEvent = null; + throttleTimer = setTimeout(() => { if (pendingSliderEvent) { onChange(pendingSliderEvent); pendingSliderEvent = null; } throttleTimer = null; - }, 50); // Max 20 updates/second + }, 50); } }docs/gettingstarted/installer.md (1)
1-172: Documentation improvements look good.The editorial changes improve clarity and consistency. The wording updates and formatting standardization are appropriate.
Note: Static analysis flagged multiple images missing alt text (lines 66, 70, 78, 92, 103, 112, 120, 124, 151, 163). While these appear to be pre-existing issues not introduced by this PR, adding descriptive alt text would improve accessibility.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
docs/develop/architecture.mddocs/gettingstarted/installer.mdinterface/src/lib/components/moonbase/FieldRenderer.sveltelib/framework/WWWData.hsrc/MoonLight/Modules/ModuleLightsControl.h
🧰 Additional context used
🧠 Learnings (10)
📓 Common learnings
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 84
File: src/MoonLight/Modules/ModuleLightsControl.h:367-379
Timestamp: 2025-12-28T10:28:24.157Z
Learning: In MoonLight PR #84: wrapping effectTask's layerP.loop() with monitorMutex while reading channelsE for monitor emission is thread-safe and prevents partial frames, but introduces serialization between effects rendering and monitor emission. Reading from channelsD (driver/front buffer) instead—with monitorMutex held only during the brief pointer swap—is superior because it maintains full parallelism (effects ║ monitor ║ driver), shows actual LED output, and minimizes mutex hold time.
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-08T11:39:52.862Z
Learning: In MoonLight WebSocket monitor: When browser tabs are hidden, browsers throttle JavaScript (including WebSocket onmessage handlers) but keep TCP connections open. Monitor data (48KB frames every 20-300ms) continues arriving at TCP level but isn't processed. This fills ESP32's httpd_ws_send_frame_async() buffers, causing ESP_FAIL. When tab becomes visible, hundreds of queued frames overwhelm the JavaScript main thread, freezing the UI. Solution: use Page Visibility API (client_visibility event) to stop emitting monitor data to hidden clients, preventing buffer overflow and backlog buildup.
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-06T13:45:03.067Z
Learning: For MoonLight on ESP32 at extreme loads (e.g., 49K channels @ 100 fps): Task watchdog can fire when AppDrivers/AppEffects process many nodes without yielding. Even with vTaskDelay(1) at loop end, the node iteration for-loops (PhysicalLayer::loopDrivers line 108-111, PhysicalLayer::loop line 73-75) can take >5 seconds without yielding, starving IDLE task (priority 0). Solution: add taskYIELD() or vTaskDelay(1) inside the node/layer iteration loops to give IDLE task chances to run and reset watchdog. Lowering task priority won't help since IDLE is already at 0 (lowest).
📚 Learning: 2026-01-06T13:45:03.067Z
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-06T13:45:03.067Z
Learning: For MoonLight on ESP32 at extreme loads (e.g., 49K channels @ 100 fps): Task watchdog can fire when AppDrivers/AppEffects process many nodes without yielding. Even with vTaskDelay(1) at loop end, the node iteration for-loops (PhysicalLayer::loopDrivers line 108-111, PhysicalLayer::loop line 73-75) can take >5 seconds without yielding, starving IDLE task (priority 0). Solution: add taskYIELD() or vTaskDelay(1) inside the node/layer iteration loops to give IDLE task chances to run and reset watchdog. Lowering task priority won't help since IDLE is already at 0 (lowest).
Applied to files:
docs/develop/architecture.md
📚 Learning: 2025-12-25T13:34:40.097Z
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2025-12-25T13:34:40.097Z
Learning: For the MoonLight LED system: Effects task can run on Core 0 (PRO_CPU) since it only performs computation (calculating colors, writing to channels array) without timing-critical hardware interaction. WiFi preemption is acceptable for pure computation tasks. Only the driver task (using DMA, I2S, LCD, PARLIO) requires Core 1 (APP_CPU) for uninterrupted timing-critical hardware operations.
Applied to files:
docs/develop/architecture.md
📚 Learning: 2025-12-25T13:31:32.367Z
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2025-12-25T13:31:32.367Z
Learning: For ESP32 LED drivers using DMA, I2S, LCD, or PARLIO: always pin driver tasks to Core 1 (APP_CPU), not Core 0. WiFi/BT runs at priority ~23 on Core 0 and would constantly preempt lower-priority driver tasks, breaking LED timing requirements. ESP-IDF explicitly recommends display/DMA processing on Core 1 to avoid WiFi interference.
Applied to files:
docs/develop/architecture.md
📚 Learning: 2026-01-04T09:31:07.533Z
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 93
File: lib/framework/ESP32SvelteKit.cpp:229-229
Timestamp: 2026-01-04T09:31:07.533Z
Learning: In lib/framework/ESP32SvelteKit.cpp, the SvelteKit loop task stack size of 6KB (6 * 1024 bytes) for non-PSRAM devices has been validated through testing, showing approximately 1500 bytes of free stack remaining under load, providing adequate ~24% headroom.
Applied to files:
docs/develop/architecture.md
📚 Learning: 2026-01-08T10:20:09.363Z
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-08T10:20:09.363Z
Learning: In MoonLight's Idle Watchdog approach: `vTaskDelay(1)` is preferred over `taskYIELD()` because taskYIELD() only yields to equal or higher priority tasks. Since Effect Task and Driver Task run at priority 3 while the Idle task runs at priority 0, only vTaskDelay(1) actually blocks the calling task and allows the Idle task to run and reset the watchdog timer.
Applied to files:
docs/develop/architecture.md
📚 Learning: 2025-11-29T16:17:30.619Z
Learnt from: CR
Repo: MoonModules/MoonLight PR: 0
File: misc/parking/GEMINI.md:0-0
Timestamp: 2025-11-29T16:17:30.619Z
Learning: Applies to misc/parking/src/**/*.{h,cpp} : Modify C++ backend code in the `src/` and `lib/` directories for ESP32 development
Applied to files:
docs/develop/architecture.md
📚 Learning: 2026-01-08T11:39:52.862Z
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-08T11:39:52.862Z
Learning: In MoonLight WebSocket monitor: When browser tabs are hidden, browsers throttle JavaScript (including WebSocket onmessage handlers) but keep TCP connections open. Monitor data (48KB frames every 20-300ms) continues arriving at TCP level but isn't processed. This fills ESP32's httpd_ws_send_frame_async() buffers, causing ESP_FAIL. When tab becomes visible, hundreds of queued frames overwhelm the JavaScript main thread, freezing the UI. Solution: use Page Visibility API (client_visibility event) to stop emitting monitor data to hidden clients, preventing buffer overflow and backlog buildup.
Applied to files:
docs/develop/architecture.md
📚 Learning: 2025-12-28T10:28:24.157Z
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 84
File: src/MoonLight/Modules/ModuleLightsControl.h:367-379
Timestamp: 2025-12-28T10:28:24.157Z
Learning: In MoonLight PR #84: wrapping effectTask's layerP.loop() with monitorMutex while reading channelsE for monitor emission is thread-safe and prevents partial frames, but introduces serialization between effects rendering and monitor emission. Reading from channelsD (driver/front buffer) instead—with monitorMutex held only during the brief pointer swap—is superior because it maintains full parallelism (effects ║ monitor ║ driver), shows actual LED output, and minimizes mutex hold time.
Applied to files:
docs/develop/architecture.mdsrc/MoonLight/Modules/ModuleLightsControl.h
📚 Learning: 2026-01-06T15:24:29.770Z
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 93
File: src/MoonLight/Layers/VirtualLayer.cpp:212-221
Timestamp: 2026-01-06T15:24:29.770Z
Learning: In VirtualLayer::getLight (src/MoonLight/Layers/VirtualLayer.cpp), bounds checking for unmapped lights (the else branch around lines 212-221) is intentionally disabled for performance when driving 100K+ LEDs. The design places responsibility on effect developers to stay within maxChannels. This is an explicit performance vs safety tradeoff for extreme LED counts.
Applied to files:
src/MoonLight/Modules/ModuleLightsControl.h
🪛 LanguageTool
docs/gettingstarted/installer.md
[style] ~133-~133: Using many exclamation marks might seem excessive (in this case: 24 exclamation marks for a text that’s 10358 characters long)
Context: ...lation/#usb-to-serial-chip-drivers).
!!! info "Next steps"
* You can now ch...
(EN_EXCESSIVE_EXCLAMATION)
🪛 markdownlint-cli2 (0.18.1)
docs/gettingstarted/installer.md
66-66: Images should have alternate text (alt text)
(MD045, no-alt-text)
70-70: Images should have alternate text (alt text)
(MD045, no-alt-text)
78-78: Images should have alternate text (alt text)
(MD045, no-alt-text)
92-92: Images should have alternate text (alt text)
(MD045, no-alt-text)
103-103: Images should have alternate text (alt text)
(MD045, no-alt-text)
112-112: Images should have alternate text (alt text)
(MD045, no-alt-text)
120-120: Images should have alternate text (alt text)
(MD045, no-alt-text)
124-124: Images should have alternate text (alt text)
(MD045, no-alt-text)
135-135: Spaces inside emphasis markers
(MD037, no-space-in-emphasis)
151-151: Images should have alternate text (alt text)
(MD045, no-alt-text)
163-163: Images should have alternate text (alt text)
(MD045, no-alt-text)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (3)
src/MoonLight/Modules/ModuleLightsControl.h (1)
511-528: LGTM! Direct state access removes _accessMutex contention.This change correctly addresses the performance issue identified in the PR discussion where
read()was holding_accessMutexevery 20-40ms for monitor emission, causing brightness updates to timeout waiting 100ms. Directly accessing_state.data["monitorOn"]is safe here since it's a simple boolean check where a momentary stale read has negligible impact (at worst, one frame is skipped or sent unnecessarily).Based on learnings, this maintains parallelism between effects rendering, monitor emission, and driver tasks.
interface/src/lib/components/moonbase/FieldRenderer.svelte (1)
60-70: LGTM! Proper cleanup ensures no lost updates.The cleanup correctly clears the timeout and flushes any pending slider event on component destruction, ensuring the final user value is sent to the server.
docs/develop/architecture.md (1)
72-94: LGTM! Clear documentation of task responsibilities and synchronization.The new task narratives clearly document which operations are protected by
nodeMutex, helping developers understand the synchronization model. This aligns well with the per-node mutex changes inVirtualLayer.cppandPhysicalLayer.cpp.
| while (true) { | ||
| loop(); | ||
| esp_task_wdt_reset(); | ||
| loop(); | ||
| vTaskDelay(1); | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code example doesn't match the description.
The code shows esp_task_wdt_reset() being called before loop(), but the AI summary states it was moved to occur "after loop() execution." If the intent is to reset the watchdog after the loop completes, the code should be:
📝 Suggested fix if watchdog reset should be after loop
while (true) {
- esp_task_wdt_reset();
loop();
+ esp_task_wdt_reset();
vTaskDelay(1);
}Otherwise, update the surrounding documentation to clarify the current behavior.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| while (true) { | |
| loop(); | |
| esp_task_wdt_reset(); | |
| loop(); | |
| vTaskDelay(1); | |
| } | |
| while (true) { | |
| loop(); | |
| esp_task_wdt_reset(); | |
| vTaskDelay(1); | |
| } |
🤖 Prompt for AI Agents
In @docs/develop/architecture.md around lines 306 - 311, The code sample calls
esp_task_wdt_reset() before loop(), but the documentation text says the watchdog
reset occurs after loop() — fix by making code and prose consistent: either move
the esp_task_wdt_reset() call to after loop() (update the while(true) body
around loop(), esp_task_wdt_reset(), then vTaskDelay(1)) if you want the reset
to happen post-loop, or update the surrounding documentation to state that
esp_task_wdt_reset() is called before loop() and describe the implications;
refer to the loop(), esp_task_wdt_reset(), vTaskDelay(), and the while(true)
loop in the example when applying the change.
|
@coderabbitai, one more thing, can you check virtual ~Node() I understood C++ constructors are not inherited, so I declare it as normal function: constructor(). Should we do the same for ~Node() ? And what about derived classes, e.g. ~TetrixEffect() override { freeMB(drops); }; |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 580 🏁 Script executed: Length of output: 2138 🏁 Script executed: Length of output: 7927 @ewowi, great question! The destructor behavior is fundamentally different from constructors in C++. Here's the complete explanation: 1. Destructor Syntax:
|
| Aspect | Constructor Pattern | Destructor Pattern |
|---|---|---|
| Syntax | constructor() (custom) |
~Node() (mandatory C++ syntax) |
| Inheritance | Manual call from derived | Automatic chain (derived → base) |
| Virtual needed? | N/A | Yes (for polymorphic cleanup) |
| Your code | ✅ Correct custom pattern | ✅ Correct C++ destructors |
No changes needed! Your destructor implementation is correct and follows C++ best practices. 🎉
🐰 Mood
/\___/\
( ^._.^ ) "Destructors are automatic!
\ ~ / No manual cleanup needed!"
(__m__m)
🧠 Learnings used
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-06T13:45:03.067Z
Learning: For MoonLight on ESP32 at extreme loads (e.g., 49K channels @ 100 fps): Task watchdog can fire when AppDrivers/AppEffects process many nodes without yielding. Even with vTaskDelay(1) at loop end, the node iteration for-loops (PhysicalLayer::loopDrivers line 108-111, PhysicalLayer::loop line 73-75) can take >5 seconds without yielding, starving IDLE task (priority 0). Solution: add taskYIELD() or vTaskDelay(1) inside the node/layer iteration loops to give IDLE task chances to run and reset watchdog. Lowering task priority won't help since IDLE is already at 0 (lowest).
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 84
File: src/MoonLight/Modules/ModuleLightsControl.h:367-379
Timestamp: 2025-12-28T10:28:24.157Z
Learning: In MoonLight PR #84: wrapping effectTask's layerP.loop() with monitorMutex while reading channelsE for monitor emission is thread-safe and prevents partial frames, but introduces serialization between effects rendering and monitor emission. Reading from channelsD (driver/front buffer) instead—with monitorMutex held only during the brief pointer swap—is superior because it maintains full parallelism (effects ║ monitor ║ driver), shows actual LED output, and minimizes mutex hold time.
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2025-12-25T13:34:40.097Z
Learning: For the MoonLight LED system: Effects task can run on Core 0 (PRO_CPU) since it only performs computation (calculating colors, writing to channels array) without timing-critical hardware interaction. WiFi preemption is acceptable for pure computation tasks. Only the driver task (using DMA, I2S, LCD, PARLIO) requires Core 1 (APP_CPU) for uninterrupted timing-critical hardware operations.
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-08T11:39:52.862Z
Learning: In MoonLight WebSocket monitor: When browser tabs are hidden, browsers throttle JavaScript (including WebSocket onmessage handlers) but keep TCP connections open. Monitor data (48KB frames every 20-300ms) continues arriving at TCP level but isn't processed. This fills ESP32's httpd_ws_send_frame_async() buffers, causing ESP_FAIL. When tab becomes visible, hundreds of queued frames overwhelm the JavaScript main thread, freezing the UI. Solution: use Page Visibility API (client_visibility event) to stop emitting monitor data to hidden clients, preventing buffer overflow and backlog buildup.
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 93
File: src/MoonLight/Layers/VirtualLayer.cpp:212-221
Timestamp: 2026-01-06T15:24:29.770Z
Learning: In VirtualLayer::getLight (src/MoonLight/Layers/VirtualLayer.cpp), bounds checking for unmapped lights (the else branch around lines 212-221) is intentionally disabled for performance when driving 100K+ LEDs. The design places responsibility on effect developers to stay within maxChannels. This is an explicit performance vs safety tradeoff for extreme LED counts.
To prevent crashes when updating controls frequently
Possible fix for #101
Summary by CodeRabbit
Documentation
User Experience
Chores
✏️ Tip: You can customize this high-level summary in your review settings.