Skip to content

Conversation

@ewowi
Copy link
Collaborator

@ewowi ewowi commented Jan 11, 2026

To prevent crashes when updating controls frequently

Possible fix for #101

Summary by CodeRabbit

  • Documentation

    • Reworded and clarified installer instructions, callouts, image captions, update/compatibility notes, and standardized punctuation/links.
    • Corrected MoonLight effects asset references and expanded architecture docs with task descriptions, core assignments, and timing notes.
  • User Experience

    • Slider inputs are now throttled for smoother local feedback and reduced rapid updates.
  • Chores

    • Strengthened internal thread-safety around per-node operations.

✏️ Tip: You can customize this high-level summary in your review settings.

To prevent crashes when updating controls frequently
@coderabbitai
Copy link

coderabbitai bot commented Jan 11, 2026

Note

.coderabbit.yaml has unrecognized properties

CodeRabbit is using all valid settings from your configuration. Unrecognized properties (listed below) have been ignored and may indicate typos or deprecated fields that can be removed.

⚠️ Parsing warnings (1)
Validation error: Unrecognized key(s) in object: 'review', 'context'
⚙️ Configuration instructions
  • Please see the configuration documentation for more information.
  • You can also validate your configuration using the online YAML validator.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json
📝 Walkthrough

Walkthrough

Per-node mutexes were added to Node and used in NodeManager, PhysicalLayer, and VirtualLayer to serialize per-node state updates and callbacks; docs and UI were edited (installer, effects asset names, architecture) and FieldRenderer gained throttled slider input handling; minor module comment/assignment fixes applied.

Changes

Cohort / File(s) Summary
Node core
src/MoonBase/Nodes.h
Add nodeMutex (created with xSemaphoreCreateMutex()), null-check in constructor, and destructor that deletes the mutex.
Manager synchronization
src/MoonBase/NodeManager.h
Acquire/release per-node mutex around control update and onUpdate calls to protect control value modifications.
Layer synchronization
src/MoonLight/Layers/PhysicalLayer.cpp, src/MoonLight/Layers/VirtualLayer.cpp
Guard node->loop(), node->onSizeChanged(), and related per-node callbacks with xSemaphoreTake/xSemaphoreGive when nodes are active or size changes.
Module notes & assignment
src/MoonLight/Modules/ModuleDrivers.h, src/MoonLight/Modules/ModuleEffects.h
Ensure driver-layer node assignment in existing-index path; add comments marking where effects run concurrently.
UI component
interface/src/lib/components/moonbase/FieldRenderer.svelte
Add throttled slider input handling (batch updates ~50ms), immediate local visual updates, and flush-on-destroy behavior.
Control logic tweak
src/MoonLight/Modules/ModuleLightsControl.h
Replace read-with-callback usage with direct inline checks against _state.data["monitorOn"].
Configuration
platformio.ini
Bump APP_DATE build flag from 20260109 to 20260112.
Documentation
docs/gettingstarted/installer.md, docs/moonlight/effects.md, docs/develop/architecture.md
Reword installer steps (terminology and formatting), correct effect asset filenames (capitalize StarSky files), and expand architecture task/core/timing and synchronization notes.

Sequence Diagram(s)

sequenceDiagram
  participant EffectTask as Effect/Driver Task
  participant Layer as Physical/Virtual Layer
  participant Manager as NodeManager
  participant Node as Node (has nodeMutex)

  EffectTask->>Layer: call loop()
  Layer->>Manager: iterate nodes / trigger control update path
  Manager->>Node: xSemaphoreTake(nodeMutex)
  Manager->>Node: updateControl(...) / onUpdate(...)
  Manager->>Node: xSemaphoreGive(nodeMutex)
  Layer->>Node: xSemaphoreTake(nodeMutex) before node->loop()/onSizeChanged()
  Node->>Node: node->loop()/onSizeChanged() (protected)
  Node-->>Layer: xSemaphoreGive(nodeMutex)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

enhancement

Poem

🐰 I hopped through code with tiny paws,
I tucked each node behind locked jaws,
Sliders whisper, batched and bright,
Docs polished for the installer’s light,
Concurrency snug — a carroted applause! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add node mutex' accurately reflects the primary change: introducing per-node mutexes throughout the codebase for thread-safe node operations.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 336af74 and 9db9dc5.

📒 Files selected for processing (3)
  • interface/src/lib/components/moonbase/FieldRenderer.svelte
  • lib/framework/WWWData.h
  • platformio.ini
🚧 Files skipped from review as they are similar to previous changes (1)
  • interface/src/lib/components/moonbase/FieldRenderer.svelte
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 84
File: src/MoonLight/Modules/ModuleLightsControl.h:367-379
Timestamp: 2025-12-28T10:28:24.157Z
Learning: In MoonLight PR #84: wrapping effectTask's layerP.loop() with monitorMutex while reading channelsE for monitor emission is thread-safe and prevents partial frames, but introduces serialization between effects rendering and monitor emission. Reading from channelsD (driver/front buffer) instead—with monitorMutex held only during the brief pointer swap—is superior because it maintains full parallelism (effects ║ monitor ║ driver), shows actual LED output, and minimizes mutex hold time.
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-06T13:45:03.067Z
Learning: For MoonLight on ESP32 at extreme loads (e.g., 49K channels @ 100 fps): Task watchdog can fire when AppDrivers/AppEffects process many nodes without yielding. Even with vTaskDelay(1) at loop end, the node iteration for-loops (PhysicalLayer::loopDrivers line 108-111, PhysicalLayer::loop line 73-75) can take >5 seconds without yielding, starving IDLE task (priority 0). Solution: add taskYIELD() or vTaskDelay(1) inside the node/layer iteration loops to give IDLE task chances to run and reset watchdog. Lowering task priority won't help since IDLE is already at 0 (lowest).
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-08T11:39:52.862Z
Learning: In MoonLight WebSocket monitor: When browser tabs are hidden, browsers throttle JavaScript (including WebSocket onmessage handlers) but keep TCP connections open. Monitor data (48KB frames every 20-300ms) continues arriving at TCP level but isn't processed. This fills ESP32's httpd_ws_send_frame_async() buffers, causing ESP_FAIL. When tab becomes visible, hundreds of queued frames overwhelm the JavaScript main thread, freezing the UI. Solution: use Page Visibility API (client_visibility event) to stop emitting monitor data to hidden clients, preventing buffer overflow and backlog buildup.
📚 Learning: 2025-11-29T16:17:30.619Z
Learnt from: CR
Repo: MoonModules/MoonLight PR: 0
File: misc/parking/GEMINI.md:0-0
Timestamp: 2025-11-29T16:17:30.619Z
Learning: Applies to misc/parking/**/platformio.ini : Use PlatformIO to build and upload the ESP32 backend C++ code

Applied to files:

  • platformio.ini
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build
🔇 Additional comments (1)
platformio.ini (1)

59-59: LGTM! Routine build metadata update.

The APP_DATE build flag update from "20260109" to "20260112" is a standard metadata change that reflects the build date. This has no functional impact on the codebase.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In @src/MoonLight/Layers/PhysicalLayer.cpp:
- Around line 74-78: Currently nodeMutex is held for the entire duration of
layer->loop(), causing unnecessary serialization; change the code to grab
nodeMutex only for the minimal critical section (e.g., copy the layer pointer or
perform the pointer swap/read under the mutex), then release the mutex before
calling layer->loop(), or alternatively move the mutex into the implementation
of Layer::loop() so it only protects specific shared-state accesses; ensure the
pointer copy/read uses nodeMutex (nodeMutex, layer) to prevent use-after-free
while leaving long-running rendering in layer->loop() outside the lock.

In @src/MoonLight/Layers/PhysicalLayer.h:
- Line 124: The code creates nodeMutex via xSemaphoreCreateMutex() without
checking for NULL and never deletes it; validate the return of
xSemaphoreCreateMutex() (e.g., after the call to xSemaphoreCreateMutex() check
if nodeMutex == NULL and log/handle the error) and add lifecycle cleanup by
declaring a PhysicalLayer destructor (~PhysicalLayer()) and implementing it to
vSemaphoreDelete(nodeMutex) (and clear nodeMutex) so future
xSemaphoreTake/xSemaphoreGive calls won't operate on a NULL/leaked handle.
🧹 Nitpick comments (3)
src/MoonBase/NodeManager.h (1)

274-277: Consider adding a timeout and null check for the mutex.

Using portMAX_DELAY blocks indefinitely, which could cause deadlocks if the mutex is held elsewhere or not properly initialized. Based on learnings from similar mutex usage, minimizing hold time and avoiding indefinite blocking improves robustness.

🔒 Suggested improvements
  1. Add a null check for the mutex:
+if (layerP.nodeMutex == NULL) {
+  EXT_LOGW(ML_TAG, "nodeMutex not initialized");
+  return;
+}
  1. Consider using a timeout instead of portMAX_DELAY:
-xSemaphoreTake(layerP.nodeMutex, portMAX_DELAY);
+if (xSemaphoreTake(layerP.nodeMutex, pdMS_TO_TICKS(100)) != pdTRUE) {
+  EXT_LOGW(ML_TAG, "Failed to acquire nodeMutex for control update");
+  return;
+}
src/MoonLight/Layers/PhysicalLayer.cpp (2)

113-117: Add timeout for mutex acquisition.

Using portMAX_DELAY in loopDrivers() risks blocking indefinitely if the mutex is held elsewhere or not initialized. Consider using a timeout as suggested for NodeManager.h.

-xSemaphoreTake(nodeMutex, portMAX_DELAY);
+if (xSemaphoreTake(nodeMutex, pdMS_TO_TICKS(100)) == pdTRUE) {
   node->onSizeChanged(prevSize);
   xSemaphoreGive(nodeMutex);
+} else {
+  EXT_LOGW(ML_TAG, "Failed to acquire nodeMutex for onSizeChanged");
+}

131-133: Consider timeout and verify protection granularity.

Similar to the other mutex usage locations:

  1. portMAX_DELAY risks indefinite blocking
  2. node->onLayout() could be a lengthy operation, introducing serialization across all nodes

If only specific state modifications within onLayout() require synchronization, consider moving the mutex protection inside the function to minimize hold time. Otherwise, apply the timeout pattern:

-xSemaphoreTake(nodeMutex, portMAX_DELAY);
+if (xSemaphoreTake(nodeMutex, pdMS_TO_TICKS(100)) == pdTRUE) {
   node->onLayout();
   xSemaphoreGive(nodeMutex);
+} else {
+  EXT_LOGW(ML_TAG, "Failed to acquire nodeMutex for onLayout");
+}
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1fc239a and b9cea1b.

📒 Files selected for processing (5)
  • docs/gettingstarted/installer.md
  • docs/moonlight/effects.md
  • src/MoonBase/NodeManager.h
  • src/MoonLight/Layers/PhysicalLayer.cpp
  • src/MoonLight/Layers/PhysicalLayer.h
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 84
File: src/MoonLight/Modules/ModuleLightsControl.h:367-379
Timestamp: 2025-12-28T10:28:24.157Z
Learning: In MoonLight PR #84: wrapping effectTask's layerP.loop() with monitorMutex while reading channelsE for monitor emission is thread-safe and prevents partial frames, but introduces serialization between effects rendering and monitor emission. Reading from channelsD (driver/front buffer) instead—with monitorMutex held only during the brief pointer swap—is superior because it maintains full parallelism (effects ║ monitor ║ driver), shows actual LED output, and minimizes mutex hold time.
📚 Learning: 2025-12-28T10:28:24.157Z
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 84
File: src/MoonLight/Modules/ModuleLightsControl.h:367-379
Timestamp: 2025-12-28T10:28:24.157Z
Learning: In MoonLight PR #84: wrapping effectTask's layerP.loop() with monitorMutex while reading channelsE for monitor emission is thread-safe and prevents partial frames, but introduces serialization between effects rendering and monitor emission. Reading from channelsD (driver/front buffer) instead—with monitorMutex held only during the brief pointer swap—is superior because it maintains full parallelism (effects ║ monitor ║ driver), shows actual LED output, and minimizes mutex hold time.

Applied to files:

  • src/MoonLight/Layers/PhysicalLayer.cpp
📚 Learning: 2026-01-06T15:24:17.118Z
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 93
File: src/MoonLight/Layers/VirtualLayer.cpp:164-171
Timestamp: 2026-01-06T15:24:17.118Z
Learning: In VirtualLayer::setLight and VirtualLayer::getLight (src/MoonLight/Layers/VirtualLayer.cpp), bounds checking in the unmapped path (when indexV >= mappingTableSize or oneToOneMapping is true) is intentionally disabled for performance when driving very large LED counts (100K+). Effect developers are responsible for ensuring their code stays within bounds. This is an explicit design decision prioritizing performance over defensive checks in the hot path.

Applied to files:

  • src/MoonLight/Layers/PhysicalLayer.cpp
📚 Learning: 2026-01-06T15:24:29.770Z
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 93
File: src/MoonLight/Layers/VirtualLayer.cpp:212-221
Timestamp: 2026-01-06T15:24:29.770Z
Learning: In VirtualLayer::getLight (src/MoonLight/Layers/VirtualLayer.cpp), bounds checking for unmapped lights (the else branch around lines 212-221) is intentionally disabled for performance when driving 100K+ LEDs. The design places responsibility on effect developers to stay within maxChannels. This is an explicit performance vs safety tradeoff for extreme LED counts.

Applied to files:

  • src/MoonLight/Layers/PhysicalLayer.cpp
📚 Learning: 2026-01-06T13:45:03.067Z
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-06T13:45:03.067Z
Learning: For MoonLight on ESP32 at extreme loads (e.g., 49K channels @ 100 fps): Task watchdog can fire when AppDrivers/AppEffects process many nodes without yielding. Even with vTaskDelay(1) at loop end, the node iteration for-loops (PhysicalLayer::loopDrivers line 108-111, PhysicalLayer::loop line 73-75) can take >5 seconds without yielding, starving IDLE task (priority 0). Solution: add taskYIELD() or vTaskDelay(1) inside the node/layer iteration loops to give IDLE task chances to run and reset watchdog. Lowering task priority won't help since IDLE is already at 0 (lowest).

Applied to files:

  • src/MoonLight/Layers/PhysicalLayer.cpp
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build
🔇 Additional comments (3)
docs/gettingstarted/installer.md (2)

27-27: Terminology clarification improves user guidance.

Changing "Restart" to "Reset Device" is more explicit and accurately describes the action users should perform in the installer context. This improves clarity for users following the setup instructions.


144-144: Proper name capitalization corrected.

Updating "Visual Studio code" to "Visual Studio Code" corrects the capitalization of the proper product name.

docs/moonlight/effects.md (1)

60-60: LGTM!

Asset path capitalization corrected for consistency.


Coord3D prevSize; // to calculate size change

SemaphoreHandle_t nodeMutex = xSemaphoreCreateMutex();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate mutex creation and add lifecycle management.

xSemaphoreCreateMutex() can return NULL if memory allocation fails. Without validation, subsequent xSemaphoreTake/xSemaphoreGive calls on a NULL handle will fail silently or cause undefined behavior.

Additionally, the mutex should be deleted in the destructor to prevent resource leaks.

🔧 Recommended fixes
  1. Add validation after creation (in the constructor or setup):
// In PhysicalLayer::PhysicalLayer() or PhysicalLayer::setup()
if (nodeMutex == NULL) {
  EXT_LOGE(ML_TAG, "Failed to create nodeMutex");
  // Handle error appropriately
}
  1. Add cleanup in destructor:
// In PhysicalLayer.h, add destructor declaration:
~PhysicalLayer();

// In PhysicalLayer.cpp, implement:
PhysicalLayer::~PhysicalLayer() {
  if (nodeMutex != NULL) {
    vSemaphoreDelete(nodeMutex);
    nodeMutex = NULL;
  }
}
🤖 Prompt for AI Agents
In @src/MoonLight/Layers/PhysicalLayer.h at line 124, The code creates nodeMutex
via xSemaphoreCreateMutex() without checking for NULL and never deletes it;
validate the return of xSemaphoreCreateMutex() (e.g., after the call to
xSemaphoreCreateMutex() check if nodeMutex == NULL and log/handle the error) and
add lifecycle cleanup by declaring a PhysicalLayer destructor (~PhysicalLayer())
and implementing it to vSemaphoreDelete(nodeMutex) (and clear nodeMutex) so
future xSemaphoreTake/xSemaphoreGive calls won't operate on a NULL/leaked
handle.

Docs
====
- Architecture: add update and loop in Task Interaction Flow

Backend
=======
- Node manager: node onUpdate within nodeMutex
- Nodes: nodeMutex per node (not one global), constructor and destructor checks
- Physical layer: nodeMutex on onSizeChanged and loop and onLayout
- Virtual layer: nodeMutex on onSizeChanged and loop
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In @docs/develop/architecture.md:
- Line 13: The timing listed for ESP32SvelteKit is inconsistent: the task table
row showing "10ms" should match the detailed task description which states "Runs
every: 20ms"; update the table entry for the ESP32SvelteKit row (the table cell
containing "10ms") to "20ms" so both the main table and the detailed description
align.

In @src/MoonBase/Nodes.h:
- Around line 90-93: nodeMutex can be null after creation and callers
unconditionally call xSemaphoreTake(node->nodeMutex,...), which will crash;
change the node creation path in Nodes.h so semaphore creation failure is
propagated (fail-fast) by returning an error/false or nullptr from the node
creation/initialization routine (or throw) instead of only logging, and update
all call sites in NodeManager.h, VirtualLayer.cpp and PhysicalLayer.cpp to
handle a failed creation (check returned pointer/boolean and abort
initialization) or guard every xSemaphoreTake with a null check for
node->nodeMutex until callers are fixed.
🧹 Nitpick comments (2)
docs/develop/architecture.md (1)

72-94: Excellent documentation of task interactions and mutex synchronization.

The new task narrative sections clearly document the synchronization strategy using nodeMutex to protect node updates, size changes, and loops across HTTPP, SvelteKit, Driver, and Effect tasks. This aligns well with the PR objective of preventing crashes during frequent control updates.

Please address the following minor grammar/spelling issues:

📝 Suggested grammar and spelling fixes

Line 77: Hyphenate compound adjective:

-* MoonLight Modules: runs Modules::compareRecursive and Modules::checkReOrderSwap which calls postUpdate which presents a semaphore guarded updatedItem (updateProcessedSem, updateReadySem)
+* MoonLight Modules: runs Modules::compareRecursive and Modules::checkReOrderSwap which calls postUpdate which presents a semaphore-guarded updatedItem (updateProcessedSem, updateReadySem)

Line 83: Correct spelling:

-* processUpdatedItem() calls Module::onUpdate(), which is a virtual function which is overriden by Modules to implement custom functionality 
+* processUpdatedItem() calls Module::onUpdate(), which is a virtual function which is overridden by Modules to implement custom functionality

Line 82: Fix typo:

-* Module::loop() runs in the SvelteKit task and calls getUpdate() to retrieve the updatedItam in a synchronized way, getUpdate() calls processUpdatedItem()
+* Module::loop() runs in the SvelteKit task and calls getUpdate() to retrieve the updatedItem in a synchronized way, getUpdate() calls processUpdatedItem()

Based on static analysis hints and code review.

src/MoonBase/NodeManager.h (1)

273-279: Mutex usage is correct, but consider a null-check for defensive coding.

The synchronization pattern correctly protects updateControl and onUpdate from concurrent loop() execution. requestMappings() is appropriately placed outside the critical section since it only sets atomic flags.

However, if mutex creation fails (as checked in Node constructor), this code will crash on xSemaphoreTake(nullptr, ...).

🛡️ Optional: Add defensive null-check
         if (nodeClass != nullptr) {
-
-            xSemaphoreTake(nodeClass->nodeMutex, portMAX_DELAY);
+            if (nodeClass->nodeMutex) xSemaphoreTake(nodeClass->nodeMutex, portMAX_DELAY);
             nodeClass->updateControl(control);
             nodeClass->onUpdate(updatedItem.oldValue, control);  // custom onUpdate for the node
-            xSemaphoreGive(nodeClass->nodeMutex);
+            if (nodeClass->nodeMutex) xSemaphoreGive(nodeClass->nodeMutex);

             nodeClass->requestMappings();
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b9cea1b and 35e87d3.

📒 Files selected for processing (7)
  • docs/develop/architecture.md
  • src/MoonBase/NodeManager.h
  • src/MoonBase/Nodes.h
  • src/MoonLight/Layers/PhysicalLayer.cpp
  • src/MoonLight/Layers/VirtualLayer.cpp
  • src/MoonLight/Modules/ModuleDrivers.h
  • src/MoonLight/Modules/ModuleEffects.h
✅ Files skipped from review due to trivial changes (1)
  • src/MoonLight/Modules/ModuleEffects.h
🧰 Additional context used
🧠 Learnings (12)
📓 Common learnings
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 84
File: src/MoonLight/Modules/ModuleLightsControl.h:367-379
Timestamp: 2025-12-28T10:28:24.157Z
Learning: In MoonLight PR #84: wrapping effectTask's layerP.loop() with monitorMutex while reading channelsE for monitor emission is thread-safe and prevents partial frames, but introduces serialization between effects rendering and monitor emission. Reading from channelsD (driver/front buffer) instead—with monitorMutex held only during the brief pointer swap—is superior because it maintains full parallelism (effects ║ monitor ║ driver), shows actual LED output, and minimizes mutex hold time.
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-06T13:45:03.067Z
Learning: For MoonLight on ESP32 at extreme loads (e.g., 49K channels @ 100 fps): Task watchdog can fire when AppDrivers/AppEffects process many nodes without yielding. Even with vTaskDelay(1) at loop end, the node iteration for-loops (PhysicalLayer::loopDrivers line 108-111, PhysicalLayer::loop line 73-75) can take >5 seconds without yielding, starving IDLE task (priority 0). Solution: add taskYIELD() or vTaskDelay(1) inside the node/layer iteration loops to give IDLE task chances to run and reset watchdog. Lowering task priority won't help since IDLE is already at 0 (lowest).
📚 Learning: 2025-12-28T10:28:24.157Z
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 84
File: src/MoonLight/Modules/ModuleLightsControl.h:367-379
Timestamp: 2025-12-28T10:28:24.157Z
Learning: In MoonLight PR #84: wrapping effectTask's layerP.loop() with monitorMutex while reading channelsE for monitor emission is thread-safe and prevents partial frames, but introduces serialization between effects rendering and monitor emission. Reading from channelsD (driver/front buffer) instead—with monitorMutex held only during the brief pointer swap—is superior because it maintains full parallelism (effects ║ monitor ║ driver), shows actual LED output, and minimizes mutex hold time.

Applied to files:

  • src/MoonLight/Layers/VirtualLayer.cpp
  • src/MoonLight/Layers/PhysicalLayer.cpp
  • docs/develop/architecture.md
📚 Learning: 2026-01-06T15:24:11.096Z
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 93
File: src/MoonLight/Layers/VirtualLayer.cpp:164-171
Timestamp: 2026-01-06T15:24:11.096Z
Learning: In VirtualLayer.cpp (setLight and getLight), the unmapped path intentionally disables bounds checking when indexV >= mappingTableSize or oneToOneMapping is true to optimize performance for very large LED counts (100K+). This is a deliberate design decision; only rely on this behavior if you can guarantee calls stay within bounds. Document this contract and avoid adding defensive checks in the hot path. Use explicit bounds checks in all other code paths or when the input cannot be guaranteed to be in-range.

Applied to files:

  • src/MoonLight/Layers/VirtualLayer.cpp
📚 Learning: 2026-01-06T15:24:29.770Z
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 93
File: src/MoonLight/Layers/VirtualLayer.cpp:212-221
Timestamp: 2026-01-06T15:24:29.770Z
Learning: In VirtualLayer::getLight (src/MoonLight/Layers/VirtualLayer.cpp), bounds checking for unmapped lights (the else branch around lines 212-221) is intentionally disabled for performance when driving 100K+ LEDs. The design places responsibility on effect developers to stay within maxChannels. This is an explicit performance vs safety tradeoff for extreme LED counts.

Applied to files:

  • src/MoonLight/Layers/VirtualLayer.cpp
  • src/MoonLight/Layers/PhysicalLayer.cpp
📚 Learning: 2026-01-06T15:24:17.118Z
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 93
File: src/MoonLight/Layers/VirtualLayer.cpp:164-171
Timestamp: 2026-01-06T15:24:17.118Z
Learning: In VirtualLayer::setLight and VirtualLayer::getLight (src/MoonLight/Layers/VirtualLayer.cpp), bounds checking in the unmapped path (when indexV >= mappingTableSize or oneToOneMapping is true) is intentionally disabled for performance when driving very large LED counts (100K+). Effect developers are responsible for ensuring their code stays within bounds. This is an explicit design decision prioritizing performance over defensive checks in the hot path.

Applied to files:

  • src/MoonLight/Modules/ModuleDrivers.h
  • src/MoonLight/Layers/PhysicalLayer.cpp
📚 Learning: 2026-01-06T13:45:03.067Z
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-06T13:45:03.067Z
Learning: For MoonLight on ESP32 at extreme loads (e.g., 49K channels @ 100 fps): Task watchdog can fire when AppDrivers/AppEffects process many nodes without yielding. Even with vTaskDelay(1) at loop end, the node iteration for-loops (PhysicalLayer::loopDrivers line 108-111, PhysicalLayer::loop line 73-75) can take >5 seconds without yielding, starving IDLE task (priority 0). Solution: add taskYIELD() or vTaskDelay(1) inside the node/layer iteration loops to give IDLE task chances to run and reset watchdog. Lowering task priority won't help since IDLE is already at 0 (lowest).

Applied to files:

  • src/MoonLight/Layers/PhysicalLayer.cpp
  • docs/develop/architecture.md
📚 Learning: 2025-12-25T13:31:32.367Z
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2025-12-25T13:31:32.367Z
Learning: For ESP32 LED drivers using DMA, I2S, LCD, or PARLIO: always pin driver tasks to Core 1 (APP_CPU), not Core 0. WiFi/BT runs at priority ~23 on Core 0 and would constantly preempt lower-priority driver tasks, breaking LED timing requirements. ESP-IDF explicitly recommends display/DMA processing on Core 1 to avoid WiFi interference.

Applied to files:

  • docs/develop/architecture.md
📚 Learning: 2025-12-25T13:34:40.097Z
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2025-12-25T13:34:40.097Z
Learning: For the MoonLight LED system: Effects task can run on Core 0 (PRO_CPU) since it only performs computation (calculating colors, writing to channels array) without timing-critical hardware interaction. WiFi preemption is acceptable for pure computation tasks. Only the driver task (using DMA, I2S, LCD, PARLIO) requires Core 1 (APP_CPU) for uninterrupted timing-critical hardware operations.

Applied to files:

  • docs/develop/architecture.md
📚 Learning: 2026-01-04T09:31:07.533Z
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 93
File: lib/framework/ESP32SvelteKit.cpp:229-229
Timestamp: 2026-01-04T09:31:07.533Z
Learning: In lib/framework/ESP32SvelteKit.cpp, the SvelteKit loop task stack size of 6KB (6 * 1024 bytes) for non-PSRAM devices has been validated through testing, showing approximately 1500 bytes of free stack remaining under load, providing adequate ~24% headroom.

Applied to files:

  • docs/develop/architecture.md
📚 Learning: 2025-11-29T16:17:30.619Z
Learnt from: CR
Repo: MoonModules/MoonLight PR: 0
File: misc/parking/GEMINI.md:0-0
Timestamp: 2025-11-29T16:17:30.619Z
Learning: Applies to misc/parking/src/**/*.{h,cpp} : Modify C++ backend code in the `src/` and `lib/` directories for ESP32 development

Applied to files:

  • docs/develop/architecture.md
📚 Learning: 2026-01-08T10:20:09.363Z
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-08T10:20:09.363Z
Learning: In MoonLight's Idle Watchdog approach: `vTaskDelay(1)` is preferred over `taskYIELD()` because taskYIELD() only yields to equal or higher priority tasks. Since Effect Task and Driver Task run at priority 3 while the Idle task runs at priority 0, only vTaskDelay(1) actually blocks the calling task and allows the Idle task to run and reset the watchdog timer.

Applied to files:

  • docs/develop/architecture.md
📚 Learning: 2026-01-08T11:39:52.862Z
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-08T11:39:52.862Z
Learning: In MoonLight WebSocket monitor: When browser tabs are hidden, browsers throttle JavaScript (including WebSocket onmessage handlers) but keep TCP connections open. Monitor data (48KB frames every 20-300ms) continues arriving at TCP level but isn't processed. This fills ESP32's httpd_ws_send_frame_async() buffers, causing ESP_FAIL. When tab becomes visible, hundreds of queued frames overwhelm the JavaScript main thread, freezing the UI. Solution: use Page Visibility API (client_visibility event) to stop emitting monitor data to hidden clients, preventing buffer overflow and backlog buildup.

Applied to files:

  • docs/develop/architecture.md
🪛 LanguageTool
docs/develop/architecture.md

[grammar] ~77-~77: Use a hyphen to join words.
Context: ...ls postUpdate which presents a semaphore guarded updatedItem (updateProcessedSem,...

(QB_NEW_EN_HYPHEN)


[grammar] ~83-~83: Ensure spelling is correct
Context: ...), which is a virtual function which is overriden by Modules to implement custom function...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build
🔇 Additional comments (10)
docs/develop/architecture.md (2)

35-35: Verify UI responsiveness impact of timing change.

The ESP32SvelteKit task cadence changed from 10ms to 20ms (doubled period). While this may reduce CPU load and potential mutex contention, please verify that UI responsiveness remains acceptable, especially for:

  • WebSocket message latency
  • Real-time control updates
  • Monitor data refresh rates

Based on the architecture described, UI updates will now occur at half the previous frequency. If this was an intentional optimization related to the mutex synchronization work, consider documenting the rationale in the PR description.


308-310: LGTM: Correct watchdog integration pattern.

The code example properly demonstrates the watchdog pattern with esp_task_wdt_reset() at the start of the loop iteration, followed by loop() execution and vTaskDelay(1) yield. This ensures the Idle task (priority 0) gets CPU time to reset the watchdog timer, which is critical for preventing Task Watchdog Timer crashes under heavy loads.

Based on learnings: vTaskDelay(1) is essential (not taskYIELD()) because it blocks the calling task and allows lower-priority Idle task to run.

src/MoonLight/Modules/ModuleDrivers.h (1)

168-174: LGTM - Comment clarifies concurrency boundary.

The added comment correctly documents that after this point, the node runs concurrently in the drivers task. The else branch assignment ensures proper node replacement at existing indices.

src/MoonLight/Layers/VirtualLayer.cpp (2)

62-71: Mutex synchronization looks correct for loop() and onSizeChanged.

The mutex acquisition/release pattern properly protects per-node operations, and addYield(10) is correctly placed outside the critical section.


77-81: No synchronization issue: loop20ms() is a placeholder method with no actual implementations in any derived classes. The base Node class declares it as an empty virtual method, and no subclasses override it. Since loop20ms() executes no code, there is no risk of race conditions or state modification. The pattern inconsistency (no mutex in loop20ms() vs. mutex in loop()) is immaterial because loop20ms() does nothing. Additionally, the comment references onUpdate() which does not exist in the codebase.

Likely an incorrect or invalid review comment.

src/MoonLight/Layers/PhysicalLayer.cpp (3)

74-76: Minor formatting improvement.

Adding explicit braces improves readability without changing behavior.


110-122: Mutex usage in loopDrivers is consistent with VirtualLayer.

The synchronization pattern correctly protects onSizeChanged and loop operations. addYield(10) remains outside the critical section, which is important for watchdog feeding per your previous learnings about extreme loads (49K channels @ 100 fps).


129-137: Mutex around onLayout is correct.

Layout operations are now thread-safe with respect to control updates and loop execution.

src/MoonBase/Nodes.h (2)

78-78: Mutex initialization at member declaration is correct for ESP32/FreeRTOS.

Creating the mutex inline ensures it's available immediately when the Node object is allocated.


95-101: Destructor cleanup is correct.

The null-check before vSemaphoreDelete and the subsequent null assignment prevent double-free and use-after-free issues.

Comment on lines +90 to 93
if (nodeMutex == nullptr) {
EXT_LOGE(ML_TAG, "Failed to create nodeMutex");
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Error logging but continued operation may lead to crashes.

When nodeMutex creation fails, the code logs an error but allows the node to continue. However, all call sites (NodeManager.h, VirtualLayer.cpp, PhysicalLayer.cpp) unconditionally call xSemaphoreTake(node->nodeMutex, ...), which will crash if nodeMutex is null.

Consider either:

  1. Propagating the failure (return false or throw), or
  2. Adding null checks at all call sites (as suggested for NodeManager.h)
🔧 Option: Fail fast by returning early or setting a flag
   virtual void constructor(VirtualLayer* layer, const JsonArray& controls) {
     this->layer = layer;
     this->controls = controls;
     if (nodeMutex == nullptr) {
       EXT_LOGE(ML_TAG, "Failed to create nodeMutex");
+      // Consider: mark node as invalid so it won't be used
+      // on = false;  // prevent loop execution
     }
   }

Alternatively, ensure all mutex call sites have null guards (more defensive but adds overhead).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (nodeMutex == nullptr) {
EXT_LOGE(ML_TAG, "Failed to create nodeMutex");
}
}
if (nodeMutex == nullptr) {
EXT_LOGE(ML_TAG, "Failed to create nodeMutex");
// Consider: mark node as invalid so it won't be used
// on = false; // prevent loop execution
}
}
🤖 Prompt for AI Agents
In @src/MoonBase/Nodes.h around lines 90 - 93, nodeMutex can be null after
creation and callers unconditionally call xSemaphoreTake(node->nodeMutex,...),
which will crash; change the node creation path in Nodes.h so semaphore creation
failure is propagated (fail-fast) by returning an error/false or nullptr from
the node creation/initialization routine (or throw) instead of only logging, and
update all call sites in NodeManager.h, VirtualLayer.cpp and PhysicalLayer.cpp
to handle a failed creation (check returned pointer/boolean and abort
initialization) or guard every xSemaphoreTake with a null check for
node->nodeMutex until callers are fixed.

@ewowi
Copy link
Collaborator Author

ewowi commented Jan 12, 2026

@coderabbitai, while making the system more robust by adding mutexes and semaphores, updating the global brightness is very slow (triggering [ 61188][I][StatefulService.h:202] beginTransaction(): [🐼] _accessMutex lightscontrol wait 100ms) and updating node controls is quite slow (triggering [395992][D][Nodes.cpp:22] updateControl(): [💫] ESP32 SvelteKit (3720) star fill = 200 t:slider p:0x3c28eb2e).

Can you make an analysis why this is and how we can speed up things?

@coderabbitai
Copy link

coderabbitai bot commented Jan 12, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Read StatefulService.h to understand _accessMutex and beginTransaction
cat src/MoonBase/StatefulService.h

Length of output: 62


🏁 Script executed:

#!/bin/bash
# Read ModuleLightsControl to understand brightness updates
cat src/MoonLight/Modules/ModuleLightsControl.h

Length of output: 23318


🏁 Script executed:

#!/bin/bash
# Read Nodes.cpp to see updateControl implementation and timing
cat src/MoonBase/Nodes.cpp

Length of output: 24379


🏁 Script executed:

#!/bin/bash
# Search for where nodeMutex is acquired to understand contention points
rg -n "xSemaphoreTake.*nodeMutex" -A 5 -B 2

Length of output: 2978


🏁 Script executed:

#!/bin/bash
# Look for any nested mutex scenarios - where _accessMutex might be held while nodeMutex is acquired
rg -n "beginTransaction|_accessMutex" -A 10 -B 2 src/MoonLight/Modules/ModuleLightsControl.h

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Find StatefulService.h in the correct location
fd -t f "StatefulService.h" --exec cat {}

Length of output: 6966


🏁 Script executed:

#!/bin/bash
# Look at the architecture to understand task priorities and core assignments
cat docs/develop/architecture.md | head -200

Length of output: 7500


🏁 Script executed:

#!/bin/bash
# Check how loop() is called and what it might be doing - look at effect nodes
rg -n "class.*Node.*:" -A 30 src/MoonLight/Modules/ModuleEffects.h | head -100

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Look for how long operations might be happening in loops
rg -n "void.*loop\(\)" -A 20 src/MoonLight/Modules/ModuleEffects.h | head -150

Length of output: 114


🏁 Script executed:

#!/bin/bash
# Check the task structure and timing
rg -n "effectTask|driverTask|loopDrivers" -B 3 -A 10 src/MoonLight/Layers/PhysicalLayer.cpp

Length of output: 273


@ewowi, I've analyzed the performance issue and identified the root causes. The slowness is due to mutex contention where control updates from the UI thread must wait for Effect/Driver tasks that hold nodeMutex during long-running loop() operations.

Root Causes

1. nodeMutex held during loop() execution ⚠️

Using portMAX_DELAY means tasks wait indefinitely to acquire locks, and your code holds nodeMutex while calling node->loop():

PhysicalLayer.cpp & VirtualLayer.cpp:

xSemaphoreTake(node->nodeMutex, portMAX_DELAY);  // Blocks indefinitely
node->loop();  // Can take 5-15ms per node!
xSemaphoreGive(node->nodeMutex);

Impact: If you have 10 nodes taking 10ms each, a control update arriving mid-iteration must wait ~100ms for all nodes to complete their loops.

2. Nested mutex scenario with brightness updates

When brightness is updated through ModuleLightsControl:

  1. SvelteKit task acquires _accessMutex (100ms timeout)
  2. Calls onUpdate() which modifies layerP.lights.header.brightness
  3. Meanwhile, if Effect/Driver tasks are processing nodes, they hold nodeMutex
  4. If brightness update needs to propagate to nodes, deadlock/timeout occurs

3. Priority inversion

  • SvelteKit (Priority 2) waiting on Effect/Driver tasks (Priority 3)
  • FreeRTOS priority inheritance temporarily boosts low-priority task priority when high-priority task waits, but this adds overhead

Solutions

Immediate Fix: Minimize Critical Sections 🎯

Only hold nodeMutex when accessing control data, NOT during computation:

In PhysicalLayer.cpp & VirtualLayer.cpp:

// ❌ Current (bad):
xSemaphoreTake(node->nodeMutex, portMAX_DELAY);
node->loop();  // Computation while holding lock!
xSemaphoreGive(node->nodeMutex);

// ✅ Better:
if (node->on) {
  node->loop();  // Run without lock - nodes manage their own data consistency
  addYield(10);
}

In NodeManager.h - Add timeout:

// ❌ Current:
xSemaphoreTake(nodeClass->nodeMutex, portMAX_DELAY);

// ✅ Better - fail fast:
if (xSemaphoreTake(nodeClass->nodeMutex, pdMS_TO_TICKS(50)) == pdTRUE) {
  nodeClass->updateControl(control);
  nodeClass->onUpdate(updatedItem.oldValue, control);
  xSemaphoreGive(nodeClass->nodeMutex);
  nodeClass->requestMappings();
} else {
  EXT_LOGW(ML_TAG, "⏱️ Control update timeout for %s", updatedItem.name.c_str());
}

Better Approach: Double-Buffer Control Values 🔄

Apply control updates at the START of loop(), not during UI updates:

Add to Node class:

struct NodeControls {
  std::atomic<bool> needsUpdate{false};
  JsonDocument pendingValues;  // Updated from UI thread
  SemaphoreHandle_t valueMutex;
};

UI thread (NodeManager):

// Just queue the update, don't block
xSemaphoreTake(nodeClass->controls.valueMutex, pdMS_TO_TICKS(10));
nodeClass->controls.pendingValues = control;
nodeClass->controls.needsUpdate = true;
xSemaphoreGive(nodeClass->controls.valueMutex);

Effect/Driver loop start:

void Node::loop() {
  // Apply pending control updates at loop start
  if (controls.needsUpdate.exchange(false)) {
    xSemaphoreTake(controls.valueMutex, portMAX_DELAY);
    applyControlUpdates(controls.pendingValues);
    xSemaphoreGive(controls.valueMutex);
  }
  
  // Now do the actual effect computation...
}

Optimal: Lock-Free Control Queue 🚀

For high-performance systems, use a lock-free ring buffer:

#include <atomic>

template<typename T, size_t SIZE>
class LockFreeQueue {
  std::array<T, SIZE> buffer;
  std::atomic<size_t> head{0}, tail{0};
  
public:
  bool push(const T& item) {
    size_t t = tail.load(std::memory_order_relaxed);
    size_t next = (t + 1) % SIZE;
    if (next == head.load(std::memory_order_acquire)) return false;
    buffer[t] = item;
    tail.store(next, std::memory_order_release);
    return true;
  }
  
  bool pop(T& item) {
    size_t h = head.load(std::memory_order_relaxed);
    if (h == tail.load(std::memory_order_acquire)) return false;
    item = buffer[h];
    head.store((h + 1) % SIZE, std::memory_order_release);
    return true;
  }
};

Recommendations

  1. Quick win: Remove nodeMutex from around loop() calls in PhysicalLayer/VirtualLayer - set realistic timeouts instead of always using portMAX_DELAY
  2. Short-term: Add 50ms timeout to NodeManager control updates
  3. Medium-term: Move control application to node loop start (deferred update pattern)
  4. Long-term: Consider lock-free queues for control updates at extreme loads

The key insight: Mutexes should protect data, not computation. Your current design holds locks during the entire loop() execution, which serializes everything.

Would you like me to generate a complete patch implementing the immediate fix?


🧠 Learnings used
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-06T13:45:03.067Z
Learning: For MoonLight on ESP32 at extreme loads (e.g., 49K channels @ 100 fps): Task watchdog can fire when AppDrivers/AppEffects process many nodes without yielding. Even with vTaskDelay(1) at loop end, the node iteration for-loops (PhysicalLayer::loopDrivers line 108-111, PhysicalLayer::loop line 73-75) can take >5 seconds without yielding, starving IDLE task (priority 0). Solution: add taskYIELD() or vTaskDelay(1) inside the node/layer iteration loops to give IDLE task chances to run and reset watchdog. Lowering task priority won't help since IDLE is already at 0 (lowest).

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-08T11:39:52.862Z
Learning: In MoonLight WebSocket monitor: When browser tabs are hidden, browsers throttle JavaScript (including WebSocket onmessage handlers) but keep TCP connections open. Monitor data (48KB frames every 20-300ms) continues arriving at TCP level but isn't processed. This fills ESP32's httpd_ws_send_frame_async() buffers, causing ESP_FAIL. When tab becomes visible, hundreds of queued frames overwhelm the JavaScript main thread, freezing the UI. Solution: use Page Visibility API (client_visibility event) to stop emitting monitor data to hidden clients, preventing buffer overflow and backlog buildup.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 84
File: src/MoonLight/Modules/ModuleLightsControl.h:367-379
Timestamp: 2025-12-28T10:28:24.157Z
Learning: In MoonLight PR #84: wrapping effectTask's layerP.loop() with monitorMutex while reading channelsE for monitor emission is thread-safe and prevents partial frames, but introduces serialization between effects rendering and monitor emission. Reading from channelsD (driver/front buffer) instead—with monitorMutex held only during the brief pointer swap—is superior because it maintains full parallelism (effects ║ monitor ║ driver), shows actual LED output, and minimizes mutex hold time.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2025-12-25T13:34:40.097Z
Learning: For the MoonLight LED system: Effects task can run on Core 0 (PRO_CPU) since it only performs computation (calculating colors, writing to channels array) without timing-critical hardware interaction. WiFi preemption is acceptable for pure computation tasks. Only the driver task (using DMA, I2S, LCD, PARLIO) requires Core 1 (APP_CPU) for uninterrupted timing-critical hardware operations.

@ewowi
Copy link
Collaborator Author

ewowi commented Jan 12, 2026

@coderabbitai, regarding

Impact: If you have 10 nodes taking 10ms each, a control update arriving mid-iteration must wait ~100ms for all nodes to complete their loops.

Is that true? As each node has its own mutex it only waits for one frame (loop) which is about 20ms , not more isn't it? And as we don't know what the loop is doing (lots of users will make effects) and the loop will use allocated variables in onUpdate and onSizeChanged.

So long story short: it needs a lock on loop but that will not take long ...

@coderabbitai
Copy link

coderabbitai bot commented Jan 12, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Look at how brightness updates work and where _accessMutex contention occurs
cat src/MoonLight/Modules/ModuleLightsControl.h | grep -A 30 "void onUpdate"

Length of output: 1722


🏁 Script executed:

#!/bin/bash
# Check StatefulService beginTransaction implementation and where it might block
cat src/MoonBase/StatefulService.h | grep -A 20 "beginTransaction"

Length of output: 62


🏁 Script executed:

#!/bin/bash
# Find what other code paths might hold _accessMutex for extended periods
rg -n "_accessMutex|beginTransaction" src/MoonLight/Modules/ModuleLightsControl.h -B 2 -A 8

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Look at the effect task and see if it holds any global mutexes
rg -n "xSemaphoreTake.*accessMutex|SemaphoreTake.*lights" src/MoonLight/Modules/ModuleLightsControl.h -B 3 -A 5

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Check where monitorMutex or other global mutexes might be held for long periods
rg -n "monitorMutex|xSemaphoreTake" src/MoonLight/Modules/ModuleLightsControl.h -B 2 -A 10

Length of output: 1681


🏁 Script executed:

#!/bin/bash
# Find StatefulService.h location
fd -t f "StatefulService.h"

Length of output: 31


🏁 Script executed:

#!/bin/bash
# Look at the log message location to understand _accessMutex
rg -n "accessMutex.*wait" -B 5 -A 5

Length of output: 1047


🏁 Script executed:

#!/bin/bash
# Check if StatefulService is in ESP32SvelteKit directory
find . -name "StatefulService.h" -type f

Length of output: 33


🏁 Script executed:

#!/bin/bash
# Look for where the 100ms timeout warning comes from
rg -n "wait 100ms|_accessMutex.*wait" -B 3 -A 3

Length of output: 819


🏁 Script executed:

#!/bin/bash
# Look at full StatefulService to understand _accessMutex usage
cat lib/framework/StatefulService.h

Length of output: 6966


🏁 Script executed:

#!/bin/bash
# Find what operations call beginTransaction or hold _accessMutex for extended periods
rg -n "beginTransaction|update\(" lib/framework/StatefulService.h -B 2 -A 15

Length of output: 3388


🏁 Script executed:

#!/bin/bash
# Check if effectTask or driverTask might be calling read() which holds _accessMutex
rg -n "\.read\(|\.update\(" src/MoonLight/Modules/ModuleLightsControl.h -B 3 -A 10

Length of output: 536


🏁 Script executed:

#!/bin/bash
# Look at the effectTask to see if it holds _accessMutex during loop operations
rg -n "effectTask|layerP\.loop" src/MoonLight/Modules/ModuleLightsControl.h -B 5 -A 15

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Look for all places that call read() or update() on ModuleLightsControl state
rg -n "\.read\(|\.update\(" src/MoonLight/Modules/ModuleLightsControl.h -C 5 | head -200

Length of output: 405


🏁 Script executed:

#!/bin/bash
# Check the monitor emission code that seems to call read() frequently
cat src/MoonLight/Modules/ModuleLightsControl.h | sed -n '500,540p'

Length of output: 2223


🏁 Script executed:

#!/bin/bash
# Search for how often monitor emission happens and if it blocks brightness updates
rg -n "monitorMillis|monitorOn" src/MoonLight/Modules/ModuleLightsControl.h -B 3 -A 10

Length of output: 2727


🏁 Script executed:

#!/bin/bash
# Check if there are nested read() calls or update() calls from within onUpdate
rg -n "class ModuleLightsControl" src/MoonLight/Modules/ModuleLightsControl.h -A 50

Length of output: 2746


@ewowi, you're absolutely right—I apologize for the confusion! 🙏

My Error Corrected

You're correct: each node has its own nodeMutex, so a control update for Node A only waits for Node A's loop() to complete (~20ms), not all nodes sequentially. My analysis about 100ms cumulative wait was wrong.

Actual Root Cause of 100ms _accessMutex Wait

I found the real culprit! The monitor emission code is holding _accessMutex frequently:

ModuleLightsControl.h lines 526-537:

static unsigned long monitorMillis = 0;
if (millis() - monitorMillis >= MAX(20, layerP.lights.header.nrOfLights / 300)) {  // Every 20-40ms
  monitorMillis = millis();
  
  read(  // ← This acquires _accessMutex!
      [&](ModuleState& _state) {
        if (_socket->getActiveClients() && _state.data["monitorOn"]) {
          _socket->emitEvent("monitor", (char*)layerP.lights.channelsD, ...);
        }
      },
      _moduleName);
}

The contention scenario:

  1. Driver/Effect task calls read() for monitor emission → holds _accessMutex
  2. UI thread tries to update brightness → calls beginTransaction()waits for _accessMutex
  3. If monitor is emitting data (every 20-40ms), brightness update can hit the 100ms timeout

StatefulService.h shows the timeout:

if (xSemaphoreTakeRecursive(_accessMutex, pdMS_TO_TICKS(100))==pdFALSE) {
    ESP_LOGI("🐼", "_accessMutex %s wait 100ms", originId.c_str());

Node Control Updates

The "quite slow" node control updates are likely just the ~20ms per-node nodeMutex wait you mentioned, which is probably acceptable since:

  • Each node's loop must complete atomically with its data
  • User effects need consistent state between onUpdate() and loop()
  • 20ms is reasonable for UI responsiveness

Solutions for _accessMutex Contention

Option 1: Cache monitorOn flag

Avoid read() call when monitor is off:

// Add member variable
std::atomic<bool> cachedMonitorOn{false};

// In setup/onUpdate, cache the flag:
void onUpdate(const UpdatedItem& updatedItem) {
  if (updatedItem.name == "monitorOn") {
    cachedMonitorOn = _state.data["monitorOn"];
  }
  // ... rest of onUpdate
}

// In monitor emission:
if (_socket->getActiveClients() && cachedMonitorOn.load()) {
  // Emit directly without read() call
  _socket->emitEvent("monitor", ...);
}

Option 2: Use non-blocking trylock pattern

// Try to read, but don't block if mutex busy
if (xSemaphoreTakeRecursive(_accessMutex, 0) == pdTRUE) {
  if (_state.data["monitorOn"]) {
    _socket->emitEvent("monitor", ...);
  }
  xSemaphoreGiveRecursive(_accessMutex);
}
// If busy, skip this frame—next frame is only 20-40ms away

Option 3: Separate monitor state mutex

Move monitorOn out of the main state to avoid contention with brightness/color updates.

Summary

  • ✅ You're right: per-node mutex = ~20ms wait per node (acceptable)
  • ❌ I was wrong: no cumulative wait across nodes
  • 🎯 Real problem: monitor read() every 20-40ms blocks _accessMutex causing 100ms brightness update delays

The quickest fix is Option 1 (cache monitorOn flag) to eliminate most read() calls from the hot path. Would you like me to prepare a patch for that?


🧠 Learnings used
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-06T13:45:03.067Z
Learning: For MoonLight on ESP32 at extreme loads (e.g., 49K channels @ 100 fps): Task watchdog can fire when AppDrivers/AppEffects process many nodes without yielding. Even with vTaskDelay(1) at loop end, the node iteration for-loops (PhysicalLayer::loopDrivers line 108-111, PhysicalLayer::loop line 73-75) can take >5 seconds without yielding, starving IDLE task (priority 0). Solution: add taskYIELD() or vTaskDelay(1) inside the node/layer iteration loops to give IDLE task chances to run and reset watchdog. Lowering task priority won't help since IDLE is already at 0 (lowest).

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 84
File: src/MoonLight/Modules/ModuleLightsControl.h:367-379
Timestamp: 2025-12-28T10:28:24.157Z
Learning: In MoonLight PR #84: wrapping effectTask's layerP.loop() with monitorMutex while reading channelsE for monitor emission is thread-safe and prevents partial frames, but introduces serialization between effects rendering and monitor emission. Reading from channelsD (driver/front buffer) instead—with monitorMutex held only during the brief pointer swap—is superior because it maintains full parallelism (effects ║ monitor ║ driver), shows actual LED output, and minimizes mutex hold time.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2025-12-25T13:34:40.097Z
Learning: For the MoonLight LED system: Effects task can run on Core 0 (PRO_CPU) since it only performs computation (calculating colors, writing to channels array) without timing-critical hardware interaction. WiFi preemption is acceptable for pure computation tasks. Only the driver task (using DMA, I2S, LCD, PARLIO) requires Core 1 (APP_CPU) for uninterrupted timing-critical hardware operations.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-08T11:39:52.862Z
Learning: In MoonLight WebSocket monitor: When browser tabs are hidden, browsers throttle JavaScript (including WebSocket onmessage handlers) but keep TCP connections open. Monitor data (48KB frames every 20-300ms) continues arriving at TCP level but isn't processed. This fills ESP32's httpd_ws_send_frame_async() buffers, causing ESP_FAIL. When tab becomes visible, hundreds of queued frames overwhelm the JavaScript main thread, freezing the UI. Solution: use Page Visibility API (client_visibility event) to stop emitting monitor data to hidden clients, preventing buffer overflow and backlog buildup.

@ewowi
Copy link
Collaborator Author

ewowi commented Jan 12, 2026

@coderabbitai, eliminating read for emitEvent monitor was an excellent catch, that solved the wait on global brightness and the _accessmutex is also not waiting anymore!
Changing the node controls is still a bit slow, as when I move a slider, multiple calls to updateControl take place which I want to have fast slider response, but that means all these changes takes about 3 seconds:

[234479][D][Nodes.cpp:22] updateControl(): [💫] ESP32 SvelteKit (3720) speed = 3 t:slider p:0x3c35131d
[234609][D][Nodes.cpp:22] updateControl(): [💫] ESP32 SvelteKit (3720) speed = 4 t:slider p:0x3c35131d
[234806][D][Nodes.cpp:22] updateControl(): [💫] ESP32 SvelteKit (3720) speed = 5 t:slider p:0x3c35131d
[234952][D][Nodes.cpp:22] updateControl(): [💫] ESP32 SvelteKit (3720) speed = 6 t:slider p:0x3c35131d
[235888][D][Nodes.cpp:22] updateControl(): [💫] ESP32 SvelteKit (3720) speed = 7 t:slider p:0x3c35131d
[236052][D][Nodes.cpp:22] updateControl(): [💫] ESP32 SvelteKit (3720) speed = 8 t:slider p:0x3c35131d
[236192][D][Nodes.cpp:22] updateControl(): [💫] ESP32 SvelteKit (3720) speed = 9 t:slider p:0x3c35131d
[236327][D][Nodes.cpp:22] updateControl(): [💫] ESP32 SvelteKit (3720) speed = 13 t:slider p:0x3c35131d
[236473][D][Nodes.cpp:22] updateControl(): [💫] ESP32 SvelteKit (3720) speed = 16 t:slider p:0x3c35131d
[236615][D][Nodes.cpp:22] updateControl(): [💫] ESP32 SvelteKit (3720) speed = 17 t:slider p:0x3c35131d
[236898][D][Nodes.cpp:22] updateControl(): [💫] ESP32 SvelteKit (3720) speed = 18 t:slider p:0x3c35131d
[237027][D][Nodes.cpp:22] updateControl(): [💫] ESP32 SvelteKit (3720) speed = 19 t:slider p:0x3c35131d
[237181][D][Nodes.cpp:22] updateControl(): [💫] ESP32 SvelteKit (3720) speed = 20 t:slider p:0x3c35131d
[237312][D][Nodes.cpp:22] updateControl(): [💫] ESP32 SvelteKit (3720) speed = 21 t:slider p:0x3c35131d
[237730][D][Nodes.cpp:22] updateControl(): [💫] ESP32 SvelteKit (3720) speed = 23 t:slider p:0x3c35131d
[238026][D][Nodes.cpp:22] updateControl(): [💫] ESP32 SvelteKit (3720) speed = 24 t:slider p:0x3c35131d
[238166][D][Nodes.cpp:22] updateControl(): [💫] ESP32 SvelteKit (3720) speed = 25 t:slider p:0x3c35131d
[238334][D][Nodes.cpp:22] updateControl(): [💫] ESP32 SvelteKit (3720) speed = 26 t:slider p:0x3c35131d
[239322][D][Nodes.cpp:22] updateControl(): [💫] ESP32 SvelteKit (3720) speed = 27 t:slider p:0x3c35131d
[239445][D][Nodes.cpp:22] updateControl(): [💫] ESP32 SvelteKit (3720) speed = 28 t:slider p:0x3c35131d
[239587][D][Nodes.cpp:22] updateControl(): [💫] ESP32 SvelteKit (3720) speed = 29 t:slider p:0x3c35131d
[239732][D][Nodes.cpp:22] updateControl(): [💫] ESP32 SvelteKit (3720) speed = 30 t:slider p:0x3c35131d
[239862][D][Nodes.cpp:22] updateControl(): [💫] ESP32 SvelteKit (3720) speed = 31 t:slider p:0x3c35131d
[239994][D][Nodes.cpp:22] updateControl(): [💫] ESP32 SvelteKit (3720) speed = 31 t:slider p:0x3c35131d

Can we do that faster, keeping in mind I want to keep the slider responsiveness so the loop must keep updating while changing the slider. Also take into account that some control updates won't recreate memory objects (e.g. speed) but others do (e.g. nr of stars), for the latter some more delay is acceptable

@coderabbitai
Copy link

coderabbitai bot commented Jan 12, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Read NodeManager.h to see the control update flow
cat src/MoonBase/NodeManager.h

Length of output: 15252


🏁 Script executed:

#!/bin/bash
# Check architecture docs for task priorities
cat docs/develop/architecture.md | grep -A 30 "Task Priorities"

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Search for requestMappings implementation
rg -n "requestMappings" -A 10 -B 2

Length of output: 6693


🏁 Script executed:

#!/bin/bash
# Look at the Nodes.cpp updateControl to see timing details
cat src/MoonBase/Nodes.cpp | head -50

Length of output: 2807


🏁 Script executed:

#!/bin/bash
# Look at architecture doc for task information and priorities
cat docs/develop/architecture.md | grep -B 5 -A 10 "Priority"

Length of output: 4299


🏁 Script executed:

#!/bin/bash
# Check how often SvelteKit task runs
rg -n "SvelteKit.*task|ESP32SvelteKit.*loop|sveltekit.*20ms" docs/develop/architecture.md

Length of output: 176


🏁 Script executed:

#!/bin/bash
# Look at StatefulService to understand how control updates are processed
cat lib/framework/StatefulService.h | grep -A 30 "void update"

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Check if there's any throttling or debouncing on control updates
rg -n "throttle|debounce|delay.*control|control.*delay" src/MoonBase/

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Look for how WebSocket messages are processed and if there's queuing
rg -n "WebSocket.*message|ws.*onmessage|emitControl" -A 5 -B 2 | head -100

Length of output: 3179


🏁 Script executed:

#!/bin/bash
# Look at how WebSocket control updates are sent from the client
cat interface/src/lib/stores/socket.ts | grep -A 20 -B 5 "emit\|send"

Length of output: 2701


🏁 Script executed:

#!/bin/bash
# Check how often the SvelteKit task processes messages
rg -n "loop\(\)|processLoop" lib/framework/ -A 10 -B 2

Length of output: 10634


🏁 Script executed:

#!/bin/bash
# Look for any message queuing or processing in StatefulService
cat lib/framework/StatefulService.h | head -100

Length of output: 3135


🏁 Script executed:

#!/bin/bash
# Check if there are multiple WebSocket messages being queued
rg -n "updateControl|onUpdate" src/MoonBase/Nodes.cpp -C 10

Length of output: 2270


@ewowi, great catch! 🎯 I've identified the root cause. The slowness isn't from the nodeMutex wait itself—it's from task priority preemption.

Root Cause Analysis

Looking at the timestamps, most delays are 100-300ms (not ~20ms from a single loop), with two big gaps of ~900ms. Here's what's happening:

Architecture Context

From docs/develop/architecture.md: ESP32SvelteKit task runs on Core 1 at Priority 2, while Effect Task and Driver Task both run at Priority 3. In FreeRTOS, higher number = higher priority, so Effect/Driver tasks can preempt the SvelteKit task.

The Flow:

  1. Client floods the server: Browser sends WebSocket message for every slider position (no throttling) → 24 messages in 5.5 seconds
  2. SvelteKit task runs every 20ms: Processes incoming WebSocket messages in its loop
  3. Effect/Driver tasks preempt it: While SvelteKit is trying to process messages, higher-priority Effect/Driver tasks (Priority 3) constantly interrupt it
  4. When SvelteKit finally runs: It takes nodeMutex, calls updateControl() + onUpdate() (fast), releases mutex, calls requestMappings() (just sets flags—very fast)

The delays you're seeing are not mutex waits—they're the time SvelteKit spends waiting for CPU time because Effect/Driver tasks keep preempting it!

Solutions

🚀 Solution 1: Client-Side Throttling (Recommended)

Throttle slider updates on the client side to send at most 1 update per 50-100ms. This maintains visual responsiveness while reducing server load:

In interface/src/lib/components/SliderControl.svelte (or similar):

let throttleTimer: ReturnType<typeof setTimeout> | null = null;
let pendingValue: number | null = null;

function handleSliderChange(event: Event) {
  const value = (event.target as HTMLInputElement).value;
  
  // Update UI immediately for responsiveness
  localValue = value;
  
  // Throttle WebSocket sends
  pendingValue = value;
  
  if (!throttleTimer) {
    throttleTimer = setTimeout(() => {
      if (pendingValue !== null) {
        socket.sendEvent('update', {
          id: controlId,
          value: pendingValue
        });
        pendingValue = null;
      }
      throttleTimer = null;
    }, 50); // Send max 20 updates/sec
  }
}

Benefits:

  • Slider feels instant (UI updates immediately)
  • Loop keeps updating (receives fewer but meaningful values)
  • Reduces WebSocket traffic by ~80%
  • No server-side changes needed

🔧 Solution 2: Server-Side Update Coalescing

Process only the latest value if multiple updates arrive in quick succession:

In NodeManager.h, replace the control update section:

// Add member variable to track pending updates per control
struct PendingUpdate {
  uint8_t nodeIndex;
  uint8_t controlIndex;
  JsonObject control;
  unsigned long timestamp;
};
std::vector<PendingUpdate> pendingUpdates;

// In onUpdate(), instead of immediate processing:
else if (updatedItem.parent[1] == "controls" && updatedItem.name == "value") {
  JsonObject control = nodeState["controls"][updatedItem.index[1]];
  
  // Check if there's already a pending update for this control
  bool found = false;
  for (auto& pending : pendingUpdates) {
    if (pending.nodeIndex == updatedItem.index[0] && 
        pending.controlIndex == updatedItem.index[1]) {
      // Update existing pending update with latest value
      pending.control = control;
      pending.timestamp = millis();
      found = true;
      break;
    }
  }
  
  if (!found) {
    pendingUpdates.push_back({
      updatedItem.index[0],
      updatedItem.index[1],
      control,
      millis()
    });
  }
}

// Process pending updates (call from Module::loop every ~50ms):
void processPendingControlUpdates() {
  if (pendingUpdates.empty()) return;
  
  for (const auto& pending : pendingUpdates) {
    if (pending.nodeIndex < nodes->size()) {
      Node* nodeClass = (*nodes)[pending.nodeIndex];
      if (nodeClass) {
        xSemaphoreTake(nodeClass->nodeMutex, portMAX_DELAY);
        nodeClass->updateControl(pending.control);
        nodeClass->onUpdate("", pending.control);
        xSemaphoreGive(nodeClass->nodeMutex);
        nodeClass->requestMappings();
      }
    }
  }
  pendingUpdates.clear();
}

⏱️ Solution 3: Timeout Instead of Blocking Forever

Replace portMAX_DELAY with a timeout to fail gracefully:

if (xSemaphoreTake(nodeClass->nodeMutex, pdMS_TO_TICKS(100)) == pdTRUE) {
  nodeClass->updateControl(control);
  nodeClass->onUpdate(updatedItem.oldValue, control);
  xSemaphoreGive(nodeClass->nodeMutex);
  nodeClass->requestMappings();
} else {
  EXT_LOGD(ML_TAG, "⏱️ Skipped control update (node busy)");
  // Optionally retry later or drop the update
}

🎯 Solution 4: Differentiate Control Types

For "fast" controls like speed (no memory reallocation), process immediately. For "slow" controls like "number of stars", allow delays:

// Add control metadata
bool isHeavyUpdate = (control["name"] == "count" || control["name"] == "size"); 

if (!isHeavyUpdate) {
  // Fast path: try-lock with short timeout
  if (xSemaphoreTake(nodeClass->nodeMutex, pdMS_TO_TICKS(20)) == pdTRUE) {
    nodeClass->updateControl(control);
    nodeClass->onUpdate(updatedItem.oldValue, control);
    xSemaphoreGive(nodeClass->nodeMutex);
    nodeClass->requestMappings();
  } else {
    // Queue for later if node is busy
  }
} else {
  // Heavy update: use longer timeout (acceptable delay)
  xSemaphoreTake(nodeClass->nodeMutex, pdMS_TO_TICKS(100));
  // ... update ...
}

Recommendation

Start with Solution 1 (client-side throttling)—it's the cleanest fix that maintains responsiveness while reducing unnecessary traffic. If you still see issues, combine it with Solution 3 (timeouts) or Solution 4 (differentiate control types).

The key insight: Your per-node mutex design is correct—the issue is too many updates arriving too fast combined with task priority preemption. Throttling on the client side solves both problems elegantly.

Would you like me to prepare a patch for the client-side throttling?


🧠 Learnings used
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 84
File: src/MoonLight/Modules/ModuleLightsControl.h:367-379
Timestamp: 2025-12-28T10:28:24.157Z
Learning: In MoonLight PR #84: wrapping effectTask's layerP.loop() with monitorMutex while reading channelsE for monitor emission is thread-safe and prevents partial frames, but introduces serialization between effects rendering and monitor emission. Reading from channelsD (driver/front buffer) instead—with monitorMutex held only during the brief pointer swap—is superior because it maintains full parallelism (effects ║ monitor ║ driver), shows actual LED output, and minimizes mutex hold time.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-06T13:45:03.067Z
Learning: For MoonLight on ESP32 at extreme loads (e.g., 49K channels @ 100 fps): Task watchdog can fire when AppDrivers/AppEffects process many nodes without yielding. Even with vTaskDelay(1) at loop end, the node iteration for-loops (PhysicalLayer::loopDrivers line 108-111, PhysicalLayer::loop line 73-75) can take >5 seconds without yielding, starving IDLE task (priority 0). Solution: add taskYIELD() or vTaskDelay(1) inside the node/layer iteration loops to give IDLE task chances to run and reset watchdog. Lowering task priority won't help since IDLE is already at 0 (lowest).

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-08T11:39:52.862Z
Learning: In MoonLight WebSocket monitor: When browser tabs are hidden, browsers throttle JavaScript (including WebSocket onmessage handlers) but keep TCP connections open. Monitor data (48KB frames every 20-300ms) continues arriving at TCP level but isn't processed. This fills ESP32's httpd_ws_send_frame_async() buffers, causing ESP_FAIL. When tab becomes visible, hundreds of queued frames overwhelm the JavaScript main thread, freezing the UI. Solution: use Page Visibility API (client_visibility event) to stop emitting monitor data to hidden clients, preventing buffer overflow and backlog buildup.

@ewowi
Copy link
Collaborator Author

ewowi commented Jan 12, 2026

@coderabbitai, should I give the sveltekit task also priority 3 to give it a fair use of time, when controls are updated specifically but also for all other use cases ?

@coderabbitai
Copy link

coderabbitai bot commented Jan 12, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Read the full architecture doc section on tasks and priorities
cat docs/develop/architecture.md | grep -B 10 -A 30 "Priority\|Core\|Task"

Length of output: 11846


🏁 Script executed:

#!/bin/bash
# Search for all task creation calls to see priorities
rg -n "xTaskCreatePinnedToCore|xTaskCreate" -B 2 -A 5

Length of output: 2887


🏁 Script executed:

#!/bin/bash
# Look for any comments or documentation about why specific priorities were chosen
rg -n "priority.*2|Priority.*2|SvelteKit.*priority" -B 3 -A 3

Length of output: 2867


🏁 Script executed:

#!/bin/bash
# Check if there are other tasks in the system we should be aware of
rg -n "Task.*create|createTask" -B 2 -A 5 | head -100

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Look at the specific task creation for SvelteKit, Effect, and Driver tasks
rg -n "effectTask|driverTask|SvelteKit.*Task" -B 5 -A 10 src/MoonLight/

Length of output: 0


@ewowi, No, I recommend keeping SvelteKit at Priority 2 for the following reasons:

Why Priority 2 is Correct

1. LED Timing is More Critical Than UI Responsiveness 🎯

The current design is intentional per the architecture docs:

  • "Driver Task (priority 3) preempts SvelteKit (priority 2)"
  • "LED output never stalls for HTTP requests"

Driver Task performs timing-critical DMA operations that must maintain consistent ~60fps output. If SvelteKit were Priority 3, heavy WebSocket processing or JSON serialization could delay LED frames, causing visible glitches.

2. Priority 3 Won't Actually Fix the Control Update Speed Issue

Even with equal priority (3), FreeRTOS uses round-robin scheduling—Driver Task would still get CPU time, interleaving with SvelteKit. The delays you're seeing aren't from priority preemption alone; they're from:

  • Control update volume: 24 messages in 5.5 seconds
  • Task scheduling overhead: Context switching between tasks
  • Mutex waits: Even short nodeMutex waits add up

Raising priority would only reduce, not eliminate, the delays.

3. Current Priority Already Uses Inheritance

FreeRTOS has priority inheritance built into mutexes:

  • When SvelteKit (Priority 2) waits on a mutex held by a Priority 3 task, that task is temporarily boosted
  • This prevents priority inversion and unbounded blocking
  • You're already getting some benefit from this mechanism

What You SHOULD Do Instead

Option 1: Client-Side Throttling (Best Solution)

Throttle slider updates in the browser to send max 1 update per 50-100ms (as I detailed in my previous response). This gives you:

  • Instant visual feedback (UI updates immediately)
  • Fast server response (fewer messages to process)
  • Reduced WebSocket traffic by ~80%

Option 2: Differentiated Priority for Control Updates Only

If you want better control update responsiveness specifically, create a dedicated control update task at Priority 3:

// In main.cpp
xTaskCreatePinnedToCore(
  controlUpdateTask,
  "ControlUpdates", 
  2 * 1024,
  NULL,
  3,  // Same as Driver/Effect
  &controlUpdateTaskHandle,
  1   // Core 1
);

Then move the mutex-protected control update logic from NodeManager into this task via a lightweight message queue. This way:

  • LED timing remains unaffected (Driver Task still runs)
  • Control updates get Priority 3 treatment
  • SvelteKit remains Priority 2 for general HTTP/WebSocket work

⚠️ Option 3: Temporary Priority Boost (Advanced)

If you want to briefly boost SvelteKit during control updates only:

// In NodeManager.h, during control update
vTaskPrioritySet(NULL, 3);  // Boost to Priority 3
xSemaphoreTake(nodeClass->nodeMutex, portMAX_DELAY);
nodeClass->updateControl(control);
nodeClass->onUpdate(updatedItem.oldValue, control);
xSemaphoreGive(nodeClass->nodeMutex);
vTaskPrioritySet(NULL, 2);  // Drop back to Priority 2

This is hacky but effective for short bursts.

My Recommendation

Keep the current Priority 2 and implement client-side throttling. Your architecture is correctly prioritizing LED timing over UI responsiveness, which is the right trade-off for a lighting system. The control update delays are a symptom of too many updates, not priority misconfiguration.

If throttling isn't enough, Option 2 (dedicated control update task) is the cleanest architectural solution.

Would you like me to prepare a patch for the client-side throttling, or would you prefer to explore the dedicated control update task approach?


🧠 Learnings used
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-06T13:45:03.067Z
Learning: For MoonLight on ESP32 at extreme loads (e.g., 49K channels @ 100 fps): Task watchdog can fire when AppDrivers/AppEffects process many nodes without yielding. Even with vTaskDelay(1) at loop end, the node iteration for-loops (PhysicalLayer::loopDrivers line 108-111, PhysicalLayer::loop line 73-75) can take >5 seconds without yielding, starving IDLE task (priority 0). Solution: add taskYIELD() or vTaskDelay(1) inside the node/layer iteration loops to give IDLE task chances to run and reset watchdog. Lowering task priority won't help since IDLE is already at 0 (lowest).

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 84
File: src/MoonLight/Modules/ModuleLightsControl.h:367-379
Timestamp: 2025-12-28T10:28:24.157Z
Learning: In MoonLight PR #84: wrapping effectTask's layerP.loop() with monitorMutex while reading channelsE for monitor emission is thread-safe and prevents partial frames, but introduces serialization between effects rendering and monitor emission. Reading from channelsD (driver/front buffer) instead—with monitorMutex held only during the brief pointer swap—is superior because it maintains full parallelism (effects ║ monitor ║ driver), shows actual LED output, and minimizes mutex hold time.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2025-12-25T13:34:40.097Z
Learning: For the MoonLight LED system: Effects task can run on Core 0 (PRO_CPU) since it only performs computation (calculating colors, writing to channels array) without timing-critical hardware interaction. WiFi preemption is acceptable for pure computation tasks. Only the driver task (using DMA, I2S, LCD, PARLIO) requires Core 1 (APP_CPU) for uninterrupted timing-critical hardware operations.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-08T11:39:52.862Z
Learning: In MoonLight WebSocket monitor: When browser tabs are hidden, browsers throttle JavaScript (including WebSocket onmessage handlers) but keep TCP connections open. Monitor data (48KB frames every 20-300ms) continues arriving at TCP level but isn't processed. This fills ESP32's httpd_ws_send_frame_async() buffers, causing ESP_FAIL. When tab becomes visible, hundreds of queued frames overwhelm the JavaScript main thread, freezing the UI. Solution: use Page Visibility API (client_visibility event) to stop emitting monitor data to hidden clients, preventing buffer overflow and backlog buildup.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-08T10:20:09.363Z
Learning: In MoonLight's Idle Watchdog approach: `vTaskDelay(1)` is preferred over `taskYIELD()` because taskYIELD() only yields to equal or higher priority tasks. Since Effect Task and Driver Task run at priority 3 while the Idle task runs at priority 0, only vTaskDelay(1) actually blocks the calling task and allows the Idle task to run and reset the watchdog timer.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2025-12-25T13:31:32.367Z
Learning: For ESP32 LED drivers using DMA, I2S, LCD, or PARLIO: always pin driver tasks to Core 1 (APP_CPU), not Core 0. WiFi/BT runs at priority ~23 on Core 0 and would constantly preempt lower-priority driver tasks, breaking LED timing requirements. ESP-IDF explicitly recommends display/DMA processing on Core 1 to avoid WiFi interference.

@ewowi
Copy link
Collaborator Author

ewowi commented Jan 12, 2026

@coderabbitai, regarding slider control.

Please provide a solution which fits the current code. Ho tp FieldRederer.svelte, there is

{:else if property.type == 'slider'}
	<!-- range colors: https://daisyui.com/components/range/ 
     on:input: direct response to server
     -->
	<input
		type="range"
		min={property.min ? property.min : 0}
		max={property.max ? property.max : 255}
		{step}
		class={'flex-1 range ' +
			(disabled == false
				? property.color == 'Red'
					? 'range-error'
					: property.color == 'Green'
						? 'range-success'
						: 'range-primary'
				: 'range-secondary')}
		{disabled}
		bind:value
		on:input={onChange}
	/>
	{#if hasNumber}
		<input
			type="number"
			min={property.min ? property.min : 0}
			max={property.max ? property.max : 255}
			{step}
			class="input"
			style="height: 2rem; width: 5rem"
			{disabled}
			bind:value
			on:change={onChange}
		/>
	{/if}

so we should add Client-Side Throttling here isn't it?

@coderabbitai
Copy link

coderabbitai bot commented Jan 12, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Read FieldRenderer.svelte to understand the full component structure
cat interface/src/lib/components/FieldRenderer.svelte

Length of output: 81


🏁 Script executed:

#!/bin/bash
# Find FieldRenderer.svelte
fd -t f "FieldRenderer.svelte"

Length of output: 58


🏁 Script executed:

#!/bin/bash
# Read the correct FieldRenderer.svelte file
cat interface/src/lib/components/moonbase/FieldRenderer.svelte

Length of output: 11855


@ewowi, yes exactly! Here's the client-side throttling solution that fits perfectly into your existing FieldRenderer.svelte:

Add Throttling to FieldRenderer.svelte

Add this to the <script> section (around line 20, after the existing variables):

// Throttling for slider control updates
let throttleTimer: ReturnType<typeof setTimeout> | null = null;
let pendingSliderEvent: Event | null = null;

function handleSliderInput(event: Event) {
	// Update local value immediately for visual responsiveness
	// (bind:value already handles this)
	
	// Store the event for throttled sending
	pendingSliderEvent = event;
	
	// Only send to server once per 50ms
	if (!throttleTimer) {
		throttleTimer = setTimeout(() => {
			if (pendingSliderEvent) {
				onChange(pendingSliderEvent);
				pendingSliderEvent = null;
			}
			throttleTimer = null;
		}, 50); // Max 20 updates/second
	}
}

// Clean up throttle timer on component destroy
onDestroy(() => {
	clearInterval(interval);
	if (throttleTimer) {
		clearTimeout(throttleTimer);
		// Send final pending update if any
		if (pendingSliderEvent) {
			onChange(pendingSliderEvent);
		}
	}
});

Update the existing onDestroy block (around line 33) to include the throttle cleanup:

onDestroy(() => {
	// console.log('FieldRenderer clearing interval');
	clearInterval(interval);
	// Clean up throttle timer
	if (throttleTimer) {
		clearTimeout(throttleTimer);
		// Send final pending update if any
		if (pendingSliderEvent) {
			onChange(pendingSliderEvent);
		}
	}
});

Update the slider input event handler (line 179):

{:else if property.type == 'slider'}
	<!-- range colors: https://daisyui.com/components/range/ 
         on:input: throttled response to server for performance
         -->
	<input
		type="range"
		min={property.min ? property.min : 0}
		max={property.max ? property.max : 255}
		{step}
		class={'flex-1 range ' +
			(disabled == false
				? property.color == 'Red'
					? 'range-error'
					: property.color == 'Green'
						? 'range-success'
						: 'range-primary'
				: 'range-secondary')}
		{disabled}
		bind:value
		on:input={handleSliderInput}
	/>
	{#if hasNumber}
		<input
			type="number"
			min={property.min ? property.min : 0}
			max={property.max ? property.max : 255}
			{step}
			class="input"
			style="height: 2rem; width: 5rem"
			{disabled}
			bind:value
			on:change={onChange}
		/>
	{/if}

What This Does

  1. Instant Visual Feedback: The bind:value still updates immediately as you drag, so the slider feels responsive
  2. Throttled Server Updates: Server receives at most 1 update per 50ms (20 updates/sec) instead of every mouse movement
  3. Final Update Guaranteed: When component unmounts, any pending update is sent immediately
  4. Number Input Unchanged: The number input still uses on:change (only fires on blur/enter), which is already optimal

Expected Results

  • Before: 24 updates in 5.5 seconds = ~4.4 updates/sec (but with huge delays between them)
  • After: ~20 updates/sec maximum, processed much faster due to reduced server load
  • Slider feel: Identical to before (instant visual feedback)
  • Server load: Reduced by ~60-80%

This should make your slider controls feel much more responsive! 🎯


🧠 Learnings used
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-06T13:45:03.067Z
Learning: For MoonLight on ESP32 at extreme loads (e.g., 49K channels @ 100 fps): Task watchdog can fire when AppDrivers/AppEffects process many nodes without yielding. Even with vTaskDelay(1) at loop end, the node iteration for-loops (PhysicalLayer::loopDrivers line 108-111, PhysicalLayer::loop line 73-75) can take >5 seconds without yielding, starving IDLE task (priority 0). Solution: add taskYIELD() or vTaskDelay(1) inside the node/layer iteration loops to give IDLE task chances to run and reset watchdog. Lowering task priority won't help since IDLE is already at 0 (lowest).

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 84
File: src/MoonLight/Modules/ModuleLightsControl.h:367-379
Timestamp: 2025-12-28T10:28:24.157Z
Learning: In MoonLight PR #84: wrapping effectTask's layerP.loop() with monitorMutex while reading channelsE for monitor emission is thread-safe and prevents partial frames, but introduces serialization between effects rendering and monitor emission. Reading from channelsD (driver/front buffer) instead—with monitorMutex held only during the brief pointer swap—is superior because it maintains full parallelism (effects ║ monitor ║ driver), shows actual LED output, and minimizes mutex hold time.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2025-12-25T13:34:40.097Z
Learning: For the MoonLight LED system: Effects task can run on Core 0 (PRO_CPU) since it only performs computation (calculating colors, writing to channels array) without timing-critical hardware interaction. WiFi preemption is acceptable for pure computation tasks. Only the driver task (using DMA, I2S, LCD, PARLIO) requires Core 1 (APP_CPU) for uninterrupted timing-critical hardware operations.

Doc
====
- Installer language check

frontend
=======
- Field Renderer: throttle slider updates

backend
=======
Lights control: monitoring without read monitorOn (to avoid _accessMutex congestion)
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @docs/develop/architecture.md:
- Around line 306-311: The code sample calls esp_task_wdt_reset() before loop(),
but the documentation text says the watchdog reset occurs after loop() — fix by
making code and prose consistent: either move the esp_task_wdt_reset() call to
after loop() (update the while(true) body around loop(), esp_task_wdt_reset(),
then vTaskDelay(1)) if you want the reset to happen post-loop, or update the
surrounding documentation to state that esp_task_wdt_reset() is called before
loop() and describe the implications; refer to the loop(), esp_task_wdt_reset(),
vTaskDelay(), and the while(true) loop in the example when applying the change.
🧹 Nitpick comments (2)
interface/src/lib/components/moonbase/FieldRenderer.svelte (1)

41-58: Consider leading-edge throttle for immediate first update.

The current throttling implementation waits up to 50ms before sending the first event. For better perceived responsiveness, consider a "leading-edge" pattern that sends the first event immediately, then throttles subsequent events:

♻️ Optional: Leading-edge throttle for better UX
 function handleSliderInput(event: Event) {
-	// Update local value immediately for visual responsiveness
-	// (bind:value already handles this)
-
-	// Store the event for throttled sending
 	pendingSliderEvent = event;
 
-	// Only send to server once per 50ms
 	if (!throttleTimer) {
+		// Send immediately on first interaction
+		onChange(event);
+		pendingSliderEvent = null;
+
 		throttleTimer = setTimeout(() => {
 			if (pendingSliderEvent) {
 				onChange(pendingSliderEvent);
 				pendingSliderEvent = null;
 			}
 			throttleTimer = null;
-		}, 50); // Max 20 updates/second
+		}, 50);
 	}
 }
docs/gettingstarted/installer.md (1)

1-172: Documentation improvements look good.

The editorial changes improve clarity and consistency. The wording updates and formatting standardization are appropriate.

Note: Static analysis flagged multiple images missing alt text (lines 66, 70, 78, 92, 103, 112, 120, 124, 151, 163). While these appear to be pre-existing issues not introduced by this PR, adding descriptive alt text would improve accessibility.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 35e87d3 and 336af74.

📒 Files selected for processing (5)
  • docs/develop/architecture.md
  • docs/gettingstarted/installer.md
  • interface/src/lib/components/moonbase/FieldRenderer.svelte
  • lib/framework/WWWData.h
  • src/MoonLight/Modules/ModuleLightsControl.h
🧰 Additional context used
🧠 Learnings (10)
📓 Common learnings
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 84
File: src/MoonLight/Modules/ModuleLightsControl.h:367-379
Timestamp: 2025-12-28T10:28:24.157Z
Learning: In MoonLight PR #84: wrapping effectTask's layerP.loop() with monitorMutex while reading channelsE for monitor emission is thread-safe and prevents partial frames, but introduces serialization between effects rendering and monitor emission. Reading from channelsD (driver/front buffer) instead—with monitorMutex held only during the brief pointer swap—is superior because it maintains full parallelism (effects ║ monitor ║ driver), shows actual LED output, and minimizes mutex hold time.
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-08T11:39:52.862Z
Learning: In MoonLight WebSocket monitor: When browser tabs are hidden, browsers throttle JavaScript (including WebSocket onmessage handlers) but keep TCP connections open. Monitor data (48KB frames every 20-300ms) continues arriving at TCP level but isn't processed. This fills ESP32's httpd_ws_send_frame_async() buffers, causing ESP_FAIL. When tab becomes visible, hundreds of queued frames overwhelm the JavaScript main thread, freezing the UI. Solution: use Page Visibility API (client_visibility event) to stop emitting monitor data to hidden clients, preventing buffer overflow and backlog buildup.
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-06T13:45:03.067Z
Learning: For MoonLight on ESP32 at extreme loads (e.g., 49K channels @ 100 fps): Task watchdog can fire when AppDrivers/AppEffects process many nodes without yielding. Even with vTaskDelay(1) at loop end, the node iteration for-loops (PhysicalLayer::loopDrivers line 108-111, PhysicalLayer::loop line 73-75) can take >5 seconds without yielding, starving IDLE task (priority 0). Solution: add taskYIELD() or vTaskDelay(1) inside the node/layer iteration loops to give IDLE task chances to run and reset watchdog. Lowering task priority won't help since IDLE is already at 0 (lowest).
📚 Learning: 2026-01-06T13:45:03.067Z
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-06T13:45:03.067Z
Learning: For MoonLight on ESP32 at extreme loads (e.g., 49K channels @ 100 fps): Task watchdog can fire when AppDrivers/AppEffects process many nodes without yielding. Even with vTaskDelay(1) at loop end, the node iteration for-loops (PhysicalLayer::loopDrivers line 108-111, PhysicalLayer::loop line 73-75) can take >5 seconds without yielding, starving IDLE task (priority 0). Solution: add taskYIELD() or vTaskDelay(1) inside the node/layer iteration loops to give IDLE task chances to run and reset watchdog. Lowering task priority won't help since IDLE is already at 0 (lowest).

Applied to files:

  • docs/develop/architecture.md
📚 Learning: 2025-12-25T13:34:40.097Z
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2025-12-25T13:34:40.097Z
Learning: For the MoonLight LED system: Effects task can run on Core 0 (PRO_CPU) since it only performs computation (calculating colors, writing to channels array) without timing-critical hardware interaction. WiFi preemption is acceptable for pure computation tasks. Only the driver task (using DMA, I2S, LCD, PARLIO) requires Core 1 (APP_CPU) for uninterrupted timing-critical hardware operations.

Applied to files:

  • docs/develop/architecture.md
📚 Learning: 2025-12-25T13:31:32.367Z
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2025-12-25T13:31:32.367Z
Learning: For ESP32 LED drivers using DMA, I2S, LCD, or PARLIO: always pin driver tasks to Core 1 (APP_CPU), not Core 0. WiFi/BT runs at priority ~23 on Core 0 and would constantly preempt lower-priority driver tasks, breaking LED timing requirements. ESP-IDF explicitly recommends display/DMA processing on Core 1 to avoid WiFi interference.

Applied to files:

  • docs/develop/architecture.md
📚 Learning: 2026-01-04T09:31:07.533Z
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 93
File: lib/framework/ESP32SvelteKit.cpp:229-229
Timestamp: 2026-01-04T09:31:07.533Z
Learning: In lib/framework/ESP32SvelteKit.cpp, the SvelteKit loop task stack size of 6KB (6 * 1024 bytes) for non-PSRAM devices has been validated through testing, showing approximately 1500 bytes of free stack remaining under load, providing adequate ~24% headroom.

Applied to files:

  • docs/develop/architecture.md
📚 Learning: 2026-01-08T10:20:09.363Z
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-08T10:20:09.363Z
Learning: In MoonLight's Idle Watchdog approach: `vTaskDelay(1)` is preferred over `taskYIELD()` because taskYIELD() only yields to equal or higher priority tasks. Since Effect Task and Driver Task run at priority 3 while the Idle task runs at priority 0, only vTaskDelay(1) actually blocks the calling task and allows the Idle task to run and reset the watchdog timer.

Applied to files:

  • docs/develop/architecture.md
📚 Learning: 2025-11-29T16:17:30.619Z
Learnt from: CR
Repo: MoonModules/MoonLight PR: 0
File: misc/parking/GEMINI.md:0-0
Timestamp: 2025-11-29T16:17:30.619Z
Learning: Applies to misc/parking/src/**/*.{h,cpp} : Modify C++ backend code in the `src/` and `lib/` directories for ESP32 development

Applied to files:

  • docs/develop/architecture.md
📚 Learning: 2026-01-08T11:39:52.862Z
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-08T11:39:52.862Z
Learning: In MoonLight WebSocket monitor: When browser tabs are hidden, browsers throttle JavaScript (including WebSocket onmessage handlers) but keep TCP connections open. Monitor data (48KB frames every 20-300ms) continues arriving at TCP level but isn't processed. This fills ESP32's httpd_ws_send_frame_async() buffers, causing ESP_FAIL. When tab becomes visible, hundreds of queued frames overwhelm the JavaScript main thread, freezing the UI. Solution: use Page Visibility API (client_visibility event) to stop emitting monitor data to hidden clients, preventing buffer overflow and backlog buildup.

Applied to files:

  • docs/develop/architecture.md
📚 Learning: 2025-12-28T10:28:24.157Z
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 84
File: src/MoonLight/Modules/ModuleLightsControl.h:367-379
Timestamp: 2025-12-28T10:28:24.157Z
Learning: In MoonLight PR #84: wrapping effectTask's layerP.loop() with monitorMutex while reading channelsE for monitor emission is thread-safe and prevents partial frames, but introduces serialization between effects rendering and monitor emission. Reading from channelsD (driver/front buffer) instead—with monitorMutex held only during the brief pointer swap—is superior because it maintains full parallelism (effects ║ monitor ║ driver), shows actual LED output, and minimizes mutex hold time.

Applied to files:

  • docs/develop/architecture.md
  • src/MoonLight/Modules/ModuleLightsControl.h
📚 Learning: 2026-01-06T15:24:29.770Z
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 93
File: src/MoonLight/Layers/VirtualLayer.cpp:212-221
Timestamp: 2026-01-06T15:24:29.770Z
Learning: In VirtualLayer::getLight (src/MoonLight/Layers/VirtualLayer.cpp), bounds checking for unmapped lights (the else branch around lines 212-221) is intentionally disabled for performance when driving 100K+ LEDs. The design places responsibility on effect developers to stay within maxChannels. This is an explicit performance vs safety tradeoff for extreme LED counts.

Applied to files:

  • src/MoonLight/Modules/ModuleLightsControl.h
🪛 LanguageTool
docs/gettingstarted/installer.md

[style] ~133-~133: Using many exclamation marks might seem excessive (in this case: 24 exclamation marks for a text that’s 10358 characters long)
Context: ...lation/#usb-to-serial-chip-drivers).

!!! info "Next steps"
* You can now ch...

(EN_EXCESSIVE_EXCLAMATION)

🪛 markdownlint-cli2 (0.18.1)
docs/gettingstarted/installer.md

66-66: Images should have alternate text (alt text)

(MD045, no-alt-text)


70-70: Images should have alternate text (alt text)

(MD045, no-alt-text)


78-78: Images should have alternate text (alt text)

(MD045, no-alt-text)


92-92: Images should have alternate text (alt text)

(MD045, no-alt-text)


103-103: Images should have alternate text (alt text)

(MD045, no-alt-text)


112-112: Images should have alternate text (alt text)

(MD045, no-alt-text)


120-120: Images should have alternate text (alt text)

(MD045, no-alt-text)


124-124: Images should have alternate text (alt text)

(MD045, no-alt-text)


135-135: Spaces inside emphasis markers

(MD037, no-space-in-emphasis)


151-151: Images should have alternate text (alt text)

(MD045, no-alt-text)


163-163: Images should have alternate text (alt text)

(MD045, no-alt-text)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build
🔇 Additional comments (3)
src/MoonLight/Modules/ModuleLightsControl.h (1)

511-528: LGTM! Direct state access removes _accessMutex contention.

This change correctly addresses the performance issue identified in the PR discussion where read() was holding _accessMutex every 20-40ms for monitor emission, causing brightness updates to timeout waiting 100ms. Directly accessing _state.data["monitorOn"] is safe here since it's a simple boolean check where a momentary stale read has negligible impact (at worst, one frame is skipped or sent unnecessarily).

Based on learnings, this maintains parallelism between effects rendering, monitor emission, and driver tasks.

interface/src/lib/components/moonbase/FieldRenderer.svelte (1)

60-70: LGTM! Proper cleanup ensures no lost updates.

The cleanup correctly clears the timeout and flushes any pending slider event on component destruction, ensuring the final user value is sent to the server.

docs/develop/architecture.md (1)

72-94: LGTM! Clear documentation of task responsibilities and synchronization.

The new task narratives clearly document which operations are protected by nodeMutex, helping developers understand the synchronization model. This aligns well with the per-node mutex changes in VirtualLayer.cpp and PhysicalLayer.cpp.

Comment on lines 306 to +311
while (true) {
loop();
esp_task_wdt_reset();
loop();
vTaskDelay(1);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Code example doesn't match the description.

The code shows esp_task_wdt_reset() being called before loop(), but the AI summary states it was moved to occur "after loop() execution." If the intent is to reset the watchdog after the loop completes, the code should be:

📝 Suggested fix if watchdog reset should be after loop
   while (true) {
-    esp_task_wdt_reset();
     loop();
+    esp_task_wdt_reset();
     vTaskDelay(1);
   }

Otherwise, update the surrounding documentation to clarify the current behavior.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
while (true) {
loop();
esp_task_wdt_reset();
loop();
vTaskDelay(1);
}
while (true) {
loop();
esp_task_wdt_reset();
vTaskDelay(1);
}
🤖 Prompt for AI Agents
In @docs/develop/architecture.md around lines 306 - 311, The code sample calls
esp_task_wdt_reset() before loop(), but the documentation text says the watchdog
reset occurs after loop() — fix by making code and prose consistent: either move
the esp_task_wdt_reset() call to after loop() (update the while(true) body
around loop(), esp_task_wdt_reset(), then vTaskDelay(1)) if you want the reset
to happen post-loop, or update the surrounding documentation to state that
esp_task_wdt_reset() is called before loop() and describe the implications;
refer to the loop(), esp_task_wdt_reset(), vTaskDelay(), and the while(true)
loop in the example when applying the change.

@ewowi ewowi merged commit dea3aad into main Jan 12, 2026
2 checks passed
@ewowi ewowi deleted the Node-update-mutex branch January 12, 2026 13:33
@ewowi
Copy link
Collaborator Author

ewowi commented Jan 12, 2026

@coderabbitai, one more thing, can you check virtual ~Node() I understood C++ constructors are not inherited, so I declare it as normal function: constructor(). Should we do the same for ~Node() ?

And what about derived classes, e.g. ~TetrixEffect() override { freeMB(drops); };
Is ~Node() implicitly called or do we need to call it here ?

@coderabbitai
Copy link

coderabbitai bot commented Jan 12, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Read Node.h to see the current destructor implementation
cat src/MoonBase/Nodes.h | grep -A 15 "~Node"

Length of output: 580


🏁 Script executed:

#!/bin/bash
# Look at derived class examples to see destructor usage
rg -n "~.*Effect.*override" -A 3 -B 1

Length of output: 2138


🏁 Script executed:

#!/bin/bash
# Check if there are any other derived class destructors
rg -n "class.*:.*public Node" -A 20 | grep -E "class|~.*\(\)|virtual ~"

Length of output: 7927


@ewowi, great question! The destructor behavior is fundamentally different from constructors in C++. Here's the complete explanation:

1. Destructor Syntax: ~Node() is Correct ✅

No, you should NOT make it a normal function. Unlike your constructor() pattern, destructors must use the ~ClassName() syntax—it's the only way C++ recognizes them as destructors.

Your current code is correct:

virtual ~Node() {
  if (nodeMutex != nullptr) {
    vSemaphoreDelete(nodeMutex);
    nodeMutex = NULL;
  }
}

Why the difference from constructor()?

  • Constructors: You use constructor() as a normal function because you want explicit control over initialization timing (probably called from derived classes explicitly)
  • Destructors: C++ requires the ~ClassName() syntax to hook into the language's automatic cleanup mechanism

2. Base Destructor is ALWAYS Automatically Called ✅

Yes, ~Node() is implicitly called! You never need to (and cannot) call it manually.

Destruction Order (Guaranteed by C++)

When a derived class object is destroyed:

~TetrixEffect() override { 
  freeMB(drops);     // 1️⃣ Derived destructor runs first
}
// 2️⃣ Then ~Node() runs automatically (you don't call it)
// 3️⃣ Then member variable destructors run

The compiler automatically generates:

~TetrixEffect() override {
  freeMB(drops);           // Your code
  ~Node();                 // Compiler inserts this automatically!
  // Cleanup member variables
}

3. Why virtual is Critical 🎯

Without virtual ~Node(), polymorphic deletion would leak memory:

Node* node = new TetrixEffect();
delete node;  // ❌ Without virtual: only ~Node() runs, leaks drops!
              // ✅ With virtual: ~TetrixEffect() runs, then ~Node()

Your current code is perfect because:

  • virtual ~Node() ensures proper cleanup through base pointers
  • Derived classes use override keyword (good practice)
  • Each destructor cleans up its own resources
  • Base destructor automatically called after derived

Summary

Aspect Constructor Pattern Destructor Pattern
Syntax constructor() (custom) ~Node() (mandatory C++ syntax)
Inheritance Manual call from derived Automatic chain (derived → base)
Virtual needed? N/A Yes (for polymorphic cleanup)
Your code ✅ Correct custom pattern ✅ Correct C++ destructors

No changes needed! Your destructor implementation is correct and follows C++ best practices. 🎉


🐰 Mood
    /\___/\
   ( ^._.^ )  "Destructors are automatic!
    \  ~  /    No manual cleanup needed!"
   (__m__m)

🧠 Learnings used
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-06T13:45:03.067Z
Learning: For MoonLight on ESP32 at extreme loads (e.g., 49K channels @ 100 fps): Task watchdog can fire when AppDrivers/AppEffects process many nodes without yielding. Even with vTaskDelay(1) at loop end, the node iteration for-loops (PhysicalLayer::loopDrivers line 108-111, PhysicalLayer::loop line 73-75) can take >5 seconds without yielding, starving IDLE task (priority 0). Solution: add taskYIELD() or vTaskDelay(1) inside the node/layer iteration loops to give IDLE task chances to run and reset watchdog. Lowering task priority won't help since IDLE is already at 0 (lowest).

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 84
File: src/MoonLight/Modules/ModuleLightsControl.h:367-379
Timestamp: 2025-12-28T10:28:24.157Z
Learning: In MoonLight PR #84: wrapping effectTask's layerP.loop() with monitorMutex while reading channelsE for monitor emission is thread-safe and prevents partial frames, but introduces serialization between effects rendering and monitor emission. Reading from channelsD (driver/front buffer) instead—with monitorMutex held only during the brief pointer swap—is superior because it maintains full parallelism (effects ║ monitor ║ driver), shows actual LED output, and minimizes mutex hold time.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2025-12-25T13:34:40.097Z
Learning: For the MoonLight LED system: Effects task can run on Core 0 (PRO_CPU) since it only performs computation (calculating colors, writing to channels array) without timing-critical hardware interaction. WiFi preemption is acceptable for pure computation tasks. Only the driver task (using DMA, I2S, LCD, PARLIO) requires Core 1 (APP_CPU) for uninterrupted timing-critical hardware operations.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-08T11:39:52.862Z
Learning: In MoonLight WebSocket monitor: When browser tabs are hidden, browsers throttle JavaScript (including WebSocket onmessage handlers) but keep TCP connections open. Monitor data (48KB frames every 20-300ms) continues arriving at TCP level but isn't processed. This fills ESP32's httpd_ws_send_frame_async() buffers, causing ESP_FAIL. When tab becomes visible, hundreds of queued frames overwhelm the JavaScript main thread, freezing the UI. Solution: use Page Visibility API (client_visibility event) to stop emitting monitor data to hidden clients, preventing buffer overflow and backlog buildup.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 93
File: src/MoonLight/Layers/VirtualLayer.cpp:212-221
Timestamp: 2026-01-06T15:24:29.770Z
Learning: In VirtualLayer::getLight (src/MoonLight/Layers/VirtualLayer.cpp), bounds checking for unmapped lights (the else branch around lines 212-221) is intentionally disabled for performance when driving 100K+ LEDs. The design places responsibility on effect developers to stay within maxChannels. This is an explicit performance vs safety tradeoff for extreme LED counts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants