feat: diagnostic telemetry system (Plan 06)#84
Merged
basmeerman merged 6 commits intomasterfrom Mar 19, 2026
Merged
Conversation
Add a built-in diagnostic telemetry system that captures 64-byte state snapshots into a RAM ring buffer (8 KB, 128 slots). Supports multiple capture profiles (GENERAL, SOLAR, LOADBAL, MODBUS, FAST) and exposes REST endpoints for status, start/stop, and binary download. New modules: - diag_telemetry.h/c: Pure C ring buffer with CRC32 serialization - diag_sampler.h/cpp: Firmware bridge sampling globals every 1s - test_diag_telemetry.c: 29 native tests with SbE annotations - tools/diag_decode.py: Python script to decode .diag binary to JSON REST API: /diag/status, /diag/start, /diag/stop, /diag/download Verified: native tests (39 suites), sanitizers, cppcheck, ESP32 + CH32 builds. RAM impact: +8.2 KB (23.7% total). Flash: 85.7%. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…48) Add diagnostic telemetry persistence and auto-trigger capabilities: - diag_storage.h/cpp: LittleFS dump/load with 4-file retention policy, extended data capture (per-node LB state), auto-delete oldest on overflow - Auto-dump triggers: error onset, unexpected C→A disconnect, solar timer max, meter timeout — fires automatically when capture is active - REST endpoints: POST /diag/dump, GET /diag/files, GET /diag/file/*, DELETE /diag/file/* for file management - MQTT Set/DiagProfile command: accepts off/general/solar/loadbal/modbus/fast (string or numeric 0-5) to start/stop capture remotely - 5 new MQTT parser tests for DiagProfile command validation Verified: 39 test suites, sanitizers, cppcheck, ESP32 (85.9%) + CH32 builds. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add real-time diagnostic streaming and Modbus communication monitoring: - diag_modbus.h/c: Pure C ring buffer for Modbus frame timing events (32 entries x 8 bytes = 256 bytes RAM). Records sent/received/error events with timestamps from ModbusSend8, MBhandleData, MBhandleError. - WebSocket /diag/stream endpoint in network_common.cpp: pushes 64-byte binary snapshots to connected clients on each sample tick, following the existing wsLcdConnections pattern. - Modbus event ring auto-enables for MODBUS and FAST capture profiles. - tools/diag_viewer.html: standalone browser diagnostic viewer with live WebSocket table, profile control, and .diag download button. - 8 new native tests for Modbus event ring (test_diag_modbus.c). Verified: 40 test suites, sanitizers, cppcheck, ESP32 (85.9%) + CH32 builds. RAM impact: +264 bytes (23.8% total). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add a .diag file loader and replay test framework for the native test harness, enabling developers to replay captured diagnostic data: - diag_loader.h/c: Pure C loader for .diag binary files. Supports loading from file, in-memory buffer, and synthetic generation. Validates magic, version, CRC32, and handles truncated files. - test_diag_replay.c: 9 native tests covering loader functionality, field mapping to evse_ctx_t, state transition tracking, solar oscillation detection, and full serialize/load round-trip. The replay framework operates in advisory mode — snapshots are mapped to evse_ctx_t fields for analysis rather than strict assertion, since the 1 Hz sample rate cannot capture all sub-second transients. Verified: 41 test suites, sanitizers, ESP32 + CH32 builds. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CI enforces -Wstack-usage=1024 (GCC). diag_capture_t contains a 256-entry snapshot array (16 KB) which exceeded the limit when declared as a local variable. Move to file-scope static instance shared across all test functions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two more test functions (test_push_and_read_single, test_frozen_allows_read) exceeded the 1024-byte stack limit with buf[8] + out[8] arrays. Use file-scope static buffers instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
basmeerman
added a commit
that referenced
this pull request
Mar 19, 2026
- Add Plan 06 (Diagnostic Telemetry) features section with PR #84 - Add Plan 07 (Web UI Modernization) features with PR #85 - Add Plan 09 (Power Input Methods) features: API staleness detection, HomeWizard P1 energy data, manual IP fallback, metering diagnostics - Add per-phase energy MQTT and metering diagnostic counters to MQTT section - Update Testing & Quality metrics (43 suites, 870+ scenarios) - Update Roadmap: all 9 plans marked Done with PR links - Add PR references throughout for traceability Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
diag_telemetry.h/c): Pure C circular buffer with 64-byte packed snapshots (128 slots = 8 KB RAM), CRC32 serialization, configurable capture profiles (OFF/GENERAL/SOLAR/LOADBAL/MODBUS/FAST)diag_sampler.h/cpp): Bridges 50+ firmware globals into snapshots every 1s, with REST endpoints for status/start/stop/downloaddiag_storage.h/cpp): On-demand dump to flash with 4-file retention, auto-dump on error transitions (error onset, unexpected C→A, solar timer max, meter timeout)Set/DiagProfilecommand to start/stop capture remotely/diag/stream): Real-time binary snapshot push to browser clientsdiag_modbus.h/c): 32-entry timing ring (256 bytes) instrumented in ModbusSend8/MBhandleData/MBhandleErrordiag_loader.h/c): Load .diag files in native tests, advisory replay with field mapping to evse_ctx_tdiag_decode.py(binary→JSON),diag_viewer.html(live browser dashboard)Verification
Issues closed
Closes #47, closes #48, closes #49, closes #50
Test plan
cd SmartEVSE-3/test/native && make clean testmake clean test CFLAGS_EXTRA="-fsanitize=address,undefined -fno-omit-frame-pointer"pio run -e release -d SmartEVSE-3/pio run -e ch32 -d SmartEVSE-3//diag/status,/diag/start?profile=general,/diag/download)Set/DiagProfilecommanddiag_viewer.htmldiag_decode.pydecodes downloaded .diag files🤖 Generated with Claude Code