Skip to content

feat: diagnostic telemetry system (Plan 06)#84

Merged
basmeerman merged 6 commits intomasterfrom
work/plan-06
Mar 19, 2026
Merged

feat: diagnostic telemetry system (Plan 06)#84
basmeerman merged 6 commits intomasterfrom
work/plan-06

Conversation

@basmeerman
Copy link
Copy Markdown
Owner

@basmeerman basmeerman commented Mar 19, 2026

Summary

  • Core ring buffer (diag_telemetry.h/c): Pure C circular buffer with 64-byte packed snapshots (128 slots = 8 KB RAM), CRC32 serialization, configurable capture profiles (OFF/GENERAL/SOLAR/LOADBAL/MODBUS/FAST)
  • Firmware sampler (diag_sampler.h/cpp): Bridges 50+ firmware globals into snapshots every 1s, with REST endpoints for status/start/stop/download
  • LittleFS persistence (diag_storage.h/cpp): On-demand dump to flash with 4-file retention, auto-dump on error transitions (error onset, unexpected C→A, solar timer max, meter timeout)
  • MQTT control: Set/DiagProfile command to start/stop capture remotely
  • WebSocket stream (/diag/stream): Real-time binary snapshot push to browser clients
  • Modbus event ring (diag_modbus.h/c): 32-entry timing ring (256 bytes) instrumented in ModbusSend8/MBhandleData/MBhandleError
  • Test replay framework (diag_loader.h/c): Load .diag files in native tests, advisory replay with field mapping to evse_ctx_t
  • Tools: diag_decode.py (binary→JSON), diag_viewer.html (live browser dashboard)

Verification

  • 41 native test suites, all passing (51 new tests)
  • Address + UB sanitizers clean
  • cppcheck clean
  • ESP32: RAM 23.8%, Flash 85.9%
  • CH32: RAM 20.5%, Flash 63.0%

Issues closed

Closes #47, closes #48, closes #49, closes #50

Test plan

  • Verify native tests pass: cd SmartEVSE-3/test/native && make clean test
  • Verify sanitizers: make clean test CFLAGS_EXTRA="-fsanitize=address,undefined -fno-omit-frame-pointer"
  • Verify ESP32 build: pio run -e release -d SmartEVSE-3/
  • Verify CH32 build: pio run -e ch32 -d SmartEVSE-3/
  • Flash ESP32 and test REST endpoints (/diag/status, /diag/start?profile=general, /diag/download)
  • Test MQTT Set/DiagProfile command
  • Test WebSocket stream with diag_viewer.html
  • Test auto-dump triggers (simulate error transition)
  • Verify diag_decode.py decodes downloaded .diag files

🤖 Generated with Claude Code

basmeerman and others added 6 commits March 19, 2026 10:12
Add a built-in diagnostic telemetry system that captures 64-byte state
snapshots into a RAM ring buffer (8 KB, 128 slots). Supports multiple
capture profiles (GENERAL, SOLAR, LOADBAL, MODBUS, FAST) and exposes
REST endpoints for status, start/stop, and binary download.

New modules:
- diag_telemetry.h/c: Pure C ring buffer with CRC32 serialization
- diag_sampler.h/cpp: Firmware bridge sampling globals every 1s
- test_diag_telemetry.c: 29 native tests with SbE annotations
- tools/diag_decode.py: Python script to decode .diag binary to JSON

REST API: /diag/status, /diag/start, /diag/stop, /diag/download

Verified: native tests (39 suites), sanitizers, cppcheck, ESP32 + CH32 builds.
RAM impact: +8.2 KB (23.7% total). Flash: 85.7%.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…48)

Add diagnostic telemetry persistence and auto-trigger capabilities:

- diag_storage.h/cpp: LittleFS dump/load with 4-file retention policy,
  extended data capture (per-node LB state), auto-delete oldest on overflow
- Auto-dump triggers: error onset, unexpected C→A disconnect, solar timer
  max, meter timeout — fires automatically when capture is active
- REST endpoints: POST /diag/dump, GET /diag/files, GET /diag/file/*,
  DELETE /diag/file/* for file management
- MQTT Set/DiagProfile command: accepts off/general/solar/loadbal/modbus/fast
  (string or numeric 0-5) to start/stop capture remotely
- 5 new MQTT parser tests for DiagProfile command validation

Verified: 39 test suites, sanitizers, cppcheck, ESP32 (85.9%) + CH32 builds.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add real-time diagnostic streaming and Modbus communication monitoring:

- diag_modbus.h/c: Pure C ring buffer for Modbus frame timing events
  (32 entries x 8 bytes = 256 bytes RAM). Records sent/received/error
  events with timestamps from ModbusSend8, MBhandleData, MBhandleError.
- WebSocket /diag/stream endpoint in network_common.cpp: pushes 64-byte
  binary snapshots to connected clients on each sample tick, following
  the existing wsLcdConnections pattern.
- Modbus event ring auto-enables for MODBUS and FAST capture profiles.
- tools/diag_viewer.html: standalone browser diagnostic viewer with
  live WebSocket table, profile control, and .diag download button.
- 8 new native tests for Modbus event ring (test_diag_modbus.c).

Verified: 40 test suites, sanitizers, cppcheck, ESP32 (85.9%) + CH32 builds.
RAM impact: +264 bytes (23.8% total).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add a .diag file loader and replay test framework for the native test
harness, enabling developers to replay captured diagnostic data:

- diag_loader.h/c: Pure C loader for .diag binary files. Supports
  loading from file, in-memory buffer, and synthetic generation.
  Validates magic, version, CRC32, and handles truncated files.
- test_diag_replay.c: 9 native tests covering loader functionality,
  field mapping to evse_ctx_t, state transition tracking, solar
  oscillation detection, and full serialize/load round-trip.

The replay framework operates in advisory mode — snapshots are mapped
to evse_ctx_t fields for analysis rather than strict assertion, since
the 1 Hz sample rate cannot capture all sub-second transients.

Verified: 41 test suites, sanitizers, ESP32 + CH32 builds.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CI enforces -Wstack-usage=1024 (GCC). diag_capture_t contains a
256-entry snapshot array (16 KB) which exceeded the limit when
declared as a local variable. Move to file-scope static instance
shared across all test functions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two more test functions (test_push_and_read_single,
test_frozen_allows_read) exceeded the 1024-byte stack limit with
buf[8] + out[8] arrays. Use file-scope static buffers instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@basmeerman basmeerman merged commit 06dc654 into master Mar 19, 2026
11 checks passed
basmeerman added a commit that referenced this pull request Mar 19, 2026
- Add Plan 06 (Diagnostic Telemetry) features section with PR #84
- Add Plan 07 (Web UI Modernization) features with PR #85
- Add Plan 09 (Power Input Methods) features: API staleness detection,
  HomeWizard P1 energy data, manual IP fallback, metering diagnostics
- Add per-phase energy MQTT and metering diagnostic counters to MQTT section
- Update Testing & Quality metrics (43 suites, 870+ scenarios)
- Update Roadmap: all 9 plans marked Done with PR links
- Add PR references throughout for traceability

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant