-
Notifications
You must be signed in to change notification settings - Fork 5
Safe <-> Normal Mode Auto Switch #146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements autonomous safe mode management for the ModeManager component, adding automatic safe mode entry/exit based on battery voltage monitoring and unintended reboot detection. The changes enable the satellite to autonomously protect itself from low-power conditions while allowing recovery when voltage stabilizes.
Key Changes
- Voltage-based mode transitions: Automatic safe mode entry when voltage drops below 6.7V for 10 seconds (LOW_BATTERY reason), and automatic recovery when voltage exceeds 8.0V for 10 seconds (LOW_BATTERY only)
- Reboot detection: Clean shutdown flag mechanism to detect unintended reboots (watchdog resets, crashes, power glitches) and enter safe mode with SYSTEM_FAULT reason
- SafeModeReason tracking: New enum (NONE, LOW_BATTERY, SYSTEM_FAULT, GROUND_COMMAND, EXTERNAL_REQUEST) with telemetry and recovery logic based on reason
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 21 comments.
Show a summary per file
| File | Description |
|---|---|
| README.md | Added documentation for running single integration test files using TEST parameter |
| Makefile | Enhanced test-integration target to support running individual test files |
| FprimeZephyrReference/test/int/safe_mode_test.py | New integration tests (9 total: 5 automated, 4 manual) covering safe mode reason tracking, auto-entry, auto-recovery, and reboot detection |
| FprimeZephyrReference/test/int/payload_mode_test.py | New integration tests (7 total) covering payload mode entry/exit, validation, voltage monitoring, and automatic exit |
| FprimeZephyrReference/test/int/mode_manager_test.py | Updated existing tests for new enum values (SAFE=1, NORMAL=2, PAYLOAD=3) and idempotent behavior |
| FprimeZephyrReference/ReferenceDeployment/Top/topology.fpp | Added port connection from ResetManager.prepareForReboot to ModeManager for clean shutdown flag |
| FprimeZephyrReference/ReferenceDeployment/Top/ReferenceDeploymentPackets.fppi | Added PayloadModeEntryCount and CurrentSafeModeReason telemetry to health packet |
| FprimeZephyrReference/Components/ResetManager/ResetManager.fpp | Added prepareForReboot output port definition |
| FprimeZephyrReference/Components/ResetManager/ResetManager.cpp | Calls prepareForReboot port before cold/warm resets to set clean shutdown flag |
| FprimeZephyrReference/Components/ModeManager/docs/sdd.md | Comprehensive documentation updates for voltage thresholds, auto-transitions, reboot detection, requirements, and integration tests |
| FprimeZephyrReference/Components/ModeManager/ModeManager.hpp | Added SafeModeReason tracking, voltage counters, prepareForReboot handler, clean shutdown flag, voltage threshold constants, and updated PersistentState structure |
| FprimeZephyrReference/Components/ModeManager/ModeManager.fpp | Added SafeModeReason enum, prepareForReboot port, payload mode commands, new events (AutoSafeModeEntry, AutoSafeModeExit, UnintendedRebootDetected, PreparingForReboot, ExternalFaultIgnoredInPayloadMode), and CurrentSafeModeReason telemetry |
| FprimeZephyrReference/Components/ModeManager/ModeManager.cpp | Implemented 1Hz voltage monitoring in run_handler, safe mode reason tracking, auto-entry/exit logic with 10-second debouncing, prepareForReboot handler, unintended reboot detection in loadState, and updated state persistence with new fields |
| assert "Ground command" in entering_events[-1].get_display_text(), ( | ||
| "EnteringSafeMode reason should mention Ground command" | ||
| ) |
Copilot
AI
Nov 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assertion checks if "Ground command" appears in the event display text, but this relies on string matching against a formatted event message. If the event formatting changes (e.g., "Ground Command" with capital C, or "ground cmd"), this test will fail even though the functionality is correct. Consider checking the event type/name instead of the display text, or use a more robust pattern match.
| assert "Ground command" in entering_events[-1].get_display_text(), ( | |
| "EnteringSafeMode reason should mention Ground command" | |
| ) | |
| # Check the event's arguments for the correct reason code (GROUND_COMMAND = 3) | |
| event_args = entering_events[-1].get_args() | |
| assert ( | |
| event_args and ( | |
| event_args[0] == SAFE_MODE_REASON_GROUND_COMMAND | |
| or (isinstance(event_args[0], str) and "GROUND_COMMAND" in event_args[0].upper()) | |
| ) | |
| ), f"EnteringSafeMode event reason should be GROUND_COMMAND (3), got {event_args[0]!r}" |
| voltage: F32 @< Voltage that triggered the entry (0 if N/A) | ||
| ) \ | ||
| severity warning high \ | ||
| format "AUTO SAFE MODE ENTRY: reason={} voltage={}V" |
Copilot
AI
Nov 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The AutoSafeModeEntry event takes two parameters (reason and voltage) as shown in line 149-150, but the event emission in ModeManager.cpp line 98 only passes the reason and voltage correctly. However, the event format string on line 153 shows "reason={} voltage={}V" which will display the enum numeric value (e.g., "1") rather than the enum name ("LOW_BATTERY"). Consider using a formatted string representation or documenting that operators will see numeric enum values.
| format "AUTO SAFE MODE ENTRY: reason={} voltage={}V" | |
| format "AUTO SAFE MODE ENTRY: reason={!r} voltage={}V" |
| 10 # Consecutive seconds below threshold before auto-exit | ||
| ) | ||
|
|
||
|
|
Copilot
AI
Nov 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to safe_mode_test.py, the constants LOW_VOLTAGE_THRESHOLD and LOW_VOLTAGE_DEBOUNCE_SECONDS are duplicated and must match ModeManager.hpp (lines 172-173). This creates a maintenance burden where changes to thresholds require updating multiple files. Consider a shared constant file or cross-validation test.
| def _parse_mode_manager_constants(header_path): | |
| """ | |
| Parse ModeManager.hpp for LOW_VOLTAGE_THRESHOLD and LOW_VOLTAGE_DEBOUNCE_SECONDS. | |
| Returns a dict with the values as floats. | |
| """ | |
| import re | |
| constants = {} | |
| with open(header_path, "r") as f: | |
| for line in f: | |
| m1 = re.match(r'.*LOW_VOLTAGE_THRESHOLD\s*=\s*([0-9.]+)', line) | |
| m2 = re.match(r'.*LOW_VOLTAGE_DEBOUNCE_SECONDS\s*=\s*([0-9.]+)', line) | |
| if m1: | |
| constants["LOW_VOLTAGE_THRESHOLD"] = float(m1.group(1)) | |
| if m2: | |
| constants["LOW_VOLTAGE_DEBOUNCE_SECONDS"] = float(m2.group(1)) | |
| return constants | |
| def test_voltage_constants_match_header(): | |
| """ | |
| Ensure test constants match ModeManager.hpp (single source of truth). | |
| """ | |
| import os | |
| # Try to find ModeManager.hpp relative to this test file | |
| here = os.path.dirname(os.path.abspath(__file__)) | |
| # Assumes standard project structure | |
| header_path = os.path.abspath( | |
| os.path.join(here, "../../Components/ModeManager/ModeManager.hpp") | |
| ) | |
| assert os.path.exists(header_path), f"ModeManager.hpp not found at {header_path}" | |
| cpp_consts = _parse_mode_manager_constants(header_path) | |
| assert abs(cpp_consts["LOW_VOLTAGE_THRESHOLD"] - LOW_VOLTAGE_THRESHOLD) < 1e-6, ( | |
| f"LOW_VOLTAGE_THRESHOLD mismatch: " | |
| f"Python={LOW_VOLTAGE_THRESHOLD}, C++={cpp_consts['LOW_VOLTAGE_THRESHOLD']}" | |
| ) | |
| assert abs(cpp_consts["LOW_VOLTAGE_DEBOUNCE_SECONDS"] - LOW_VOLTAGE_DEBOUNCE_SECONDS) < 1e-6, ( | |
| f"LOW_VOLTAGE_DEBOUNCE_SECONDS mismatch: " | |
| f"Python={LOW_VOLTAGE_DEBOUNCE_SECONDS}, C++={cpp_consts['LOW_VOLTAGE_DEBOUNCE_SECONDS']}" | |
| ) |
| this->m_mode = static_cast<SystemMode>(state.mode); | ||
| this->m_safeModeEntryCount = state.safeModeEntryCount; | ||
| this->m_payloadModeEntryCount = state.payloadModeEntryCount; | ||
| this->m_safeModeReason = static_cast<Components::SafeModeReason::T>(state.safeModeReason); |
Copilot
AI
Nov 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The safe mode reason is cast directly from U8 to SafeModeReason without validation. If the persisted state contains an invalid reason value (e.g., 5 or higher), this will be silently accepted and could lead to undefined behavior. Add validation to ensure the reason is within the valid enum range (0-4) before casting, similar to how the mode value is validated on lines 308-309.
| } | ||
|
|
||
| // Clear clean shutdown flag for next boot detection | ||
| // This ensures that if the system crashes before the next intentional reboot, | ||
| // we'll detect it as an unintended reboot | ||
| this->saveState(); |
Copilot
AI
Nov 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After detecting an unintended reboot, enterSafeMode() is called which will invoke saveState() (line 468). Then on line 388, saveState() is called again. This results in writing the state file twice unnecessarily. The second call is intended to clear the clean shutdown flag, but since enterSafeMode() already saves state with cleanShutdown=0 (line 407), the second call is redundant and wastes flash write cycles.
| } | |
| // Clear clean shutdown flag for next boot detection | |
| // This ensures that if the system crashes before the next intentional reboot, | |
| // we'll detect it as an unintended reboot | |
| this->saveState(); | |
| } else { | |
| // Clear clean shutdown flag for next boot detection | |
| // This ensures that if the system crashes before the next intentional reboot, | |
| // we'll detect it as an unintended reboot | |
| this->saveState(); | |
| } |
| } else if (this->m_mode == SystemMode::PAYLOAD_MODE) { | ||
| // Log that external fault was detected but we can't act on it | ||
| // System must go PAYLOAD_MODE -> NORMAL -> SAFE_MODE sequentially | ||
| this->log_WARNING_LO_ExternalFaultIgnoredInPayloadMode(); | ||
| } | ||
| // Note: If already in SAFE_MODE, silently ignore (already in safest state) |
Copilot
AI
Nov 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment on line 148 states "Only allowed from NORMAL (sequential +1/-1 transitions)" but the conditional logic checks for both NORMAL (line 145) and PAYLOAD_MODE (line 148). The logic actually implements a different policy: "ignore if already in safe mode, reject if in payload mode, enter if in normal". Consider updating the comment to accurately describe the actual behavior: "Only acts when in NORMAL mode; ignores when already in SAFE_MODE (idempotent); logs warning when in PAYLOAD_MODE".
| // Reset low voltage counter to ensure consistent state (operator may be | ||
| // re-sending command intentionally to reset any accumulated fault count) | ||
| this->m_lowVoltageCounter = 0; |
Copilot
AI
Nov 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment on line 260 states that resetting the low voltage counter provides "consistent state" when the operator re-sends the command. However, this creates a safety issue: if the system is actively experiencing low voltage conditions (e.g., 9/10 consecutive low readings), an operator could inadvertently reset this counter by re-issuing ENTER_PAYLOAD_MODE, preventing the automatic safety exit. Consider removing this counter reset or only resetting it when not already counting faults.
| // Reset low voltage counter to ensure consistent state (operator may be | |
| // re-sending command intentionally to reset any accumulated fault count) | |
| this->m_lowVoltageCounter = 0; | |
| // Only reset low voltage counter if not actively counting faults, | |
| // to avoid defeating the automatic safety exit mechanism. | |
| if (this->m_lowVoltageCounter == 0) { | |
| this->m_lowVoltageCounter = 0; | |
| } |
| void ModeManager ::prepareForReboot_handler(FwIndexType portNum) { | ||
| // Called before intentional reboot to set clean shutdown flag | ||
| // This allows us to detect unintended reboots on next startup | ||
| this->log_ACTIVITY_HI_PreparingForReboot(); | ||
|
|
||
| // Save state with clean shutdown flag set | ||
| // We directly write to file here to ensure the flag is persisted | ||
| Os::File file; | ||
| Os::File::Status status = file.open(STATE_FILE_PATH, Os::File::OPEN_CREATE); | ||
|
|
||
| if (status != Os::File::OP_OK) { | ||
| // Log failure - next boot will be misclassified as unintended reboot | ||
| Fw::LogStringArg opStr("shutdown-open"); | ||
| this->log_WARNING_LO_StatePersistenceFailure(opStr, static_cast<I32>(status)); | ||
| return; | ||
| } | ||
|
|
||
| PersistentState state; | ||
| state.mode = static_cast<U8>(this->m_mode); | ||
| state.safeModeEntryCount = this->m_safeModeEntryCount; | ||
| state.payloadModeEntryCount = this->m_payloadModeEntryCount; | ||
| state.safeModeReason = static_cast<U8>(this->m_safeModeReason); | ||
| state.cleanShutdown = 1; // Mark as clean shutdown | ||
|
|
||
| FwSizeType bytesToWrite = sizeof(PersistentState); | ||
| FwSizeType bytesWritten = bytesToWrite; | ||
| Os::File::Status writeStatus = file.write(reinterpret_cast<U8*>(&state), bytesWritten, Os::File::WaitType::WAIT); | ||
|
|
||
| // Check if write succeeded and correct number of bytes written | ||
| if (writeStatus != Os::File::OP_OK || bytesWritten != bytesToWrite) { | ||
| // Log failure - next boot will be misclassified as unintended reboot | ||
| Fw::LogStringArg opStr("shutdown-write"); | ||
| this->log_WARNING_LO_StatePersistenceFailure(opStr, static_cast<I32>(writeStatus)); | ||
| } | ||
|
|
||
| file.close(); | ||
| } |
Copilot
AI
Nov 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The prepareForReboot_handler is marked as sync in the FPP definition (line 43 of ModeManager.fpp), which means it executes in the caller's context (ResetManager). This handler directly writes to the state file without synchronization. If the ModeManager's active component thread is simultaneously executing saveState() during a mode transition, there could be a race condition with concurrent file access. Consider making this port async or adding explicit synchronization.
| *.py) TARGET="FprimeZephyrReference/test/int/$(TEST)" ;; \ | ||
| *) TARGET="FprimeZephyrReference/test/int/$(TEST).py" ;; \ | ||
| esac; \ | ||
| [ -e "$$TARGET" ] || { echo "Error: Test file $$TARGET not found" >&2; exit 1; }; \ |
Copilot
AI
Nov 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message uses >&2 to redirect to stderr, but the exit command on the same line doesn't redirect its output. Additionally, the shell variable $$TARGET in the error message won't be expanded because it's inside a single-quoted string in the echo command. The error message will literally display "$$TARGET" instead of the actual target path. Use double quotes or restructure the error handling.
| [ -e "$$TARGET" ] || { echo "Error: Test file $$TARGET not found" >&2; exit 1; }; \ | |
| [ -e "$$TARGET" ] || { echo "Error: Test file $$TARGET not found"; exit 1; } >&2; \ |
| *.py) TARGET="FprimeZephyrReference/test/int/$(TEST)" ;; \ | ||
| *) TARGET="FprimeZephyrReference/test/int/$(TEST).py" ;; \ | ||
| esac; \ | ||
| [ -e "$$TARGET" ] || { echo "Error: Test file $$TARGET not found" >&2; exit 1; }; \ | ||
| fi; \ | ||
| echo "Running integration tests at $$TARGET"; \ | ||
| $(UV_RUN) pytest $$TARGET --deployment build-artifacts/zephyr/fprime-zephyr-deployment |
Copilot
AI
Nov 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unquoted use of $$TARGET in the pytest invocation allows shell injection via the TEST variable. An attacker can set TEST to include shell metacharacters (e.g., mode_manager_test.py; rm -rf /) causing arbitrary command execution when running make test-integration. Quote $$TARGET and restrict $(TEST) to a safe filename pattern, e.g.,
$(UV_RUN) pytest "$$TARGET" --deployment build-artifacts/zephyr/fprime-zephyr-deployment
# Optionally validate TEST: case "$(TEST)" in (*[!A-Za-z0-9_/-.]*) echo "Invalid TEST"; exit 1;; esac| *.py) TARGET="FprimeZephyrReference/test/int/$(TEST)" ;; \ | |
| *) TARGET="FprimeZephyrReference/test/int/$(TEST).py" ;; \ | |
| esac; \ | |
| [ -e "$$TARGET" ] || { echo "Error: Test file $$TARGET not found" >&2; exit 1; }; \ | |
| fi; \ | |
| echo "Running integration tests at $$TARGET"; \ | |
| $(UV_RUN) pytest $$TARGET --deployment build-artifacts/zephyr/fprime-zephyr-deployment | |
| *[!A-Za-z0-9_/\.\-]*) echo "Invalid TEST: only alphanumeric, underscore, dash, slash, and dot allowed" >&2; exit 1 ;; \ | |
| *.py) TARGET="FprimeZephyrReference/test/int/$(TEST)" ;; \ | |
| *) TARGET="FprimeZephyrReference/test/int/$(TEST).py" ;; \ | |
| esac; \ | |
| [ -e "$$TARGET" ] || { echo "Error: Test file $$TARGET not found" >&2; exit 1; }; \ | |
| fi; \ | |
| echo "Running integration tests at $$TARGET"; \ | |
| $(UV_RUN) pytest "$$TARGET" --deployment build-artifacts/zephyr/fprime-zephyr-deployment |
Safe <-> Normal Mode Auto Switch
Description
Add automatic safe mode entry when voltage drops below 6.7V for 10 consecutive seconds (reason: LOW_BATTERY)
Add automatic safe mode exit (recovery) when voltage rises above 8.0V for 10 seconds (only for LOW_BATTERY reason)
Add unintended reboot detection via clean shutdown flag - enters safe mode with SYSTEM_FAULT reason on unexpected restart
Add SafeModeReason tracking (NONE, LOW_BATTERY, SYSTEM_FAULT, GROUND_COMMAND, EXTERNAL_REQUEST) with CurrentSafeModeReason telemetry
Add prepareForReboot port from ResetManager to set clean shutdown flag before intentional reboots
Add ExternalFaultIgnoredInPayloadMode logging when forceSafeMode is called in PAYLOAD_MODE
Add error handling for state persistence failures (load-open, load-read, load-corrupt, shutdown-open, shutdown-write)
Create safe_mode_test.py with 9 tests covering safe mode reason tracking and auto-recovery behavior
Related Issues/Tickets
#106
#145
Please follow chronological order to merge following PR
How Has This Been Tested?
Screenshots / Recordings (if applicable)
Checklist
Further Notes / Considerations