Skip to content

Conversation

@yudataguy
Copy link
Collaborator

Safe <-> Normal Mode Auto Switch

Description

Add automatic safe mode entry when voltage drops below 6.7V for 10 consecutive seconds (reason: LOW_BATTERY)
Add automatic safe mode exit (recovery) when voltage rises above 8.0V for 10 seconds (only for LOW_BATTERY reason)
Add unintended reboot detection via clean shutdown flag - enters safe mode with SYSTEM_FAULT reason on unexpected restart
Add SafeModeReason tracking (NONE, LOW_BATTERY, SYSTEM_FAULT, GROUND_COMMAND, EXTERNAL_REQUEST) with CurrentSafeModeReason telemetry
Add prepareForReboot port from ResetManager to set clean shutdown flag before intentional reboots
Add ExternalFaultIgnoredInPayloadMode logging when forceSafeMode is called in PAYLOAD_MODE
Add error handling for state persistence failures (load-open, load-read, load-corrupt, shutdown-open, shutdown-write)
Create safe_mode_test.py with 9 tests covering safe mode reason tracking and auto-recovery behavior

Related Issues/Tickets

#106

#145

Please follow chronological order to merge following PR

  1. Payload mode
  2. Payload and normal auto switch
  3. Safe and normal auto switch (this PR)

How Has This Been Tested?

  • Unit tests
  • Integration tests
  • Z Tests
  • Manual testing (describe steps)

Screenshots / Recordings (if applicable)

Checklist

  • Written detailed sdd with requirements, channels, ports, commands, telemetry defined and correctly formatted and spelled
  • Have written relevant integration tests and have documented them in the sdd
  • Have done a code review with
  • Have tested this PR on every supported board with correct board definitions

Further Notes / Considerations

@yudataguy yudataguy requested a review from Copilot November 30, 2025 00:02
Copilot finished reviewing on behalf of yudataguy November 30, 2025 00:05
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements autonomous safe mode management for the ModeManager component, adding automatic safe mode entry/exit based on battery voltage monitoring and unintended reboot detection. The changes enable the satellite to autonomously protect itself from low-power conditions while allowing recovery when voltage stabilizes.

Key Changes

  • Voltage-based mode transitions: Automatic safe mode entry when voltage drops below 6.7V for 10 seconds (LOW_BATTERY reason), and automatic recovery when voltage exceeds 8.0V for 10 seconds (LOW_BATTERY only)
  • Reboot detection: Clean shutdown flag mechanism to detect unintended reboots (watchdog resets, crashes, power glitches) and enter safe mode with SYSTEM_FAULT reason
  • SafeModeReason tracking: New enum (NONE, LOW_BATTERY, SYSTEM_FAULT, GROUND_COMMAND, EXTERNAL_REQUEST) with telemetry and recovery logic based on reason

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 21 comments.

Show a summary per file
File Description
README.md Added documentation for running single integration test files using TEST parameter
Makefile Enhanced test-integration target to support running individual test files
FprimeZephyrReference/test/int/safe_mode_test.py New integration tests (9 total: 5 automated, 4 manual) covering safe mode reason tracking, auto-entry, auto-recovery, and reboot detection
FprimeZephyrReference/test/int/payload_mode_test.py New integration tests (7 total) covering payload mode entry/exit, validation, voltage monitoring, and automatic exit
FprimeZephyrReference/test/int/mode_manager_test.py Updated existing tests for new enum values (SAFE=1, NORMAL=2, PAYLOAD=3) and idempotent behavior
FprimeZephyrReference/ReferenceDeployment/Top/topology.fpp Added port connection from ResetManager.prepareForReboot to ModeManager for clean shutdown flag
FprimeZephyrReference/ReferenceDeployment/Top/ReferenceDeploymentPackets.fppi Added PayloadModeEntryCount and CurrentSafeModeReason telemetry to health packet
FprimeZephyrReference/Components/ResetManager/ResetManager.fpp Added prepareForReboot output port definition
FprimeZephyrReference/Components/ResetManager/ResetManager.cpp Calls prepareForReboot port before cold/warm resets to set clean shutdown flag
FprimeZephyrReference/Components/ModeManager/docs/sdd.md Comprehensive documentation updates for voltage thresholds, auto-transitions, reboot detection, requirements, and integration tests
FprimeZephyrReference/Components/ModeManager/ModeManager.hpp Added SafeModeReason tracking, voltage counters, prepareForReboot handler, clean shutdown flag, voltage threshold constants, and updated PersistentState structure
FprimeZephyrReference/Components/ModeManager/ModeManager.fpp Added SafeModeReason enum, prepareForReboot port, payload mode commands, new events (AutoSafeModeEntry, AutoSafeModeExit, UnintendedRebootDetected, PreparingForReboot, ExternalFaultIgnoredInPayloadMode), and CurrentSafeModeReason telemetry
FprimeZephyrReference/Components/ModeManager/ModeManager.cpp Implemented 1Hz voltage monitoring in run_handler, safe mode reason tracking, auto-entry/exit logic with 10-second debouncing, prepareForReboot handler, unintended reboot detection in loadState, and updated state persistence with new fields

Comment on lines +169 to +171
assert "Ground command" in entering_events[-1].get_display_text(), (
"EnteringSafeMode reason should mention Ground command"
)
Copy link

Copilot AI Nov 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assertion checks if "Ground command" appears in the event display text, but this relies on string matching against a formatted event message. If the event formatting changes (e.g., "Ground Command" with capital C, or "ground cmd"), this test will fail even though the functionality is correct. Consider checking the event type/name instead of the display text, or use a more robust pattern match.

Suggested change
assert "Ground command" in entering_events[-1].get_display_text(), (
"EnteringSafeMode reason should mention Ground command"
)
# Check the event's arguments for the correct reason code (GROUND_COMMAND = 3)
event_args = entering_events[-1].get_args()
assert (
event_args and (
event_args[0] == SAFE_MODE_REASON_GROUND_COMMAND
or (isinstance(event_args[0], str) and "GROUND_COMMAND" in event_args[0].upper())
)
), f"EnteringSafeMode event reason should be GROUND_COMMAND (3), got {event_args[0]!r}"

Copilot uses AI. Check for mistakes.
voltage: F32 @< Voltage that triggered the entry (0 if N/A)
) \
severity warning high \
format "AUTO SAFE MODE ENTRY: reason={} voltage={}V"
Copy link

Copilot AI Nov 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AutoSafeModeEntry event takes two parameters (reason and voltage) as shown in line 149-150, but the event emission in ModeManager.cpp line 98 only passes the reason and voltage correctly. However, the event format string on line 153 shows "reason={} voltage={}V" which will display the enum numeric value (e.g., "1") rather than the enum name ("LOW_BATTERY"). Consider using a formatted string representation or documenting that operators will see numeric enum values.

Suggested change
format "AUTO SAFE MODE ENTRY: reason={} voltage={}V"
format "AUTO SAFE MODE ENTRY: reason={!r} voltage={}V"

Copilot uses AI. Check for mistakes.
10 # Consecutive seconds below threshold before auto-exit
)


Copy link

Copilot AI Nov 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to safe_mode_test.py, the constants LOW_VOLTAGE_THRESHOLD and LOW_VOLTAGE_DEBOUNCE_SECONDS are duplicated and must match ModeManager.hpp (lines 172-173). This creates a maintenance burden where changes to thresholds require updating multiple files. Consider a shared constant file or cross-validation test.

Suggested change
def _parse_mode_manager_constants(header_path):
"""
Parse ModeManager.hpp for LOW_VOLTAGE_THRESHOLD and LOW_VOLTAGE_DEBOUNCE_SECONDS.
Returns a dict with the values as floats.
"""
import re
constants = {}
with open(header_path, "r") as f:
for line in f:
m1 = re.match(r'.*LOW_VOLTAGE_THRESHOLD\s*=\s*([0-9.]+)', line)
m2 = re.match(r'.*LOW_VOLTAGE_DEBOUNCE_SECONDS\s*=\s*([0-9.]+)', line)
if m1:
constants["LOW_VOLTAGE_THRESHOLD"] = float(m1.group(1))
if m2:
constants["LOW_VOLTAGE_DEBOUNCE_SECONDS"] = float(m2.group(1))
return constants
def test_voltage_constants_match_header():
"""
Ensure test constants match ModeManager.hpp (single source of truth).
"""
import os
# Try to find ModeManager.hpp relative to this test file
here = os.path.dirname(os.path.abspath(__file__))
# Assumes standard project structure
header_path = os.path.abspath(
os.path.join(here, "../../Components/ModeManager/ModeManager.hpp")
)
assert os.path.exists(header_path), f"ModeManager.hpp not found at {header_path}"
cpp_consts = _parse_mode_manager_constants(header_path)
assert abs(cpp_consts["LOW_VOLTAGE_THRESHOLD"] - LOW_VOLTAGE_THRESHOLD) < 1e-6, (
f"LOW_VOLTAGE_THRESHOLD mismatch: "
f"Python={LOW_VOLTAGE_THRESHOLD}, C++={cpp_consts['LOW_VOLTAGE_THRESHOLD']}"
)
assert abs(cpp_consts["LOW_VOLTAGE_DEBOUNCE_SECONDS"] - LOW_VOLTAGE_DEBOUNCE_SECONDS) < 1e-6, (
f"LOW_VOLTAGE_DEBOUNCE_SECONDS mismatch: "
f"Python={LOW_VOLTAGE_DEBOUNCE_SECONDS}, C++={cpp_consts['LOW_VOLTAGE_DEBOUNCE_SECONDS']}"
)

Copilot uses AI. Check for mistakes.
this->m_mode = static_cast<SystemMode>(state.mode);
this->m_safeModeEntryCount = state.safeModeEntryCount;
this->m_payloadModeEntryCount = state.payloadModeEntryCount;
this->m_safeModeReason = static_cast<Components::SafeModeReason::T>(state.safeModeReason);
Copy link

Copilot AI Nov 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The safe mode reason is cast directly from U8 to SafeModeReason without validation. If the persisted state contains an invalid reason value (e.g., 5 or higher), this will be silently accepted and could lead to undefined behavior. Add validation to ensure the reason is within the valid enum range (0-4) before casting, similar to how the mode value is validated on lines 308-309.

Copilot uses AI. Check for mistakes.
Comment on lines +383 to +388
}

// Clear clean shutdown flag for next boot detection
// This ensures that if the system crashes before the next intentional reboot,
// we'll detect it as an unintended reboot
this->saveState();
Copy link

Copilot AI Nov 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After detecting an unintended reboot, enterSafeMode() is called which will invoke saveState() (line 468). Then on line 388, saveState() is called again. This results in writing the state file twice unnecessarily. The second call is intended to clear the clean shutdown flag, but since enterSafeMode() already saves state with cleanShutdown=0 (line 407), the second call is redundant and wastes flash write cycles.

Suggested change
}
// Clear clean shutdown flag for next boot detection
// This ensures that if the system crashes before the next intentional reboot,
// we'll detect it as an unintended reboot
this->saveState();
} else {
// Clear clean shutdown flag for next boot detection
// This ensures that if the system crashes before the next intentional reboot,
// we'll detect it as an unintended reboot
this->saveState();
}

Copilot uses AI. Check for mistakes.
Comment on lines +148 to +153
} else if (this->m_mode == SystemMode::PAYLOAD_MODE) {
// Log that external fault was detected but we can't act on it
// System must go PAYLOAD_MODE -> NORMAL -> SAFE_MODE sequentially
this->log_WARNING_LO_ExternalFaultIgnoredInPayloadMode();
}
// Note: If already in SAFE_MODE, silently ignore (already in safest state)
Copy link

Copilot AI Nov 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment on line 148 states "Only allowed from NORMAL (sequential +1/-1 transitions)" but the conditional logic checks for both NORMAL (line 145) and PAYLOAD_MODE (line 148). The logic actually implements a different policy: "ignore if already in safe mode, reject if in payload mode, enter if in normal". Consider updating the comment to accurately describe the actual behavior: "Only acts when in NORMAL mode; ignores when already in SAFE_MODE (idempotent); logs warning when in PAYLOAD_MODE".

Copilot uses AI. Check for mistakes.
Comment on lines +260 to +262
// Reset low voltage counter to ensure consistent state (operator may be
// re-sending command intentionally to reset any accumulated fault count)
this->m_lowVoltageCounter = 0;
Copy link

Copilot AI Nov 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment on line 260 states that resetting the low voltage counter provides "consistent state" when the operator re-sends the command. However, this creates a safety issue: if the system is actively experiencing low voltage conditions (e.g., 9/10 consecutive low readings), an operator could inadvertently reset this counter by re-issuing ENTER_PAYLOAD_MODE, preventing the automatic safety exit. Consider removing this counter reset or only resetting it when not already counting faults.

Suggested change
// Reset low voltage counter to ensure consistent state (operator may be
// re-sending command intentionally to reset any accumulated fault count)
this->m_lowVoltageCounter = 0;
// Only reset low voltage counter if not actively counting faults,
// to avoid defeating the automatic safety exit mechanism.
if (this->m_lowVoltageCounter == 0) {
this->m_lowVoltageCounter = 0;
}

Copilot uses AI. Check for mistakes.
Comment on lines +162 to +198
void ModeManager ::prepareForReboot_handler(FwIndexType portNum) {
// Called before intentional reboot to set clean shutdown flag
// This allows us to detect unintended reboots on next startup
this->log_ACTIVITY_HI_PreparingForReboot();

// Save state with clean shutdown flag set
// We directly write to file here to ensure the flag is persisted
Os::File file;
Os::File::Status status = file.open(STATE_FILE_PATH, Os::File::OPEN_CREATE);

if (status != Os::File::OP_OK) {
// Log failure - next boot will be misclassified as unintended reboot
Fw::LogStringArg opStr("shutdown-open");
this->log_WARNING_LO_StatePersistenceFailure(opStr, static_cast<I32>(status));
return;
}

PersistentState state;
state.mode = static_cast<U8>(this->m_mode);
state.safeModeEntryCount = this->m_safeModeEntryCount;
state.payloadModeEntryCount = this->m_payloadModeEntryCount;
state.safeModeReason = static_cast<U8>(this->m_safeModeReason);
state.cleanShutdown = 1; // Mark as clean shutdown

FwSizeType bytesToWrite = sizeof(PersistentState);
FwSizeType bytesWritten = bytesToWrite;
Os::File::Status writeStatus = file.write(reinterpret_cast<U8*>(&state), bytesWritten, Os::File::WaitType::WAIT);

// Check if write succeeded and correct number of bytes written
if (writeStatus != Os::File::OP_OK || bytesWritten != bytesToWrite) {
// Log failure - next boot will be misclassified as unintended reboot
Fw::LogStringArg opStr("shutdown-write");
this->log_WARNING_LO_StatePersistenceFailure(opStr, static_cast<I32>(writeStatus));
}

file.close();
}
Copy link

Copilot AI Nov 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prepareForReboot_handler is marked as sync in the FPP definition (line 43 of ModeManager.fpp), which means it executes in the caller's context (ResetManager). This handler directly writes to the state file without synchronization. If the ModeManager's active component thread is simultaneously executing saveState() during a mode transition, there could be a race condition with concurrent file access. Consider making this port async or adding explicit synchronization.

Copilot uses AI. Check for mistakes.
*.py) TARGET="FprimeZephyrReference/test/int/$(TEST)" ;; \
*) TARGET="FprimeZephyrReference/test/int/$(TEST).py" ;; \
esac; \
[ -e "$$TARGET" ] || { echo "Error: Test file $$TARGET not found" >&2; exit 1; }; \
Copy link

Copilot AI Nov 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message uses >&2 to redirect to stderr, but the exit command on the same line doesn't redirect its output. Additionally, the shell variable $$TARGET in the error message won't be expanded because it's inside a single-quoted string in the echo command. The error message will literally display "$$TARGET" instead of the actual target path. Use double quotes or restructure the error handling.

Suggested change
[ -e "$$TARGET" ] || { echo "Error: Test file $$TARGET not found" >&2; exit 1; }; \
[ -e "$$TARGET" ] || { echo "Error: Test file $$TARGET not found"; exit 1; } >&2; \

Copilot uses AI. Check for mistakes.
Comment on lines +60 to +66
*.py) TARGET="FprimeZephyrReference/test/int/$(TEST)" ;; \
*) TARGET="FprimeZephyrReference/test/int/$(TEST).py" ;; \
esac; \
[ -e "$$TARGET" ] || { echo "Error: Test file $$TARGET not found" >&2; exit 1; }; \
fi; \
echo "Running integration tests at $$TARGET"; \
$(UV_RUN) pytest $$TARGET --deployment build-artifacts/zephyr/fprime-zephyr-deployment
Copy link

Copilot AI Nov 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unquoted use of $$TARGET in the pytest invocation allows shell injection via the TEST variable. An attacker can set TEST to include shell metacharacters (e.g., mode_manager_test.py; rm -rf /) causing arbitrary command execution when running make test-integration. Quote $$TARGET and restrict $(TEST) to a safe filename pattern, e.g.,

$(UV_RUN) pytest "$$TARGET" --deployment build-artifacts/zephyr/fprime-zephyr-deployment
# Optionally validate TEST: case "$(TEST)" in (*[!A-Za-z0-9_/-.]*) echo "Invalid TEST"; exit 1;; esac
Suggested change
*.py) TARGET="FprimeZephyrReference/test/int/$(TEST)" ;; \
*) TARGET="FprimeZephyrReference/test/int/$(TEST).py" ;; \
esac; \
[ -e "$$TARGET" ] || { echo "Error: Test file $$TARGET not found" >&2; exit 1; }; \
fi; \
echo "Running integration tests at $$TARGET"; \
$(UV_RUN) pytest $$TARGET --deployment build-artifacts/zephyr/fprime-zephyr-deployment
*[!A-Za-z0-9_/\.\-]*) echo "Invalid TEST: only alphanumeric, underscore, dash, slash, and dot allowed" >&2; exit 1 ;; \
*.py) TARGET="FprimeZephyrReference/test/int/$(TEST)" ;; \
*) TARGET="FprimeZephyrReference/test/int/$(TEST).py" ;; \
esac; \
[ -e "$$TARGET" ] || { echo "Error: Test file $$TARGET not found" >&2; exit 1; }; \
fi; \
echo "Running integration tests at $$TARGET"; \
$(UV_RUN) pytest "$$TARGET" --deployment build-artifacts/zephyr/fprime-zephyr-deployment

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

4 participants