[fix] delay stop until configuration is applied#248
[fix] delay stop until configuration is applied#248Viscous106 wants to merge 2 commits intoopenwisp:masterfrom
Conversation
📝 WalkthroughWalkthroughAdds a new Sequence Diagram(s)sequenceDiagram
participant Admin as Admin (systemctl)
participant Init as openwisp.init
participant Config as Config Apply Process
participant Service as openwisp-agent
Admin->>Init: stop/reload request
Init->>Config: check CONTROL_FILE exists?
alt CONTROL_FILE present
Init->>Config: poll until CONTROL_FILE removed (up to 30 iterations)
Config-->>Init: CONTROL_FILE removed / timeout
end
Init->>Service: stop (or start after reload)
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
openwisp-config/files/openwisp.init (1)
85-96:⚠️ Potential issue | 🔴 CriticalDo not clear the lock file when the wait times out.
On Line 95, unconditional
rm -f "$CONTROL_FILE"can let stop/reload continue even if config is still applying after 30 seconds, recreating the inconsistency race this PR is fixing.🐛 Proposed fix
wait_applying_config() { # avoid stopping/reloading while configuration is being applied # will wait for a maximum of 30 seconds - for _ in $(seq 1 30); do - if [ -f "$CONTROL_FILE" ]; then - sleep 1 - else - break - fi - done - rm -f "$CONTROL_FILE" + local i=0 + while [ -f "$CONTROL_FILE" ] && [ "$i" -lt 30 ]; do + sleep 1 + i=$((i + 1)) + done + if [ -f "$CONTROL_FILE" ]; then + logger -s "$PROG_NAME stop/reload delayed: config still applying after 30s" -t openwisp -p daemon.warn + return 1 + fi + return 0 } stop_service() { logger -s "$PROG_NAME stopping" -t openwisp -p daemon.info - wait_applying_config + wait_applying_config || return 1 } reload_service() { logger -s "$PROG_NAME received reload trigger" -t openwisp -p daemon.info - wait_applying_config + wait_applying_config || return 1 start }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@openwisp-config/files/openwisp.init` around lines 85 - 96, The function wait_applying_config currently unconditionally removes the lock with rm -f "$CONTROL_FILE" which can clear the lock even when the loop timed out; remove that unconditional deletion from wait_applying_config and instead return without touching CONTROL_FILE so the lock remains if the wait timed out (i.e., stop/remove the rm -f "$CONTROL_FILE" at the end of wait_applying_config to avoid clearing the lock on timeout).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@openwisp-config/files/openwisp.init`:
- Around line 85-96: The function wait_applying_config currently unconditionally
removes the lock with rm -f "$CONTROL_FILE" which can clear the lock even when
the loop timed out; remove that unconditional deletion from wait_applying_config
and instead return without touching CONTROL_FILE so the lock remains if the wait
timed out (i.e., stop/remove the rm -f "$CONTROL_FILE" at the end of
wait_applying_config to avoid clearing the lock on timeout).
ℹ️ Review info
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
openwisp-config/files/openwisp.init
📜 Review details
🔇 Additional comments (1)
openwisp-config/files/openwisp.init (1)
98-107: Good reuse of synchronization logic across stop and reload.Calling the same wait helper from both paths is the right direction to close the stop/reload race window.
ce1579a to
037f7d0
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
openwisp-config/files/openwisp.init (1)
85-96:⚠️ Potential issue | 🟠 MajorUnconditional file removal after timeout may disrupt an ongoing configuration.
If the 30-second timeout is reached while the configuration is still being applied (file still exists),
rm -f "$CONTROL_FILE"will forcibly remove the lock file. This could:
- Cause the configuration process to lose its synchronization signal.
- Allow a subsequent stop/reload to proceed immediately while config is still running.
Consider only removing the file if it was naturally released (loop exited via
break), or at minimum log a warning when the timeout is reached so operators can investigate slow configuration applications.🛠️ Proposed fix with timeout detection and conditional removal
wait_applying_config() { # avoid stopping/reloading while configuration is being applied # will wait for a maximum of 30 seconds + local timed_out=0 for _ in $(seq 1 30); do if [ -f "$CONTROL_FILE" ]; then sleep 1 else break fi done - rm -f "$CONTROL_FILE" + if [ -f "$CONTROL_FILE" ]; then + timed_out=1 + logger -s "$PROG_NAME waited 30s for config apply, proceeding anyway" \ + -t openwisp -p daemon.warn + fi + rm -f "$CONTROL_FILE" }If forcibly proceeding after timeout is intentional (to avoid indefinite blocking), the warning log at least provides visibility. However, consider whether removing the file is safe in the timeout case, or if the service should simply proceed without deleting it.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@openwisp-config/files/openwisp.init` around lines 85 - 96, The wait_applying_config function currently always removes CONTROL_FILE after the 30-second loop which can break synchronization if the file still exists; change it to detect whether the loop exited via timeout or naturally by checking the file presence after the loop (or tracking a flag), and only run rm -f "$CONTROL_FILE" when the file was released naturally (loop broke), otherwise do not remove the file and emit a warning/log message indicating the timeout so operators can investigate; reference wait_applying_config and CONTROL_FILE to locate where to add the timeout detection, conditional removal, and warning log.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@openwisp-config/files/openwisp.init`:
- Around line 85-96: The wait_applying_config function currently always removes
CONTROL_FILE after the 30-second loop which can break synchronization if the
file still exists; change it to detect whether the loop exited via timeout or
naturally by checking the file presence after the loop (or tracking a flag), and
only run rm -f "$CONTROL_FILE" when the file was released naturally (loop
broke), otherwise do not remove the file and emit a warning/log message
indicating the timeout so operators can investigate; reference
wait_applying_config and CONTROL_FILE to locate where to add the timeout
detection, conditional removal, and warning log.
ℹ️ Review info
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
openwisp-config/files/openwisp.init
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build and Upload Packages
🔇 Additional comments (2)
openwisp-config/files/openwisp.init (2)
98-101: LGTM!The integration correctly ensures the agent waits for configuration application to complete before procd stops the service. This addresses the issue
#172objective.
103-107: LGTM!Good refactor extracting the wait logic into a reusable helper. The behavior is preserved while enabling code reuse in
stop_service.
There was a problem hiding this comment.
Pull request overview
This PR aims to prevent the OpenWISP config agent from being stopped while a configuration is being applied (which can leave devices in an inconsistent state) by reusing the existing “wait while applying” behavior previously present only for reload.
Changes:
- Extracts the
/tmp/openwisp/applying_confwait loop into a helper (wait_applying_config). - Uses the helper from both
reload_service()andstop_service()so stop is delayed during config application.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
037f7d0 to
246e0c5
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@openwisp-config/files/openwisp.init`:
- Around line 95-96: The timeout branch currently force-removes the lock file
via rm -f "$CONTROL_FILE", which can clear a still-valid in-progress apply and
re-enable stop/reload mid-configuration; change the logic to validate the
control marker before deleting it: read the PID or token stored in
"$CONTROL_FILE" (the CONTROL_FILE marker), check whether that PID/process is
still running (or verify the token against an active lock/flock), and only
remove the file if the owning process is gone or the lock is stale; update the
timeout handler that contains the rm -f "$CONTROL_FILE" call to perform this
check (use the CONTROL_FILE contents and a PID existence test) instead of
unconditional removal.
- Line 92: The function wait_applying_config currently ends with an unadorned
"return" which inherits the last command's non-zero status when no lock file
exists; update the function wait_applying_config to explicitly return 0 in the
normal "no lock file / success" path (replace the bare return with a "return 0")
so the shell function reports success rather than propagating the failed test
status.
ℹ️ Review info
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
openwisp-config/files/openwisp.init
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build and Upload Packages
🔇 Additional comments (2)
openwisp-config/files/openwisp.init (2)
99-102: Good reuse of the shared guard instop_service.Pulling the apply-wait logic into
stop_servicecloses the previously unprotected stop path.
104-107: Good centralization of reload protection.Using the same helper in
reload_servicereduces divergence and keeps stop/reload behavior consistent.
Extends the protection already in place for reload_service to stop_service as well. Extracted the 30-second wait loop into a reusable wait_applying_config() helper function to avoid code duplication. Closes openwisp#172
246e0c5 to
c58683e
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (1)
openwisp-config/files/openwisp.init (1)
95-97:⚠️ Potential issue | 🟠 MajorDo not clear
CONTROL_FILEon timeout; abort stop/reload instead.At Line 95-96, deleting the marker unconditionally can unblock stop/reload while apply is still active, which defeats the PR goal. Treat timeout as failure and make callers honor it.
🔧 Proposed fix
wait_applying_config() { # avoid stopping/reloading while configuration is being applied # will wait for a maximum of 30 seconds for _ in $(seq 1 30); do if [ -f "$CONTROL_FILE" ]; then sleep 1 else return 0 fi done - # timed out: remove stale marker so we can proceed - rm -f "$CONTROL_FILE" + # timed out: keep marker and abort stop/reload to avoid mid-apply interruption + logger -s "$PROG_NAME wait timeout: configuration still applying" -t openwisp -p daemon.warn + return 1 } stop_service() { logger -s "$PROG_NAME stopping" -t openwisp -p daemon.info - wait_applying_config + wait_applying_config || return 1 } reload_service() { logger -s "$PROG_NAME received reload trigger" -t openwisp -p daemon.info - wait_applying_config + wait_applying_config || return 1 start }Also applies to: 101-102, 105-107
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@openwisp-config/files/openwisp.init` around lines 95 - 97, The timeout handler currently removes the CONTROL_FILE marker (rm -f "$CONTROL_FILE"), which must not happen; instead change the timeout branch to leave CONTROL_FILE intact and abort the stop/reload operation by returning or exiting with a non-zero status (e.g., return 1 or exit 1) so callers can detect failure; make this same change for the other identical blocks that remove CONTROL_FILE (the blocks at the same pattern as lines 101-102 and 105-107) and ensure any caller code checks the command's exit code to honor the timeout.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@openwisp-config/files/openwisp.init`:
- Around line 95-97: The timeout handler currently removes the CONTROL_FILE
marker (rm -f "$CONTROL_FILE"), which must not happen; instead change the
timeout branch to leave CONTROL_FILE intact and abort the stop/reload operation
by returning or exiting with a non-zero status (e.g., return 1 or exit 1) so
callers can detect failure; make this same change for the other identical blocks
that remove CONTROL_FILE (the blocks at the same pattern as lines 101-102 and
105-107) and ensure any caller code checks the command's exit code to honor the
timeout.
ℹ️ Review info
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
openwisp-config/files/openwisp.init
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build and Upload Packages
🔇 Additional comments (1)
openwisp-config/files/openwisp.init (1)
85-93: Good extraction and explicit success return inwait_applying_config.Line 92’s
return 0fixes the normal no-lock path to report success deterministically.
|
@nemesifier ,Hi can you have a look at the pr, I have tested it locally , tell me if you need any more changes. Thanks |
Reference to Existing Issue
Closes #172.
Description of Changes
In some cases, the agent may be stopped while a configuration is being applied, leaving the device in an inconsistent state (configuration partially applied, or status not reported back to the OpenWISP server).
The reload_service function already had a mechanism to delay reloading while
CONTROL_FILE(/tmp/openwisp/applying_conf) exists. This PR extends the same protection to stop_service by:This ensures the agent will not be abruptly terminated mid-configuration, regardless of what triggers the stop (system reboot, package upgrade, watchdog script, etc.).
Screenshot