Skip to content

[fix] delay stop until configuration is applied#248

Open
Viscous106 wants to merge 2 commits intoopenwisp:masterfrom
Viscous106:172-bugFix
Open

[fix] delay stop until configuration is applied#248
Viscous106 wants to merge 2 commits intoopenwisp:masterfrom
Viscous106:172-bugFix

Conversation

@Viscous106
Copy link
Contributor

Reference to Existing Issue

Closes #172.

Description of Changes

In some cases, the agent may be stopped while a configuration is being applied, leaving the device in an inconsistent state (configuration partially applied, or status not reported back to the OpenWISP server).

The reload_service function already had a mechanism to delay reloading while CONTROL_FILE (/tmp/openwisp/applying_conf) exists. This PR extends the same protection to stop_service by:

  1. Extracting the 30-second wait loop from reload_service into a reusable helper function wait_applying_config().
  2. Calling wait_applying_config() from both reload_service and stop_service.

This ensures the agent will not be abruptly terminated mid-configuration, regardless of what triggers the stop (system reboot, package upgrade, watchdog script, etc.).

Screenshot

image

@coderabbitai
Copy link

coderabbitai bot commented Feb 27, 2026

📝 Walkthrough

Walkthrough

Adds a new wait_applying_config() helper to the OpenWISP init script that blocks stop/reload operations while a configuration-application control file exists. The helper polls up to 30 iterations for the CONTROL_FILE to disappear and removes the marker on timeout. stop_service now calls wait_applying_config() before stopping, and reload_service calls it before triggering a start, preventing service interruptions during active configuration application.

Sequence Diagram(s)

sequenceDiagram
    participant Admin as Admin (systemctl)
    participant Init as openwisp.init
    participant Config as Config Apply Process
    participant Service as openwisp-agent

    Admin->>Init: stop/reload request
    Init->>Config: check CONTROL_FILE exists?
    alt CONTROL_FILE present
        Init->>Config: poll until CONTROL_FILE removed (up to 30 iterations)
        Config-->>Init: CONTROL_FILE removed / timeout
    end
    Init->>Service: stop (or start after reload)
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding a delay mechanism to the stop function until configuration is applied.
Description check ✅ Passed The description provides a clear reference to issue #172, explains the problem and solution, and includes a test screenshot demonstrating the fix works as intended.
Linked Issues check ✅ Passed The PR fully addresses issue #172 by extracting the wait logic into a reusable wait_applying_config() function and applying it to both stop_service and reload_service, preventing stops during configuration application.
Out of Scope Changes check ✅ Passed All changes in the PR are directly related to the objective of delaying stop operations until configuration is applied; no out-of-scope modifications are present.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
openwisp-config/files/openwisp.init (1)

85-96: ⚠️ Potential issue | 🔴 Critical

Do not clear the lock file when the wait times out.

On Line 95, unconditional rm -f "$CONTROL_FILE" can let stop/reload continue even if config is still applying after 30 seconds, recreating the inconsistency race this PR is fixing.

🐛 Proposed fix
 wait_applying_config() {
 	# avoid stopping/reloading while configuration is being applied
 	# will wait for a maximum of 30 seconds
-	for _ in $(seq 1 30); do
-		if [ -f "$CONTROL_FILE" ]; then
-			sleep 1
-		else
-			break
-		fi
-	done
-	rm -f "$CONTROL_FILE"
+	local i=0
+	while [ -f "$CONTROL_FILE" ] && [ "$i" -lt 30 ]; do
+		sleep 1
+		i=$((i + 1))
+	done
+	if [ -f "$CONTROL_FILE" ]; then
+		logger -s "$PROG_NAME stop/reload delayed: config still applying after 30s" -t openwisp -p daemon.warn
+		return 1
+	fi
+	return 0
 }
 
 stop_service() {
 	logger -s "$PROG_NAME stopping" -t openwisp -p daemon.info
-	wait_applying_config
+	wait_applying_config || return 1
 }
 
 reload_service() {
 	logger -s "$PROG_NAME received reload trigger" -t openwisp -p daemon.info
-	wait_applying_config
+	wait_applying_config || return 1
 	start
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openwisp-config/files/openwisp.init` around lines 85 - 96, The function
wait_applying_config currently unconditionally removes the lock with rm -f
"$CONTROL_FILE" which can clear the lock even when the loop timed out; remove
that unconditional deletion from wait_applying_config and instead return without
touching CONTROL_FILE so the lock remains if the wait timed out (i.e.,
stop/remove the rm -f "$CONTROL_FILE" at the end of wait_applying_config to
avoid clearing the lock on timeout).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@openwisp-config/files/openwisp.init`:
- Around line 85-96: The function wait_applying_config currently unconditionally
removes the lock with rm -f "$CONTROL_FILE" which can clear the lock even when
the loop timed out; remove that unconditional deletion from wait_applying_config
and instead return without touching CONTROL_FILE so the lock remains if the wait
timed out (i.e., stop/remove the rm -f "$CONTROL_FILE" at the end of
wait_applying_config to avoid clearing the lock on timeout).

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4549659 and ce1579a.

📒 Files selected for processing (1)
  • openwisp-config/files/openwisp.init
📜 Review details
🔇 Additional comments (1)
openwisp-config/files/openwisp.init (1)

98-107: Good reuse of synchronization logic across stop and reload.

Calling the same wait helper from both paths is the right direction to close the stop/reload race window.

coderabbitai[bot]
coderabbitai bot previously approved these changes Feb 27, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
openwisp-config/files/openwisp.init (1)

85-96: ⚠️ Potential issue | 🟠 Major

Unconditional file removal after timeout may disrupt an ongoing configuration.

If the 30-second timeout is reached while the configuration is still being applied (file still exists), rm -f "$CONTROL_FILE" will forcibly remove the lock file. This could:

  1. Cause the configuration process to lose its synchronization signal.
  2. Allow a subsequent stop/reload to proceed immediately while config is still running.

Consider only removing the file if it was naturally released (loop exited via break), or at minimum log a warning when the timeout is reached so operators can investigate slow configuration applications.

🛠️ Proposed fix with timeout detection and conditional removal
 wait_applying_config() {
 	# avoid stopping/reloading while configuration is being applied
 	# will wait for a maximum of 30 seconds
+	local timed_out=0
 	for _ in $(seq 1 30); do
 		if [ -f "$CONTROL_FILE" ]; then
 			sleep 1
 		else
 			break
 		fi
 	done
-	rm -f "$CONTROL_FILE"
+	if [ -f "$CONTROL_FILE" ]; then
+		timed_out=1
+		logger -s "$PROG_NAME waited 30s for config apply, proceeding anyway" \
+			-t openwisp -p daemon.warn
+	fi
+	rm -f "$CONTROL_FILE"
 }

If forcibly proceeding after timeout is intentional (to avoid indefinite blocking), the warning log at least provides visibility. However, consider whether removing the file is safe in the timeout case, or if the service should simply proceed without deleting it.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openwisp-config/files/openwisp.init` around lines 85 - 96, The
wait_applying_config function currently always removes CONTROL_FILE after the
30-second loop which can break synchronization if the file still exists; change
it to detect whether the loop exited via timeout or naturally by checking the
file presence after the loop (or tracking a flag), and only run rm -f
"$CONTROL_FILE" when the file was released naturally (loop broke), otherwise do
not remove the file and emit a warning/log message indicating the timeout so
operators can investigate; reference wait_applying_config and CONTROL_FILE to
locate where to add the timeout detection, conditional removal, and warning log.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@openwisp-config/files/openwisp.init`:
- Around line 85-96: The wait_applying_config function currently always removes
CONTROL_FILE after the 30-second loop which can break synchronization if the
file still exists; change it to detect whether the loop exited via timeout or
naturally by checking the file presence after the loop (or tracking a flag), and
only run rm -f "$CONTROL_FILE" when the file was released naturally (loop
broke), otherwise do not remove the file and emit a warning/log message
indicating the timeout so operators can investigate; reference
wait_applying_config and CONTROL_FILE to locate where to add the timeout
detection, conditional removal, and warning log.

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ce1579a and 037f7d0.

📒 Files selected for processing (1)
  • openwisp-config/files/openwisp.init
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build and Upload Packages
🔇 Additional comments (2)
openwisp-config/files/openwisp.init (2)

98-101: LGTM!

The integration correctly ensures the agent waits for configuration application to complete before procd stops the service. This addresses the issue #172 objective.


103-107: LGTM!

Good refactor extracting the wait logic into a reusable helper. The behavior is preserved while enabling code reuse in stop_service.

@Viscous106 Viscous106 marked this pull request as ready for review February 27, 2026 04:50
Copilot AI review requested due to automatic review settings February 27, 2026 04:50
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to prevent the OpenWISP config agent from being stopped while a configuration is being applied (which can leave devices in an inconsistent state) by reusing the existing “wait while applying” behavior previously present only for reload.

Changes:

  • Extracts the /tmp/openwisp/applying_conf wait loop into a helper (wait_applying_config).
  • Uses the helper from both reload_service() and stop_service() so stop is delayed during config application.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@openwisp-config/files/openwisp.init`:
- Around line 95-96: The timeout branch currently force-removes the lock file
via rm -f "$CONTROL_FILE", which can clear a still-valid in-progress apply and
re-enable stop/reload mid-configuration; change the logic to validate the
control marker before deleting it: read the PID or token stored in
"$CONTROL_FILE" (the CONTROL_FILE marker), check whether that PID/process is
still running (or verify the token against an active lock/flock), and only
remove the file if the owning process is gone or the lock is stale; update the
timeout handler that contains the rm -f "$CONTROL_FILE" call to perform this
check (use the CONTROL_FILE contents and a PID existence test) instead of
unconditional removal.
- Line 92: The function wait_applying_config currently ends with an unadorned
"return" which inherits the last command's non-zero status when no lock file
exists; update the function wait_applying_config to explicitly return 0 in the
normal "no lock file / success" path (replace the bare return with a "return 0")
so the shell function reports success rather than propagating the failed test
status.

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 037f7d0 and 246e0c5.

📒 Files selected for processing (1)
  • openwisp-config/files/openwisp.init
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build and Upload Packages
🔇 Additional comments (2)
openwisp-config/files/openwisp.init (2)

99-102: Good reuse of the shared guard in stop_service.

Pulling the apply-wait logic into stop_service closes the previously unprotected stop path.


104-107: Good centralization of reload protection.

Using the same helper in reload_service reduces divergence and keeps stop/reload behavior consistent.

Extends the protection already in place for reload_service to
stop_service as well. Extracted the 30-second wait loop into a
reusable wait_applying_config() helper function to avoid code
duplication.

Closes openwisp#172
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
openwisp-config/files/openwisp.init (1)

95-97: ⚠️ Potential issue | 🟠 Major

Do not clear CONTROL_FILE on timeout; abort stop/reload instead.

At Line 95-96, deleting the marker unconditionally can unblock stop/reload while apply is still active, which defeats the PR goal. Treat timeout as failure and make callers honor it.

🔧 Proposed fix
 wait_applying_config() {
 	# avoid stopping/reloading while configuration is being applied
 	# will wait for a maximum of 30 seconds
 	for _ in $(seq 1 30); do
 		if [ -f "$CONTROL_FILE" ]; then
 			sleep 1
 		else
 			return 0
 		fi
 	done
-	# timed out: remove stale marker so we can proceed
-	rm -f "$CONTROL_FILE"
+	# timed out: keep marker and abort stop/reload to avoid mid-apply interruption
+	logger -s "$PROG_NAME wait timeout: configuration still applying" -t openwisp -p daemon.warn
+	return 1
 }
 
 stop_service() {
 	logger -s "$PROG_NAME stopping" -t openwisp -p daemon.info
-	wait_applying_config
+	wait_applying_config || return 1
 }
 
 reload_service() {
 	logger -s "$PROG_NAME received reload trigger" -t openwisp -p daemon.info
-	wait_applying_config
+	wait_applying_config || return 1
 	start
 }

Also applies to: 101-102, 105-107

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openwisp-config/files/openwisp.init` around lines 95 - 97, The timeout
handler currently removes the CONTROL_FILE marker (rm -f "$CONTROL_FILE"), which
must not happen; instead change the timeout branch to leave CONTROL_FILE intact
and abort the stop/reload operation by returning or exiting with a non-zero
status (e.g., return 1 or exit 1) so callers can detect failure; make this same
change for the other identical blocks that remove CONTROL_FILE (the blocks at
the same pattern as lines 101-102 and 105-107) and ensure any caller code checks
the command's exit code to honor the timeout.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@openwisp-config/files/openwisp.init`:
- Around line 95-97: The timeout handler currently removes the CONTROL_FILE
marker (rm -f "$CONTROL_FILE"), which must not happen; instead change the
timeout branch to leave CONTROL_FILE intact and abort the stop/reload operation
by returning or exiting with a non-zero status (e.g., return 1 or exit 1) so
callers can detect failure; make this same change for the other identical blocks
that remove CONTROL_FILE (the blocks at the same pattern as lines 101-102 and
105-107) and ensure any caller code checks the command's exit code to honor the
timeout.

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 246e0c5 and c58683e.

📒 Files selected for processing (1)
  • openwisp-config/files/openwisp.init
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build and Upload Packages
🔇 Additional comments (1)
openwisp-config/files/openwisp.init (1)

85-93: Good extraction and explicit success return in wait_applying_config.

Line 92’s return 0 fixes the normal no-lock path to report success deterministically.

@Viscous106
Copy link
Contributor Author

@nemesifier ,Hi can you have a look at the pr, I have tested it locally , tell me if you need any more changes. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[bug] The agent may be stopped and started while the configuration is applied

2 participants