[fix] SSH session leak and silent failure on config push error#1307
[fix] SSH session leak and silent failure on config push error#1307mn-ram wants to merge 6 commits intoopenwisp:masterfrom
Conversation
When connector_instance.update_config() raised an exception (e.g. SSH channel closed mid-transfer, device reboot, network partition), the old code swallowed it silently and skipped disconnect(), leaving the SSH session open and the config status permanently stuck in 'modified' with no visible error to operators. Fix: use try/finally in AbstractDeviceConnection.update_config() so disconnect() is always called, and catch the propagated exception in the update_config Celery task to call set_status_error() on the device config, making failures visible in the admin dashboard.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📜 Recent review details🧰 Additional context used📓 Path-based instructions (4)**/*.{js,ts,tsx,jsx,py,java,go,cs,rb,php,c,cpp,h,hpp}📄 CodeRabbit inference engine (Custom checks)
Files:
**/*.{js,ts,tsx,jsx,py,java,go,cs,rb,php,c,cpp,h,hpp,sql}📄 CodeRabbit inference engine (Custom checks)
Files:
**/*.{js,ts,tsx,jsx,py,java,go,cs,rb,php,c,cpp,h,hpp,sh,bash,sql}📄 CodeRabbit inference engine (Custom checks)
Files:
**/*.{py,html}📄 CodeRabbit inference engine (Custom checks)
Files:
🧠 Learnings (6)📓 Common learnings📚 Learning: 2026-03-17T09:20:10.456ZApplied to files:
📚 Learning: 2026-01-12T22:27:48.342ZApplied to files:
📚 Learning: 2026-01-15T15:05:49.557ZApplied to files:
📚 Learning: 2026-02-17T19:13:10.088ZApplied to files:
📚 Learning: 2026-01-15T15:07:17.354ZApplied to files:
🔇 Additional comments (3)
📝 WalkthroughWalkthroughAbstractDeviceConnection.update_config() was modified to always call disconnect() by replacing the prior try/except/else with try/finally; module-level logging in that method was removed. The Celery task update_config(device_id) now wraps device_conn.update_config() in a broad try/except: it logs the exception and, if the device has a config, sets its status to "error" via device.config.set_status_error(reason=str(e)). Tests were added to verify disconnect() is invoked on failure and the task marks config status as "error" with the exception recorded. Sequence Diagram(s)sequenceDiagram
participant Task as Celery Task
participant DC as DeviceConnection
participant Connector as Connector Instance
participant Config as Device.config (DB)
Task->>DC: instantiate / fetch connection
Task->>DC: call update_config()
DC->>DC: connect()
DC->>Connector: connector.update_config()
alt connector raises exception
Connector-->>DC: raises
else connector succeeds
Connector-->>DC: success
end
DC->>DC: disconnect() <-- always called (finally)
alt exception propagated to Task
DC-->>Task: raises
Task->>Task: logger.exception(...)
Task->>Config: set_status_error(reason=exception message)
else no exception
DC-->>Task: returns success
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@openwisp_controller/connection/tasks.py`:
- Around line 59-66: Catch and capture the exception object when calling
device_conn.update_config() and pass a concise error reason into
device.config.set_status_error(reason=...) so the failure is persisted; update
the except block that currently calls logger.exception(...) and
device.config.set_status_error() to store the exception message (or formatted
traceback snippet) as the reason, ensuring device and device.config exist before
calling set_status_error.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 139362f1-1908-472a-afb9-6546e2daa837
📒 Files selected for processing (3)
openwisp_controller/connection/base/models.pyopenwisp_controller/connection/tasks.pyopenwisp_controller/connection/tests/test_models.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: Python==3.12 | django~=4.2.0
- GitHub Check: Python==3.12 | django~=5.1.0
- GitHub Check: Python==3.13 | django~=5.1.0
- GitHub Check: Python==3.13 | django~=5.2.0
- GitHub Check: Python==3.12 | django~=5.2.0
- GitHub Check: Python==3.10 | django~=5.2.0
- GitHub Check: Python==3.10 | django~=4.2.0
- GitHub Check: Python==3.11 | django~=4.2.0
- GitHub Check: Python==3.11 | django~=5.1.0
- GitHub Check: Python==3.11 | django~=5.2.0
- GitHub Check: Python==3.10 | django~=5.1.0
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{js,ts,tsx,jsx,py,java,go,cs,rb,php,c,cpp,h,hpp}
📄 CodeRabbit inference engine (Custom checks)
**/*.{js,ts,tsx,jsx,py,java,go,cs,rb,php,c,cpp,h,hpp}: Flag potential security vulnerabilities in code
Avoid unnecessary comments or docstrings for code that is already clear
Code formatting is compact and readable. Do not add excessive blank lines, especially inside function or method bodies
Flag unused or redundant code
Ensure variables, functions, classes, and files have descriptive and consistent names
New code must handle errors properly: log errors that cannot be resolved by the user with error level, log unusual conditions with warning level, log important background actions with info level, and provide user-facing messages for errors that the user can solve autonomously
Files:
openwisp_controller/connection/tasks.pyopenwisp_controller/connection/base/models.pyopenwisp_controller/connection/tests/test_models.py
**/*.{js,ts,tsx,jsx,py,java,go,cs,rb,php,c,cpp,h,hpp,sql}
📄 CodeRabbit inference engine (Custom checks)
Flag obvious performance regressions, such as heavy loops, repeated I/O, or unoptimized queries
Files:
openwisp_controller/connection/tasks.pyopenwisp_controller/connection/base/models.pyopenwisp_controller/connection/tests/test_models.py
**/*.{js,ts,tsx,jsx,py,java,go,cs,rb,php,c,cpp,h,hpp,sh,bash,sql}
📄 CodeRabbit inference engine (Custom checks)
Cryptic or non-obvious code (regex, complex bash commands, or hard-to-read code) must include a concise comment explaining why it is needed and why the complexity is acceptable
Files:
openwisp_controller/connection/tasks.pyopenwisp_controller/connection/base/models.pyopenwisp_controller/connection/tests/test_models.py
**/*.{py,html}
📄 CodeRabbit inference engine (Custom checks)
For Django pull requests, ensure all user-facing strings are marked as translatable using the Django i18n framework
Files:
openwisp_controller/connection/tasks.pyopenwisp_controller/connection/base/models.pyopenwisp_controller/connection/tests/test_models.py
🧠 Learnings (6)
📓 Common learnings
Learnt from: pandafy
Repo: openwisp/openwisp-controller PR: 1292
File: openwisp_controller/connection/tasks.py:27-31
Timestamp: 2026-03-17T09:20:10.456Z
Learning: In `openwisp_controller/connection/tasks.py`, the `update_config` Celery task only accepts one argument `device_id`, which is always passed as a string (via `str(device.pk)`) from the call site in `openwisp_controller/connection/apps.py`. Do not flag `str(device_id) in task["args"]` as unreliable due to typed args — the args are always strings.
📚 Learning: 2026-03-17T09:20:10.456Z
Learnt from: pandafy
Repo: openwisp/openwisp-controller PR: 1292
File: openwisp_controller/connection/tasks.py:27-31
Timestamp: 2026-03-17T09:20:10.456Z
Learning: In `openwisp_controller/connection/tasks.py`, the `update_config` Celery task only accepts one argument `device_id`, which is always passed as a string (via `str(device.pk)`) from the call site in `openwisp_controller/connection/apps.py`. Do not flag `str(device_id) in task["args"]` as unreliable due to typed args — the args are always strings.
Applied to files:
openwisp_controller/connection/tasks.pyopenwisp_controller/connection/tests/test_models.py
📚 Learning: 2026-01-15T15:05:49.557Z
Learnt from: DragnEmperor
Repo: openwisp/openwisp-controller PR: 1175
File: openwisp_controller/config/management/commands/clear_last_ip.py:38-42
Timestamp: 2026-01-15T15:05:49.557Z
Learning: In Django projects, when using select_related() to traverse relations (for example, select_related("organization__config_settings")), the traversed relation must not be deferred. If you also use .only() in the same query, include the relation name or FK field (e.g., "organization" or "organization_id") in the .only() list to avoid the error "Field X cannot be both deferred and traversed using select_related at the same time." Apply this guideline to Django code in openwisp_controller/config/management/commands/clear_last_ip.py and similar modules by ensuring any select_related with an accompanying only() includes the related field names to prevent deferred/traversed conflicts.
Applied to files:
openwisp_controller/connection/tasks.pyopenwisp_controller/connection/base/models.pyopenwisp_controller/connection/tests/test_models.py
📚 Learning: 2026-02-17T19:13:10.088Z
Learnt from: nemesifier
Repo: openwisp/openwisp-controller PR: 1175
File: openwisp_controller/config/whois/commands.py:0-0
Timestamp: 2026-02-17T19:13:10.088Z
Learning: In reviews for the openwisp/openwisp-controller repository, do not propose changes based on Ruff warnings. The project does not use Ruff as its linter; ignore Ruff-related suggestions and follow the repository’s established linting and configuration rules. This guidance applies to all Python files under the openwisp_controller directory.
Applied to files:
openwisp_controller/connection/tasks.pyopenwisp_controller/connection/base/models.pyopenwisp_controller/connection/tests/test_models.py
📚 Learning: 2026-01-15T15:07:17.354Z
Learnt from: DragnEmperor
Repo: openwisp/openwisp-controller PR: 1175
File: openwisp_controller/geo/estimated_location/tests/tests.py:172-175
Timestamp: 2026-01-15T15:07:17.354Z
Learning: In this repository, flake8 enforces E501 (line too long) via setup.cfg (max-line-length = 88) while ruff ignores E501 via ruff.toml. Therefore, use '# noqa: E501' on lines that intentionally exceed 88 characters to satisfy flake8 without affecting ruff checks. This applies to Python files across the project (any .py) and is relevant for tests as well. Use sparingly and only where breaking lines is not feasible without hurting readability or functionality.
Applied to files:
openwisp_controller/connection/tasks.pyopenwisp_controller/connection/base/models.pyopenwisp_controller/connection/tests/test_models.py
📚 Learning: 2026-01-12T22:27:48.342Z
Learnt from: nemesifier
Repo: openwisp/openwisp-controller PR: 1175
File: tests/openwisp2/sample_config/migrations/0008_whoisinfo_organizationconfigsettings_whois_enabled.py:18-67
Timestamp: 2026-01-12T22:27:48.342Z
Learning: In tests/openwisp2/sample_config/models.py and corresponding test migrations, the WHOISInfo model intentionally includes an additional "details" field not present in the base AbstractWHOISInfo model. This is a testing pattern to verify that swappable models (CONFIG_WHOISINFO_MODEL) can be extended with custom fields without errors.
Applied to files:
openwisp_controller/connection/tests/test_models.py
🔇 Additional comments (4)
openwisp_controller/connection/base/models.py (1)
362-368: LGTM! Correct fix for the SSH session leak.Using
try/finallyensuresdisconnect()is always called whenconnector_instance.update_config()exits (success or exception), preventing file descriptor leaks. The exception now propagates to the caller (tasks.py), which handles status updates appropriately.openwisp_controller/connection/tests/test_models.py (3)
902-915: LGTM! Good unit test for disconnect guarantee.The test correctly verifies that
disconnect()is called even whenconnector_instance.update_config()raises an exception. The mock setup is clean and the assertion validates the fix for the SSH session leak.
986-986: LGTM! Updated assertions correctly reflect the new error-handling behavior.The status assertions now expect
"error"instead of"modified"when:
- The openwisp-config version check fails (line 986)
- The restart command fails (lines 1037-1038)
This aligns with the fix that exceptions now propagate and trigger
set_status_error()in the task.Also applies to: 1037-1038
1081-1099: LGTM! Integration test verifies end-to-end error status propagation.The test confirms the full flow: when
exec_commandraises during config push, theupdate_configtask catches the exception and sets config status to"error". UsingTransactionTestCaseis appropriate here since it tests Celery task behavior with database commits.
When update_config() raises, the exception message is now stored as error_reason on the Config object so the failure cause is visible in the admin dashboard.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@openwisp_controller/connection/tests/test_models.py`:
- Around line 1128-1144: The test
test_update_config_task_sets_status_error_on_failure currently asserts only that
conf.status becomes "error" but doesn't assert the persisted failure message;
update the test to also assert that conf.error_reason contains the SSH exception
text (e.g., "SSH channel closed") after the mocked exec_command raises—locate
the test function test_update_config_task_sets_status_error_on_failure and add
an assertion on conf.error_reason (after conf.refresh_from_db()) to lock in the
new behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: f207dfad-69fa-4446-9bea-10e95713b708
📒 Files selected for processing (2)
openwisp_controller/connection/tasks.pyopenwisp_controller/connection/tests/test_models.py
📜 Review details
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{js,ts,tsx,jsx,py,java,go,cs,rb,php,c,cpp,h,hpp}
📄 CodeRabbit inference engine (Custom checks)
**/*.{js,ts,tsx,jsx,py,java,go,cs,rb,php,c,cpp,h,hpp}: Flag potential security vulnerabilities in code
Avoid unnecessary comments or docstrings for code that is already clear
Code formatting is compact and readable. Do not add excessive blank lines, especially inside function or method bodies
Flag unused or redundant code
Ensure variables, functions, classes, and files have descriptive and consistent names
New code must handle errors properly: log errors that cannot be resolved by the user with error level, log unusual conditions with warning level, log important background actions with info level, and provide user-facing messages for errors that the user can solve autonomously
Files:
openwisp_controller/connection/tasks.pyopenwisp_controller/connection/tests/test_models.py
**/*.{js,ts,tsx,jsx,py,java,go,cs,rb,php,c,cpp,h,hpp,sql}
📄 CodeRabbit inference engine (Custom checks)
Flag obvious performance regressions, such as heavy loops, repeated I/O, or unoptimized queries
Files:
openwisp_controller/connection/tasks.pyopenwisp_controller/connection/tests/test_models.py
**/*.{js,ts,tsx,jsx,py,java,go,cs,rb,php,c,cpp,h,hpp,sh,bash,sql}
📄 CodeRabbit inference engine (Custom checks)
Cryptic or non-obvious code (regex, complex bash commands, or hard-to-read code) must include a concise comment explaining why it is needed and why the complexity is acceptable
Files:
openwisp_controller/connection/tasks.pyopenwisp_controller/connection/tests/test_models.py
**/*.{py,html}
📄 CodeRabbit inference engine (Custom checks)
For Django pull requests, ensure all user-facing strings are marked as translatable using the Django i18n framework
Files:
openwisp_controller/connection/tasks.pyopenwisp_controller/connection/tests/test_models.py
🧠 Learnings (6)
📓 Common learnings
Learnt from: pandafy
Repo: openwisp/openwisp-controller PR: 1292
File: openwisp_controller/connection/tasks.py:27-31
Timestamp: 2026-03-17T09:20:10.456Z
Learning: In `openwisp_controller/connection/tasks.py`, the `update_config` Celery task only accepts one argument `device_id`, which is always passed as a string (via `str(device.pk)`) from the call site in `openwisp_controller/connection/apps.py`. Do not flag `str(device_id) in task["args"]` as unreliable due to typed args — the args are always strings.
📚 Learning: 2026-03-17T09:20:10.456Z
Learnt from: pandafy
Repo: openwisp/openwisp-controller PR: 1292
File: openwisp_controller/connection/tasks.py:27-31
Timestamp: 2026-03-17T09:20:10.456Z
Learning: In `openwisp_controller/connection/tasks.py`, the `update_config` Celery task only accepts one argument `device_id`, which is always passed as a string (via `str(device.pk)`) from the call site in `openwisp_controller/connection/apps.py`. Do not flag `str(device_id) in task["args"]` as unreliable due to typed args — the args are always strings.
Applied to files:
openwisp_controller/connection/tasks.pyopenwisp_controller/connection/tests/test_models.py
📚 Learning: 2026-01-15T15:05:49.557Z
Learnt from: DragnEmperor
Repo: openwisp/openwisp-controller PR: 1175
File: openwisp_controller/config/management/commands/clear_last_ip.py:38-42
Timestamp: 2026-01-15T15:05:49.557Z
Learning: In Django projects, when using select_related() to traverse relations (for example, select_related("organization__config_settings")), the traversed relation must not be deferred. If you also use .only() in the same query, include the relation name or FK field (e.g., "organization" or "organization_id") in the .only() list to avoid the error "Field X cannot be both deferred and traversed using select_related at the same time." Apply this guideline to Django code in openwisp_controller/config/management/commands/clear_last_ip.py and similar modules by ensuring any select_related with an accompanying only() includes the related field names to prevent deferred/traversed conflicts.
Applied to files:
openwisp_controller/connection/tasks.pyopenwisp_controller/connection/tests/test_models.py
📚 Learning: 2026-02-17T19:13:10.088Z
Learnt from: nemesifier
Repo: openwisp/openwisp-controller PR: 1175
File: openwisp_controller/config/whois/commands.py:0-0
Timestamp: 2026-02-17T19:13:10.088Z
Learning: In reviews for the openwisp/openwisp-controller repository, do not propose changes based on Ruff warnings. The project does not use Ruff as its linter; ignore Ruff-related suggestions and follow the repository’s established linting and configuration rules. This guidance applies to all Python files under the openwisp_controller directory.
Applied to files:
openwisp_controller/connection/tasks.pyopenwisp_controller/connection/tests/test_models.py
📚 Learning: 2026-01-15T15:07:17.354Z
Learnt from: DragnEmperor
Repo: openwisp/openwisp-controller PR: 1175
File: openwisp_controller/geo/estimated_location/tests/tests.py:172-175
Timestamp: 2026-01-15T15:07:17.354Z
Learning: In this repository, flake8 enforces E501 (line too long) via setup.cfg (max-line-length = 88) while ruff ignores E501 via ruff.toml. Therefore, use '# noqa: E501' on lines that intentionally exceed 88 characters to satisfy flake8 without affecting ruff checks. This applies to Python files across the project (any .py) and is relevant for tests as well. Use sparingly and only where breaking lines is not feasible without hurting readability or functionality.
Applied to files:
openwisp_controller/connection/tasks.pyopenwisp_controller/connection/tests/test_models.py
📚 Learning: 2026-01-12T22:27:48.342Z
Learnt from: nemesifier
Repo: openwisp/openwisp-controller PR: 1175
File: tests/openwisp2/sample_config/migrations/0008_whoisinfo_organizationconfigsettings_whois_enabled.py:18-67
Timestamp: 2026-01-12T22:27:48.342Z
Learning: In tests/openwisp2/sample_config/models.py and corresponding test migrations, the WHOISInfo model intentionally includes an additional "details" field not present in the base AbstractWHOISInfo model. This is a testing pattern to verify that swappable models (CONFIG_WHOISINFO_MODEL) can be extended with custom fields without errors.
Applied to files:
openwisp_controller/connection/tests/test_models.py
🔇 Additional comments (2)
openwisp_controller/connection/tasks.py (1)
64-71: Good failure-handling boundary in the task.Catching the propagated exception here, logging with stack trace, and persisting
set_status_error(reason=str(e))aligns the behavior with the intended non-silent failure flow.openwisp_controller/connection/tests/test_models.py (1)
903-917: These test updates correctly enforce the regression fix.The new disconnect assertion and the
modified→errorexpectation changes align with the new exception propagation and status semantics.Also applies to: 987-987, 1038-1039
|
The CI is failing due to transient infrastructure issues (not related to your code). I have restarted the failed jobs automatically (/3). |
Checklist
Reference to Existing Issue
Closes #1306.
Description of Changes
AbstractDeviceConnection.update_config()had anexcept/elsepattern that caused two bugs:Bug 1 — SSH session leak:
disconnect()was in theelseblock, so it was never called when an exception occurred, leaving the SSH connection open indefinitely.Bug 2 — Silent failure: The
exceptblock swallowed the exception withlogger.exception(e), so it never propagated totasks.py. As a result,set_status_error()was never called and the config status stayedmodifiedwith no visible error in the admin dashboard.Fix: Replace
except/elsewithfinally— both bugs are fixed by this single minimal change:Screenshot
N/A — backend fix with no UI changes.