Conversation
Review Summary by QodoImplement health check plugin architecture with HTTP and KubeVirt plugins
WalkthroughsDescription• Implement health check plugin architecture replacing legacy classes • Add HTTP and KubeVirt VM health check plugins with comprehensive features • Create factory pattern for dynamic plugin discovery and instantiation • Integrate new plugin system into main run loop with proper error handling Diagramflowchart LR
A["Legacy HealthChecker<br/>VirtChecker Classes"] -->|"Replaced by"| B["AbstractHealthCheckPlugin<br/>Base Class"]
B -->|"Discovered by"| C["HealthCheckFactory"]
C -->|"Creates"| D["HttpHealthCheckPlugin"]
C -->|"creates"| E["VirtHealthCheckPlugin"]
C -->|"creates"| F["SimpleHealthCheckPlugin"]
D -->|"integrated in"| G["run_kraken.py"]
E -->|"integrated in"| G
G -->|"collects telemetry"| H["Telemetry Queues"]
File Changes1. krkn/health_checks/__init__.py
|
Code Review by Qodo
1. Health failures return code 2
|
| if health_checker and health_checker.get_return_value() != 0: | ||
| logging.error("Health check failed for the applications, Please check; exiting") | ||
| return health_checker.ret_value | ||
| return health_checker.get_return_value() | ||
|
|
||
| if kubevirt_checker.ret_value != 0: | ||
| if kubevirt_checker and kubevirt_checker.get_return_value() != 0: | ||
| logging.error("Kubevirt check still had failed VMIs at end of run, Please check; exiting") | ||
| return kubevirt_checker.ret_value | ||
| return kubevirt_checker.get_return_value() |
There was a problem hiding this comment.
1. Health failures return code 2 📘 Rule violation ⛯ Reliability
Health check failures set/propagate exit code 2, but documented semantics require health check failures to use exit code 3+. This can misclassify health failures as critical-alert exit code 2 for automation/CI consumers.
Agent Prompt
## Issue description
Health check failures currently set/propagate exit code `2`, but compliance requires health check failures to use exit codes `3+`.
## Issue Context
- `HttpHealthCheckPlugin` and `VirtHealthCheckPlugin` set `self.ret_value = 2` on health failures.
- `run_kraken.py` returns the plugin return value directly, so the process may exit with `2` for health check failures.
## Fix Focus Areas
- krkn/health_checks/http_health_check_plugin.py[171-176]
- krkn/health_checks/virt_health_check_plugin.py[426-428]
- run_kraken.py[616-622]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| # Collect health check telemetry | ||
| if health_check_worker: | ||
| health_check_worker.join() | ||
| try: | ||
| chaos_telemetry.health_checks = health_check_telemetry_queue.get_nowait() | ||
| except queue.Empty: | ||
| chaos_telemetry.health_checks = None | ||
| else: |
There was a problem hiding this comment.
2. Health threads can deadlock 🐞 Bug ⛯ Reliability
Health check workers terminate only when current_iterations reaches iterations, but the main loop can stop early (STOP/alerts) or use iterations=inf in daemon mode. run_kraken then join()s the worker(s), which can block forever and hang the whole run.
Agent Prompt
### Issue description
Health check threads (HTTP + virt) can run indefinitely and block `run_kraken.py` during shutdown because their termination condition depends solely on `current_iterations >= iterations`. This breaks on early STOP/abort paths and is guaranteed to hang in daemon mode (`iterations = inf`).
### Issue Context
- Main loop can exit early (STOP/alert paths).
- Daemon mode sets iterations to infinity.
- Health check workers are joined at the end, requiring them to terminate.
### Fix Focus Areas
- run_kraken.py[274-285]
- run_kraken.py[378-396]
- run_kraken.py[449-457]
- krkn/health_checks/http_health_check_plugin.py[132-208]
- krkn/health_checks/virt_health_check_plugin.py[299-375]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
Type of change
Description
Revamping health checks to be a plugin that allows for others in the future
Related Tickets & Documents
If no related issue, please create one and start the converasation on wants of
Documentation
If checked, a documentation PR must be created and merged in the website repository.
Related Documentation PR (if applicable)
<-- Add the link to the corresponding documentation PR in the website repository -->
Checklist before requesting a review
[ ] Ensure the changes and proposed solution have been discussed in the relevant issue and have received acknowledgment from the community or maintainers. See contributing guidelines
See testing your changes and run on any Kubernetes or OpenShift cluster to validate your changes
REQUIRED:
Description of combination of tests performed and output of run
OR