Skip to content

Conversation

@marceloneppel
Copy link
Member

Issue

Solution

Previously, the update_status hook only checked if the PostgreSQL pebble service was active. This could miss situations where the service appeared running but the health check was failing. This change enhances the restart logic to also monitor Pebble health check status and collect diagnostic information when issues are detected.

Changes:

  • Monitor Pebble health check status in addition to service status
  • Trigger restart when health check is DOWN even if service is ACTIVE
  • Add _get_postgresql_startup_logs() to collect diagnostic information from Pebble logs and PostgreSQL logs before and after restart attempts
  • Add comprehensive unit tests for health check monitoring and error handling

This improves observability and helps diagnose PostgreSQL restart issues by capturing relevant logs at the time of failure.

Checklist

  • I have added or updated any relevant documentation.
  • I have cleaned any remaining cloud resources from my accounts.

… logs

Previously, the update_status hook only checked if the PostgreSQL pebble
service was active. This could miss situations where the service appeared
running but the health check was failing. This change enhances the restart
logic to also monitor Pebble health check status and collect diagnostic
information when issues are detected.

Changes:
- Monitor Pebble health check status in addition to service status
- Trigger restart when health check is DOWN even if service is ACTIVE
- Add _get_postgresql_startup_logs() to collect diagnostic information
  from Pebble logs and PostgreSQL logs before and after restart attempts
- Add comprehensive unit tests for health check monitoring and error handling

This improves observability and helps diagnose PostgreSQL restart issues
by capturing relevant logs at the time of failure.

Signed-off-by: Marcelo Henrique Neppel <[email protected]>
@github-actions github-actions bot added the Libraries: Out of sync The charm libs used are out-of-sync label Dec 19, 2025
@codecov
Copy link

codecov bot commented Dec 19, 2025

Codecov Report

❌ Patch coverage is 67.85714% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.30%. Comparing base (1ebe727) to head (a19a02a).

Files with missing lines Patch % Lines
src/charm.py 67.85% 8 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1192      +/-   ##
==========================================
- Coverage   73.34%   73.30%   -0.04%     
==========================================
  Files          15       15              
  Lines        3916     3944      +28     
  Branches      574      576       +2     
==========================================
+ Hits         2872     2891      +19     
- Misses        830      838       +8     
- Partials      214      215       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…ption-a-health-check

Signed-off-by: Marcelo Henrique Neppel <[email protected]>
@taurus-forever
Copy link
Contributor

Nice PR and interesting findings right out of the box: ERROR cannot start service while killing

@marceloneppel
Copy link
Member Author

Nice PR and interesting findings right out of the box: ERROR cannot start service while killing

Hopefuly, this PR is the solution for the issue that we saw when upgrading PG to 14.20. I believe the new PG is working a little bit different and Pebble from Juju 2.9 is not correctly detecting that Patroni is stopped (it seems that Pebble is seeing PG still running and considering the Patroni service - PG process parent - still active because of that).

Regarding the finding you commented, I believe that error may be a side effect of the fix added by this PR. It will need more investigation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Libraries: Out of sync The charm libs used are out-of-sync

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants