Skip to content

[DPE-8865] Extract error messages#1320

Merged
marceloneppel merged 9 commits into16/edgefrom
dpe-8865-extract-error-messages
Dec 16, 2025
Merged

[DPE-8865] Extract error messages#1320
marceloneppel merged 9 commits into16/edgefrom
dpe-8865-extract-error-messages

Conversation

@marceloneppel
Copy link
Member

@marceloneppel marceloneppel commented Nov 28, 2025

Issue

Sometimes the error message from pgBackRest is sent to stdout instead of stderr (and we don't log them, as seen in #1280).

Solution

Check both stdout and stderr for errors and warnings. Debugging with the Field Engineering team before the sprint in Gothenburg, I figured out that some errors are reported as warnings (one case is when the replica cannot communicate with the primary through the pgBackRest TLS server because its IP is not in the SANs list, which was already fixed by #1162).

Fixes #1280

Checklist

  • I have added or updated any relevant documentation.
  • I have cleaned any remaining cloud resources from my accounts.

Sometimes the error message is sent to the stdout instead of the stderr, so it's important to check both

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
@github-actions github-actions bot added the Libraries: OK The charm libs used are OK and in-sync label Nov 28, 2025
@codecov
Copy link

codecov bot commented Nov 28, 2025

Codecov Report

❌ Patch coverage is 87.87879% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.60%. Comparing base (fbfd2e8) to head (a9f6fdc).
⚠️ Report is 5 commits behind head on 16/edge.

Files with missing lines Patch % Lines
src/backups.py 87.09% 4 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           16/edge    #1320      +/-   ##
===========================================
+ Coverage    70.46%   70.60%   +0.13%     
===========================================
  Files           16       16              
  Lines         4249     4276      +27     
  Branches       684      689       +5     
===========================================
+ Hits          2994     3019      +25     
- Misses        1045     1047       +2     
  Partials       210      210              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@marceloneppel marceloneppel added the bug Something isn't working as expected label Nov 28, 2025
@marceloneppel marceloneppel marked this pull request as ready for review December 4, 2025 12:47
taurus-forever
taurus-forever previously approved these changes Dec 9, 2025
@taurus-forever
Copy link
Contributor

LGTM, it looks like we are missing here # Test when the failure in the stanza check is due to an archive timeout. from 14 branch: https://github.com/canonical/postgresql-operator/pull/1328/files#diff-08e542b8e478d7c63342b8b5958a3f2987418c1cde80604661aba4ffd2100fceR893-R900

Concatenate multiple ERROR/WARN lines with "; " instead of " " for
improved readability when debugging backup issues.
Updated unit test to match new separator format.
Cherry-picked from #1328 (commit 1f1b601).

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
@marceloneppel
Copy link
Member Author

LGTM, it looks like we are missing here # Test when the failure in the stanza check is due to an archive timeout. from 14 branch: https://github.com/canonical/postgresql-operator/pull/1328/files#diff-08e542b8e478d7c63342b8b5958a3f2987418c1cde80604661aba4ffd2100fceR893-R900

Yes, we're missing. I initially created this PR to solve the issue it solves, then I ported it's code to the PG 14 branch's PR. I created #1346 to port it.

@github-actions github-actions bot added Libraries: Out of sync The charm libs used are out-of-sync and removed Libraries: OK The charm libs used are OK and in-sync labels Dec 12, 2025
@marceloneppel marceloneppel force-pushed the dpe-8865-extract-error-messages branch from f6ff193 to 1592518 Compare December 12, 2025 13:59
taurus-forever
taurus-forever previously approved these changes Dec 15, 2025
Copy link
Contributor

@taurus-forever taurus-forever left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent! Thank you!

onkolahmet
onkolahmet previously approved these changes Dec 15, 2025
Copy link
Contributor

@onkolahmet onkolahmet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Marcelo!

Copy link
Contributor

@carlcsaposs-canonical carlcsaposs-canonical left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if https://pgbackrest.org/configuration.html#section-log is set, would just using stderr (or just using stdout) be sufficient?

it appears that it might be possible to configure stderr to output warning messages (--log-level-stderr warn) and that including stderr + stdout might lead to log duplication

(--log-level-console refers to stdout—ref: pgbackrest/pgbackrest#569 (comment))

by default, it appears that --log-level-console is warn and --log-level-stderr is off

Port archive timeout error handling (error code 82) from PR #1328 to
allow users to fix network issues and retry with `juju resolve`.
When archive operations timeout, the charm enters error state instead
of blocked state, enabling recovery via juju resolve.
Fixes #1346

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
@marceloneppel
Copy link
Member Author

if https://pgbackrest.org/configuration.html#section-log is set, would just using stderr (or just using stdout) be sufficient?

it appears that it might be possible to configure stderr to output warning messages (--log-level-stderr warn) and that including stderr + stdout might lead to log duplication

(--log-level-console refers to stdout—ref: pgbackrest/pgbackrest#569 (comment))

by default, it appears that --log-level-console is warn and --log-level-stderr is off

I think you're right, @carlcsaposs-canonical! I'll test updating those config options.

marceloneppel added a commit that referenced this pull request Dec 15, 2025
Addresses code review feedback from PR #1320 (review #3577653265) about
potential log duplication when checking both stdout and stderr for errors.

Changes:
- Add PGBACKREST_LOG_LEVEL_STDERR constant (--log-level-stderr=warn) to
  ensure all pgBackRest commands output errors/warnings to stderr
- Update all 9 pgBackRest command invocations to include the new flag
- Simplify _extract_error_message() to only check stderr (removes stdout
  parameter entirely)
- Update all 8 call sites and 7 unit tests to use new single-parameter API
- Remove redundant test for stdout handling

This approach eliminates the risk of log duplication while maintaining
predictable error extraction. All errors and warnings now consistently
appear in stderr, making error handling more reliable.

Reference: https://pgbackrest.org/configuration.html#section-log
marceloneppel added a commit that referenced this pull request Dec 15, 2025
Addresses code review feedback from PR #1320 (review #3577653265) about
potential log duplication when checking both stdout and stderr for errors.

Changes:
- Add PGBACKREST_LOG_LEVEL_STDERR constant (--log-level-stderr=warn) to
  ensure all pgBackRest commands output errors/warnings to stderr
- Update all 9 pgBackRest command invocations to include the new flag
- Simplify _extract_error_message() to only check stderr (removes stdout
  parameter entirely)
- Update all 8 call sites and 7 unit tests to use new single-parameter API
- Remove redundant test for stdout handling

This approach eliminates the risk of log duplication while maintaining
predictable error extraction. All errors and warnings now consistently
appear in stderr, making error handling more reliable.

Reference: https://pgbackrest.org/configuration.html#section-log
Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
@marceloneppel marceloneppel force-pushed the dpe-8865-extract-error-messages branch from 9db9b1e to 13a9ac4 Compare December 15, 2025 13:46
Addresses code review feedback from PR #1320 (review #3577653265) about
potential log duplication when checking both stdout and stderr for errors.

Changes:
- Add PGBACKREST_LOG_LEVEL_STDERR constant (--log-level-stderr=warn) to
  ensure all pgBackRest commands output errors/warnings to stderr
- Update all 9 pgBackRest command invocations to include the new flag
- Simplify _extract_error_message() to only check stderr (removes stdout
  parameter entirely)
- Update all 8 call sites and 7 unit tests to use new single-parameter API
- Remove redundant test for stdout handling

This approach eliminates the risk of log duplication while maintaining
predictable error extraction. All errors and warnings now consistently
appear in stderr, making error handling more reliable.

Reference: https://pgbackrest.org/configuration.html#section-log
Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
@marceloneppel marceloneppel force-pushed the dpe-8865-extract-error-messages branch from 1c3eb04 to 58e049a Compare December 15, 2025 13:55
This reverts commit 13a9ac4.

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
This reverts commit 1c3eb04.

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
@marceloneppel marceloneppel force-pushed the dpe-8865-extract-error-messages branch from 58e049a to d3a46b6 Compare December 15, 2025 13:56
- Add --log-level-stderr=warn to test_initialise_stanza expected command to
  match implementation from commit d3a46b6
- Replace unused stdout variables with '_' in check_stanza,
  _is_primary_pgbackrest_service_running, and _on_restore_action
- Add --log-level-console=debug to backup command for detailed output
Fixes assertion error in tests/unit/test_backups.py::test_initialise_stanza

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
@marceloneppel
Copy link
Member Author

Thanks, @carlcsaposs-canonical! I updated the PR to turn on logging on stderr and use that log instead of both.

@marceloneppel marceloneppel merged commit dad1b02 into 16/edge Dec 16, 2025
248 of 253 checks passed
@marceloneppel marceloneppel deleted the dpe-8865-extract-error-messages branch December 16, 2025 22:32
a-velasco pushed a commit that referenced this pull request Jan 15, 2026
* Extract warning and error messages from pgBackRest logs

Sometimes the error message is sent to the stdout instead of the stderr, so it's important to check both

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

* Use semicolon separator for pgBackRest error messages

Concatenate multiple ERROR/WARN lines with "; " instead of " " for
improved readability when debugging backup issues.
Updated unit test to match new separator format.
Cherry-picked from #1328 (commit 1f1b601).

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

* [DPE-9070] Handle pgBackRest archive timeout in check_stanza (#1350)

Port archive timeout error handling (error code 82) from PR #1328 to
allow users to fix network issues and retry with `juju resolve`.
When archive operations timeout, the charm enters error state instead
of blocked state, enabling recovery via juju resolve.
Fixes #1346

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

* Standardize pgBackRest logging to use stderr consistently

Addresses code review feedback from PR #1320 (review #3577653265) about
potential log duplication when checking both stdout and stderr for errors.

Changes:
- Add PGBACKREST_LOG_LEVEL_STDERR constant (--log-level-stderr=warn) to
  ensure all pgBackRest commands output errors/warnings to stderr
- Update all 9 pgBackRest command invocations to include the new flag
- Simplify _extract_error_message() to only check stderr (removes stdout
  parameter entirely)
- Update all 8 call sites and 7 unit tests to use new single-parameter API
- Remove redundant test for stdout handling

This approach eliminates the risk of log duplication while maintaining
predictable error extraction. All errors and warnings now consistently
appear in stderr, making error handling more reliable.

Reference: https://pgbackrest.org/configuration.html#section-log
Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

* Revert "Standardize pgBackRest logging to use stderr consistently"

This reverts commit 13a9ac4.

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

* Reapply "Standardize pgBackRest logging to use stderr consistently"

This reverts commit 1c3eb04.

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

* Fix test_initialise_stanza and cleanup unused variables

- Add --log-level-stderr=warn to test_initialise_stanza expected command to
  match implementation from commit d3a46b6
- Replace unused stdout variables with '_' in check_stanza,
  _is_primary_pgbackrest_service_running, and _on_restore_action
- Add --log-level-console=debug to backup command for detailed output
Fixes assertion error in tests/unit/test_backups.py::test_initialise_stanza

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

---------

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Signed-off-by: andreia <andreia.velasco@canonical.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working as expected Libraries: Out of sync The charm libs used are out-of-sync

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants