Skip to content

Conversation

@mdegat01
Copy link
Contributor

@mdegat01 mdegat01 commented Mar 25, 2025

Proposed change

Report the stage an error occurred in during a job. This allows core to separate errors which occurred during the copy_additional_locations stage from errors during the process of creating the backup.

Additionally tweak the logic of _copy_to_additional_locations so it does not lose track of the copies it successfully made. It will still stop on first error but any copies it made before that will be recorded in the backup rather then forgotten about like today.

Type of change

  • Dependency upgrade
  • Bugfix (non-breaking change which fixes an issue)
  • New feature (which adds functionality to the supervisor)
  • Breaking change (fix/feature causing existing functionality to break)
  • Code quality improvements to existing code or addition of tests

Additional information

Checklist

  • The code change is tested and works locally.
  • Local tests pass. Your PR cannot be merged unless tests pass
  • There is no commented out code in this PR.
  • I have followed the development checklist
  • The code has been formatted using Ruff (ruff format supervisor tests)
  • Tests have been added to verify that the new code works.

If API endpoints or add-on configuration are added/changed:

Summary by CodeRabbit

  • Refactor
    • Streamlined the backup file handling process for more consistent and reliable operations.
  • New Features
    • Enhanced error reporting by including additional context in job and backup processes, improving diagnostics during failure scenarios.
    • Introduced a new attribute stage in the error handling classes for better tracking of job states.
  • Tests
    • Added tests for backup and job error handling to ensure system stability during failure scenarios, including new test functions for error conditions.

@mdegat01 mdegat01 added the new-feature A new feature label Mar 25, 2025
@mdegat01 mdegat01 requested a review from agners March 25, 2025 21:20
@mdegat01 mdegat01 added missing-documentation Added to pull requests that needs a docs, but none is linked needs-client-library Pull requests needs client library changes but none is linked labels Mar 25, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 25, 2025

📝 Walkthrough

Walkthrough

The changes update the backup file copying and error reporting mechanisms. In the backup manager module, the _copy_to_additional_locations method now utilizes a new variable, all_new_locations, and directly calls shutil.copy for file operations. In the job module, a new stage attribute is added to the SupervisorJobError class, allowing it to capture additional context during error reporting. New tests have been implemented to simulate copy failures and validate job execution errors, ensuring that error details include the relevant stage information.

Changes

File(s) Change Summary
supervisor/backups/manager.py Modified _copy_to_additional_locations: replaced all_locations with all_new_locations and directly uses shutil.copy for copying backup files.
supervisor/jobs/__init__.py Added a new stage attribute to SupervisorJobError and updated its as_dict and capture_error methods to include the stage in error details.
tests/api/test_backups.py Introduced a new test method to simulate errors during backup copying, verifying that error details correctly include the stage information.
tests/api/test_jobs.py Added a new asynchronous test method to validate job execution behavior when an error occurs, ensuring it captures and reports the error stage.
tests/jobs/test_job_manager.py Modified test_notify_on_change to include a "stage" field in error data when capturing errors.
tests/backups/test_manager.py Added a new test method to validate behavior during OS errors in backup operations to multiple locations, checking the health state and recorded locations.

Sequence Diagram(s)

sequenceDiagram
    participant API as API Client
    participant BM as BackupManager
    participant Copy as shutil.copy
    participant ER as Error Handler
    API->>BM: Initiate backup process
    BM->>Copy: Copy backup file to additional locations
    Copy-->>BM: Raises OSError (simulated)
    BM->>ER: Report error with stage information
    ER-->>BM: Log error, update backup paths
    BM->>API: Return backup details (only original location)
Loading
sequenceDiagram
    participant Client as API Client
    participant Job as SupervisorJob
    participant Inner as Nested Job Logic
    participant SE as SupervisorJobError
    Client->>Job: Start job execution
    Job->>Job: Set stage "test" and call inner method
    Inner-->>Job: Raise SupervisorError("bad")
    Job->>SE: Capture error with stage "test"
    SE-->>Job: Return error details with stage
    Job->>Client: Provide job info including error and stage
Loading

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9f6d9e6 and d2cb7b9.

📒 Files selected for processing (2)
  • tests/api/test_backups.py (4 hunks)
  • tests/backups/test_manager.py (2 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
tests/backups/test_manager.py (3)
supervisor/resolution/module.py (1)
  • unhealthy (129-131)
supervisor/backups/backup.py (5)
  • Backup (77-900)
  • name (122-124)
  • location (206-208)
  • slug (112-114)
  • all_locations (211-213)
supervisor/backups/manager.py (2)
  • do_backup_full (571-609)
  • get (106-108)
⏰ Context from checks skipped due to timeout of 90000ms (4)
  • GitHub Check: Build armv7 supervisor
  • GitHub Check: Build armhf supervisor
  • GitHub Check: Build aarch64 supervisor
  • GitHub Check: Run tests Python 3.13.2
🔇 Additional comments (4)
tests/backups/test_manager.py (1)

2084-2120: Well-designed test for error handling in multiple locations backup.

This new test properly validates that when an OSError occurs during backup to multiple locations:

  1. The backup is still created in the default location
  2. The additional location is correctly excluded from backup.all_locations
  3. The system health state is properly updated according to the error type
tests/api/test_backups.py (3)

397-401: Good enhancement of error reporting structure.

Adding the stage field to error information provides better context about where in the process the error occurred, which will help with debugging and user feedback.


433-445: Enhanced error reporting with stage information.

The error structure now includes the "stage" field, providing crucial context about where in the backup process the error occurred. This aligns with the PR objective of enhancing error reporting with stage information.


681-729: Good test for error handling during backup copy stage.

This new test validates that when a copy operation to additional locations fails:

  1. The backup is still successfully created in the primary location
  2. Error reporting includes the appropriate "stage" field indicating the failure occurred during "copy_additional_locations"
  3. The job error structure is correctly formatted

This test is valuable as it directly verifies the PR's stated objective of ensuring successful copies are recorded even when errors occur later in the process.

✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai plan to trigger planning for file edits and PR creation.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@mdegat01 mdegat01 force-pushed the report-stage-in-job-error branch from efe19c5 to 7b93c4e Compare March 25, 2025 21:25
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
tests/api/test_backups.py (1)

675-676: Consider using side_effect parameter more specifically.

While the current implementation correctly simulates a failure, you might want to make the test more specific to ensure it only affects copies to the additional location.

-    with patch("supervisor.backups.manager.shutil.copy", side_effect=OSError):
+    with patch("supervisor.backups.manager.shutil.copy", side_effect=OSError("Permission denied")):

This provides a more specific error message that would appear in logs, making debugging easier.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between efe19c5 and 7b93c4e.

📒 Files selected for processing (5)
  • supervisor/backups/manager.py (4 hunks)
  • supervisor/jobs/__init__.py (2 hunks)
  • tests/api/test_backups.py (1 hunks)
  • tests/api/test_jobs.py (2 hunks)
  • tests/jobs/test_job_manager.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • tests/jobs/test_job_manager.py
  • tests/api/test_jobs.py
  • supervisor/backups/manager.py
  • supervisor/jobs/init.py
🧰 Additional context used
🧬 Code Definitions (1)
tests/api/test_backups.py (4)
tests/conftest.py (3)
  • api_client (488-511)
  • coresys (324-404)
  • backups (654-673)
supervisor/core.py (1)
  • set_state (76-92)
supervisor/backups/backup.py (3)
  • slug (112-114)
  • all_locations (211-213)
  • location (206-208)
supervisor/backups/manager.py (1)
  • get (106-108)
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Run tests Python 3.13.2
  • GitHub Check: Build armhf supervisor
  • GitHub Check: Build aarch64 supervisor
🔇 Additional comments (3)
tests/api/test_backups.py (3)

661-709: Well-structured test for backup failures during copy stage.

This test effectively validates the core objective of the PR - ensuring errors during the copy stage are correctly reported with stage information while preserving successful operations. The test appropriately:

  1. Simulates a failure during the copy process by patching shutil.copy
  2. Verifies the backup exists in the original location
  3. Confirms the error is reported with the correct stage information
  4. Checks that the backup's locations only include the successful copy

690-693: Great verification of the backup state after error.

The test effectively verifies that despite the copy failure, the original backup file still exists and is tracked in the locations dictionary. This validates that successful operations are preserved even when an error occurs during the copy stage.


702-708: Excellent verification of error stage information.

The test properly validates that the error includes the stage information (copy_additional_locations), which is a key requirement from the PR objectives. This ensures that errors can be properly diagnosed by identifying exactly where in the backup process they occurred.

@mdegat01 mdegat01 force-pushed the report-stage-in-job-error branch from 4f975e6 to 8e92e55 Compare March 26, 2025 15:41
@mdegat01 mdegat01 force-pushed the report-stage-in-job-error branch from 8e92e55 to 9f6d9e6 Compare March 26, 2025 18:28
"type": self.type_.__name__,
"message": self.message,
"stage": self.stage,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bends the job system to a specific requirement by the Backup system (so far we only have stages for backups). I wonder if it wouldn't be cleaner to just handle and return error "in-line" in Backup instead of relying on the Job system 🤔

But then, we already leaned in a lot into the Job system, and it seems to work. So 🤷

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't return it inline because core isn't listening for that. They do backups asynchronously. This is our only means of communicating status of the backup to core currently.

Also Backup wasn't the only intended consumer of stage, just the only one so far. For example we had plans to try and report status and progress better with docker actions as well, updates in particular. Just haven't gotten to it yet. Might be other places as well.

@mdegat01 mdegat01 merged commit 9222a3c into main Mar 27, 2025
22 checks passed
@mdegat01 mdegat01 deleted the report-stage-in-job-error branch March 27, 2025 14:07
@github-actions github-actions bot locked and limited conversation to collaborators Mar 29, 2025
@mdegat01 mdegat01 removed the needs-client-library Pull requests needs client library changes but none is linked label Aug 26, 2025
@mdegat01 mdegat01 removed the missing-documentation Added to pull requests that needs a docs, but none is linked label Sep 8, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants