Fix and extend cloud backup support #5464

mdegat01 · 2024-12-04T20:35:22Z

Proposed change

Follow up to #5438 , lot of changes based on subsequent discussions:

No changes to bind mounts at all, that part is entirely backed out. Core will communicate with supervisor entirely over API
Backup APIs and upload API take a list of locations. Supervisor will actually make the backup in the first one and then copy it to the others
Partial backup reload is gone, it makes no sense with core using the API for everything
Clients can add a dictionary of extra metadata to backups. Supervisor won't validate this other then checking it is a dictionary. It is included with the backup metadata so clients can keep track of info only they understand about how a backup was made.
Fixed up multi-location stuff in backups to make it more intuitive, we're consolidating backups not adding locations. It tracks all the paths although currently nothing can be done with that. Will support picking a location to restore from in future. Also added a sort so multi-location backups are consistent across reloads
Add a task to clean up backups older then 2 days in cloud backup folder as it is intended for transient use

Type of change

Dependency upgrade
Bugfix (non-breaking change which fixes an issue)
New feature (which adds functionality to the supervisor)
Breaking change (fix/feature causing existing functionality to break)
Code quality improvements to existing code or addition of tests

Additional information

This PR fixes or closes issue: fixes #
This PR is related to issue:
Link to documentation pull request: Document fields and parameters added to support cloud backup developers.home-assistant#2506
Link to cli pull request: Add enhancements added for cloud backup initiative cli#534
Link to client library pull request: Cloud backup support home-assistant-libs/python-supervisor-client#33

Checklist

The code change is tested and works locally.
Local tests pass. Your PR cannot be merged unless tests pass
There is no commented out code in this PR.
I have followed the development checklist
The code has been formatted using Ruff (ruff format supervisor tests)
Tests have been added to verify that the new code works.

If API endpoints or add-on configuration are added/changed:

Documentation added/updated for developers.home-assistant.io
CLI updated (if necessary)
Client library updated (if necessary)

Summary by CodeRabbit

Release Notes

New Features
- Enhanced backup and restore functionalities, allowing multiple backup locations.
- Added support for extra metadata during backup creation.
- Introduced a scheduled task for automatic cleanup of old backups.
- Improved Docker add-on management with refined mount handling.
Bug Fixes
- Improved error handling and logging throughout backup management processes.
- Removed outdated mount configurations for Home Assistant, simplifying setup.
Tests
- Expanded test coverage for backup uploads, multiple locations, and metadata handling.
- Streamlined tests by removing less relevant cases.

These updates enhance the flexibility and robustness of the backup management system, improving user experience and reliability.

coderabbitai · 2024-12-04T20:39:16Z

📝 Walkthrough

Walkthrough

The pull request introduces significant updates to the backup and restore functionalities within the Supervisor API. Key enhancements include the ability to manage multiple backup locations, the addition of new helper functions, and modifications to existing methods to accommodate these changes. New constants and exception classes are introduced, while existing constants and methods are refined or removed to streamline operations. Additionally, the test suite is expanded and updated to reflect these changes, ensuring comprehensive coverage of the new functionalities.

Changes

File Path	Change Summary
supervisor/api/backups.py	Added `_ensure_list` method; updated schemas and methods for multiple locations; simplified `reload` method; modified `_location_to_mount` and `_validate_cloud_backup_location` methods.
supervisor/api/const.py	Added `ATTR_ADDITIONAL_LOCATIONS` constant.
supervisor/backups/backup.py	Added `location_sort_key` method; updated constructor and properties; added `extra` property and `consolidate` method; refined error handling.
supervisor/backups/const.py	Added `COPY_ADDITONAL_LOCATIONS` enum member.
supervisor/backups/manager.py	Changed `list_backups` return type; added error handling; updated method signatures for backup consolidation and additional locations handling.
supervisor/backups/validate.py	Added `ATTR_EXTRA` variable; updated `SCHEMA_BACKUP` to include `ATTR_EXTRA`.
supervisor/const.py	Added `ATTR_EXTRA` constant.
supervisor/docker/addon.py	Updated `mounts` property for additional logic; enhanced error handling in methods.
supervisor/docker/homeassistant.py	Removed `PATH_BACKUP` and `PATH_CLOUD_BACKUP` variables; updated `timeout` property.
supervisor/exceptions.py	Added `BackupDataDiskBadMessageError` class.
supervisor/homeassistant/const.py	Removed `BACKUP_COMPLETE` constant.
supervisor/homeassistant/websocket.py	Updated `MIN_VERSION` for `WSType.BACKUP_COMPLETE`; refined error handling.
supervisor/mounts/manager.py	Removed backup mount handling in methods.
supervisor/mounts/mount.py	Removed `PATH_BACKUP` import; modified properties and methods related to mounts.
tests/api/test_backups.py	Added tests for upload/download and multiple locations; modified existing tests.
tests/api/test_mounts.py	Adjusted `user_path` assertions and error handling in tests.
tests/backups/test_manager.py	Added assertions for `all_locations`; removed WebSocket message tests.
tests/docker/test_homeassistant.py	Removed backup mount configurations in tests.
tests/resolution/fixup/test_mount_execute_remove.py	Adjusted `active_state` and expected service calls in tests.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant BackupManager
    participant Backup
    participant Storage

    User->>BackupManager: Request Backup
    BackupManager->>Backup: Create Backup
    Backup->>Storage: Store Backup Data
    Backup->>BackupManager: Return Backup Info
    BackupManager->>User: Return Backup Success

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Caution

Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (3)

tests/api/test_mounts.py (1)

Line range hint 1-799: Add test coverage for new backup features

While the test file thoroughly covers the removal of bind mounts and read-only mount functionality, there appear to be gaps in test coverage for key PR objectives:

Multi-location backup support

Extra metadata functionality

Backup location consolidation

Consider adding test cases to cover these new features to ensure they work as expected.
supervisor/api/backups.py (1)
243-243: Typographical Correction in Error Message

In the error message on line 243, consider correcting "cannot backup to there" to "cannot back up to there" for proper grammar.

Apply this diff to fix the typo:
-                    f"Mount {mount.name} is not used for backups, cannot backup to there"
+                    f"Mount {mount.name} is not used for backups, cannot back up to there"
supervisor/backups/validate.py (1)
136-136: Consider adding size limits to extra metadata dictionary.

While allowing arbitrary dictionaries aligns with the PR objectives, consider adding size limits to prevent potential memory issues.

Example implementation:
-        vol.Optional(ATTR_EXTRA, default=dict): dict,
+        vol.Optional(ATTR_EXTRA, default=dict): vol.All(
+            dict,
+            vol.Length(max=1000),  # Limit number of entries
+            vol.Schema({
+                str: vol.All(  # Ensure keys are strings
+                    vol.Any(str, int, float, bool, dict, list),
+                    # Add custom validator to limit nested structure depth
+                    # and total size if needed
+                )
+            })
+        ),

🛑 Comments failed to post (3)

supervisor/backups/backup.py (1)
245-248: 🛠️ Refactor suggestion

Equality Method May Need to Include All Relevant Attributes

The __eq__ method compares backups based solely on their _data attribute. To ensure backups are truly identical, consider including _locations in the comparison.

Update the method to include _locations:
 def __eq__(self, other: Any) -> bool:
     """Return true if backups have same metadata and locations."""
     return (
         isinstance(other, Backup)
         and self._data == other._data
+        and self._locations == other._locations
     )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
    def __eq__(self, other: Any) -> bool:
        """Return true if backups have same metadata and locations."""
        return (
            isinstance(other, Backup)
            and self._data == other._data
            and self._locations == other._locations
        )
supervisor/backups/manager.py (1)
368-369: ⚠️ Potential issue

Incorrect Dictionary Key in Backup Consolidation Check

In lines 368-369, self._backups[backup] should be self._backups[backup.slug] because self._backups uses slugs as keys. Using the Backup object directly will result in a KeyError.

Apply this diff to fix the issue:
 if (
     backup.slug in self._backups
-    and backup.all_locations != self._backups[backup].all_locations
+    and backup.all_locations != self._backups[backup.slug].all_locations
 ):
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
            backup.slug in self._backups
            and backup.all_locations != self._backups[backup.slug].all_locations
supervisor/backups/const.py (1)
26-26: ⚠️ Potential issue

Typographical Error in Enum Member Name

There's a typo in the enum member name COPY_ADDITONAL_LOCATIONS. It should be COPY_ADDITIONAL_LOCATIONS to match the correct spelling of "additional".

Apply this diff to correct the typo:
-COPY_ADDITONAL_LOCATIONS = "copy_additional_locations"
+COPY_ADDITIONAL_LOCATIONS = "copy_additional_locations"
Ensure to update all references to this enum member throughout the codebase.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
    COPY_ADDITIONAL_LOCATIONS = "copy_additional_locations"

coderabbitai

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)

supervisor/api/backups.py (1)

434-449: Consider adding error handling for temporary directory creation.

While the implementation is good, consider adding explicit error handling for temporary directory creation failures.
 if location and location != LOCATION_CLOUD_BACKUP:
     tmp_path = location.local_where
+    if not tmp_path.exists():
+        try:
+            tmp_path.mkdir(parents=True, exist_ok=True)
+        except OSError as err:
+            raise APIError(f"Failed to create temporary directory: {err}")

supervisor/backups/manager.py (1)

288-331: Consider adding retry mechanism for network-related failures.

While the error handling is robust, network-related failures during backup copying could be transient. Consider implementing a retry mechanism with exponential backoff for network operations.

Example implementation:

from functools import wraps
import time

def retry_on_network_error(retries=3, backoff_factor=2):
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            last_exception = None
            for attempt in range(retries):
                try:
                    return func(*args, **kwargs)
                except OSError as err:
                    last_exception = err
                    if attempt < retries - 1:
                        time.sleep(backoff_factor ** attempt)
                    continue
            raise last_exception
        return wrapper
    return decorator

@retry_on_network_error()
def copy_to_location(src: Path, dst: Path) -> Path:
    return Path(copy(src, dst))

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 2463543 and ee9ac4a.

📒 Files selected for processing (19)

supervisor/api/backups.py (12 hunks)
supervisor/api/const.py (1 hunks)
supervisor/backups/backup.py (10 hunks)
supervisor/backups/const.py (1 hunks)
supervisor/backups/manager.py (15 hunks)
supervisor/backups/validate.py (2 hunks)
supervisor/const.py (1 hunks)
supervisor/docker/addon.py (0 hunks)
supervisor/docker/homeassistant.py (0 hunks)
supervisor/exceptions.py (1 hunks)
supervisor/homeassistant/const.py (0 hunks)
supervisor/homeassistant/websocket.py (0 hunks)
supervisor/mounts/manager.py (0 hunks)
supervisor/mounts/mount.py (1 hunks)
tests/api/test_backups.py (3 hunks)
tests/api/test_mounts.py (5 hunks)
tests/backups/test_manager.py (5 hunks)
tests/docker/test_homeassistant.py (0 hunks)
tests/resolution/fixup/test_mount_execute_remove.py (1 hunks)

💤 Files with no reviewable changes (6)

supervisor/homeassistant/const.py
supervisor/mounts/manager.py
supervisor/docker/homeassistant.py
tests/docker/test_homeassistant.py
supervisor/homeassistant/websocket.py
supervisor/docker/addon.py

✅ Files skipped from review due to trivial changes (1)

supervisor/const.py

🚧 Files skipped from review as they are similar to previous changes (7)

supervisor/api/const.py
supervisor/exceptions.py
supervisor/backups/const.py
supervisor/backups/validate.py
tests/api/test_mounts.py
tests/resolution/fixup/test_mount_execute_remove.py
tests/backups/test_manager.py

🔇 Additional comments (16)

supervisor/api/backups.py (6)

72-76: LGTM! Helper function implementation is clean and effective.

The _ensure_list helper function provides a clean way to handle both single values and lists, ensuring backward compatibility.

100-102: Schema changes effectively support new features.

The schema updates appropriately handle:

Multiple backup locations through _ensure_list
Extra metadata through the new ATTR_EXTRA field

Also applies to: 105-105

233-247: LGTM! Location to mount conversion is well-implemented.

The method properly handles:

Cloud backup locations
Mount validation
Clear error messages

301-313: Multiple location handling is well-implemented.

The implementation correctly:

Validates all locations
Sets primary location
Stores additional locations

463-466: LGTM! Error handling improvement for bad messages.

The error handling properly:

Checks for specific error conditions
Updates system health status
Maintains proper error logging

Line range hint 253-264: Verify cloud backup location validation.

The validation now handles lists of locations, but we should verify this doesn't break existing functionality.

✅ Verification successful

Let me gather more information about how the cloud backup location is used in the codebase.

Let me check how the backup location was handled before this change and look for test cases with a different pattern.

Based on the comprehensive analysis of the code and tests, I can now provide the final response:

Cloud backup location validation change is safe and well-tested.

The validation change to handle lists of locations is part of a larger feature to support multiple backup locations. The changes are safe because:

The validation maintains backward compatibility by converting single locations to a list
The core validation logic remains unchanged - only Home Assistant can use cloud backup locations
The feature is well-tested as evidenced by test cases like test_cloud_backup_core_only and test_backup_to_multiple_locations
All existing usages of the validation function have been updated to handle the new list format

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for any direct usage of cloud backup location validation
rg -A 3 "_validate_cloud_backup_location"

Length of output: 2511

Script:

#!/bin/bash
# Check for the definition and usage of LOCATION_CLOUD_BACKUP constant
rg -A 2 "LOCATION_CLOUD_BACKUP\s*=" 

# Check how location is used in the backup-related code
ast-grep --pattern 'location: list[str | None] | str | None'

# Look for any test cases related to cloud backup location validation
rg -l "test.*cloud.*backup.*location|test.*backup.*cloud.*location"

Length of output: 296

Script:

#!/bin/bash
# Check git history for the file to see the previous implementation
git show HEAD -- supervisor/api/backups.py

# Look for backup-related test files
rg -l "test.*backup" tests/

# Check test assertions related to backup locations
rg -A 3 "def test.*backup" tests/

Length of output: 26090

supervisor/mounts/mount.py (1)

33-33: LGTM! Simplified imports align with architectural changes.

The removal of backup-related imports aligns with the PR objective of removing bind mounts and moving to API-only communication.

tests/api/test_backups.py (3)

542-570: LGTM! Comprehensive upload/download test implementation.

The test effectively:

Validates the upload process
Verifies backup integrity
Confirms download functionality

577-607: LGTM! Multiple location backup test is well-structured.

The test thoroughly validates:

Backup creation in multiple locations
File existence verification
Location tracking

612-638: LGTM! Extra metadata test provides good coverage.

The test effectively verifies:

Extra metadata storage
Retrieval functionality
Data integrity

supervisor/backups/backup.py (5)

65-67: LGTM: Clean implementation of location sorting.

The function provides a clean way to sort locations while handling None values appropriately.

198-200: LGTM: Enhanced location tracking with paths.

The change from returning a set to a dictionary improves functionality by preserving the mapping between locations and their paths.

207-214: LGTM: Proper handling of location sorting and cloud backup.

The implementation correctly handles location sorting and cloud backup exclusion logic, aligning with the PR objectives for location consolidation.

249-259: LGTM: Robust backup consolidation implementation.

The consolidation logic properly validates both slug and metadata matches before merging, with appropriate error handling.

177-180: LGTM: Client metadata implementation matches requirements.

The implementation aligns with the PR objectives, allowing clients to store arbitrary metadata as a dictionary.

✅ Verification successful

LGTM: Schema validation properly handles client metadata

The implementation is correctly validated in the schema:

ATTR_EXTRA is defined as an optional field with dictionary type
Default value is set to dict
Parent schema allows extra fields with extra=vol.ALLOW_EXTRA

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Verify ATTR_EXTRA is properly defined in schema
ast-grep --pattern 'SCHEMA_BACKUP = vol.Schema({
  $$$
  ATTR_EXTRA: vol.Any(dict, None),
  $$$
})'

Length of output: 101

Script:

#!/bin/bash
# Search for ATTR_EXTRA constant definition
rg "ATTR_EXTRA\s*=" -A 1

# Search for schema definition containing ATTR_EXTRA
rg "SCHEMA_BACKUP.*=.*Schema" -A 10

Length of output: 2098

Script:

#!/bin/bash
# Let's check the complete schema in validate.py to ensure ATTR_EXTRA is properly defined
rg "SCHEMA_BACKUP = vol\.Schema" -A 20 supervisor/backups/validate.py

Length of output: 973

Script:

#!/bin/bash
# Let's check the complete schema definition to find ATTR_EXTRA
# Using more context lines since the schema might be longer
rg "SCHEMA_BACKUP = vol\.Schema" -A 40 supervisor/backups/validate.py

Length of output: 1807

Script:

#!/bin/bash
# Let's check even more lines of the schema to find ATTR_EXTRA
# and also look for any schema extra validation settings
rg "SCHEMA_BACKUP = vol\.Schema" -A 60 supervisor/backups/validate.py

Length of output: 2338

supervisor/backups/manager.py (1)

476-478: LGTM: Consistent implementation of backup parameters.

The changes to backup methods properly implement client metadata and additional locations support, maintaining consistency across full and partial backups.

Also applies to: 490-490, 500-500, 526-528, 551-551, 564-569

coderabbitai · 2024-12-04T20:42:05Z

supervisor/backups/manager.py

+    @Job(name="backup_manager_import_backup")
+    async def import_backup(
+        self,
+        tar_file: Path,
+        location: Mount | Literal[LOCATION_CLOUD_BACKUP] | None = None,
+        additional_locations: list[Mount | Literal[LOCATION_CLOUD_BACKUP] | None]
+        | None = None,
+    ) -> Backup | None:
        """Check backup tarfile and import it."""
        backup = Backup(self.coresys, tar_file, "temp", None)

        # Read meta data
        if not await backup.load():
            return None

-        # Already exists?
-        if backup.slug in self._backups:
-            _LOGGER.warning("Backup %s already exists! overwriting", backup.slug)
-            self.remove(self.get(backup.slug))
-
-        # Move backup to backup
-        tar_origin = Path(self.sys_config.path_backup, f"{backup.slug}.tar")
+        # Move backup to destination folder
+        tar_origin = Path(self._get_base_path(location), f"{backup.slug}.tar")
        try:
            backup.tarfile.rename(tar_origin)

        except OSError as err:
-            if err.errno == errno.EBADMSG:
+            if err.errno == errno.EBADMSG and location in {LOCATION_CLOUD_BACKUP, None}:
                self.sys_resolution.unhealthy = UnhealthyReason.OSERROR_BAD_MESSAGE
            _LOGGER.error("Can't move backup file to storage: %s", err)
            return None

        # Load new backup
        backup = Backup(self.coresys, tar_origin, backup.slug, None, backup.data)
        if not await backup.load():
+            # Remove invalid backup from location it was moved to
+            backup.tarfile.unlink()
            return None
        _LOGGER.info("Successfully imported %s", backup.slug)

-        self._backups[backup.slug] = backup
+        # Already exists?
+        if (
+            backup.slug in self._backups
+            and backup.all_locations != self._backups[backup].all_locations
+        ):
+            _LOGGER.warning("Backup %s already exists! consolidating", backup.slug)
+            try:
+                self._backups[backup.slug].consolidate(backup)
+            except BackupInvalidError as err:
+                backup.tarfile.unlink()
+                raise BackupInvalidError(
+                    f"Cannot import backup {backup.slug} due to: {err!s}", _LOGGER.error
+                ) from err
+        else:
+            self._backups[backup.slug] = backup
+
+        if additional_locations:
+            await self._copy_to_additional_locations(backup, additional_locations)
+


🛠️ Refactor suggestion

Consider using atomic operations for backup import.

The backup import process involves multiple file operations. Consider making these operations atomic to prevent partial imports in case of failures.

Example implementation:

from atomicwrites import atomic_write def atomic_copy(src: Path, dst: Path) -> Path: with atomic_write(dst, overwrite=True) as f: f.write(src.read_bytes()) return dst # Use in import_backup: backup_file = atomic_copy(tar_file, tar_origin)

coderabbitai

Actionable comments posted: 6

🧹 Outside diff range and nitpick comments (6)

tests/misc/test_tasks.py (1)
215-241: Enhance test coverage for backup cleanup.

While the test verifies basic cleanup functionality, consider adding these test cases:

Verify that backups younger than the threshold are not deleted

Test edge cases where backup dates are exactly at the threshold

Add assertions to verify the backup metadata state

Here's a suggested enhancement:
 @pytest.mark.usefixtures("path_extern")
 async def test_core_backup_cleanup(
     tasks: Tasks, coresys: CoreSys, tmp_supervisor_data: Path
 ):
     """Test core backup task cleans up old backup files."""
     coresys.core.state = CoreState.RUNNING
     coresys.hardware.disk.get_disk_free_space = lambda x: 5000
 
     # Put an old and new backup in folder
     copy(get_fixture_path("backup_example.tar"), coresys.config.path_core_backup)
     await coresys.backups.reload(
         location=".cloud_backup", filename="backup_example.tar"
     )
     assert (old_backup := coresys.backups.get("7fed74c8"))
+    # Create a backup just before the threshold
+    threshold_backup = await coresys.backups.do_backup_partial(
+        name="threshold", folders=["ssl"], location=".cloud_backup"
+    )
+    # Artificially age the backup to just before threshold
+    threshold_backup.date = (utcnow() - OLD_BACKUP_THRESHOLD + timedelta(minutes=1)).isoformat()
+
     new_backup = await coresys.backups.do_backup_partial(
         name="test", folders=["ssl"], location=".cloud_backup"
     )
 
     old_tar = old_backup.tarfile
+    threshold_tar = threshold_backup.tarfile
     new_tar = new_backup.tarfile
     # pylint: disable-next=protected-access
     await tasks._core_backup_cleanup()
 
     assert coresys.backups.get(new_backup.slug)
+    assert coresys.backups.get(threshold_backup.slug)  # Should not be deleted
     assert not coresys.backups.get("7fed74c8")
     assert new_tar.exists()
+    assert threshold_tar.exists()  # Should still exist
     assert not old_tar.exists()
+    # Verify backup metadata state
+    assert len(coresys.backups.list_backups) == 2
supervisor/backups/manager.py (5)
10-10: Consider using shutil.copy2 to preserve metadata

The copy function copies file data and permission mode but not metadata like timestamps and flags. Using copy2 preserves all file metadata, which might be important for backup integrity.

Apply this diff:
-from shutil import copy
+from shutil import copy2
And update the usage of copy to copy2 throughout the code.

Line range hint 187-199: Properly validate and sanitize the extra metadata

The extra parameter allows inclusion of additional metadata with backups. Ensure that this data is properly validated and sanitized to prevent security issues like code injection or corruption of backup metadata.

225-225: Update the docstring for the reload method to reflect the return type

The reload method now returns a bool, but the docstring mentions it returns a coroutine. Update the docstring to accurately reflect the method's behavior and return type.

Apply this diff:
     async def reload(
         self,
         location: Mount
         | Literal[LOCATION_CLOUD_BACKUP]
         | type[DEFAULT]
         | None = DEFAULT,
         filename: str | None = None,
     ) -> bool:
         """Load existing backups.

-        Return a coroutine.
+        Return True if successful, False otherwise.
         """
Line range hint 410-464: Check for race conditions during backup to multiple locations

In _do_backup, backups are copied to additional locations. Ensure that concurrent backups don't interfere with each other, potentially causing race conditions or corrupt backups, especially if backups share the same slug.

492-494: Document and validate new parameters in do_backup_full

The parameters extra and additional_locations are added to do_backup_full. Ensure these parameters are documented in the method's docstring and validate them to prevent invalid input.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between ee9ac4a and a06a8f0.

📒 Files selected for processing (4)

supervisor/backups/manager.py (15 hunks)
supervisor/misc/tasks.py (4 hunks)
tests/backups/test_manager.py (5 hunks)
tests/misc/test_tasks.py (3 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

tests/backups/test_manager.py

🔇 Additional comments (6)

tests/misc/test_tasks.py (1)

5-6: LGTM! Import changes are appropriate.

The added imports from standard library modules are necessary for the new test functionality.

Also applies to: 21-21

supervisor/misc/tasks.py (2)

91-95: LGTM! Task registration is well-structured.

The cleanup task is properly registered with appropriate condition (JobCondition.HEALTHY).

46-47: Consider adjusting the timing constants.

The current timing configuration might need adjustment:

The cleanup task runs approximately every 24 hours (86200 seconds), which could lead to backups persisting longer than necessary.
The 2-day threshold for deletion might be too aggressive, potentially removing backups that are still needed.

Let's analyze the backup patterns to determine better timing values:

Also applies to: 50-51

✅ Verification successful

Let me generate additional scripts to analyze the backup patterns and timing.

Based on the code analysis, I can now provide a final response regarding the timing constants:

The timing values are appropriate for their use case.

The cleanup task timing (24 hours) and 2-day threshold are well-justified because:

The cleanup task (RUN_CORE_BACKUP_CLEANUP = 86200) targets only transient core backups stored in .cloud_backup location, as shown in supervisor/misc/tasks.py:

async def _core_backup_cleanup(self) -> None:
    """Core backup is intended for transient use, remove any old backups that got left behind."""

These backups are created automatically during core operations (like updates) and are not user-created backups, as evidenced by the creation patterns in supervisor/homeassistant/core.py and supervisor/addons/manager.py.
The 2-day retention ensures temporary backups don't accumulate while giving enough time for rollback if needed.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check backup creation patterns to determine optimal cleanup schedule
rg -A 2 "do_backup_partial|create_backup" | grep -v "test"

Length of output: 69715

Script:

#!/bin/bash
# Search for backup cleanup related code
rg -A 5 "backup.*cleanup|cleanup.*backup" --type py

# Search for backup creation patterns
rg -A 5 "create_backup|do_backup" --type py

# Search for backup configuration and settings
rg -A 5 "backup.*config|backup.*setting" --type py

Length of output: 46278

supervisor/backups/manager.py (3)

22-22: Ensure appropriate handling of the new BackupDataDiskBadMessageError

A new exception BackupDataDiskBadMessageError is introduced. Verify that this exception is used consistently and that all scenarios where errno.EBADMSG might occur are adequately handled, providing clear feedback to the user.

168-171: Validate the setting of unhealthy state upon OSError

When an OSError with errno.EBADMSG is caught, the code sets UnhealthyReason.OSERROR_BAD_MESSAGE. Ensure that this is the appropriate response and that it doesn't conflict with other health checks or lead to unintended side effects.

348-400: Verify backup import and consolidation logic for data integrity

During import_backup, backups are consolidated if one with the same slug exists. Ensure that this process doesn't lead to data loss or inconsistencies, especially when backups have differing metadata or contents.

supervisor/misc/tasks.py

supervisor/backups/manager.py

coderabbitai

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (5)

supervisor/backups/manager.py (5)
168-171: Enhance error handling for bad message errors

Good addition of specific error handling for bad message errors. However, consider logging the specific error code for better debugging.
 if err.errno == errno.EBADMSG and path in {
     self.sys_config.path_backup,
     self.sys_config.path_core_backup,
 }:
+    _LOGGER.error("Bad message error (errno: %d) for path: %s", err.errno, path)
     self.sys_resolution.unhealthy = UnhealthyReason.OSERROR_BAD_MESSAGE
187-187: Verify type hints for extra metadata

The extra parameter allows for arbitrary dictionary data, but the type hint could be more specific to prevent potential misuse.
-    extra: dict | None = None,
+    extra: dict[str, Any] | None = None,
Also applies to: 199-199

233-243: Improve error handling in backup consolidation

The consolidation logic properly handles invalid backups, but consider adding debug logging for successful consolidations to aid in troubleshooting.
     try:
         self._backups[backup.slug].consolidate(backup)
+        _LOGGER.debug(
+            "Successfully consolidated backup %s from location %s",
+            backup.slug,
+            backup.location,
+        )
     except BackupInvalidError as err:
304-347: Enhance error handling in _copy_to_additional_locations

The error handling is thorough, but consider adding retries for transient network issues when copying to remote locations.

Consider implementing a retry mechanism with exponential backoff for network-related errors:
from typing import Any
from tenacity import retry, stop_after_attempt, wait_exponential

@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=4, max=10),
    retry=retry_if_exception_type(OSError)
)
def copy_with_retry(src: Path, dst: Path) -> Path:
    """Copy file with retry mechanism for network issues."""
    return Path(copy(src, dst))
544-546: Ensure consistent parameter ordering across methods

The parameter ordering in do_backup_partial and do_backup_full is consistent, which is good. However, consider grouping related parameters together using a dataclass for better maintainability.

Consider creating a backup options dataclass:
@dataclass
class BackupOptions:
    """Options for backup operations."""
    password: str | None = None
    compressed: bool = True
    location: Mount | Literal[LOCATION_CLOUD_BACKUP] | type[DEFAULT] | None = DEFAULT
    extra: dict[str, Any] | None = None
    additional_locations: list[Mount | Literal[LOCATION_CLOUD_BACKUP] | None] | None = None
Also applies to: 493-495

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between a06a8f0 and 664a3d1.

📒 Files selected for processing (1)

supervisor/backups/manager.py (16 hunks)

🔇 Additional comments (1)

supervisor/backups/manager.py (1)

382-400: 🛠️ Refactor suggestion

Enhance backup import consolidation logic

The backup import consolidation logic correctly handles existing backups, but consider adding validation for the extra metadata when consolidating backups.

Add validation for extra metadata during consolidation:

def _validate_extra_metadata(self, extra: dict | None) -> bool:
    """Validate extra metadata structure."""
    if extra is None:
        return True
    return isinstance(extra, dict) and all(
        isinstance(k, str) for k in extra.keys()
    )

supervisor/backups/manager.py

coderabbitai

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (8)

tests/backups/conftest.py (3)

61-61: Fix typo in docstring

The docstring contains a typo: "prcoessed" should be "processed".
-    """Return a list of prcoessed backup locations."""
+    """Return a list of processed backup locations."""
58-60: Add type hint for request.param

The fixture uses request.param but its type is not hinted in the function signature. Add a type hint to improve code clarity.
-async def fixture_backup_locations(
-    request: pytest.FixtureRequest, coresys: CoreSys, mount_propagation, mock_is_mount
-) -> list[LOCATION_TYPE]:
+async def fixture_backup_locations(
+    request: pytest.FixtureRequest[list[str | None]], coresys: CoreSys, mount_propagation, mock_is_mount
+) -> list[LOCATION_TYPE]:
63-63: Use more descriptive variable name

The loaded flag could be more descriptive to better indicate its purpose.
-    loaded = False
+    mounts_loaded = False

supervisor/api/backups.py (2)

71-75: Add type validation in _ensure_list helper

The helper function should validate that the input is not already a list before wrapping it.

 def _ensure_list(item: Any) -> list:
     """Ensure value is a list."""
-    if not isinstance(item, list):
+    if item is None:
+        return []
+    elif not isinstance(item, list):
         return [item]
     return item

Line range hint 299-326: Improve error handling in copy_to_additional_locations

The error handling could be improved by:

Adding more context about the source of the error
Preserving the original error information
Using a more specific exception type for mount-related errors

             except OSError as err:
-                msg = f"Could not copy backup to {location.name if isinstance(location, Mount) else location} due to: {err!s}"
+                location_name = location.name if isinstance(location, Mount) else location
+                msg = (
+                    f"Failed to copy backup from {backup.tarfile} to {location_name}: "
+                    f"{err.__class__.__name__}: {err}"
+                )

                 if err.errno == errno.EBADMSG and location in {
                     LOCATION_CLOUD_BACKUP,
                     None,
                 }:
-                    raise BackupDataDiskBadMessageError(msg, _LOGGER.error) from err
-                raise BackupError(msg, _LOGGER.error) from err
+                    exc = BackupDataDiskBadMessageError(msg, _LOGGER.error)
+                else:
+                    exc = BackupMountError(msg, _LOGGER.error) if isinstance(location, Mount) else BackupError(msg, _LOGGER.error)
+                raise exc from err

tests/backups/test_manager.py (3)

1720-1727: Consider adding docstrings to explain test parameters.

The parametrize decorator would benefit from docstrings explaining the test scenarios and expected outcomes for each parameter combination.

 @pytest.mark.parametrize(
     ("backup_locations", "location_name", "healthy_expected"),
     [
-        (["test"], "test", True),
-        ([None], None, False),
+        (["test"], "test", True),  # Named location should maintain healthy state
+        ([None], None, False),  # Default location should mark system unhealthy
     ],
     indirect=["backup_locations"],
 )

Line range hint 1908-1977: Consider adding metadata consistency checks.

While the tests verify location tracking well, consider adding assertions to verify that backup metadata (slug, date, etc.) remains consistent across locations.

     assert backup.location is None
     assert backup.locations == [None]
     assert backup.all_locations.keys() == {".cloud_backup", None}
+    # Verify backup metadata consistency
+    assert backup.slug == "7fed74c8"
+    assert backup.name == "backup_example"  # Assuming this is the name
+    assert backup.date  # Verify date is preserved

1980-2014: Consider adding error case tests for backup removal.

The tests effectively cover successful removal scenarios, but consider adding test cases for:

Permission errors during removal
Partial failures (some locations fail to remove)
Race conditions (file already removed)

Example test case to add:

async def test_backup_remove_partial_failure(
    coresys: CoreSys, tmp_supervisor_data: Path
):
    """Test removing a backup with failures in some locations."""
    backup_file = get_fixture_path("backup_example.tar")
    location_1 = Path(copy(backup_file, coresys.config.path_backup))
    location_2 = Path(copy(backup_file, coresys.config.path_core_backup))

    await coresys.backups.reload()
    backup = coresys.backups.get("7fed74c8")
    
    # Simulate permission error in one location
    location_1.chmod(0o000)
    
    coresys.backups.remove(backup)
    assert location_1.exists()  # Should still exist due to permission error
    assert not location_2.exists()  # Should be removed successfully
    assert backup.all_locations == {None: location_1}

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 664a3d1 and 0b31fc2.

📒 Files selected for processing (5)

supervisor/api/backups.py (13 hunks)
supervisor/backups/const.py (2 hunks)
supervisor/backups/manager.py (18 hunks)
tests/backups/conftest.py (2 hunks)
tests/backups/test_manager.py (7 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

supervisor/backups/const.py

🔇 Additional comments (5)

supervisor/api/backups.py (1)

431-446: ⚠️ Potential issue

Handle partial failures in upload method

The upload method should handle partial failures when copying to additional locations and ensure cleanup in case of errors.

     if additional_locations:
         self._change_stage(BackupJobStage.COPY_ADDITONAL_LOCATIONS, backup)
         try:
             await self._copy_to_additional_locations(
                 backup, additional_locations
             )
         except BackupError as err:
             self.sys_jobs.capture_error(err)
+            # Clean up the partially copied backup
+            self.remove(backup)
+            return None

Likely invalid or redundant comment.

supervisor/backups/manager.py (3)

63-65: Convert dict_values to list in list_backups property

The property is annotated to return list[Backup] but returns a dict_values object.

442-450: Improve error handling in _do_backup for additional locations

The error handling for additional locations copy should be consistent with other backup error handling.

221-231: 🛠️ Refactor suggestion

Improve error handling in backup consolidation

The consolidation error handling should provide more context about the failure and ensure proper cleanup.

                     try:
                         self._backups[backup.slug].consolidate(backup)
                     except BackupInvalidError as err:
                         _LOGGER.error(
-                            "Ignoring backup %s in %s due to: %s",
+                            "Failed to consolidate backup %s from %s with existing backup: %s",
                             backup.slug,
                             backup.location,
                             err,
                         )
+                        # Clean up the invalid backup file
+                        backup.tarfile.unlink()
                         return False

Likely invalid or redundant comment.

tests/backups/test_manager.py (1)

1729-1748: LGTM! Comprehensive error handling tests.

The error handling tests effectively cover:

Different error scenarios (EBUSY, EBADMSG)
System health state verification
Location-specific error handling

supervisor/backups/manager.py

home-assistant bot added the cla-signed label Dec 4, 2024

mdegat01 added the new-feature A new feature label Dec 4, 2024

mdegat01 requested a review from agners December 4, 2024 20:35

Foix and extend cloud backup support

ee9ac4a

mdegat01 force-pushed the cloud-backup-support-part-2 branch from 2463543 to ee9ac4a Compare December 4, 2024 20:37

coderabbitai bot reviewed Dec 4, 2024

View reviewed changes

Clean up task for cloud backup and remove by location

a06a8f0

mdegat01 changed the title ~~Foix and extend cloud backup support~~ Fix and extend cloud backup support Dec 4, 2024

Args to kwargs on backup methods

664a3d1

coderabbitai bot reviewed Dec 4, 2024

View reviewed changes

supervisor/backups/manager.py Show resolved Hide resolved

agners approved these changes Dec 4, 2024

View reviewed changes

Fix backup remove error test and typing clean up

0b31fc2

coderabbitai bot reviewed Dec 4, 2024

View reviewed changes

supervisor/backups/manager.py Show resolved Hide resolved

mdegat01 merged commit 6e32144 into main Dec 5, 2024
20 checks passed

mdegat01 deleted the cloud-backup-support-part-2 branch December 5, 2024 05:07

github-actions bot locked and limited conversation to collaborators Dec 7, 2024

Fix and extend cloud backup support #5464

Fix and extend cloud backup support #5464

Uh oh!

Conversation

mdegat01 commented Dec 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed change

Type of change

Additional information

Checklist

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Dec 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 4, 2024

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mdegat01 commented Dec 4, 2024 •

edited

Loading

coderabbitai bot commented Dec 4, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)