Skip to content

ericchapman80/wd-mycloud-rsync-recovery

Repository files navigation

WD MyCloud Rsync Recovery

Modern rsync-based recovery toolkit for Western Digital MyCloud NAS devices. Uses battle-tested rsync with intelligent path reconstruction from SQLite database.

🚀 Recommended approach for MyCloud recovery. Simpler, faster, and more reliable than SDK-based methods.

⚠️ Platform Support: macOS and Linux. Windows has limited support via WSL2 only (see Installation Guide).

⚠️ Disclaimer: This software is provided "as is" without warranty of any kind. The authors are not responsible for any data loss, corruption, or other issues that may occur. Always maintain backups of your original data before attempting recovery. Use at your own risk.


☕ Support This Project

If this tool saved your data, consider supporting continued development:


Why Rsync?

  • Automatic timestamp preservation - No separate mtime sync needed
  • Native resume capability - Interrupted recoveries continue seamlessly
  • Battle-tested reliability - Decades of proven rsync stability
  • Better performance - Optimized I/O patterns
  • Lower memory usage - ~50 MB vs 2-10 GB (SDK approach)
  • Simpler operation - Fewer manual steps

Alternative: SDK Toolkit

For users who need Python API access or prefer REST SDK approach, see wd-mycloud-python-recovery.


Quick Start

macOS users (install system dependencies first):

# From repository root
brew install rsync python@3.12

Setup with Poetry (recommended):

# Standard setup (asks permission to modify shell config)
./setup.sh

# Minimal setup (no shell config modification)
./setup.sh --no-shell-config

# Reload your shell to apply UTF-8 settings (if you chose to modify)
source ~/.zshrc  # or ~/.bashrc

# Activate Poetry shell
poetry shell

# Run preflight analysis
python preflight.py /path/to/source /path/to/dest

# Run recovery
python rsync_restore.py --db index.db --source-root /source --dest-root /dest

# Monitor progress (in another terminal)
./monitor.sh

Alternative: Direct commands with Poetry:

poetry run python preflight.py /path/to/source /path/to/dest
poetry run python rsync_restore.py --db index.db --source-root /source --dest-root /dest

Features

Core Recovery

  • Multi-threaded rsync operations for optimal performance
  • Progress monitoring with real-time statistics
  • Automatic timestamp preservation (no manual sync needed)
  • Resume capability for interrupted transfers
  • Path reconstruction from SQLite database

Cleanup Mode

  • Orphan detection - Find files in destination not in database
  • Pattern-based protection - Exclude specific paths from cleanup
  • Dry-run mode - Preview changes before deleting
  • Interactive wizard - Guided cleanup with prompts
  • Config persistence - Save cleanup settings

Monitoring & Analysis

  • Preflight checks - System analysis and recommendations
  • Thread optimization - Automatic thread count tuning
  • Disk space warnings - Proactive space management
  • Transfer statistics - Detailed progress reporting

Tools

  • rsync_restore.py - Main recovery script (rsync wrapper with intelligent path handling)
  • preflight.py - System analysis and thread recommendations
  • monitor.sh - Real-time progress monitoring

Testing

Test Coverage: 70-76% (467+ tests, 5,722 lines of test code)

# Run all tests
./run_tests.sh

# Run with coverage report
./run_tests.sh html

# Run specific test suites
poetry run pytest tests/test_symlink_farm.py -v          # Symlink farm tests
poetry run pytest tests/test_preflight_integration.py -v  # Integration tests
poetry run pytest tests/test_cleanup_integration.py -v    # Cleanup workflows

Test Suite:

  • Unit Tests (202 tests): Symlink farm, preflight, cleanup, user interaction
  • Integration Tests (127 tests): End-to-end workflows, component interaction
  • Additional Tests (60+ tests): Progress monitoring, database operations, error handling

Comparison: Rsync vs SDK Toolkit

Feature Rsync Toolkit (This) SDK Toolkit
Timestamp Preservation Automatic Requires sync_mtime.py
Resume Native rsync support Limited
Memory Usage ~50 MB 2-10 GB
Performance Optimized I/O Good
Complexity Lower Higher
Development Active Open source
Test Coverage 70-76% 63%
API Access No Yes (REST SDK)

When to Use Which Toolkit

Use this rsync toolkit when:

  • Starting a new recovery project (recommended)
  • Want simplest operation with automatic features
  • Need reliable resume capability
  • Prefer battle-tested tools (rsync)
  • Want active development and new features

Use SDK toolkit when:

  • Need Python API access to MyCloud device
  • Working where rsync is unavailable
  • Require programmatic control over recovery
  • Need symlink deduplication feature
  • Prefer REST API approach

Documentation

Development Status

Active Development

  • Comprehensive test suite with 70-76% coverage
  • All critical workflows tested and validated
  • Integration tests ensure components work together
  • Regular updates and new features

Running Over SSH

For long-running recoveries over SSH, use tmux or screen to prevent disconnection from killing your session:

# Start a detachable session
tmux new -s recovery
# or: screen -S recovery

# Run your recovery inside the session
poetry shell
python rsync_restore.py --db index.db --source-root /source --dest-root /dest

# Detach: Ctrl+B then D (tmux) or Ctrl+A then D (screen)
# Reattach later:
tmux attach -t recovery
# or: screen -r recovery

Monitoring

The monitor.sh script tracks system health during long operations:

# Run in background
nohup ./monitor.sh /path/to/monitor.log 30 > /dev/null 2>&1 &

# Watch the log
tail -f /path/to/monitor.log

What it monitors:

  • Script status (running/stopped)
  • NFS mount status (OK/stalled/unmounted)
  • Memory usage %
  • System load average
  • Open file descriptors
  • Disk I/O wait %

FAQ

Why do I see "File not found in database" errors?

Files may be missing from the database due to corruption or interrupted operations on the MyCloud device. These unmatched files are reported but don't affect the recovery of other files.

How is the database structured?

See docs/DATABASE_SCHEMA.md for full schema documentation. Key points:

  • Main table is Files
  • contentID maps to on-disk filename (e.g., a22236cwsmelmd4on2qs2jdf)
  • name is the original human-readable filename
  • parentID builds the directory tree structure
  • Files are stored in sharded directories: /files/a/a22236..., /files/b/b12345...

How can I safely inspect the database?

sqlite3 index.db
-- Show tables
.tables

-- Show schema
.schema Files

-- Sample records
SELECT id, name, contentID FROM Files LIMIT 10;

-- Count files
SELECT COUNT(*) FROM Files;

Contributing

  • Issues: Report bugs or request features via GitHub issues
  • Pull Requests: Contributions welcome!
  • SDK Alternative: wd-mycloud-python-recovery

License

See LICENSE file.

Credits

Original mycloud-restsdk concept by springfielddatarecovery

Rsync approach, testing, and toolkit development by @ericchapman80

Legacy Python tool: wd-mycloud-python-recovery

About

Modern rsync-based recovery toolkit for WD MyCloud NAS devices. Recommended for all new recoveries.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors