Modern rsync-based recovery toolkit for Western Digital MyCloud NAS devices. Uses battle-tested rsync with intelligent path reconstruction from SQLite database.
🚀 Recommended approach for MyCloud recovery. Simpler, faster, and more reliable than SDK-based methods.
⚠️ Platform Support: macOS and Linux. Windows has limited support via WSL2 only (see Installation Guide).
⚠️ Disclaimer: This software is provided "as is" without warranty of any kind. The authors are not responsible for any data loss, corruption, or other issues that may occur. Always maintain backups of your original data before attempting recovery. Use at your own risk.
If this tool saved your data, consider supporting continued development:
- GitHub Sponsors: Sponsor @ericchapman80
- Buy Me a Coffee: buymeacoffee.com/ericchapman80
- Automatic timestamp preservation - No separate mtime sync needed
- Native resume capability - Interrupted recoveries continue seamlessly
- Battle-tested reliability - Decades of proven rsync stability
- Better performance - Optimized I/O patterns
- Lower memory usage - ~50 MB vs 2-10 GB (SDK approach)
- Simpler operation - Fewer manual steps
For users who need Python API access or prefer REST SDK approach, see wd-mycloud-python-recovery.
macOS users (install system dependencies first):
# From repository root
brew install rsync python@3.12Setup with Poetry (recommended):
# Standard setup (asks permission to modify shell config)
./setup.sh
# Minimal setup (no shell config modification)
./setup.sh --no-shell-config
# Reload your shell to apply UTF-8 settings (if you chose to modify)
source ~/.zshrc # or ~/.bashrc
# Activate Poetry shell
poetry shell
# Run preflight analysis
python preflight.py /path/to/source /path/to/dest
# Run recovery
python rsync_restore.py --db index.db --source-root /source --dest-root /dest
# Monitor progress (in another terminal)
./monitor.shAlternative: Direct commands with Poetry:
poetry run python preflight.py /path/to/source /path/to/dest
poetry run python rsync_restore.py --db index.db --source-root /source --dest-root /dest- Multi-threaded rsync operations for optimal performance
- Progress monitoring with real-time statistics
- Automatic timestamp preservation (no manual sync needed)
- Resume capability for interrupted transfers
- Path reconstruction from SQLite database
- Orphan detection - Find files in destination not in database
- Pattern-based protection - Exclude specific paths from cleanup
- Dry-run mode - Preview changes before deleting
- Interactive wizard - Guided cleanup with prompts
- Config persistence - Save cleanup settings
- Preflight checks - System analysis and recommendations
- Thread optimization - Automatic thread count tuning
- Disk space warnings - Proactive space management
- Transfer statistics - Detailed progress reporting
- rsync_restore.py - Main recovery script (rsync wrapper with intelligent path handling)
- preflight.py - System analysis and thread recommendations
- monitor.sh - Real-time progress monitoring
Test Coverage: 70-76% (467+ tests, 5,722 lines of test code)
# Run all tests
./run_tests.sh
# Run with coverage report
./run_tests.sh html
# Run specific test suites
poetry run pytest tests/test_symlink_farm.py -v # Symlink farm tests
poetry run pytest tests/test_preflight_integration.py -v # Integration tests
poetry run pytest tests/test_cleanup_integration.py -v # Cleanup workflowsTest Suite:
- Unit Tests (202 tests): Symlink farm, preflight, cleanup, user interaction
- Integration Tests (127 tests): End-to-end workflows, component interaction
- Additional Tests (60+ tests): Progress monitoring, database operations, error handling
| Feature | Rsync Toolkit (This) | SDK Toolkit |
|---|---|---|
| Timestamp Preservation | Automatic | Requires sync_mtime.py |
| Resume | Native rsync support | Limited |
| Memory Usage | ~50 MB | 2-10 GB |
| Performance | Optimized I/O | Good |
| Complexity | Lower | Higher |
| Development | Active | Open source |
| Test Coverage | 70-76% | 63% |
| API Access | No | Yes (REST SDK) |
Use this rsync toolkit when:
- Starting a new recovery project (recommended)
- Want simplest operation with automatic features
- Need reliable resume capability
- Prefer battle-tested tools (rsync)
- Want active development and new features
Use SDK toolkit when:
- Need Python API access to MyCloud device
- Working where rsync is unavailable
- Require programmatic control over recovery
- Need symlink deduplication feature
- Prefer REST API approach
- Installation Guide: docs/INSTALLATION.md - macOS, Linux, Windows (WSL)
- Full Usage Guide: docs/USAGE_GUIDE.md - Complete step-by-step recovery guide
- Database Schema: docs/DATABASE_SCHEMA.md - SQLite schema reference
- Legacy Python Tool: wd-mycloud-python-recovery
✅ Active Development
- Comprehensive test suite with 70-76% coverage
- All critical workflows tested and validated
- Integration tests ensure components work together
- Regular updates and new features
For long-running recoveries over SSH, use tmux or screen to prevent disconnection from killing your session:
# Start a detachable session
tmux new -s recovery
# or: screen -S recovery
# Run your recovery inside the session
poetry shell
python rsync_restore.py --db index.db --source-root /source --dest-root /dest
# Detach: Ctrl+B then D (tmux) or Ctrl+A then D (screen)
# Reattach later:
tmux attach -t recovery
# or: screen -r recoveryThe monitor.sh script tracks system health during long operations:
# Run in background
nohup ./monitor.sh /path/to/monitor.log 30 > /dev/null 2>&1 &
# Watch the log
tail -f /path/to/monitor.logWhat it monitors:
- Script status (running/stopped)
- NFS mount status (OK/stalled/unmounted)
- Memory usage %
- System load average
- Open file descriptors
- Disk I/O wait %
Why do I see "File not found in database" errors?
Files may be missing from the database due to corruption or interrupted operations on the MyCloud device. These unmatched files are reported but don't affect the recovery of other files.
How is the database structured?
See docs/DATABASE_SCHEMA.md for full schema documentation. Key points:
- Main table is
Files contentIDmaps to on-disk filename (e.g.,a22236cwsmelmd4on2qs2jdf)nameis the original human-readable filenameparentIDbuilds the directory tree structure- Files are stored in sharded directories:
/files/a/a22236...,/files/b/b12345...
How can I safely inspect the database?
sqlite3 index.db-- Show tables
.tables
-- Show schema
.schema Files
-- Sample records
SELECT id, name, contentID FROM Files LIMIT 10;
-- Count files
SELECT COUNT(*) FROM Files;- Issues: Report bugs or request features via GitHub issues
- Pull Requests: Contributions welcome!
- SDK Alternative: wd-mycloud-python-recovery
See LICENSE file.
Original mycloud-restsdk concept by springfielddatarecovery
Rsync approach, testing, and toolkit development by @ericchapman80
Legacy Python tool: wd-mycloud-python-recovery