-
Notifications
You must be signed in to change notification settings - Fork 129
Optimize TigrisFS sync performance using find command #399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Replace Python's scandir() with server-side find command using -printf to eliminate thousands of network stat() calls on network filesystems. Changes: - Rewrite scan_directory() to use find -printf for path+mtime+size in one operation - Remove obsolete methods: _scan_directory_full, _scan_directory_modified_since, _quick_count_files - Update scan() to stream results from unified scan_directory() - Add _count_files() helper using find | wc -l Expected performance for 1.4k files on TigrisFS: - Full sync: 52 minutes → 2-3 minutes (26x faster) - Incremental sync: No change (already fast at 200-600ms) Fixes #398 Co-authored-by: Paul Hernandez <[email protected]>
|
Claude finished @phernandez's task —— View job PR Review: TigrisFS Sync Performance OptimizationI've completed a thorough review of this PR against our team checklist. Overall, this is a well-executed performance optimization with excellent expected results (26x speedup). However, there are several critical issues that need to be addressed before merging. 🚨 Critical Issues1. Platform Compatibility:
|
Fixes #398
Replace Python's scandir() with server-side find command using -printf to eliminate thousands of network stat() calls on network filesystems.
Changes
Expected Performance
For 1.4k files on TigrisFS:
Generated with Claude Code