Enhance _info method to check file and directory info in parallel#786
Enhance _info method to check file and directory info in parallel#786yuxin00j wants to merge 15 commits intofsspec:mainfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #786 +/- ##
==========================================
+ Coverage 75.98% 76.44% +0.46%
==========================================
Files 14 15 +1
Lines 2665 2679 +14
==========================================
+ Hits 2025 2048 +23
+ Misses 640 631 -9 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Hi @ankitaluthra1, you may check the update on optimization in _info here and in #780 |
|
/gcbrun |
|
@yuxin00j Can you please check the e2e failure |
|
Hi @ankitaluthra1, I have fixed the test failure. |
|
QQ: Was the 30% to 60% improvement also observed for HNS buckets where we are parallelizing get_object and get_folder calls? |
Is the speedup for file paths related to the changes in this PR? I am assuming it is variance and not related to this PR as the latency for file paths shouldn't be impacted by this change, let me know if I am missing something here. |
Yes. There's improvement for all 3 bucket types when the target type is folder. |
Yeah, I think you're right. It should be just variance. |
…and format with black
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…wait and simplify parallel task evaluation in _info
|
/gcbrun |
Optimize the performance of the _info method by enabling concurrent checks for file paths and directory listings.
Early Return Strategy: If _get_object completes first and resolves to a valid file (not a directory marker), the execution cancels the directory scan tasks and returns the file metadata immediately.
Fallback Logic: If _get_object fails or yields a directory marker, it safely falls back to the directory tree scan result.
Benchmark run result
Folder Info
Execution times consistently dropped by 30% to 60% across all single-threaded and multi-process configurations.
File Info
Results are generally neutral.
Bucket Info
This optimization does not affect info call for bucket.