Microbenchmarks improvements and bug fixes#799
Microbenchmarks improvements and bug fixes#799Mahalaxmibejugam wants to merge 7 commits intofsspec:mainfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #799 +/- ##
==========================================
+ Coverage 75.98% 76.32% +0.33%
==========================================
Files 14 14
Lines 2665 2665
==========================================
+ Hits 2025 2034 +9
+ Misses 640 631 -9 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| - 131072 | ||
| folders: | ||
| - 256 | ||
| sample_size: |
There was a problem hiding this comment.
Previously, we created only 100 files and 100 folders in the bucket and called info on all 200 paths (files and folders). Now that I've modified benchmark to include 65k and 130k files, calling info on all 65k paths will not yield significantly more data points compared to what we would get by calling only 100 paths and would unnecessarily increase the benchmark's runtime.
There was a problem hiding this comment.
Seggregated scenarios for files and folders, so there is no need of sampling now. For file scenarios, 10k files are created and info is called on them. For folder scenarios, we are creating 65k files, 256 folders and calling info on all 256 folders.
| scenarios: | ||
| - name: "delete_flat" | ||
| folders: [256] | ||
| folders: [1024, 2048, 4096] |
There was a problem hiding this comment.
Instead of updating the folders i'd suggest create a new scenario with these options. This will impact the daily runs as it will take long time to create these as part of setup. So if you really want to run a daily trigger that compares large number of folders, better create different scenarios and trigger pointing to these scenarios.
There was a problem hiding this comment.
Just increasing the number of folders won't actually increase the setup time as we are not making explicit calls to create folders using mkdir but they are implicitly getting created during file creation.
But as we are increasing the scenarios from one (256) to three (1024, 2048, 4096), more scenarios will run now and hence delete benchmarks would take more time. However, I suggest keeping them because delete benchmark's latency has significant contribution from the number of folders, and we only observe latency differences in HNS and standard buckets at 2k and 4k folders.
This PR includes the following changes to the microbenchmarks suite:
Fix chunking in test_info_multi_threaded: Corrected the handling of paths in multi-threaded info benchmarks to ensure proper distribution across threads instead of passing a single tuple.
Add more files in info benchmarks: Increased the number of files in info benchmarks to provide a more rigorous performance test.
Remove sleep from rename benchmarks: Removed a sleep call that was added to work around a Long Running Operation (LRO) issue that has since been fixed.
Add more folders in rm benchmarks: Increased the number of folders in rm benchmarks to better measure performance under scale.