Skip to content

Microbenchmarks improvements and bug fixes#799

Open
Mahalaxmibejugam wants to merge 7 commits intofsspec:mainfrom
ankitaluthra1:benchmark-changes
Open

Microbenchmarks improvements and bug fixes#799
Mahalaxmibejugam wants to merge 7 commits intofsspec:mainfrom
ankitaluthra1:benchmark-changes

Conversation

@Mahalaxmibejugam
Copy link
Copy Markdown
Contributor

This PR includes the following changes to the microbenchmarks suite:

  • Fix chunking in test_info_multi_threaded: Corrected the handling of paths in multi-threaded info benchmarks to ensure proper distribution across threads instead of passing a single tuple.

  • Add more files in info benchmarks: Increased the number of files in info benchmarks to provide a more rigorous performance test.

  • Remove sleep from rename benchmarks: Removed a sleep call that was added to work around a Long Running Operation (LRO) issue that has since been fixed.

  • Add more folders in rm benchmarks: Increased the number of folders in rm benchmarks to better measure performance under scale.

@Mahalaxmibejugam Mahalaxmibejugam changed the title Update micro benchmarks Microbenchmarks improvements and bug fixes Apr 1, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 1, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.32%. Comparing base (d35f8f8) to head (072c124).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #799      +/-   ##
==========================================
+ Coverage   75.98%   76.32%   +0.33%     
==========================================
  Files          14       14              
  Lines        2665     2665              
==========================================
+ Hits         2025     2034       +9     
+ Misses        640      631       -9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

- 131072
folders:
- 256
sample_size:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we sampling?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, we created only 100 files and 100 folders in the bucket and called info on all 200 paths (files and folders). Now that I've modified benchmark to include 65k and 130k files, calling info on all 65k paths will not yield significantly more data points compared to what we would get by calling only 100 paths and would unnecessarily increase the benchmark's runtime.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seggregated scenarios for files and folders, so there is no need of sampling now. For file scenarios, 10k files are created and info is called on them. For folder scenarios, we are creating 65k files, 256 folders and calling info on all 256 folders.

scenarios:
- name: "delete_flat"
folders: [256]
folders: [1024, 2048, 4096]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of updating the folders i'd suggest create a new scenario with these options. This will impact the daily runs as it will take long time to create these as part of setup. So if you really want to run a daily trigger that compares large number of folders, better create different scenarios and trigger pointing to these scenarios.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just increasing the number of folders won't actually increase the setup time as we are not making explicit calls to create folders using mkdir but they are implicitly getting created during file creation.

But as we are increasing the scenarios from one (256) to three (1024, 2048, 4096), more scenarios will run now and hence delete benchmarks would take more time. However, I suggest keeping them because delete benchmark's latency has significant contribution from the number of folders, and we only observe latency differences in HNS and standard buckets at 2k and 4k folders.

@Mahalaxmibejugam Mahalaxmibejugam requested a review from jasha26 April 3, 2026 10:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants