Skip to content

Enhance _info method to check file and directory info in parallel#786

Open
yuxin00j wants to merge 15 commits intofsspec:mainfrom
ankitaluthra1:optimize-info
Open

Enhance _info method to check file and directory info in parallel#786
yuxin00j wants to merge 15 commits intofsspec:mainfrom
ankitaluthra1:optimize-info

Conversation

@yuxin00j
Copy link
Copy Markdown
Contributor

@yuxin00j yuxin00j commented Mar 25, 2026

Optimize the performance of the _info method by enabling concurrent checks for file paths and directory listings.

  • Early Return Strategy: If _get_object completes first and resolves to a valid file (not a directory marker), the execution cancels the directory scan tasks and returns the file metadata immediately.

  • Fallback Logic: If _get_object fails or yields a directory marker, it safely falls back to the directory tree scan result.

Benchmark run result

Folder Info

Execution times consistently dropped by 30% to 60% across all single-threaded and multi-process configurations.

File Info

Results are generally neutral.

Bucket Info

This optimization does not affect info call for bucket.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 25, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.44%. Comparing base (e70bc65) to head (df864d8).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #786      +/-   ##
==========================================
+ Coverage   75.98%   76.44%   +0.46%     
==========================================
  Files          14       15       +1     
  Lines        2665     2679      +14     
==========================================
+ Hits         2025     2048      +23     
+ Misses        640      631       -9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@yuxin00j yuxin00j marked this pull request as ready for review March 26, 2026 02:22
@yuxin00j yuxin00j changed the title Enhance _info method to check file and directory info in parallel.Optimize info Enhance _info method to check file and directory info in parallel Mar 26, 2026
@yuxin00j
Copy link
Copy Markdown
Contributor Author

Hi @ankitaluthra1, you may check the update on optimization in _info here and in #780

@ankitaluthra1
Copy link
Copy Markdown
Collaborator

/gcbrun

@ankitaluthra1
Copy link
Copy Markdown
Collaborator

@yuxin00j Can you please check the e2e failure

@yuxin00j
Copy link
Copy Markdown
Contributor Author

yuxin00j commented Apr 2, 2026

Hi @ankitaluthra1, I have fixed the test failure.

@Mahalaxmibejugam
Copy link
Copy Markdown
Contributor

QQ: Was the 30% to 60% improvement also observed for HNS buckets where we are parallelizing get_object and get_folder calls?

@Mahalaxmibejugam
Copy link
Copy Markdown
Contributor

File Info: Results are mixed but generally neutral, showing minor speedups of up to 24.6% in high process count runs. One outlier showed a minor regression in deep regional tests.

Is the speedup for file paths related to the changes in this PR? I am assuming it is variance and not related to this PR as the latency for file paths shouldn't be impacted by this change, let me know if I am missing something here.

@yuxin00j
Copy link
Copy Markdown
Contributor Author

yuxin00j commented Apr 6, 2026

QQ: Was the 30% to 60% improvement also observed for HNS buckets where we are parallelizing get_object and get_folder calls?

Yes. There's improvement for all 3 bucket types when the target type is folder.

@yuxin00j
Copy link
Copy Markdown
Contributor Author

yuxin00j commented Apr 6, 2026

File Info: Results are mixed but generally neutral, showing minor speedups of up to 24.6% in high process count runs. One outlier showed a minor regression in deep regional tests.

Is the speedup for file paths related to the changes in this PR? I am assuming it is variance and not related to this PR as the latency for file paths shouldn't be impacted by this change, let me know if I am missing something here.

Yeah, I think you're right. It should be just variance.

yuxin00j added a commit to ankitaluthra1/gcsfs that referenced this pull request Apr 6, 2026
@ankitaluthra1
Copy link
Copy Markdown
Collaborator

/gcbrun

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants